I need help creating a Capitalization Regular Expression

Yes. The trailing period in any acronyms of my special cases list was the source of the problem. I wondered why Alexander omitted it in his original example macro and thought it was just a typo. Now I see that it was intentional so as to avoid the wildcard match.

BTW: The term "Navy" doesn't need to be added as it gets handled properly just by using the Title Case action. Just in case anyone else is wondering...

Thanx.

Indeed it was a typo, and a very unfortunate one at that, as it made me overlook this edge case, leading to unnecessary confusion here.

As pointed out the trailing word boundary \b creates the problem. This is because the trailing period in "U.S.A." is a non-word-character, therefore searching for: "word boundary followed by literal U.S.A. followed by word boundary" does not match, as there is no word boundary between the trailing period and the end of the string — Nor would it be between the trailing period and a space, or any other non-word character. So "u.s.a." being at the end of your example was not what broke is, as demonstrated by your u.s. government example.

However simply removing the trailing \b does not work, because searching for "ai" would then also result in matching the first two characters in words like "AId" AIr" and AIkido".

My proposed solution is to swap out both the word boundaries \b with a leading negative lookbehind (?<![[:alpha:]]), and a trailing negative lookahead (?![[:alpha:]]), both looking around for any alphabet character [[:alpha:]] directly in front of or after the specified literal search string.

The search field in the Search and Replace should therefor be set up with the following string:

(?<![[:alpha:]])\Q%Variable%local__specialCase[1]__>__%\E(?![[:alpha:]])

And heres's the updated macro:

Title Case- then deal with special cases v1.2.kmmacros (4.9 KB)
(KM v11.0.3)

Macro Image

EDIT: I've realized that it can make sense to have the look arounds assert for [[:alpha:]] instead of for \w. This is because \w as a character class also includes numbers and underscore. This would lead to "1,024kb" and "101.1MHz" not being matched, event if kb__>__KB and mhz__>__MHz were included in the special cases list. I've updated this post, and the attached macro to reflect this.

2 Likes

Brilliant!

I've updated my last post and it's attached macro, as I realized that it can make sense to have the look arounds assert for [[:alpha:]] instead of for \w. This is because \w as a character class also includes numbers and underscore. This would lead to "1,024kb" and "101.1MHz" not being matched, event if kb__>__KB and mhz__>__MHz were included in the special cases list.