I need help creating a Capitalization Regular Expression

I'm trying to create a macro to help automate the capitalization of book titles. Using the Filter action to change a string to Title Case doesn't seem to work for me when the string contains terms such as AI, 3D, DIY or U.S.A.

For example the phrase:
how to use ai to create diy 3d text in the u.s.a.
will result in:
How to Use Ai to Create Diy 3d Text in the u.s.A.

I tried using the Search and Replace using Regular Expression action with limited success. Since RegEx Search only returns the first match in the source text, I can't get it to properly convert a book title such as:
all in on ai: how to avoid pain while using ai in searches
into:
All In On AI: How to Avoid Pain While Using AI in Searches
(note "AI" when followed by the colon but not inside the word "pain")

Are there any RegExp experts ("RegExperts"?) out there that can lend a hand? Or maybe suggest an alternate way I haven't thought of.

Thanx

Maybe this post can help you? Text Transformation from Jims

Or try this Literary Toolbox IV and Text Toolbox II

Title Case

Capitalizes the selection without capitalizing small words (like “in of on,” etc.) while

preserving other odd usage like camel case and Twitter handles. Intended primarily for

headlines. It’s a little more aggressive that Keyboard Maestro v10’s new algorithm but

it’s based on exactly the same code.

1 Like

How would the Regex/Macro determine to, e.g. ALLCAPS "diy" but Title Case "use"? Would you have a finite list of special abbreviations and such, and how to deal with them?

Have a look at placing your Search and Replace within a For Each.


A thread going into detail about this approach:

You can make a list of all the things that don't work and run a second regex to fix those.

The help page for regex tells you how to address that problem: (it tells you how to run a loop)

https://wiki.keyboardmaestro.com/Regular_Expressions?s[]=regex

How would you think Title Case should work for the last word in this title?

my favourite three-toed sloth in south america is the ai

The point there is that Title Case can't solve every problem the way you want it because it does not know the meaning of the words. It can't understand anything.

Exactly. My approach involves: 1) applying the Title Case action to the whole title; and then 2) applying another pass with a dictionary of words that require special capitalization. Isolating those words is the part I was asking for help on. I think the ideal approach probably involves using SED/AWK, but I'm even worse with those than I am with RegExp.

Hopefully I won't encounter many books about pale-throated sloths. If I get an error in capitalizing those titles, I'll deal. :slight_smile:

I saw that help page earlier. It confused me because the example discussed is for searching in a loop and I didn't know how to implement it with a RegExp Search & Replace in the loop instead. Especially confusing was the substring parameter in the for loop being the same RegExp as the one in the Search action. It seems the older I get, the less I understand.

I agree with you that that page isn't written clearly. It's not your fault if you are confused about it. I'm sure someone will help clarify that technique for you, but I'm busy for a few hours, so I can't help.

If you show me a list of exceptions that you want handled, I can probably whip something up for you. You are right that sed and awk might be the best tools for the job, but I can probably implement it in pure KM for you if that's what you want.

Likely close to something @Airy would have contributed with. But how far is this from something that could work for you?

Title Case- then deal with special cases.kmmacros (4.8 KB)
(KM v11.0.3)

Macro Image

Searching for Regular Expression only to use the handy word boundary syntax \b. Then also using \Q and \E, treating what's enclosed as literals, so as not having to escape special characters (like the periods in u.s.a).

2 Likes

Thanx. I'll check it out. I appreciate your effort.

Works like a charm. I did come across one weird scenario that didn't work, though:

I added "u.s. navy" to the source text and then added "u.s." to the special cases list. For some reason it refuses to capitalize it. I thought maybe it was interfering with the USA special case, but I tried removing that and U.S. still won't capitalize it.
It's not a biggie and all other cases have worked so far, but it's curious why this one particular term isn't working for me.

Spaces matter.

Sorry, that was a typo in my comment, not in the macro. Fixed it in the comment above to avoid further confusion. The term still does not capitalize properly.

Then the answer is likely because your original question was about changing u.s.a. to U.S.A. and now you are talking about changing u.s. navy to U.S. Navy. These are completely different requirements. Perhaps you should add another line to the macro to change u.s. to U.S.

Yes, spaces matter. As in, don't. Add this instead:

u.s.__>_U.S.
navy
>__Navy

And kill the trailing /b in the Search & Replace action, I suppose, too.

I'm sorry. I think I didn't make myself clear. I was talking about adding another term that needed to be capitalized: "u.s.". The fact that I used "U.S. Navy" was just an example of where the term "U.S." would need to be capitalized. The word Navy is irrelevant. It could be any title such as "U.S. Government Policy" or "The U.S. Goes to War".

Please see my reply to Airy above.

Oh, I saw it. Then I made those changes to the macro and got the correct results. Try it.

"All in on AI: How to Avoid the U.S. Navy While Using AI in Searches, and How to Use AI to Create DIY 3D Text in the U.S.A.."

I think the /b is an important key. Your original substitution set omitted the final period in U.S.A. presumably because it appeared at the end of the sentence. But that's a wildcard (meaning "any character"). So if you keep the /b and you use:

u.s__>U.S
navy
>__Navy

You'll get:

All in on AI: How to Avoid the U.S. Navy While Using AI in Searches, and How to Use AI to Create DIY 3D Text in the U.S.A.

Then make the change the way you just described. You are entirely free to make any changes you want. I will not dictate what changes you need to make.