Automating REGEX

Thanks for everyone's contributions. Very useful to have all these methods in one place. As my main concern is quick workflow, here's another idea I had:

CleanShot 2023-01-30 at 13.06.58

Can't get much quicker than that. Yes, I know there's a limit to how complex it can be, but for simple find-and-replace stuff, I think it might be quite handy.

1 Like

Without speaking a word of Awk, I do like the look of it because it's one single-line action. Can you do all these things with it?

(...as well as matching lines containing a string.)

If so, I'd be interested in making an Awk generator for these kinds of tasks, similar to the RegEx generator above. I just want a quickly accessibly text-searching toolkit.

Sorry @noisneil, I certainly wouldn't want to imply ignorance. I was misunderstanding what you were trying to do.

A load of guff, made "problematic" by the speed test. But I can't bring myself to delete it! Of interest to completionists only...

And, obviously, I was wrong. [TEST] is a regex, just a very particular one that (unless you have case sensitivity off) matches that and only that string. I now see you are, in your example, popping it between .* and .* to catch the rest of the line.

The problem's going to come when "I want to replace TEST but sometimes I typed it as TSET..." and the string you'll want to enter is T(ES|SE)T, which'll get blatted by your escaping routine. If you don't allow patterns in your user-provided search string you won't have that problem but that, combined with you having to code each search "type" ahead of time, will severely hobble the utility of this -- perhaps to the point that you're better off using another method! For example:

Arrays! Split on the - string! Grab item 2! Put that into a "For Each" action, and Robert's your mother's brother! (I see I'm late to the party with that suggestion!) Similarly with "everything before a string" (item 1 of an array split on the string) and "after a string" (item 2, unless your string occurs more than once per line). Everything around the string is a "For Each Line... Find string and replace with <nothing>, append result to new variable". And so on.

Yes, you have to use a "For Each" loop for these -- but your regex is effectively a "for each line" loop too because, under the hood, an "All matches" regex of .*TEST.* is actually (?m)^.*TEST.*$ and is going through your source one line at a time, looking for a match somewhere between the start and end of that line. The main difference is that with a "For Each" you have to build an output rather than replacing to "source".

And, shooting myself in the foot here -- a big benefit of using a regex (a simple one, at least) is execution speed. I was expecting the opposite, but for 320 lines of input my test regex spat out a result in 3 milliseconds while a "For Each" doing the same took 1003 milliseconds!

So... What should happen to leading/trailing spaces in the "before/after/around a string" and "between two strings" cases? In the latter, are the two strings the same or different? "First/last n matching" could use a regex to extract matches and then head or tail...

Well I am fairly ignorant, or should I say "inexperienced"?

Do you see what I'm getting at with the post above?

Yes -- but, as always, the devil is in the details. For example for "around a string" when the string is test, what should the output be when the input is:

Test on the first line
We shall test in the middle
And an end test
What if there's too much testosterone?
Perhaps a contested result...
Or if we test and test again?
2 Likes

[Neil stares solemnly at his shoes and whispers]

"I don't know".

4 Likes

All of these things are possible one way or another, but @Nige_S' observations in Post #25 are quite relevant.

Why don't you post some real-world examples – initial condition and desired outcome – and then we can think about solutions.

Here's a simple example. As part of a requested Auto Save macro for Ableton Live, I had to grab the window title

Project Name [Project Name]

and remove the square-bracketed text, including the brackets to get

Project Name

So, I want to match everything after the first occurrence of " [" on every line:

(?m)\s\[.*$

I can spit that out like so:

CleanShot 2023-02-01 at 15.51.02

I've added an Everything between two strings option, but the inclusion of the < character in the regex string breaks the XML.

This works: <string>(?m)b.*$</string>

This doesn't: <string>(?<=a)(.*?)(?=c)</string>

Full Broken XML Example
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
	<dict>
		<key>Action</key>
		<string>IgnoreCaseRegEx</string>
		<key>ActionName</key>
		<string>RegEx: Everything between two strings</string>
		<key>ActionUID</key>
		<integer>13435524</integer>
		<key>Captures</key>
		<array>
			<string>Local__Output</string>
		</array>
		<key>MacroActionType</key>
		<string>SearchRegEx</string>
		<key>Search</key>
		<string>(?<=a)(.*?)(?=c)</string>
		<key>Source</key>
		<string>Variable</string>
		<key>Variable</key>
		<string>Local__Input</string>
	</dict>
</array>
</plist>

Is there a way around that?

Replace < with &lt;

You might also have problems with:
" -- replace with &quot;
' -- replace with &apos;
> -- replace with &gt;
& -- replace with &amp;

(all untested).

1 Like

Perfect! Thanks @Nige_S!

@tiffle I know it's taken me a while to come around to this, but I think you're on to something with the subroutine idea!

Here's what I've got so far:

RegEx Macros.kmmacros (84.5 KB)

Subroutine Caller Actions.zip (4.9 KB)

If you think you can improve on or add to these, I'd love to see what you come up with.

Hi Neil - examined this with interest and found the following:

  1. I loaded up your macros and created a testing macro using your caller actions zip and found that none of your subroutines work! Here's the testing macro:

Download Macro(s): Testing Regex Subroutines.kmmacros (29 KB)

Macro-Image

Macro-Notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System Information
  • macOS 10.14.6
  • Keyboard Maestro v10.2

To perform the testing I just select the appropriate group and TRY it. I'd offer a solution but I don't have time right now. Maybe I'm doing something wrong or I don't understand what is supposed to happen?

  1. Whenever I see myself inserting the same KM actions over and over in several macros I think that it might be worth turning those actions into a subroutine. I see a bunch of actions that appear at least once in each of your subroutines; here they are:

image

So I've taken the liberty of turning them into a subroutine for you that looks like this:

image

Here's the downloadable version:

Download Macro(s): [SUB] Escape Regex String.kmmacros (17 KB)

Macro-Image

Keyboard Maestro Export

Macro-Notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System Information
  • macOS 10.14.6
  • Keyboard Maestro v10.2

and you can use it to replace the 5 occurrences of that bunch of actions. The advantage of doing that is (and I'm sorry if I'm teaching granny to suck eggs) that (a) once you've tested the subroutine you can be sure it will always work; (b) if you need to change the subroutine in future (like add error checking or an extra "escaping" action for example) you need do it in only one place and not the 5; and (c) it reduces the overall count of actions used.

1 Like

See this is where it all falls down on my RegEx newbism. The reason I'm interested in this is that I'm dumbfounded by RegEx, but unfortunately it also means I can't fully incorporate it into these macros with any degree of competence. :joy: That said, I do think this idea has potential.

For example, the Everything BEFORE String subroutine does 'work' (sort of) in that it does what I told it to do. It removes everything after a string, if the string is found on that line.

<string>IgnoreCaseRegEx</string>
...becomes...
<string>IgnoreCaseRegEx

Now of course, this is where my incompetence comes into play as I now realise this subroutine should be called "Remove Everything After a String On Each Line If That String Is Found" instead. :man_facepalming:t2:

Replace LINES Containing String for some reason isn't receiving Local__Replace With. I'm very confused by this. It's just blank. To be fair, I've never really used Subroutines, so I may be missing something obvious.

RegEx: Everything AFTER String doesn't work and I'm not sure why.

RegEx: Return Everything BETWEEN Two Strings seems to work fine. :man_shrugging:t2:

NB: I did have to reconnect the callers to the subs when I imported your test macro, but presumably, they're all calling the appropriate things on your end...?

Very kind! I'll add that to the group! It's certainly more efficient, but I've never got into the habit of using subs when working on something I might share on the forum, as I prefer everything to be self-contained. I might change that mode of thinking...

Nice reply Neil. I’ll take a closer look later on unless someone else beats me to it!

BTW - don’t get discouraged! We all benefit from this stuff ‘cos we’re learning new things all the time! And that is never a bad thing!

1 Like

Oh, it's not really incompetence. Don't be so hard on yourself.

It can be difficult to describe the precision of a regex in a few words that fit on a menu item. And there are often subtle variations that have to be accounted for.

For Text Toolbox, which implements common text conversions with regexes, I resorted to short menu "topics" essentially that expanded into prompts for options where necessary with some explanatory text.

Text manipulation is complex. Describing it (in words, alas) is difficult. Keep fighting the good fight!

1 Like

I'm not sure if it's something to do with the example inputs or something else, but these were all working yesterday. Slightly frustrating.

Just a quick question, Neil - do the results you’re seeing come from my testing macro or yours?

Yours.

So you reckon this is correct for

2023-04-12_21-04-56