Automating REGEX

Thank you for the example.

Late here, I will give some options tomorrow.

I hate to say this but what I find most difficult is writing the regular expression in the first place. In my Windows days I had a brief dalliance with RegexMagic which sometimes helped out with things. Now that's what I'd call automating the regex!!

1 Like

I'll give you this full JS example for you to play, but I think I would use a parser combinator for this task.

Instead of using regex, you could see it as a sequence of steps; each one those takes the input of the previous one.

So, we defiine a function that operates on an arbitrary line (let's call it ​x​):

const artistName = x => compose(
    strip,
    head,
    tail,
    splitOn("-")
)(x)

and then we obtain:

  • a list, using "-" as the separator (splitOn("-")),
  • the second element of that list, and
  • we strip leading and trailing whitespace from that line.

Finally, we ​map artistName​ over the ​lines​ of the input string, and ​join (​unlines)​ the resulting list using a newline character.

unlines(
    map(artistName)(
        lines(s)
    )
)

Project names.kmmacros (6.6 KB)

Expand disclosure triangle to see "Javascript" source
(() => {
    'use strict';

    // jxaContext :: IO ()
    const jxaContext = () => {
        // main :: IO ()
        const main = () => {
            const
                s = kmVar("localParameter")
            return unlines(
                map(artistName)(
                    lines(s)
                )
            )
        };

        // FUNCTIONS --
        const artistName = x => compose(
            strip,
            head,
            tail,
            splitOn("-")
        )(x)

        // GENERICS ----------------------------------------------------------------
        // JXA FOR AUTOMATION
        // kmVar :: String -> String
        const kmVar = strVariable => {
            const
                kmInst = standardAdditions().systemAttribute("KMINSTANCE"),
                kmeApp = Application("Keyboard Maestro Engine");
            return kmeApp.getvariable(strVariable, {
                instance: kmInst
            });
        };

        // standardAdditions :: () -> Application
        const standardAdditions = () =>
            Object.assign(Application.currentApplication(), {
                includeStandardAdditions: true
            });
            
        // https://github.com/RobTrew/prelude-jxa
        // JS Prelude --------------------------------------------------
        // Tuple (,) :: a -> b -> (a, b)
        const Tuple = a =>
            // A pair of values, possibly of
            // different types.
            b => ({
                type: "Tuple",
                "0": a,
                "1": b,
                length: 2,
                *[Symbol.iterator]() {
                    for (const k in this) {
                        if (!isNaN(k)) {
                            yield this[k];
                        }
                    }
                }
            });

        // compose (<<<) :: (b -> c) -> (a -> b) -> a -> c
        const compose = (...fs) =>
            // A function defined by the right-to-left
            // composition of all the functions in fs.
            fs.reduce(
                (f, g) => x => f(g(x)),
                x => x
            );

        // findIndices :: (a -> Bool) -> [a] -> [Int]
        // findIndices :: (String -> Bool) -> String -> [Int]
        const findIndices = p =>
            xs => {
                const ys = [...xs];

                return ys.flatMap(
                    (y, i) => p(y, i, ys) ? (
                        [i]
                    ) : []
                );
            };

        // head :: [a] -> a
        const head = xs =>
            // The first item (if any) in a list.
            xs.length ? (
                xs[0]
            ) : null;

        // lines :: String -> [String]
        const lines = s =>
            // A list of strings derived from a single
            // string delimited by newline and or CR.
            0 < s.length ? (
                s.split(/[\r\n]+/u)
            ) : [];

        // map :: (a -> b) -> [a] -> [b]
        const map = f =>
            // The list obtained by applying f
            // to each element of xs.
            // (The image of xs under f).
            xs => [...xs].map(f);

        // splitOn :: [a] -> [a] -> [[a]]
        // splitOn :: String -> String -> [String]
        const splitOn = pat => src =>
            // A list of the strings delimited by
            // instances of a given pattern in s.
            ("string" === typeof src) ? (
                src.split(pat)
            ) : (() => {
                const
                    lng = pat.length,
                    [a, b] = findIndices(matching(pat))(src).reduce(
                        ([x, y], i) => Tuple(
                            x.concat([src.slice(y, i)])
                        )(lng + i),
                        Tuple([])(0)
                    );

                return a.concat([src.slice(b)]);
            })();

        // strip :: String -> String
        const strip = s =>
            s.trim();

        // tail :: [a] -> [a]
        const tail = xs =>
            // A new list consisting of all
            // items of xs except the first.
            "GeneratorFunction" !== xs.constructor
            .constructor.name ? (
                0 < xs.length ? (
                    xs.slice(1)
                ) : undefined
            ) : (take(1)(xs), xs);

        // unlines :: [String] -> String
        const unlines = xs =>
            // A single string formed by the intercalation
            // of a list of strings with the newline character.
            xs.join("\n");

        // MAIN --
        return main();
    };

    return jxaContext();
})();
1 Like

Well my approach uses neither a programming language nor a regex, but simply an in-built KM facility: an array with a custom delimiter as provided by @noisneil as an example.

Here goes:

Test Automating REGEX.kmmacros (3.9 KB)

Click to see macro

2 Likes

Hey Neil,

Keyboard Maestro makes this relatively simple with its array notation for text variables:

( I see @tiffle beat me to this one... :-)

Extract Names of Albums (KM) v1.00.kmmacros (7.3 KB)

Macro Image

Keyboard Maestro Export

And Awk was born to do such things:

Extract Names of Albums (Awk) v1.00.kmmacros (6.8 KB)

Macro Image

Keyboard Maestro Export

Awk can use a regex for its field-separator, so that can easily be made more fault-tolerant.

Bread and butter for AppleScript's text item delimiters:

Extract Names of Albums (AppleScript) v1.00.kmmacros (7.1 KB)

Macro Image

Keyboard Maestro Export

Perl just for fun (and practice):

Extract Names of Albums (Perl) v1.00.kmmacros (6.9 KB)

Macro Image

Keyboard Maestro Export

Finally here's some relatively crude JavaScript for for practice (and fun):

Extract Names of Albums (JavaScript) v1.00.kmmacros (7.3 KB)

Macro Image

Keyboard Maestro Export

-Chris

3 Likes

Thanks for everyone's contributions. Very useful to have all these methods in one place. As my main concern is quick workflow, here's another idea I had:

CleanShot 2023-01-30 at 13.06.58

Can't get much quicker than that. Yes, I know there's a limit to how complex it can be, but for simple find-and-replace stuff, I think it might be quite handy.

1 Like

Without speaking a word of Awk, I do like the look of it because it's one single-line action. Can you do all these things with it?

(...as well as matching lines containing a string.)

If so, I'd be interested in making an Awk generator for these kinds of tasks, similar to the RegEx generator above. I just want a quickly accessibly text-searching toolkit.

Sorry @noisneil, I certainly wouldn't want to imply ignorance. I was misunderstanding what you were trying to do.

A load of guff, made "problematic" by the speed test. But I can't bring myself to delete it! Of interest to completionists only...

And, obviously, I was wrong. [TEST] is a regex, just a very particular one that (unless you have case sensitivity off) matches that and only that string. I now see you are, in your example, popping it between .* and .* to catch the rest of the line.

The problem's going to come when "I want to replace TEST but sometimes I typed it as TSET..." and the string you'll want to enter is T(ES|SE)T, which'll get blatted by your escaping routine. If you don't allow patterns in your user-provided search string you won't have that problem but that, combined with you having to code each search "type" ahead of time, will severely hobble the utility of this -- perhaps to the point that you're better off using another method! For example:

Arrays! Split on the - string! Grab item 2! Put that into a "For Each" action, and Robert's your mother's brother! (I see I'm late to the party with that suggestion!) Similarly with "everything before a string" (item 1 of an array split on the string) and "after a string" (item 2, unless your string occurs more than once per line). Everything around the string is a "For Each Line... Find string and replace with <nothing>, append result to new variable". And so on.

Yes, you have to use a "For Each" loop for these -- but your regex is effectively a "for each line" loop too because, under the hood, an "All matches" regex of .*TEST.* is actually (?m)^.*TEST.*$ and is going through your source one line at a time, looking for a match somewhere between the start and end of that line. The main difference is that with a "For Each" you have to build an output rather than replacing to "source".

And, shooting myself in the foot here -- a big benefit of using a regex (a simple one, at least) is execution speed. I was expecting the opposite, but for 320 lines of input my test regex spat out a result in 3 milliseconds while a "For Each" doing the same took 1003 milliseconds!

So... What should happen to leading/trailing spaces in the "before/after/around a string" and "between two strings" cases? In the latter, are the two strings the same or different? "First/last n matching" could use a regex to extract matches and then head or tail...

Well I am fairly ignorant, or should I say "inexperienced"?

Do you see what I'm getting at with the post above?

Yes -- but, as always, the devil is in the details. For example for "around a string" when the string is test, what should the output be when the input is:

Test on the first line
We shall test in the middle
And an end test
What if there's too much testosterone?
Perhaps a contested result...
Or if we test and test again?
2 Likes

[Neil stares solemnly at his shoes and whispers]

"I don't know".

4 Likes

All of these things are possible one way or another, but @Nige_S' observations in Post #25 are quite relevant.

Why don't you post some real-world examples – initial condition and desired outcome – and then we can think about solutions.

Here's a simple example. As part of a requested Auto Save macro for Ableton Live, I had to grab the window title

Project Name [Project Name]

and remove the square-bracketed text, including the brackets to get

Project Name

So, I want to match everything after the first occurrence of " [" on every line:

(?m)\s\[.*$

I can spit that out like so:

CleanShot 2023-02-01 at 15.51.02

I've added an Everything between two strings option, but the inclusion of the < character in the regex string breaks the XML.

This works: <string>(?m)b.*$</string>

This doesn't: <string>(?<=a)(.*?)(?=c)</string>

Full Broken XML Example
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
	<dict>
		<key>Action</key>
		<string>IgnoreCaseRegEx</string>
		<key>ActionName</key>
		<string>RegEx: Everything between two strings</string>
		<key>ActionUID</key>
		<integer>13435524</integer>
		<key>Captures</key>
		<array>
			<string>Local__Output</string>
		</array>
		<key>MacroActionType</key>
		<string>SearchRegEx</string>
		<key>Search</key>
		<string>(?<=a)(.*?)(?=c)</string>
		<key>Source</key>
		<string>Variable</string>
		<key>Variable</key>
		<string>Local__Input</string>
	</dict>
</array>
</plist>

Is there a way around that?

Replace < with &lt;

You might also have problems with:
" -- replace with &quot;
' -- replace with &apos;
> -- replace with &gt;
& -- replace with &amp;

(all untested).

1 Like

Perfect! Thanks @Nige_S!

@tiffle I know it's taken me a while to come around to this, but I think you're on to something with the subroutine idea!

Here's what I've got so far:

RegEx Macros.kmmacros (84.5 KB)

Subroutine Caller Actions.zip (4.9 KB)

If you think you can improve on or add to these, I'd love to see what you come up with.

Hi Neil - examined this with interest and found the following:

  1. I loaded up your macros and created a testing macro using your caller actions zip and found that none of your subroutines work! Here's the testing macro:

Download Macro(s): Testing Regex Subroutines.kmmacros (29 KB)

Macro-Image

Macro-Notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System Information
  • macOS 10.14.6
  • Keyboard Maestro v10.2

To perform the testing I just select the appropriate group and TRY it. I'd offer a solution but I don't have time right now. Maybe I'm doing something wrong or I don't understand what is supposed to happen?

  1. Whenever I see myself inserting the same KM actions over and over in several macros I think that it might be worth turning those actions into a subroutine. I see a bunch of actions that appear at least once in each of your subroutines; here they are:

image

So I've taken the liberty of turning them into a subroutine for you that looks like this:

image

Here's the downloadable version:

Download Macro(s): [SUB] Escape Regex String.kmmacros (17 KB)

Macro-Image

Keyboard Maestro Export

Macro-Notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System Information
  • macOS 10.14.6
  • Keyboard Maestro v10.2

and you can use it to replace the 5 occurrences of that bunch of actions. The advantage of doing that is (and I'm sorry if I'm teaching granny to suck eggs) that (a) once you've tested the subroutine you can be sure it will always work; (b) if you need to change the subroutine in future (like add error checking or an extra "escaping" action for example) you need do it in only one place and not the 5; and (c) it reduces the overall count of actions used.

1 Like

See this is where it all falls down on my RegEx newbism. The reason I'm interested in this is that I'm dumbfounded by RegEx, but unfortunately it also means I can't fully incorporate it into these macros with any degree of competence. :joy: That said, I do think this idea has potential.

For example, the Everything BEFORE String subroutine does 'work' (sort of) in that it does what I told it to do. It removes everything after a string, if the string is found on that line.

<string>IgnoreCaseRegEx</string>
...becomes...
<string>IgnoreCaseRegEx

Now of course, this is where my incompetence comes into play as I now realise this subroutine should be called "Remove Everything After a String On Each Line If That String Is Found" instead. :man_facepalming:t2:

Replace LINES Containing String for some reason isn't receiving Local__Replace With. I'm very confused by this. It's just blank. To be fair, I've never really used Subroutines, so I may be missing something obvious.

RegEx: Everything AFTER String doesn't work and I'm not sure why.

RegEx: Return Everything BETWEEN Two Strings seems to work fine. :man_shrugging:t2:

NB: I did have to reconnect the callers to the subs when I imported your test macro, but presumably, they're all calling the appropriate things on your end...?

Very kind! I'll add that to the group! It's certainly more efficient, but I've never got into the habit of using subs when working on something I might share on the forum, as I prefer everything to be self-contained. I might change that mode of thinking...

Nice reply Neil. I’ll take a closer look later on unless someone else beats me to it!

BTW - don’t get discouraged! We all benefit from this stuff ‘cos we’re learning new things all the time! And that is never a bad thing!

1 Like