Automating REGEX

Regex. I'm hopeless at it, and I don't think I'm alone. Googling for simple solutions to regex problems are often infuriating because any answers you do find may not work with Keyboard Maestro's regex flavour, as if the whole thing wasn't nebulous enough!

So, I wondered if perhaps we could automate some of the more simple and common regex tasks.

In this example, we want to replace every line in a variable that contains a certain string. I've used the Keyboard Maestro User Manual contents page as an example input. Following that is a group where we can specify:

  • A string to search the input variable for.
  • Something to replace each line that contains that string with.

Next up are three actions that automate the regex required to complete this task, and finally an action that displays the output.

EXAMPLE - Replace Lines Containing....kmmacros (24 KB)

Macro screenshot

Disclaimer: Did I mention I'm hopeless at regex? I may have made some kind of shortsighted error that means this won't work under all circumstances. Or maybe I nailed it...? I'm not qualified to say. However, my point is that the action group can be saved as a favourite and recalled whenever you need to do this frustratingly simple task. Anyone confident with making KM plugins could make it look all neat and pretty like a single native action.

So, if you think this is a good/bad idea, let me know. Better yet, if you want to make another one, please post it below!

3 Likes

You haven't made an error but perhaps you've been a bit short-sighted by not implementing this as a KM Subroutine.

This is my take on the subroutine approach:

[SUB] Replace Lines Containing.kmmacros (22 KB)

Click to see macro

While the macro to test it out looks like this:

Test [SUB] Replace Lines Containing.kmmacros (22 KB)

Click to see macro

YMMV!

1 Like

The example was just to show the logic. Auto-escaping special characters etc. Calling a Subroutine is a great way to use it, of course. What I'm mostly interested in is how much do you think can regex be automated?

Of course, this is not a solution to your problem, but I'll share my two cents. From the time I've started learning regex, I was constantly frustrated with it. That's why I rarely use it these days.

I would suggest @complexpoint prelude-jxa library:

RobTrew/prelude-jxa: Generic functions for macOS and iOS scripting in Javascript – function names as in Hoogle

It is a joy to use for string processing tasks. Since it is written in JavaScript, and there is a RegExp object, you can get the best of both worlds. And, debugging is much easier using a set of composable bricks, like Rob's library has.

For example, to obtain a list of string from a newline separated string:

assuming

str = "Hello\nWorld"

the function application

lines(str)

returns

["Hello", "World"]

And if you want to uppercase the resulting list, you would lift the function toUpper to a function that operates on lists:

map(toUpper)

And apply that to the output of lines(str)

map(toUpper)(
  lines(str)  
)

to obtain

["HELLO", "WORLD"]

and joining that list back into a newline separated string using function unlines

unlines(
  map(toUpper)(
    lines(str)  
  )  
)

to obtain

"HELLO\nWORLD"
1 Like

We would have to paste those four functions. But, Rob shared a macro to select from KM dialog in the past to automate the process. In full, the code could be written, with a self-invoking anonymous function, as:

(() => {
    'use strict';

    const main = () => {
        const
            str = "Hello\nWorld";
        return unlines(
            map(toUpper)(
                lines(str)
            )
        )
    };

    // GENERICS ----------------------------------------------------------------
    // https://github.com/RobTrew/prelude-jxa
    // JS Prelude --------------------------------------------------
    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single
        // string delimited by newline and or CR.
        0 < s.length ? (
            s.split(/[\r\n]+/u)
        ) : [];

    // map :: (a -> b) -> [a] -> [b]
    const map = f =>
        // The list obtained by applying f
        // to each element of xs.
        // (The image of xs under f).
        xs => [...xs].map(f);

    // toUpper :: String -> String
    const toUpper = s =>
        s.toLocaleUpperCase();

    // unlines :: [String] -> String
    const unlines = xs =>
        // A single string formed by the intercalation
        // of a list of strings with the newline character.
        xs.join("\n");

    return main();
})()

I think the problem is that you'd need to know enough about regex to be able to write a search string that wasn't "misinterpreted". For example, if you input [TEST] you'd need some way to differentiate between "search for the string [TEST]" and "search for one of T, E, or S". You'd not only need to know what may or may not need to be escaped in which situation, you'd have to escape it -- at which point you could probably write the regex anyway!

2 Likes

I just tested with [TEST] and it found it as a string, because this particular automation escapes all special characters. I do understand that some things are going to be far too complex, but wouldn't it be nice to have a few basic search and replace functions available without having to know any regex? Perhaps they could all be limited to pure strings to get things started?

Everything before/after/around a string
Everything between two strings
First/last n matching strings
Top/bottom n lines

Do you think these might be doable?

2 Likes

And because it escapes all special characters it isn't actually a regex -- you might as well use a simple "contains" string search.

As soon as you start to use specials you need a way to differentiate between their literal and special use (or ban their literal use completely).

For these, I think a dialog would be the best approach. With fixed choices the task is much simpler as you'd inserting the search and replace string(s) into one of your preset patterns.

Although for "Top/bottom n lines" I'd be using "Execute Shell Script" with head and tail rather than regex :wink:

1 Like

My take on this very interesting idea is that regex is a red herring. How the tasks get done doesn't really matter if the task can be done efficiently and all I have to do is supply some parameters. (That's where @tiffle's mention of subroutines is the key I think.)

For me the best way to do this in Keyboard Maestro is to make custom subroutines where the bits of data get fed into the subroutine. Once I have a subroutine that works and all I have to do is supply the parameters it doesn't really matter if the subroutine is using regex, or AppleScript, some other method, or Keyboard Maestro native actions to do the processing. I would then make the Action that calls the subroutine into a Favorite - which ends up being almost the same as having a custom Action to do the task.

That's my take on this! But your idea has inspired my to make a series of subroutines to do common text tasks :grinning:

1 Like

Excuse my ignorance. I'm just trying to find a quick way to match lines containing a string. The quickest way (fewest actions) I know is searching using a regular expression, but coming up with the expression itself and escaping characters is a pain.

I agree, although one thing I considered was just a means of generating the correct (don't call it regex Neil!) regex to put in a Search and Replace action. That way, you don't need five or six actions or a subroutine.

1 Like

Inspired by your original Macro, I tried doing the same thing with Keyboard Maestro native actions. For me (completely useless at regex) I would find easier to understand and edit going forward - and could be made into a subroutine where the variables get fed in from a single call a subroutine action.

EXAMPLE - Replace Lines Containing... KM Native.kmmacros (6.3 KB)

Click to Show Image of Macro

1 Like

Yup. I've done the same thing before as I'm easily confused by regex.

Replacing a whole line is easy enough but how about replacing everything on that line before/after the string? I wouldn't know how to do that without regex. I mean, I don't really know how to do it with regex either, but you know what I mean. :sweat_smile:

1 Like

Yes, the idea in your first post was being able to plug variables into lines of Regex, which is a really powerful thought. And if the processing is best done with Regex then being able to make something that someone with no Regex knowledge can use.

But (Devil's advocate) as soon as the task changes even slightly... and judging by the number of different questions about these kind of tasks on this forum, the tasks always seems to be slightly different... :rofl:

1 Like

Could you give a couple of examples about these ?

I tend to find need of things like this when I have a big dump of data I want to extract something particular from. Let's say I have a variable list like this:

01 Album Title - Morning Dew - Final Master
02 Rough Demo - Lazy Sunday - Work In Progress - v3
03 Project Name - Round The Houses - Premix for Master
...etc...

If I want to extract the names of the projects, it seems to me that the simplest way to go about it would be to match whatever is between the first and second "-" characters. I can't think of a way to do this other than using regex. (This is a totally made up example for the sake of simplicity.)

1 Like

Thank you for the example.

Late here, I will give some options tomorrow.

I hate to say this but what I find most difficult is writing the regular expression in the first place. In my Windows days I had a brief dalliance with RegexMagic which sometimes helped out with things. Now that's what I'd call automating the regex!!

1 Like

I'll give you this full JS example for you to play, but I think I would use a parser combinator for this task.

Instead of using regex, you could see it as a sequence of steps; each one those takes the input of the previous one.

So, we defiine a function that operates on an arbitrary line (let's call it ​x​):

const artistName = x => compose(
    strip,
    head,
    tail,
    splitOn("-")
)(x)

and then we obtain:

  • a list, using "-" as the separator (splitOn("-")),
  • the second element of that list, and
  • we strip leading and trailing whitespace from that line.

Finally, we ​map artistName​ over the ​lines​ of the input string, and ​join (​unlines)​ the resulting list using a newline character.

unlines(
    map(artistName)(
        lines(s)
    )
)

Project names.kmmacros (6.6 KB)

Expand disclosure triangle to see "Javascript" source
(() => {
    'use strict';

    // jxaContext :: IO ()
    const jxaContext = () => {
        // main :: IO ()
        const main = () => {
            const
                s = kmVar("localParameter")
            return unlines(
                map(artistName)(
                    lines(s)
                )
            )
        };

        // FUNCTIONS --
        const artistName = x => compose(
            strip,
            head,
            tail,
            splitOn("-")
        )(x)

        // GENERICS ----------------------------------------------------------------
        // JXA FOR AUTOMATION
        // kmVar :: String -> String
        const kmVar = strVariable => {
            const
                kmInst = standardAdditions().systemAttribute("KMINSTANCE"),
                kmeApp = Application("Keyboard Maestro Engine");
            return kmeApp.getvariable(strVariable, {
                instance: kmInst
            });
        };

        // standardAdditions :: () -> Application
        const standardAdditions = () =>
            Object.assign(Application.currentApplication(), {
                includeStandardAdditions: true
            });
            
        // https://github.com/RobTrew/prelude-jxa
        // JS Prelude --------------------------------------------------
        // Tuple (,) :: a -> b -> (a, b)
        const Tuple = a =>
            // A pair of values, possibly of
            // different types.
            b => ({
                type: "Tuple",
                "0": a,
                "1": b,
                length: 2,
                *[Symbol.iterator]() {
                    for (const k in this) {
                        if (!isNaN(k)) {
                            yield this[k];
                        }
                    }
                }
            });

        // compose (<<<) :: (b -> c) -> (a -> b) -> a -> c
        const compose = (...fs) =>
            // A function defined by the right-to-left
            // composition of all the functions in fs.
            fs.reduce(
                (f, g) => x => f(g(x)),
                x => x
            );

        // findIndices :: (a -> Bool) -> [a] -> [Int]
        // findIndices :: (String -> Bool) -> String -> [Int]
        const findIndices = p =>
            xs => {
                const ys = [...xs];

                return ys.flatMap(
                    (y, i) => p(y, i, ys) ? (
                        [i]
                    ) : []
                );
            };

        // head :: [a] -> a
        const head = xs =>
            // The first item (if any) in a list.
            xs.length ? (
                xs[0]
            ) : null;

        // lines :: String -> [String]
        const lines = s =>
            // A list of strings derived from a single
            // string delimited by newline and or CR.
            0 < s.length ? (
                s.split(/[\r\n]+/u)
            ) : [];

        // map :: (a -> b) -> [a] -> [b]
        const map = f =>
            // The list obtained by applying f
            // to each element of xs.
            // (The image of xs under f).
            xs => [...xs].map(f);

        // splitOn :: [a] -> [a] -> [[a]]
        // splitOn :: String -> String -> [String]
        const splitOn = pat => src =>
            // A list of the strings delimited by
            // instances of a given pattern in s.
            ("string" === typeof src) ? (
                src.split(pat)
            ) : (() => {
                const
                    lng = pat.length,
                    [a, b] = findIndices(matching(pat))(src).reduce(
                        ([x, y], i) => Tuple(
                            x.concat([src.slice(y, i)])
                        )(lng + i),
                        Tuple([])(0)
                    );

                return a.concat([src.slice(b)]);
            })();

        // strip :: String -> String
        const strip = s =>
            s.trim();

        // tail :: [a] -> [a]
        const tail = xs =>
            // A new list consisting of all
            // items of xs except the first.
            "GeneratorFunction" !== xs.constructor
            .constructor.name ? (
                0 < xs.length ? (
                    xs.slice(1)
                ) : undefined
            ) : (take(1)(xs), xs);

        // unlines :: [String] -> String
        const unlines = xs =>
            // A single string formed by the intercalation
            // of a list of strings with the newline character.
            xs.join("\n");

        // MAIN --
        return main();
    };

    return jxaContext();
})();
1 Like

Well my approach uses neither a programming language nor a regex, but simply an in-built KM facility: an array with a custom delimiter as provided by @noisneil as an example.

Here goes:

Test Automating REGEX.kmmacros (3.9 KB)

Click to see macro

2 Likes

Hey Neil,

Keyboard Maestro makes this relatively simple with its array notation for text variables:

( I see @tiffle beat me to this one... :-)

Extract Names of Albums (KM) v1.00.kmmacros (7.3 KB)

Macro Image

Keyboard Maestro Export

And Awk was born to do such things:

Extract Names of Albums (Awk) v1.00.kmmacros (6.8 KB)

Macro Image

Keyboard Maestro Export

Awk can use a regex for its field-separator, so that can easily be made more fault-tolerant.

Bread and butter for AppleScript's text item delimiters:

Extract Names of Albums (AppleScript) v1.00.kmmacros (7.1 KB)

Macro Image

Keyboard Maestro Export

Perl just for fun (and practice):

Extract Names of Albums (Perl) v1.00.kmmacros (6.9 KB)

Macro Image

Keyboard Maestro Export

Finally here's some relatively crude JavaScript for for practice (and fun):

Extract Names of Albums (JavaScript) v1.00.kmmacros (7.3 KB)

Macro Image

Keyboard Maestro Export

-Chris

3 Likes