How To Do These RegEx Replacements on the Clipboard?

ALYB · August 4, 2022, 6:37am

I'm having problems to use regular expressions in Find an Replace actions on the clipboard.

Here's the crucial part of my macro:

Find and Replace in current segment - test.kmmacros (4.0 KB)

This is an input string:

1.schroef 2.moer 3.slang 4.pomp 5.motor 6.ketting 7.hendel 8.behuizing 9.vulopening 10.deksel

And this is the expected result with initial uppercase:

1. Schroef 2. Moer 3. Slang 4. Pomp 5. Motor 6. Ketting 7. Hendel 8. Behuizing 9. Vulopening 10. Deksel

And a second expected result with added linefeeds:


1. Schroef 
2. Moer 
3. Slang 
4. Pomp 
5. Motor 
6. Ketting 
7. Hendel 
8. Behuizing 
9. Vulopening 
10. Deksel

In BBEdit I use:

And:

In Keyboard Maestro I have to use $1 instead of \1, but I don't get the desired result.

What am I doing wrong here?

Nige_S · August 4, 2022, 8:08am

The problem isn't the regex, it's in how your expressions are getting parsed from the text variable into the action's regex fields. So hard-coding works as expected:

I don't have an answer, but at least that'll stop you chasing the wrong problem!

ComplexPoint · August 4, 2022, 9:44am

FWIW:

Initial caps for enumerated words.kmmacros (2.7 KB)

Expand disclosure triangle to view JS Source

(() => {
    "use strict";

    const main = () => {
        const
            s = Application("Keyboard Maestro Engine")
            .getvariable("enumeratedWords");

        return s.split(/\s(?=\d)/u).map(nw => {
            const [n, w] = nw.split(".");

            return `${n}. ${toSentence(w)}`;
        })
        .join("\n");
    };

    // --------------------- GENERIC ---------------------

    // toSentence :: String -> String
    const toSentence = s =>
    // Sentence case - initial char capitalized
    // and rest lowercase.
        Boolean(s.length) ? (
            s[0].toUpperCase() + s.slice(1)
            .toLowerCase()
        ) : s;


    return main();
})();

tiffle · August 4, 2022, 10:43am

As @Nige_S says, it's the way KM handles what you're entering into your Prompt for User Input action that's the problem with the case-changing action.

This is what KM presents to your Search and Replace action:

= local__Find =
\.(w)

= local__Replace =
. $1

So obviously it won't perform the capitalisation that you want.

In any case, KM provides an easy way to do that capitalisation simply by using the Filter action like this:

KM 0 2022-08-04_11-40-45

which transforms your clipboard like this:

1.schroef 2.moer 3.slang 4.pomp 5.motor 6.ketting 7.hendel 8.behuizing 9.vulopening 10.deksel

1.Schroef 2.Moer 3.Slang 4.Pomp 5.Motor 6.Ketting 7.Hendel 8.Behuizing 9.Vulopening 10.Deksel

I think this method is somewhat simpler than most any other approach.

@peternlewis - is this a bug in the Prompt for User Input action?

Typing ". \u$1" into a text field results in ". $1" being passed into the variable.

EDIT: Or is it because (as I've just read in the KM manual) that the text field by default is a "token text field" and so, since "\u" is a text token KM interprets it before passing it to the variable? If that's the reason, can a gear menu option be added to Process text Normally/Tokens Only/Nothing as in the Set Variable to Text action?

Nige_S · August 4, 2022, 2:26pm

Or is it the \ characters, also special in KM text fields -- which IMO is why the search is failing, so $1 isn't populated...

There are just too many possible problems trying use special characters in a prompt to fill a variable to use in a regex. I think @ComplexPoint point's approach is better -- if you want a user-defined regex search and replace, hand it off.

Since using AppleScript means we can leverage KME's regex engine, with all the goodies we know and love (and nothing to do with me knowing zero JavaScript!) -- here's my go at it:

Find and Replace in current segment - AS.kmmacros (4.4 KB)

Image

Works with OP's text/requirements, but I haven't tried many other regexs -- so test thoroughly!

ComplexPoint · August 4, 2022, 7:43pm

Even without JavaScript, I still feel that divide and conquer (split and conquer) can simplify:

Splitting on space and digit using a For Each action with substrings separated by matches on \s(?=\d)
splitting on the dot between each number and word using indexed arrays with custom delimiters

Initial caps for enumerated words by KM SPLITS.kmmacros (5.2 KB)

Nige_S · August 4, 2022, 8:01pm

Yes, but that's for this particular problem. I'm reading between the lines here, but I think @ALYB ultimately wants to:

Copy some text to the clipboard
View it
Decide on and write, on the fly, a search-and-replace regex
Use the prompt to get KM to do that S&R on the clipboard contents
Paste the changed contents

I'm normally munging known formats in a consistent way, and would use your split approach. But I think OP wants to free-style, else why use a dialog?

tiffle · August 4, 2022, 10:04pm

No - there's something else going on. If you pre-fill the prompt action with the default values like this:

KM 0 2022-08-04_22-48-46

Then when you run the macro the prompt as presented to the user looks this way:

KM 1 2022-08-04_22-51-04

You can see the Find field is OK but the Replace field has been changed by KM.

If then you then re-type the Find field thus:

KM 2 2022-08-04_23-13-36

and then click OK, the macro then produces this as output:

1. \u$1chroef 2. \u$1oer 3. \u$1lang 4. \u$1omp 5. \u$1otor 6. \u$1etting 7. \u$1endel 8. \u$1ehuizing 9. \u$1ulopening 10. \u$1eksel

which is obviously nonsense.

KM is processing the text in the field as token text - as stated in the manual - but it's doing this prior to the display of the dialog. After the dialog is OK'd, for some reason what is then passed to the Search & Replace action,

= local__Find =
\.(\w)

= local__Replace =
. \u$1

although correct from a regex viewpoint, is not recognised properly and so the search/replace doesn't work. So I still think I'd like @peternlewis's perspective on what's actually going on!

Nige_S · August 4, 2022, 10:46pm

Yes, but you'll also notice the search didn't work properly -- the whole word after the full stop should be the first capture group, but it's actually find/replacing the period and the next character only.

I talk a load of rubbish sometimes. \w is a word character, not word. Right track, wrong reason...

What I haven't got my head round is when/if backslashes are backslashes in the various fields involved, and my feeling is that this is at the heart of the problem. My knee-jerk was that "if we want to pass a literal \u to the Replace we should use \\u in a text field", but that didn't seem to work.

It's almost like text tokens aren't processed in the Search/Replace fields -- because hard-coding works literally -- unless they are inside a variable token...

At which point my brain exploded and I turned to the AppleScript workaround! What's weird is that's doing almost the same thing -- taking variables from KME then feeding them back to the KME regex engine -- but with an explicit process tokens argument.

tiffle · August 4, 2022, 11:07pm

Yes - I have noticed that now you've spelled it out for me - thanks!

Ha! I just tried that out before reading your post. Could've saved me some time if I'd waited a bit...

I think your AppleScript workaround gets as close to what the OP wanted so that's great

Meanwhile - the land of Nod calls...

thoffman666 · August 5, 2022, 12:32am

Did you try changing the type for the prompt variables from Automatic to Text (using the dropdown)?

peternlewis · August 5, 2022, 3:31am

The default values for the Prompt for User Input action are parsing the tokens. So the \u gets processed, because \u means something to Keyboard Maestro. The \. and \w do not mean anything so they are left alone. In all cases they would be better written with the \ doubled.

The search and replace fields are also processed for text tokens, so the variables are expanded in both fields.

However, the search engine then processes the first field, so things like . and \w mean something to the search engine.

The replacement field is also processed, and the variable is expanded, but after that the processing is complete. No further processing happens, so the . \u$1 does not mean anything. Just as if the variable contains the text %Variable%Whatever% or any other token, it would not be recursively expanded.

There isn't an easy way to get Keyboard Maestro to process the tokens twice in the Search and Replace action replacement string. One option is to use AppleScript to ask Keyboard Maestro to do the search and replace. However since the process tokens option applies to both the search and the replace field, and you don't really want the search field to be processed, it does mean you have to quote the search field (there should be a Filter for that, but there isn't, but it simply means doubling the % and \ characters so it is pretty straight forward (though again, not that you have to double the each of the characters in the Search & Replace so they are correctly processed.

Quoting it hard and understanding how and when the quoting and processing happens is important to really understanding what is happening.

Find and Replace in current segment - test.kmmacros (6.5 KB)

tiffle · August 5, 2022, 6:51am

Yes but with no effect.

tiffle · August 5, 2022, 6:57am

Thanks for the explanation Peter. This is going to take a while to digest…

Nige_S · August 5, 2022, 10:47am

Time for some strong coffee and a careful read...

Echoing @tiffle -- thanks for the thorough explanation, @peternlewis. Above and beyond, as always.

ComplexPoint · August 5, 2022, 12:19pm

And where the search and replace model tends to overload and inflate the complexity of regexes (and the escaping decisions which they entail),
the split model eases the load on regular expressions, and lets them shrink and simplify.

Keyboard Maestro gives us:

single-character splits with indexed arrays and custom delimiters,
multi-character splits with For Each (substrings separated by matches),
and fully flexible splits through a scripting language.

ALYB · August 18, 2022, 6:41am

Thank you very much, @Nige_S! I have used your solution and came up with this variant of the dialogue box:

I put in some reminders, will enhance them when I create new replacements that I want to remember.

Dialogue box:

Executed replacement:

And of course there many simple replacements: ö>oe, ß>ss etc.

One way to quickly insert alternative Find and Replace strings would be via TextExpander:

Or does someone know an easy way to do this via Keyboard Maestro?

Demo:

And:

Nige_S · August 18, 2022, 8:03am

Do you mean "use an abbreviation in the dialog that is then expanded to a regex pattern"? Easiest way would be "Switch/Case" actions to determine what to -- if using the AS macro, what to set the Local__Find and Local__Replace variables to before executing the AppleScript. So if "n" was your abbreviation for "one or more numbers" and "w" was "one or more word characters":

Switch
    Case -- Local__Find is "n"
        Set variable Local__Find to "(\d)+"
    Case -- Local__Find is "w"
        Set variable Local__Find to "(\w)+"
    Otherwise
        Comment -- do nothing using the entered string as your Find pattern
End Switch

How To Do These RegEx Replacements on the Clipboard?

Options