Keyboard Maestro Regular Expression Bug? Trim Whitespace from Start and End of a String

@ccstone I'm intrigued that the look-behind assertion in

gets around the situation described by Peter:

because (?<=\S) doesn't really match anything. But I guess it does. So I was playing around with other ways to avoid the empty-string pattern. Here's a solution with a Find+Replace keystroke count of 17.

Find:

\A\s*|(.)\s*\z

Replace:

\1•

Haven't tried with Billie Holiday, Gene Krupa, or Duke Ellington, but works on your example. Also works with no trailing white space.

Steve

Nice.

That's much more efficient than capturing the whole body text, and from my (albeit limited) understanding of regex engines more efficient than the lookbehind.

I think I'd change the captured metacharacter just to make it read easier (for me).

\A\s*|(\S)\s*\z

-Chris

@ccstone But…but…but that's an extra keystroke! :open_mouth: :slightly_smiling_face:

And in a real-world case, wouldn't you use Search/Replace to simply trim the whitespace from both ends (replace \A\s*|\s*\z with null) and then "manually" add the bullets to the ends of the resulting string? Seems that combining the two modifications into a single Search/Replace operation is great intellectual exercise but perhaps not great procedural design.

2 Likes

I'm just adding the bullets for visual confirmation of the replace.

Keyboard Maestro trims text in Display Text in a Window actions, so you have to be careful when dealing with whitespace.

You can get around this by adding quotes or other characters to the text token you're displaying:

'%Variable%LocalOutput%'

But I'm testing in:

  • BBEdit with its native find/replace.
  • BBEdit with Perl.
  • Keyboard Maestro

So I'm looking for uniformity when testing all varieties of leading and trailing whitespace.

-Chris

@ccstone But (not trying to be difficult here), if what's being tested is uniformity of whitespace truncation in various implementations of Search/Replace (and what an interesting challenge!), all the more reason to use a separate mechanism (other than Search/Replace) to visually demarcate the result. No?

Steve

Actually, I wrote the regex I posted because search and replace seemed to directly address the specific problem: Search for a possible bit of leading white space, followed by ANYTHING (which can include internal whitespace), followed by a possible bit of trailing white space, then replace all of that with everything except the leading and trailing white space. In other words, \A\s*([\s\S]*?)\s*\z (by virtue of its capture group) means "choose absolutely everything, then discard only white space at either the very beginning or very end of the document, if any exists, and keep everything else".

In case it is not clear to all, especiallly @peternlewis , the real issue is that the KM RegEx Replace Action ALWAYS does a GLOBAL search and replace. Every other RegEx tool I have ever used allows for Search/Replace of ONLY the first match.

Request

@peternlewis, please provide an option in the Search/Replace Action to apply it to ONLY the first match.

Thanks.

That’s AN issue; it’s not the THE REAL issue.

1 Like

Another point well taken, @JMichaelTX, which I hadn't fully grokked until your comment. But in the context of @ccstone's original Search pattern \A\s*|\s*\z, I don't think your request would help. The pattern is of the form "X|Y", and it would only work as intended if the Y half were matched after the X half succeeded. That is, the pattern presumed a global search/replace. So in this particular case, the problem isn't that a global search/replace was being done, but that the global search for Y matches in 2 locations rather than 1, where there's some debate as to whether that second location is legit.

Steve

1 Like

@thoffman666 Your reply landed while I was typing mine. Did I capture what you had in mind, or are you thinking of yet another issue?

Steve

I agree with your reply, @SLWorona . More generically, I was just pointing out that @JMichaelTX 's complaint about KM's flavor of regex, while valid, is not really what we have been talking about in this thread.

BTW, I just completely coinicidently ran in to exactly the same issue with BBEdit.

Source:

hello
there

Regex search (.*) and replace with \1: “\1”

Result:

hello: “hello”: “”
there: “there”: “”

Using (.+) resolves the issue.

So this is a fairly generic issue revolving around matching empty strings.

Actually it fully relates to the issue Chris @ccstone is posting.
Chris tried to provide an alternate Regex pattern.

However, this Regex pattern always works WHEN you do NOT do a GLOBAL search/replace:

SEARCH FOR:
\R*$

Example Results from Regex101.com with GLOBAL flag turned OFF:

See Regex101.com Example

image

KM Test Macro that FAILS because it always does a GLOBAL search/replace:

Example Results From KM

image

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MACRO:   Make sure text ends with LF

-~~~ VER: 1.0    2021-06-17 ~~~
Requires: KM 8.2.4+   macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Make sure text ends with LF.kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.


Sorry, Peter, but this is incorrect.
BBEdit always does a GLOBAL search/replace, and has the "multi-line" mode ON.

So, if you turn OFF "multi-line", and only do a find next and replace, it works as expected:

Demo_BBEdit-Replace-AN

Sorry, what?

I just did exactly what I said, so it is not incorrect.

If you do a global search and replace, as I did, to transform words exactly as I wanted, I get the same bogus behaviour.

I don't care if it has other modes in other ways o operating, the fact is it behaves in exactly the same bogus way when doing a global search and replace on the document.

Adding (?-m) makes no difference in this case.

Clicking Find, and the Replace & Find over and over again works properly, but clicking Replace All shows exactly the same bogus handling of empty matches as seen in the original post.

You are totally missing the point. If you do NOT do a global/search/replace, it works fine everywhere.
So, the point is that the KM Search/Replace needs to have a "First Match" option.

I am not missing your point - I am ignoring your point as irrelevant to the discussion.

In the OPs case and in my case, the goal is a global search & replace.

It does not work if you don't do a global search & replace because it would only replace the first line (or in the OPs case, the start of the text).

You are arguing that a bogus behaviour that causes multiple replacements doesn't happen if you don't do multiple replacements, which is obvious but not helpful since the purpose in both cases is to do a global search & replace.

Even if I add a switch to the Search & Replace action to do only a single replacement, it would not have resolved the OPs problem, nor would it have resolved my problem, so this discussion is in no way an argument in favour of adding such a switch.

I don't think that is correct:

The issue is the replacement at the END of the string.
When you have GLOBAL turn on, it fails.
When GLOBAL is OFF, it works.

I know this because Chris @ccstone and I discussed and tested this privately.
It is unfortunately that the use case he presented at the top does not make that clear.

So, rather than belabout this any further, I will post a new topic with a new use case that clearly illustrates the issue.

BTW, Peter: You seem to be using the RegEx engine provided by AppKit ("TRE"). I used that in Find Any File initially, but ran into two issues:

  1. It tends to crash, especially on binary data.
  2. It does not support many advanced regex expr, such as "(?!^ABC$)".

I resolved this all by using PCRE2 instead. Had to build it and include as a lib, but since then I had not a single crash any more related to regex use.

2 Likes

Well, I think there is a role for a regular expression here, but perhaps we only need a simple one ?

[\r\n]+
Expand disclosure triangle to view JS Source
(() => {
    "use strict";

    const
        txt = Application("Keyboard Maestro Engine")
        .getvariable("testDataStr");

    const main = () =>
        unlines(
            lines(txt).flatMap(x => {
                const trimmed = x.trim();

                return trimmed ? (
                    [trimmed]
                ) : [];
            })
        );


    // --------------------- GENERIC ---------------------

    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single
        // string delimited by newline and or CR.
        0 < s.length ? (
            s.split(/[\r\n]+/u)
        ) : [];

    // unlines :: [String] -> String
    const unlines = xs =>
        // A single string formed by the intercalation
        // of a list of strings with the newline character.
        xs.join("\n");

    return main();
})();