Keyboard Maestro Regular Expression Bug? Trim Whitespace from Start and End of a String

Hey @peternlewis ,

This behavior is quite unexpected.

Why am I getting 2 bullets at the end of the output string?

Is it a bug or a feature?

-Chris

image

RegEx Test ⇢ String ⇢ KM Find and Replace RegEx.kmmacros (7.8 KB)

Hi, @ccstone

Sure seems like a bug.

In the interest of being thorough, I tested with lots of different singing groups and bands, including some from the 30s and 40s. Same result.

Same result with "xyz ", by the way. And with the simple regex "\s*\z".

Steve

1 Like

It appears to be an issue with the NSRegularExpression enumerateMatchesInString API.

Basically, if you search for an regex that can match 0 or more characters, it will always find an additional empty match at the end of the string.

I agree it is not desirable, and will work around it in the future, but it is not technically incorrect. \s*\z matches at the end all of the returns at the end of the string, but then it matches again nothing at the end of the string.

I will special case the enumeration and stop as soon as it matches to the end of the string.

In the mean time, you can append a \n to the end of the string (unless you know there will always be at least one) and then use \s+\z

1 Like

Even Perl does this...

#!/usr/bin/env perl -0777 -nsw

# ------------------------------
# ERROR – two bullets at end.
# ------------------------------

s!\A\s*|\s*\z!•!g;

# OR

# s!^\s*|\s*$!•!g;

# ------------------------------

# Workaround 1 (simplest).
# $_ = $_ . "\n";
# s!\A\s*|\s+\z!•!g;

# Workaround 2
# s!\A\s*!•!;
# s!\s*\z!•!;

print;

BBEdit does NOT do this, but I don't know if that's a function of PCRE or if Rich has worked around the issue.

-Chris

1 Like

This seems to work.

Actions (v9.2)

Keyboard Maestro Actions.kmactions (1.6 KB)

1 Like

Hey Ty,

It does – well done!

So does this:

Find:

(?s)\A\s*(.+?)\s*\z

Replace:

•\1•

And this:

Find:

\A\s*|(?<=\S)\s*\z

Replace:

:sunglasses:

-Chris

1 Like

I like the lookbehind option, Chris. You beat me by two keystrokes on the find and two keystrokes on the replace. Hopefully, this will spur readers to investigate the neatoness of regex.

Ty

1 Like

In the interest of @thoffman666's "investigate the neatoness of regex" :slightly_smiling_face:, is there some reason everyone's avoiding ^ and $, as in:

(?s)^\s*(.+?)\s*$

Fewer keystrokes, seems to work.

2 Likes

Hey Steve,

Not really, except that multiline has to be off.

\A … \z are explicitly start-of-string and end-of-string, and you can tell this at a glance if you know.

Keyboard Maestro defaults to multi-line off, but I frequently forget this and have to stub my toe before I remember.

Your pattern works fine, but for me to tell at a glance what it's doing I'd write it like this:

(?s-m)^\s*(.+?)\s*$

Even though this is not required in Keyboard Maestro.

Making regex more readable is a good practice.  :sunglasses:

-Chris

@ccstone Notwithstanding how jarring it is to see "regex" and "readable" in close proximity, point taken.

Steve

1 Like

Don't forget the shock absorber – err adverb...  :sunglasses:

@ccstone I'm intrigued that the look-behind assertion in

gets around the situation described by Peter:

because (?<=\S) doesn't really match anything. But I guess it does. So I was playing around with other ways to avoid the empty-string pattern. Here's a solution with a Find+Replace keystroke count of 17.

Find:

\A\s*|(.)\s*\z

Replace:

\1•

Haven't tried with Billie Holiday, Gene Krupa, or Duke Ellington, but works on your example. Also works with no trailing white space.

Steve

Nice.

That's much more efficient than capturing the whole body text, and from my (albeit limited) understanding of regex engines more efficient than the lookbehind.

I think I'd change the captured metacharacter just to make it read easier (for me).

\A\s*|(\S)\s*\z

-Chris

@ccstone But…but…but that's an extra keystroke! :open_mouth: :slightly_smiling_face:

And in a real-world case, wouldn't you use Search/Replace to simply trim the whitespace from both ends (replace \A\s*|\s*\z with null) and then "manually" add the bullets to the ends of the resulting string? Seems that combining the two modifications into a single Search/Replace operation is great intellectual exercise but perhaps not great procedural design.

2 Likes

I'm just adding the bullets for visual confirmation of the replace.

Keyboard Maestro trims text in Display Text in a Window actions, so you have to be careful when dealing with whitespace.

You can get around this by adding quotes or other characters to the text token you're displaying:

'%Variable%LocalOutput%'

But I'm testing in:

  • BBEdit with its native find/replace.
  • BBEdit with Perl.
  • Keyboard Maestro

So I'm looking for uniformity when testing all varieties of leading and trailing whitespace.

-Chris

@ccstone But (not trying to be difficult here), if what's being tested is uniformity of whitespace truncation in various implementations of Search/Replace (and what an interesting challenge!), all the more reason to use a separate mechanism (other than Search/Replace) to visually demarcate the result. No?

Steve

Actually, I wrote the regex I posted because search and replace seemed to directly address the specific problem: Search for a possible bit of leading white space, followed by ANYTHING (which can include internal whitespace), followed by a possible bit of trailing white space, then replace all of that with everything except the leading and trailing white space. In other words, \A\s*([\s\S]*?)\s*\z (by virtue of its capture group) means "choose absolutely everything, then discard only white space at either the very beginning or very end of the document, if any exists, and keep everything else".

In case it is not clear to all, especiallly @peternlewis , the real issue is that the KM RegEx Replace Action ALWAYS does a GLOBAL search and replace. Every other RegEx tool I have ever used allows for Search/Replace of ONLY the first match.

Request

@peternlewis, please provide an option in the Search/Replace Action to apply it to ONLY the first match.

Thanks.

That’s AN issue; it’s not the THE REAL issue.

1 Like

Another point well taken, @JMichaelTX, which I hadn't fully grokked until your comment. But in the context of @ccstone's original Search pattern \A\s*|\s*\z, I don't think your request would help. The pattern is of the form "X|Y", and it would only work as intended if the Y half were matched after the X half succeeded. That is, the pattern presumed a global search/replace. So in this particular case, the problem isn't that a global search/replace was being done, but that the global search for Y matches in 2 locations rather than 1, where there's some debate as to whether that second location is legit.

Steve

1 Like