RegEx Beginning/End-of-Line Anchors Not Working?

For some reason, my KBM's regex engine seems to have trouble with the regex beginning-of-line and end-of-line anchors, ^ and $.

Noticed it when trying to convert plain text to HTML paragraphs:

Result:

If I run the same regex s&r in TextMate, it just works:

I have tried rephrasing the regex in KBM in order to avoid ^ and $.

When I do that, it works:

However, because it doesn't grab just the line, but the line + the newline character, the replacement will not be applied to the last line of the text. A problem which I wouldn't have if ^ and $ were working.

I've also seen examples here on the forum where ^ and $ are being used in KBM regex stings. So I wonder what could be the cause of them not working for me.

I'm on KBM 8.0.3 and Mac OS Sierra 10.12.6 (16G29).

I believe KM tends to default to single-line search, whereas TextMate (I'm guessing) defaults to multi-line, which would explain this behavior. Either way, explicitly searching with the multi-line flag (?m) in KM seems to resolve the issue:

KM Regex Test.kmmacros (2.9 KB)

1 Like

Thanks, @gglick, that solved it!

1 Like

To be clear, the default is not to "single line", but to the entire text string, which may have one or more lines. So, without the (?m) flag, ^ refers to the start of the string, and $ refers to the end of the string.

For more info, see
ICU Flag Options

Control the behavior of "^" and "$" in a pattern. By default these will only match at the start and end, respectively, of the input text. If this flag is set, "^" and "$" will also match at the start and end of each line within the input text.

1 Like

Take a look at the first post where the variable contains multiple paragraphs. Shouldn’t the replacement have <p> as the first characters and </p> as the very last ones?

That would handle the entire string of the input text. But it doesn’t (just tested it for myself). The only way it handles the entire string is when you cut it back to a single paragraph (just one line).

I’m referring to paragraphs to indicate newlines and avoid confusion with any soft wrap in the variable definition.

I would have expected the entire string to have been surrounded by the paragraph tag in the first example if the entire string is processed.

I'm not clear on your test case. Please post the entire test case, including source string, RegEx you used, and KM Action that you used. Probably that's your test macro. :wink:

OK, but as I said, it's identical to the first post. I just changed the value of the variable because my Latin is rusty. Simulator, as we used to say.

With this macro the text is not wrapped in tags as it should be if the "entire string of text" is handled by ^(.*)$.

I won't repeat my earlier message, but the point is no substitution takes place.

Keyboard Maestro 8.0.3 “KM Regex Test” Macro

KM Regex Test.kmmacros (3.7 KB)

The problem is with your RegEx. By default the dot character . does NOT match end of line characters, so your RegEx is NOT matched.

You need this RegEx:
(?s)^(.*)$

The (?s) flag enables it to match end of line characters.

With this, the match is made, and results are as expected:

See Regular Expressions - ICU User Guide

If set, a "." in a pattern will match a line terminator in the input text. By default, it will not. Note that a carriage-return / line-feed pair in text behave as a single line terminator, and will match a single "." in a RE pattern.
Line terminators are \u000a, \u000b, \u000c, \u000d, \u0085, \u2028, \u2029 and the sequence \u000d \u000a.

Yep. So the “entire string of text” is not actually handled by ^(.*)$. The “.” prevents that.

No, without the flag (?m), the ^ and $ still refer to the entire string, the beginning and end, respectively. What happens (matches) in between will determine whether or not the RegEx has made a match.

For example, the Regex ^Some Text at the beginning$ would also fail if the text "Some Text at the beginning" was NOT actually at the beginning of the source string. In fact, it would have to be the entire string to match.

I have found a great way to test and learn RegEx is to use https://regex101.com/

Given input:

Line 1
Line 2
Line 3

The matches for ^(.*)$ depends on the s and m Flag Options, both of which are off by default.

  • The s (DOTALL) flag: If set, a "." in a pattern will match a line terminator in the input text. By default, it will not. Note that a carriage-return / line-feed pair in text behave as a single line terminator, and will match a single "." in a RE pattern.
  • The m (MULTILINE) flag: Control the behavior of "^" and "$" in a pattern. By default these will only match at the start and end, respectively, of the input text. If this flag is set, "^" and "$" will also match at the start and end of each line within the input text.

So Search and replace for a variable of ^(.*)$ and replace with "xyz" results in:

  • ^(.*)$ - fails to match. ^ matches at the start of Line 1, $ matches at the Line 3, . does not match end of line characters.
  • (?m)^(.*)$ - returns "xyz%Return%xyz%Return%xyz%Return%". ^ matches at the start of each line, $ matches at the end of each line, . does not match end of line characters.
  • (?s)^(.*)$ - returns "xyz". ^ matches at the start of Line 1, $ matches at the end of Line 3, . does matches everything.
  • (?sm)^(.*)$ - returns "xyz". ^ matches at the start of Line 1, $ matches at end of each line, . does matches everything. Because .* is greedy, it will match until the end of the string.
  • (?m)^(.*?)$ - returns "xyz%Return%xyz%Return%xyz%Return%". ^ matches at the start of each line, $ matches at the end of each line, . matches everything. Since it is not greedy now, it will match until the end of the first line, where $ matches.

Keyboard Maestro Actions.kmactions (1.6 KB)

2 Likes

Peter, thanks for confirming my posts.

Not to beat this to death but the issue that bothered me was the implication that the entire string would be handled by the delimiters, which I felt was confusing at best. And required qualification.

I appreciate, having written them since 1976, that regexps are tricky little things.

Keyboard Maestro’s syntax of putting options before the regexp is a new wrinkle for me anyway. So I appreciate the clarification of the syntax for flag options at least. But I wonder if it wouldn’t be better (one day) to be explicit about the options with, oh something like checkboxes.

OK, I’ve beat it to death. Sorry.

Not sure what you mean by "delimiters". The scope of the ^, $, and . and are standard, and long-standing across all RegEx engines I have seen or used.

Again, this is standard. From Specifying Modes Inside The Regular Expression

Sometimes, the tool or language does not provide the ability to specify matching options. The handy . . . Or, the regex flavor may support matching modes that aren't exposed as external flags.

In those situations, you can add the following mode modifiers to the start of the regex.

If you insert the modifier (?ism) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier.

This is also clearly described in the article
Regular Expressions (KM Wiki)

Search Modifiers

The ICU calls these modifiers “flag options”.

The search modifier “Pattern to Use” shown below is placed at the very beginning of the Search/Find Regular Expression box.
For example:
(?m)^\s*\d+[\t]+

I think KM's method of handling flags was one of the first things I learned about using RegEx with KM, since many of the RegEx I need/use require either or both (?mi) (multiline and case insensitive).

IAC, it is hopefully clear to all now how to use RegEx flags with KM. :smile:

You can also use the (?s:xxx) method for options, so something liek:

(?m:^)((?s:.)*)(?m:$)

The flag applies only to the parts within the (non-capturing) brackets.

This allows for explicit control, also useful with the i case sensitive flag.

By “delimiters” I mean metacharacters that delimit the actual text, which is what ^ and $ do. Sorry if I wasn’t clear.

Whether some particular regexp syntax is standard or documented or peculiar to a particular implementation isn’t what I was getting at. Sorry if I wasn’t clear about that either.

Keyboard Maestro makes an attempt with its graphical actions to make it easy for someone with a problem to craft a solution without years of experience of deep dives in documentation.

In fact, you can see this with the regexp popup menu that doesn’t require you to know about the case flag (i) because there is a “case sensitive” and “ignoring case” option.

But, as we’ve seen in this thread, there are other flags (like multiline) that can frustrate Keyboard Maestro users. Even the default of a global substitution has confused people here.

I’m not arguing against modes inside the expressions (although give me a moment) but suggesting it might be worth thinking about more explicit visual controls.

Like a checkboxes for options like global substitution, ignoring case, multiline, etc. perhaps in the gear menu (although that’s a little hidden away).

I think that addition to the user interface would help people build regexps in Keyboard Maestro with less frustration.

That’s what the discussion on this thread suggested to me. A checkbox for multiline would have made the option obviously desireable and avoided the confusion of not knowing the default behavior.

2 Likes

After having thought about your suggestion for a bit, I have to agree.

So, instead of this:

We would have this:

with a popup menu something like the one from RegEx101.com:

Of course the above is just a functional mock-up, not a finished UI, but I hope it illustrates the point.

This would also make it more directly comparable with the screen/UI at RegEx101.com, a great place to test and develop RegEx. I think many other RegEx apps use the syntax of /<RegEx here>/<flags here>

So, what do you think @peternlewis, is this a reasonable, doable request?

1 Like

That would get my vote if I had one <g>.

I like that the field gives a quick synopsis of what’s been set and that the popup gives a fuller explanation of the options, which would really help a lot of people. And the combination makes explicit what if anything (like global) the defaults are.

The problem with this is that regex tests are used all over Keyboard Maestro (probably a hundred different places). It would be a huge amount of extra UI clutter to include flags everywhere you can use regex, and it would be equally confusing to have the regex flags somewhere and not others.

OK, granted, it is a lot, but a hundred ??
I'm not seeing near that many with this KM Wiki search:
Search for "regular expression" [Keyboard Maestro Wiki]

Some of these are probably seldom-used things, like some of the conditions.
If you just did the main Actions to start with, and then do the others as you had time, I think that would still be helpful.

Sorry, Peter, but I don't buy the clutter claim.
There's no real clutter difference (to my eye) between these two:

To some degree, yes. But you already have one big difference:
Some show choices for case sensitivity, others don't.

I'm hoping with some clever ObjC class design you could sub-class for the various differences with just minor changes. But I'm obviously just guessing, since I don't any insights into the KM design/code.

Just my 2¢. Clearly this is NOT an urgent issue/request. We've lived with KM like it is for several years, and can continue to do so for some time until you have time to make such a change.

Just had a thought: I bet collectively we (your users) could put our heads together and come up with a KM macro that lets the user build that RegEx Flag Options. How 'bout it guys, can we do it?

1 Like