Regex Alternation Operator - Finding 2 Patterns With Single Action

Hi,

I'm hoping someone can help me figure out why this Regex is not finding both matches with one Action. It will find each pattern individually but not both together as 3 separate capture groups (it works here: regex101: build, test, and debug regex).

I'm assuming it's something to do with the flavor of Regex maybe? Or the fact that I'm only looking for the LAST match of each using Lookahead - combined with the Alternation it's only giving me the FIRST match and then stopping?

Thanks!

Pattern 1 alone:
image

Pattern 2 alone:
image

Together using Alternation:
image

In terms of custom delimiters (here ", ") for Keyboard Maestro variable arrays:


Custom KM Variable Array Delimiters for CSV.kmmacros (3.3 KB)



@ComplexPoint Thank You for the idea but I think the images I posted are a bit deceiving.

If you check the regex101: build, test, and debug regex link you’ll see that the body text I’m looking through is a log file with multiple lines and groups which are not all separated by commas.

Essentially the log file is read every 3 seconds during a transfer and the bottom most % and ETA is read to feed a progress bar. On top of that I’m monitoring the log file for any Errors that show up. I could easily just have a separate Regex Search Action for each pattern, but I’m confused as why it’s not working with the alternation.

It'll help if you export and upload the actual action, assuming you can't strip down the macro enough to post a demo of the borkage -- images are all well and good but don't show all the action options.

But it looks like the first part of the alternate is matching so the second part is never evaluated. Try this demo, then try it after replacing all ETAs in the text with ETBs so the first alternate fails:

RegEx Demo.kmmacros (3.8 KB)

I'm pretty clueless when it comes to RegEx, but I'm guessing the difference between KM's and regex.com's behaviour is that regex.com allows for global (all matches) while KM only returns the first match (see the Search Modifiers section of the action's Wiki page).

What the familiarity of grep etc makes us forget is that:

  1. Regular expressions prove clumsy and unwieldy (time-consuming) with search, but better adapted to splitting, and
  2. once you reframe in terms of splits, you barely need regex at all.

We obtain

from that body text with:

Nested splits.kmmacros (6.6 KB)


Or from:

Nested splits (JS).kmmacros (3.9 KB)


Expand disclosure triangle to view JS source
return kmvar.local_Source
    .split("\n")
    .flatMap(
        s => s.startsWith("Transferred:") && s.includes(" ETA ")
            ? (() => {
                const xs = s.split(", ");

                return [`${xs[1]}, ${xs[3]}`]
            })()
            : []
    )
    .join("\n");

A quick cursory look reveals that your regex101 example uses the flags "gm". "g" is assumed by Keyboard Maestro, but "m" is not.

Did you try adding (?m) to the front of your KM search expression(s)?

@DanThomas Thanks for the input. I believe the global modifier is also not assumed by KM and I forgot about that. I have come across this issue before and remember this post Feature request: RegEx search global modifier - #2 by JMichaelTX that explains this. I did try the (?m) and that didn't make a difference - but I'm having a bit more luck with the global modifier and the For Each action - but I haven't quite solved it yet.

The other option you could do is to use an "Execute a JavaScript for Automation" do do the regex.

I haven't looked in detail at what you're trying to do, but if you want help in that direction let me know.

@Nige_S Thanks for this - it reminded me that I needed the global modifier.

@ComplexPoint This is good to know, I'll have to use this more often. In this particular case I only want the bottom most Percentage and ETA - but also the bottom most Error.

That won't help here -- you want to consider the whole string, not go line-by-line. And there is no global modifier for the "Search by Regular Expression" action.

Do you actually need to do it this way? If the text on regex.com is representative it would appear that you could do a more simple pattern, anchored on the end of the string and working backwards:

Ignore me, I've realised this wasn't doing what I thought. For whatever reason -- probably my ignorance! -- the pattern I thought would give a "less greedy" match between ETA and ERROR isn't doing so.

Back to the drawing board...

@Nige_S The problem is that there might not 'always' be an ERROR, so I can't anchor the regex to the ERROR. I need the most up to date Percentage and ETA, and the most recent Error if there is one.

To use the Global modifier in the "Search by Reg Ex..." you need to use the "For Each" action with the "Substrings" "matching in". Everything 'works' but the macro gets more unnecessarily complicated.

The Action below will "work" but the problem is that using Alternation means that during the 'Loop 1' it will write the FTP__PercentComplete and FTP__ETA capture groups to their variables, and during 'Loop 2' it will write the FTP__Error capture group - but since 'Loop 2' is the 'other match' it 'deletes' the FTP__PercentComplete and FTP__ETA variables when it writes the FTP__Error variable.

As a result I need three "Set Variable" actions at the beginning to clear the variables, and then at the end that append said variables during each loop.

image

Vs just using 2 of the "Search by Regex" actions:

image

Thank You everyone for your input!

Yeah, I should have realised that. I goofed anyway, the pattern wasn't actually matching what I thought it was. Sorry -- I should learn to :zipper_mouth_face: until sure...

@Nige_S :laughing: oh no worries at all - thank you again for your input.

MRU %, ETA, and Error (if any)

Nested splits (Most recent %- ETA- and Error – if any).kmmacros (8.4 KB)

1 Like

@ComplexPoint This is very cool - and it's almost so simple that your brain wants to make it harder to understand than it is, haha.

So this goes through each line - if the line contains Transferred: and ETA then sets the first 2 Variables based on their sequence number in the line, separated by commas. If the line does not contain Transferred or ETA, but contains ERROR it sets the Error variable.

What if I only want the "10s" part and NOT the ETA part?
Also, what if I only want the "ERROR : ..." and everything after this on this line?

I'm assuming I'll have to resort to regex for that?

1 Like

Why ? Still splits.

But I'm not sure that I've yet understood which part of any ERROR line you need.

"ERROR : ..." and everything after this on this line?

sounds like the whole line (which seems an unlikely pairing with "only want")

Tell me more ? Which sub-string do you want ? (Showing an example always and inevitably works better than a description, of course)

Right, here's a better explanation.

For the ETA I just need the 10s or whatever comes after ETA

For the error I only need whatever the error is. In this case it's:

ERROR : 01.wav: Failed to copy: Put mkParentDir failed: mkdir "__For Upload" failed: findItem: failed to make FTP connection to "ftp.com:21": tls: first record does not look like a TLS handshake

So for the Error, everything on that line EXCEPT this

2024/09/17 14:22:45

Back to the regex. In my last try I was smart enough to use \R to match any line ending -- and dumb enough to forget your couldn't use \R in a character set. D'oh!

So, looking at this in another way -- the problem is that you want the last matches in the text, but KM's "Search with Regular Expression" only returns the first. Solution -- reverse the line ordering in the text so you search for the first matches!

Reversed Log RegEx.kmmacros (5.8 KB)

Image

It seems to work with the limited text sample available, both with and without the error message. Assuming you're tailing the log file so as not to grab too many lines this should be plenty fast enough. If you are going to loop the search then remember to blank the "error" variable (Local_FTP_Error in my macro) before each search.

It still might be better to split only the relevant lines from the log text and then search on those. You might be able to split on the INFO lines and take the last or last-but-one element, but you really need to watch the log file while transfers are progressing to see how best to do it.

@Nige_S I really like this idea but the regex is not catching the error. When you make: (ERROR[^\n]*)optional with the ? right after, the ERROR match disappears. I took a quick look but haven't solved it.