Regex Alternation Operator - Finding 2 Patterns With Single Action

DanThomas · September 20, 2024, 3:36pm

The other option you could do is to use an "Execute a JavaScript for Automation" do do the regex.

I haven't looked in detail at what you're trying to do, but if you want help in that direction let me know.

nok · September 20, 2024, 3:46pm

@Nige_S Thanks for this - it reminded me that I needed the global modifier.

nok · September 20, 2024, 3:50pm

@ComplexPoint This is good to know, I'll have to use this more often. In this particular case I only want the bottom most Percentage and ETA - but also the bottom most Error.

Nige_S · September 20, 2024, 3:59pm

That won't help here -- you want to consider the whole string, not go line-by-line. And there is no global modifier for the "Search by Regular Expression" action.

~~Do you actually need to do it this way? If the text on regex.com is representative it would appear that you could do a more simple pattern, anchored on the end of the string and working backwards:~~

Ignore me, I've realised this wasn't doing what I thought. For whatever reason -- probably my ignorance! -- the pattern I thought would give a "less greedy" match between ETA and ERROR isn't doing so.

Back to the drawing board...

nok · September 20, 2024, 4:27pm

@Nige_S The problem is that there might not 'always' be an ERROR, so I can't anchor the regex to the ERROR. I need the most up to date Percentage and ETA, and the most recent Error if there is one.

To use the Global modifier in the "Search by Reg Ex..." you need to use the "For Each" action with the "Substrings" "matching in". Everything 'works' but the macro gets more unnecessarily complicated.

The Action below will "work" but the problem is that using Alternation means that during the 'Loop 1' it will write the FTP__PercentComplete and FTP__ETA capture groups to their variables, and during 'Loop 2' it will write the FTP__Error capture group - but since 'Loop 2' is the 'other match' it 'deletes' the FTP__PercentComplete and FTP__ETA variables when it writes the FTP__Error variable.

As a result I need three "Set Variable" actions at the beginning to clear the variables, and then at the end that append said variables during each loop.

Vs just using 2 of the "Search by Regex" actions:

Thank You everyone for your input!

Nige_S · September 20, 2024, 4:31pm

Yeah, I should have realised that. I goofed anyway, the pattern wasn't actually matching what I thought it was. Sorry -- I should learn to until sure...

nok · September 20, 2024, 5:15pm

@Nige_S oh no worries at all - thank you again for your input.

ComplexPoint · September 20, 2024, 5:40pm

MRU %, ETA, and Error (if any)

Nested splits (Most recent %- ETA- and Error – if any).kmmacros (8.4 KB)

nok · September 20, 2024, 6:02pm

@ComplexPoint This is very cool - and it's almost so simple that your brain wants to make it harder to understand than it is, haha.

So this goes through each line - if the line contains Transferred: and ETA then sets the first 2 Variables based on their sequence number in the line, separated by commas. If the line does not contain Transferred or ETA, but contains ERROR it sets the Error variable.

What if I only want the "10s" part and NOT the ETA part?
Also, what if I only want the "ERROR : ..." and everything after this on this line?

I'm assuming I'll have to resort to regex for that?

ComplexPoint · September 20, 2024, 7:09pm

Why ? Still splits.

But I'm not sure that I've yet understood which part of any ERROR line you need.

"ERROR : ..." and everything after this on this line?

sounds like the whole line (which seems an unlikely pairing with "only want")

Tell me more ? Which sub-string do you want ? (Showing an example always and inevitably works better than a description, of course)

nok · September 20, 2024, 7:59pm

Right, here's a better explanation.

For the ETA I just need the 10s or whatever comes after ETA

For the error I only need whatever the error is. In this case it's:

ERROR : 01.wav: Failed to copy: Put mkParentDir failed: mkdir "__For Upload" failed: findItem: failed to make FTP connection to "ftp.com:21": tls: first record does not look like a TLS handshake

So for the Error, everything on that line EXCEPT this

2024/09/17 14:22:45

Nige_S · September 20, 2024, 8:46pm

Back to the regex. In my last try I was smart enough to use \R to match any line ending -- and dumb enough to forget your couldn't use \R in a character set. D'oh!

So, looking at this in another way -- the problem is that you want the last matches in the text, but KM's "Search with Regular Expression" only returns the first. Solution -- reverse the line ordering in the text so you search for the first matches!

Reversed Log RegEx.kmmacros (5.8 KB)

Image

It seems to work with the limited text sample available, both with and without the error message. Assuming you're tailing the log file so as not to grab too many lines this should be plenty fast enough. If you are going to loop the search then remember to blank the "error" variable (Local_FTP_Error in my macro) before each search.

It still might be better to split only the relevant lines from the log text and then search on those. You might be able to split on the INFO lines and take the last or last-but-one element, but you really need to watch the log file while transfers are progressing to see how best to do it.

nok · September 21, 2024, 4:54am

@Nige_S I really like this idea but the regex is not catching the error. When you make: (ERROR[^\n]*)optional with the ? right after, the ERROR match disappears. I took a quick look but haven't solved it.

Airy · September 21, 2024, 5:40am

I plugged the text and the regex into the site regex101.com and learned that with the "?" there's hundreds of matches (basically every spot in the file matches) but without the "?" you get only a small number of matches.

Here's the actual explanation why the "?" isn't working:

After pondering it, that makes perfect sense. By adding the "?" you are saying that the "ERROR" component is optional, and therefore every single position in the file is a match of length=0 (it also returns the correct strings.) But when you remove the "?", it only returns strings that contain "ERROR."

ComplexPoint · September 21, 2024, 6:25am

For prefix pruning, we can split at a given character index.

See the Keyboard Maestro action Get Substring of Text ... From:

Nested splits (Pruned data from Most recent %- ETA- and Error – if any).kmmacros (9.1 KB)

Nige_S · September 21, 2024, 6:42pm

I'm certainly proving my original point -- I'm clueless when it comes to RegEx! Interestingly the pattern does match if, and only if, the optional ERROR is the very first word in the reversed string. Perhaps that will give someone a clue as to what's going on.

I think reversing the string might still have legs, though. Once the string is reversed you should be able to extract all the current/latest transfer info you need by grabbing everything from the start of the string to ETA\s\d+.. It'll be easier to then process that chunk to get the bits you need.

It's actually "match zero or one times. Prefer one" (emphasis mine). See Regular Expressions | ICU Documentation. That's why I thought it would work -- I think it still could, in the hands of someone smarter than me!

Airy · September 21, 2024, 7:56pm

At least once per day I feel that my presence on this website is useless because people like you are 10x smarter than me. If I left this website you could easily fill in for me.

In this case, however, the fact that one repetition is "preferred" is irrelevant because you have a 0 to infinity asterisk inside the loop that the "?" is modifying. At the beginning of the file, before the first character, we already HAVE a match because inside your brackets is an asterisk wildcard that already matches zero characters. The outer "?" cannot trump the inner asterisk. Since every position in the file will match "X*", then adding a "?" cannot force the inner asterisk to match at least one.

When it comes to regex, I'm not a genius. I have to use regex101 like most other people to interpret a regular expression. That's what I think it's saying to me in this case.

Nige_S · September 21, 2024, 8:15pm

I don't think (again -- regex newb here!) we do. Without the "optional", (ERROR[^\n]*) matches "the string ERROR followed by zero or more non-linefeed characters".

I think my wrong-headedness was that the engine would look for the whole pattern with an optional ERROR at the front, but what it is doing is seeing that the first character in the string is not an E -- the optional instantly fails but the rest of the pattern has enough slack that it still picks up the other values.

This is reinforced by the fact that when the string does start with ERROR the regex works as intended (which is what was throwing me, my string started that way -- bad testing on my part).

It isn't clear, but in the regex.com explanation the "previous token" is everything between the preceding ( and ) -- which you picked up on in your earlier post.

Airy · September 21, 2024, 8:19pm

Your argument is convincing me now, but what I did was actually plug in the sample text with the sample regex's and the results matched my conclusion, even if my logic was wrong. So now I have to reconcile the results from regex101 with your sound argument.

EDIT: upon reflection, I still believe my own argument. I guess that makes me dumb, but I accept that conclusion.

Nige_S · September 22, 2024, 7:16pm

Not dumb -- you may well be right! But here's something that shows what I think is going on.

Simplified string that starts with ERROR, my suggested pattern, works -- there are three Groups captured: regex101: build, test, and debug regex

Same string but prefixed with 1, same pattern, fails -- you can see only two Groups are captured: regex101: build, test, and debug regex

And you'll get the same if the line starts with ERROR but you add a line before it.

Which is why I think that the engine looks at the first character of the string, and if it is an E it will carry on looking for R, R, etc. But if it isn't an E (or is, but not completing to ERROR) then the first optional fails and we're straight into (?:.|\n)*?, which will match everything from the start of the string but without capturing.

What I was hoping was that the engine would first try to match the entirety of

(ERROR[^\n]*)(?:.|\n)*?(\d{1,3}%),.*?ETA\s(\d\d.)

...and only if that failed would it try

(?:.|\n)*?(\d{1,3}%),.*?ETA\s(\d\d.)

A fundamental misunderstanding on my part -- one which I'm trying to explain both for my own benefit and so that I don't propagate such wrong-headedness here

Regex Alternation Operator - Finding 2 Patterns With Single Action

Options