Regex Alternation Operator - Finding 2 Patterns With Single Action

I plugged the text and the regex into the site regex101.com and learned that with the "?" there's hundreds of matches (basically every spot in the file matches) but without the "?" you get only a small number of matches.

Here's the actual explanation why the "?" isn't working:

After pondering it, that makes perfect sense. By adding the "?" you are saying that the "ERROR" component is optional, and therefore every single position in the file is a match of length=0 (it also returns the correct strings.) But when you remove the "?", it only returns strings that contain "ERROR."

For prefix pruning, we can split at a given character index.

See the Keyboard Maestro action Get Substring of Text ... From:

Nested splits (Pruned data from Most recent %- ETA- and Error – if any).kmmacros (9.1 KB)

I'm certainly proving my original point -- I'm clueless when it comes to RegEx! Interestingly the pattern does match if, and only if, the optional ERROR is the very first word in the reversed string. Perhaps that will give someone a clue as to what's going on.

I think reversing the string might still have legs, though. Once the string is reversed you should be able to extract all the current/latest transfer info you need by grabbing everything from the start of the string to ETA\s\d+.. It'll be easier to then process that chunk to get the bits you need.

It's actually "match zero or one times. Prefer one" (emphasis mine). See Regular Expressions | ICU Documentation. That's why I thought it would work -- I think it still could, in the hands of someone smarter than me!

At least once per day I feel that my presence on this website is useless because people like you are 10x smarter than me. If I left this website you could easily fill in for me.

In this case, however, the fact that one repetition is "preferred" is irrelevant because you have a 0 to infinity asterisk inside the loop that the "?" is modifying. At the beginning of the file, before the first character, we already HAVE a match because inside your brackets is an asterisk wildcard that already matches zero characters. The outer "?" cannot trump the inner asterisk. Since every position in the file will match "X*", then adding a "?" cannot force the inner asterisk to match at least one.

When it comes to regex, I'm not a genius. I have to use regex101 like most other people to interpret a regular expression. That's what I think it's saying to me in this case.

I don't think (again -- regex newb here!) we do. Without the "optional", (ERROR[^\n]*) matches "the string ERROR followed by zero or more non-linefeed characters".

I think my wrong-headedness was that the engine would look for the whole pattern with an optional ERROR at the front, but what it is doing is seeing that the first character in the string is not an E -- the optional instantly fails but the rest of the pattern has enough slack that it still picks up the other values.

This is reinforced by the fact that when the string does start with ERROR the regex works as intended (which is what was throwing me, my string started that way -- bad testing on my part).

It isn't clear, but in the regex.com explanation the "previous token" is everything between the preceding ( and ) -- which you picked up on in your earlier post.

Your argument is convincing me now, but what I did was actually plug in the sample text with the sample regex's and the results matched my conclusion, even if my logic was wrong. So now I have to reconcile the results from regex101 with your sound argument.

EDIT: upon reflection, I still believe my own argument. I guess that makes me dumb, but I accept that conclusion.

Not dumb -- you may well be right! But here's something that shows what I think is going on.

Simplified string that starts with ERROR, my suggested pattern, works -- there are three Groups captured: regex101: build, test, and debug regex

Same string but prefixed with 1, same pattern, fails -- you can see only two Groups are captured: regex101: build, test, and debug regex

And you'll get the same if the line starts with ERROR but you add a line before it.

Which is why I think that the engine looks at the first character of the string, and if it is an E it will carry on looking for R, R, etc. But if it isn't an E (or is, but not completing to ERROR) then the first optional fails and we're straight into (?:.|\n)*?, which will match everything from the start of the string but without capturing.

What I was hoping was that the engine would first try to match the entirety of

(ERROR[^\n]*)(?:.|\n)*?(\d{1,3}%),.*?ETA\s(\d\d.)

...and only if that failed would it try

(?:.|\n)*?(\d{1,3}%),.*?ETA\s(\d\d.)

A fundamental misunderstanding on my part -- one which I'm trying to explain both for my own benefit and so that I don't propagate such wrong-headedness here :wink: