Please help me understand \R newline in RegEx

I have read the lengthy discussion of "The %Return% token in the Wiki section here, and I'm pretty sure I understand the differences among \n \r and \R. However, I have a macro that searches a clipboard that behaves in a way that I can't figure out, so perhaps someone can help explain.

Here is the step in the macro that works

And here is relevant part of the clipboard that I have pasted into regex101 and so it appears as "TEST STRING"

In regex101 I can use the regex /(\d+)\R (or replace the \R with either \r or \n) and get the result I want (the number following the /). However in KM I can only use /(\d+) (as in the first screenshot). If I append \R or \r or \n in KM the search fails. The fix is obviously to omit the return token but I would like to understand why this happens so that in future where I may need to have the return token, I'll know how to do it properly.

thanks

Dave

Hey Dave,

When asking questions such as this please provide a downloadable test-case macro that as simply as possible demonstrates your issue.

It's always better to test than to guess.

\n == a linefeed
\r == a carriage return character
\R == a linebreak

Line Breaks

\R is a special escape that matches any line break, including Unicode line breaks. What makes it special is that it treats CRLF pairs as indivisible. If the match attempt of \R begins before a CRLF pair in the string, then a single \R matches the whole CRLF pair. \R will not backtrack to match only the CR in a CRLF pair. So while \R can match a lone CR or a lone LF, \R{2} or \R\R cannot match a single CRLF pair. The first \R matches the whole CRLF pair, leaving nothing for the second one to match.

Or at least, that is how \R should work. It works like that in JGsoft V2, Ruby 2.0 and later, Java 8, and PCRE 8.13 and later. Java 9 introduced a bug that allows \R\R to match a single CRLF pair. PCRE 7.0 through 8.12 had a bug that allows \R{2} to match a single CRLF pair. Perl has a different bug with the same result.

Note that \R only looks forward to match CRLF pairs. The regex \r\R can match a single CRLF pair. After \r has consumed the CR, the remaining lone LF is a valid line break for \R to match. This behavior is consistent across all flavors.

Regex Tutorial - Non-Printable Characters

-Chris

2 Likes

Thanks Chris. I have just made a simple test macro to demonstrate what I mean for upload... and in so doing have figured out the problem... it wasn't that the RegEx was failing to find the newline character, it was that my clipboard didn't have one! I had missed a "trim" statement earlier in the calling macro.

I should have heeded the advice from @jonathonl

... however I report here that my real cat is suitably unimpressed

Dave

3 Likes

No matter how smart and experienced we are PEBKAC can never be vanquished entirely...

The best we can hope for is to reduce the frequency that it bites us.

:sunglasses:

5 Likes