Keyboard Maestro Regex Won't Allow Squeezing Blank Lines

scrutinizer · July 8, 2024, 5:33am

Squeezing – Deduping.

Both the source text and the pattern seem to be pretty straightforward. The source text is of the form (I've numbered the lines for easier understanding; they're not meaningful):

#0: A line of text
#1
#2: Another line of text
#3:
#4:
#5: Another line of text
#6:
#7: Yet another line of text
#8:
#9:
#10: Your boring line of text.
#11: 
#12: More of the same.
#13
#14: Will this never end?
#15
#14: Keeps giving.

I want to remove duplicate contiguous blank lines to make the formatting uniform, working them with the Search-and-Replace action.

My neat and terse regular expression is as follows:

Search for (?m)(?-s)(^\s*\n)\1 and
Replace with $1

No matter what combination of characters and flags I use, KM cannot find the pattern. I confirmed the regex to work on Regex101 and in the Mac app RegExRX by Mactechnologies.

Regex101:

RegExRX:

However, when I feed KM the same expression, I get either empty capture variables (i.e., undefined) or the entirety of the target text. The crux of the problem is that there's no way to foretell what the KM regex engine thinks line endings are. Are those "$" or an intermingling of "\r" and "\n"?

RegExRX utilizes PCRE 8.33, but the version of the app (1.8) I use on one of my older machines was released in 2013. Regex101 indicates PCRE2 and PCRE, which translates to PCRE 10.43 (2024) and PCRE 8.45 (2021). The EOL of Keyboard Maestro 6 was in 2015, two years later than RegExRX.

You'd think that KM would catch up. What PCRE iteration does it conform to?

However, even if the difference between PCRE versions affects the development, is backwards compatibility on the list? The immutable segment covers the overwhelming majority of scenarios.

Most importantly, is this action viable for accomplishing the simple tasks I've shared?

griffman · July 8, 2024, 5:55am

I took a stupid simple approach, and it seems to work with your test data:

Download Macro(s): remove blanks.kmmacros (3.2 KB)

Macro screenshot

Macro notes

Macros are always disabled when imported into the Keyboard Maestro Editor.
- The user must ensure the macro is enabled.
- The user must also ensure the macro's parent macro-group is enabled.

System information

macOS 14.5
Keyboard Maestro v11.0.3

Basically, I just look for lines that begin with whitespace, and remove them when found. Running on your test data (without line numbers), it spits this out:

Note that this is a very simplistic solution and would probably not work for all combinations of possible things that may appear to be blank lines. But it worked with your sample data set.

-rob.

Airy · July 8, 2024, 7:44am

Did you try using the KM Filter action to make sure you have the correct type of CR's? That's the first thing that I would check.

Airy · July 8, 2024, 7:59am

@griffman is a much better programmer than me, but when you said "I want to remove duplicate contiguous blank lines" I interpreted that as only wanting duplicate blank lines removed, not single occurrences of blanks lines. If I'm right, then there's an amazing way to solve this problem. You may not believe this, but there's a built in Shell command to do exactly what you want:

The above action gives you uniformed, single spaced lines, like this:

But if you wanted all blank lines removed, as griffman seems to think, then this may be the easiest way:

Of course, I've been making some mistakes the last few days, so maybe I'm mistaking what you want.

ComplexPoint · July 8, 2024, 8:01am

Regular expressions (as usual) seem to be contributing more problems than solutions here

For each, skipping blank lines, appending two %LineFeed% to each line:

All lines separated by single blank lines.kmmacros (3.9 KB)

Nige_S · July 8, 2024, 11:26am

Unless you've missed something out for the sake of brevity, it appears that ~~those options~~ the second option achieves nothing. ~~(?m) is pointless in that it changes the behaviour of ^ and $, but you aren't using anchors in your pattern.~~ (Apologies -- just spotted you are using ^, but it's inside your capture group so I didn't see it first time round.) (?-s) turns off "dot matches all characters including line breaks" -- but that's default behaviour anyway, plus you never use the . in your pattern.

There appears to be no difference between your pattern and the simpler (^\s*\n)\1 on regex101.

If you want "one, and only one, blank line between each line of text, nothing else changes" it would be much simpler to manipulate the linefeed/return characters:

DanThomas · July 8, 2024, 11:39am

This.

And thank you for the \R token - I didn't know there was such a thing.

Nige_S · July 8, 2024, 11:41am

I keep forgetting about it, but every time I use [\r\n] in the Forum someone will step up to remind me. Their efforts are finally paying off!

griffman · July 8, 2024, 11:48am

I really need to read more :). Sorry for misinterpreting the request, glad others sorted me out!

-rob.

DanThomas · July 8, 2024, 11:51am

I make that kind of mistake all the time, and it's because I want to be helpful, but I don't want to spend all day reading things. So I skim over the request, missing something obvious.

That's what they made "D'OH!" for. ;p

scrutinizer · July 8, 2024, 3:29pm

You were correct. That's what I wanted exactly. I never took my time to review cat. I ran my text through it and got the desired result.
Nevertheless, even if the capture groups approach is more complex, it should've worked.

scrutinizer · July 8, 2024, 3:57pm

On Regex101, there's always little to no difference with valid expressions since its developer adopted an all-encompassing approach. Regex101 is the Rolls Royce of pattern matching and so is ReGexRX, released in 2013. I have never had a minuscule fraction of the issues on Regex101 which I often have in other applications, KM first and foremost.
Unfortunately, there was a difference in the output depending on the flags. Reading ICU, which KM relies on, wasn't useful at all.

That failed too.

Nige_S · July 8, 2024, 4:27pm

Failed how, exactly?

It's not clear from your OP what the desired result is, nor what should happen with 3, 4,or more "blank" lines. And does a "blank" line include one with spaces, like your line 12, and if so is that a duplicate of a line without spaces?

DanThomas · July 8, 2024, 4:28pm

I guarantee this will work, although it's 2 actions, not one:

Assuming you want it delimited with "\n".

ComplexPoint · July 8, 2024, 4:29pm

To avoid wasting your own time and everybody else's – you always need to show:

Input,
expected corresponding output, and
your draft macro.

Nige_S · July 8, 2024, 4:57pm

Weirdly, your exact pattern is now working for me -- I swear it wasn't before...

Input:

1. A line of text
2. A second line

4. Another line of text


7. Another line of text

9. Yet another line of text



13. Your boring line of text.

15. More of the same.

17. Will this never end?

19. Keeps giving.

Expected output:

1. A line of text
2. A second line

4. Another line of text

7. Another line of text

9. Yet another line of text


13. Your boring line of text.

15. More of the same.

17. Will this never end?

19. Keeps giving.

A line should be removed between 4. and 7. -- 2 consecutive empty lines reduced to 1. And only one line should be removed between 9. and 13. -- 3 lines are reduced to 2 because 11 is removed as a duplicate of 10, but 12 remains because there's no blank line after it to match on.

I think that's the result you're aiming for.

Macro:

Format lines.kmmacros (3.5 KB)

Image

Keyboard Maestro Regex Won't Allow Squeezing Blank Lines

Options