Replacing Duplicates Using RegEx

Next up: Peter and his solution using Perl

3 Likes

:sunglasses:

You can do the job with a 1-liner, but getting it to be fault-tolerant takes more work.

With this script I wanted to be sure to leave items in their original order.

There's a very slick way of removing duplicates using a hash, but it will reorder items according to its own whims. That makes it useless unless companioned by a sort routine (IMO).

-Chris


Remove Duplicate Lines Using Perl v1.00.kmmacros (6.9 KB)

1 Like

Hey Folks,

I figured out how to do this much more compactly.

#!/usr/bin/env perl -sw

my %lines;

while (<>) {
   if ( m!(^\S.+)! ) { print if not $lines{$1}++; }
}

This does use a hash, but not in the way I refer to above – so the sort order is preserved.

Blank lines are removed, and it's tolerant of a non-terminal linefeed.

-Chris

1 Like

Great! But how on Earth would we use that in KM?

I wanted to take a stab at this myself, so here's my version. It's keyboard maestro only, no shell scripts or javascript and uses a regex. Finds duplicates even if they aren't next to each other, retains order of lines and puts a blank line where each duplicate was found.

Basically loops through the source text a line at a time. Checks (via regex) if that line has already been added to the results, adds a blank line if it has, adds the test line if it hasn't.

I think the only bit of the regex that might be unusual is that it needs to treat each line in the result independently. the global flag (?m) at the beginning of the regex does this. See Regular Expressions [Keyboard Maestro Wiki] for more global flags (notably change the global flag to (?mi) will make the regex case-insensitive as well.

I added some entries to the test text given in the first message. Wanted to make sure partial line matches weren't being used. And test the case-sensitivie/insensitive regex.

Results

Macro Image

Macro:

Get Unique Lines.kmmacros (9.7 KB)

1 Like

If you look at my previous post you see exactly how.

-Chris

As a footnote to all this, a quick reminder that meaning of duplicate can vary a bit between particular contexts and jobs.

  • Are two names duplicates if they vary only in case ?
  • Are two dateTimes duplicates if they vary only by milliseconds ?
  • are 4 and 4.0 duplicates of each other ?

That will vary a bit with the task in hand. Scripted solutions can make it easier to define and adjust slightly more flexible equivalences when they are needed.

3 Likes