Match and Remove One Multi-Item List From Another Multi-Item List?

Say I have two variables, BigList and LittleList...

BigList
abc•lots-more-stuff-here
def•etc...
ghi•
j•
kl•
mnop•
q•
r•
st•
u•
v•
w•

Each line in BigList starts with just text, followed by a bullet, then a lot more text.

LittleList
kl
u
w

LittleList is just text, one entry per row. These are shorter than reality—neither is limited, but realistically, there may be 200 or so at the most in BigList. LittleList will generally have many fewer than that, but could (in theory) have the same number.

I want to mark (by prepending or appending text) or remove each line in BigList that has a matching entry in LittleList. I have a solution working now, but it's slow—it takes about a second to process my demo lists of 40 BigList and 6 LittleList entries.

The way I'm doing it now is iterating through BigList, and running a regex search for LittleList matching the start of big list—I converted LittleList to regex style, i.e. ^(kl|u|w)•, and then look for matches, one row of BigList at a time. When I find one, I prepend text; if no match, I just copy the row as is.

But there's got to be a faster way...I think?

-rob.

You're doing this one row at a time?

Why not do the whole big list at a time.

Turn on multiline (?m) and do the whole search/replace in one pass.

-Chris

Can you share what you have so far.

Sure, here's the test I built before integrating it into my real macro. At the end, two windows open, one with the original list, one showing the list without the items to be removed.

removing from a list.kmmacros (10.3 KB)

-rob.

Hey Rob,

Have a look at this.

-Chris

Removing from a list v2.00.kmmacros (10.3 KB)

Macro-Image

image

2 Likes

Brilliant, and I was being so dense! Though in my defense, the first thing I tried was putting a variable in the regex search line, and it failed, so I gave up.

But even then, I'd never seen ?m before .... that's just brilliant, and will be a huge timesaver. Thanks so much—again!

Edit: Removed potentially offensive intro - sorry!

-rob.

2 Likes

Wait a minute, I just noticed something. You're not using Search Using Regular Expression, but Search and Replace, then specifying a regular expression ... whoa. I'm not sure I grok how those differ?

-rob.

Hey Rob,

PCRE Regular Expression Syntax Summary

Search for “OPTION SETTING”

Search finds something and lets you extract the full found text or numbered captures to a variable or to variables.

Search/Replace performs very much like working with regex in BBEdit, although there are a few syntactical differences between BBEdit's PCRE and KM's ICU (macOS) flavors of regex.

Regular Expressions | ICU Documentation

BBEdit defaults to multiline on, whereas KM defaults to multiline off.

Make sense?

-Chris

Makes a ton of sense, and using that new magic :), I was able to make my HTML table build step dramatically faster. Before, I was parsing it row by row, and certain rows had to be treated differently. Now they're all handled by two separate Search and Replace with multiline regex.

Old time to process about 40 records: .270 seconds. New time: .080 seconds, or 3.4x faster! Thanks!

-rob.

2 Likes