Advice please on when to use For Each Line or For Each Substring

I often use "For Each Line in NamedClipboard that matches a RegEx pattern" so that I can find multiple instances of the pattern and operate on them (copy, move to variable etc.) and it has been working well. However, today while looking at some of the posts on this forum I have learned about an alternative.... "For Each Substring that matches.." which seems to do the same thing. Is there any particular reason (processing speed perhaps) to choose one or the other, and if so under what conditions?
Also the options for the match re case I understand but what are the "separated by" choices for, I cannot find it in Wiki.
Thanks for helping
Dave

It generally makes sense to do the "obvious". If you're processing line by line, start with "The lines in:" -- if nothing else, you'll thank yourself when you go back to look at the macro next year!

"The lines in:" also appears (only a quick test) to be line-ending agnostic -- you can even mix %Return% and %LineFeed% in a text block and still get every line back. That may not seem important -- you generally know the line-endings of any text you use in a macro -- but it is one less thing to worry about, and can make a big difference in a sub-routine which you could throw random text blocks at.

A quick test suggests that if you are processing every line the methods are about the same. But the substring method is faster when you have lines that don't match, and the more non-matches you have the faster it gets (which makes sense!). On my iMac, processing a 548-line variable where only 1 in 4 lines match:

  • "For each line then regex" -- ~1,650,000 microseconds, ie 1.65 seconds
  • "Substrings" -- 416,000 microseconds, ie 0.41 seconds

(Variation in run times and rounding to "obvious" numbers may make that look neater than it actually is but, again, logic suggests that processing only 1/4 of the lines should mean it takes 1/4-ish of the time.)

But I wouldn't get hung up on that -- the test was just dumping the match into a variable then moving on. I would think that in the "real world" whatever you did after checking for a match would totally mask the 10ths of a second difference from choosing one method over the other. So I'd still go for what makes sense in context -- am I processing lines or substrings? -- to make the function of the macro more obvious when I look at it.

These are mentioned in the "Changed in 10.0.1" section of the manual, with a link to the Forum thread suggesting the feature. From a quick read -- instead of treating your string as an array using custom delimiters then iterating the array by index, you can treat it as a Collection split on "separated by" and iterate through without having to manage things yourself.

2 Likes

Thanks for the detailed reply @Nige_S it's very helpful. I will experiment with the lines/substrings, but in fact I am generally looking for one or two matches in a 50-100 lines text so it may well be that substrings are better for me.
Also thanks for the link to the manual... I did the rookie mistake of searching the manual/wikl without the double quotes even though I used them in my question here, another of the lessons I have re-learned many times. :slightly_smiling_face:

I must admit -- I rarely use the search box, instead having the single page version of the manual up and using Safari's search on that!

1 Like