Regex in Clipboard search & replace v. v. slow or fails

NB: Updated info. Please go to my follow-up reply to this post.

I am just a bricoleur — perhaps I'm tripping over dried mortar.

General goal:
Remove from large amount of text (tens of thousands of characters) everything but what is contained between two different search terms.

Specific goal:
Copy source code of Web page, delete everything before a given marker, delete everything after a given marker.

Software:
OS 10.10.4
KM 7.0
Safari 8.0.7

I set this up using several instances of the Action:

Search and Replace Clipboard Using Regular Expression (case sensitive)

The instances are:

Remove LF/CR: Find "\R"; replace "{null}"
Remove all text before search term 1: Find "(.)(SearchTerm1)"; replace "{null}"
Remove all text after search term 2: Find "(.
)(SearchTerm2)"; replace "{null}"
Remove unwanted escape character: Find "{backslash}{backslash}"; replace "{null}"
Add line breaks: Find "SearchTerm3"; replace "\r"

I checked each regex using RegExRX (an excellent little program IME, although I am not qualified to judge). They all work. None took longer than 10 ms. to execute.

The first one — remove LF/CR — failed as a KM Action. I tried several variations, including Search and Replace Using String Matching with the search term "%Return% — all failed. I finally copied and pasted the invisible character from the page source into the search field in the macro, with the macro set to "String Matching". This worked.

I can't get any of the rest to work. As far as I can tell, KM does not hang — it just takes minutes to finish the four Actions. The MenuBar icon shows one of the four "corner dot; I am working" icons. The corner dot does not move for at least several minutes. Activity Monitor sometimes shows KM using one whole core (100% of quad-core MBP).

I have tried both case-sensitive and not case-sensitive.

I tried — after reading in this forum about a problem with the clipboard loading too slowly in KM 7.0 — starting the macro with the Action:

Set Copy/Paste Delay for This Macro to 1 Seconds

It made no difference.

I'm stumped. Any wisdom you can shine my way will be greatly appreciated.

—Kirby.

The macro works :blush: . I just found that there is a typo in the source code of one of my test pages. Part of my SearchTerm1 is “Photograph”. This one page — and only this page, afaict — has “Photogaph”.

The macro takes between 2 and 3 minutes to process. Almost all of that time is spent on the Action to search and replace the text up to SearchTerm1 (using the regex “(.*)(SearchTerm1)”. How can I shorten that time? Again, in my testing using RegExRX, this operation is done in fewer than 10 milliseconds.

The macro seems to hang when the search term is not found. Advice on better grammer or error-handling?

The reason to remove the LF/CR/whatever-it-is is to allow the following actions to run. What is the better way to have a regex search text with LF/CR’s?

—Kirby.

Perhaps if you could post the exact macros and the page and what you are copying, then I could duplicate it and point out the issue.

I don’t see anything immediately obvious, but a couple thoughts on where problems could lie:

Keyboard Maestro works with styled text clipboards whenever possible, including search and replace. If you don’t need to end up with styled text results, then it would be better to assign the clipboard to a variable and then search & replace on that - that avoids re-reading and re-writing the clipboard between actions, and makes it easier to see exactly what data the actions are operating on, and eliminates interactions with the clipboard as a source of problems.

If you add a Log action between each action, you could determine exactly which action is behaving abnormally.

It is possible to write pathologically bad regular expressions, ones that take exponential time. The ones you quote do not appear to be in that class, but they can be very straight forward expressions that behave exceptionally poorly.

1 Like

Hey Kirby,

That is indeed outlandish and probably a syntax difference with ICU regular expressions.

But as Peter said it's really difficult to help without a real data-sample to test against.

I do this sort of thing all the time, although I use the AppleScript and the Satimage.osax for my find and find/replace actions.

If you're parsing HTML you should also consider talking directly to Safari or Chrome and using Xpath instead of RegEx.

An example can be found here.

-Chris

1 Like

Hi. Revisiting this to, first, thank you, Peter, and also Chris, and second, provide some closure. I was able to re-jigger the macro to work in milliseconds instead of minutes. I was unable to trouble-shoot the very slow RegEx replace, from which I concluded that I had, unaided, managed to write a

I was able, however, to replace it with a pair of RegEx replace Actions which did what I wanted, and at the expected speed.

Thanks again.

—Kirby.