How Do I Use Regular Expressions (Regex) to Search/Replace on Variable That Has Special Regex Escape Characters

Hi all:

Here's my problem I cannot figure out, and hoping someone has solved this:

I have variables with text in it, the text often has what are regular expression escape characters in them, like "+" "=" "(" and ")".

If my text/variable doesnt have these characters, my macro's regular expression search/replace commands work great! If they do have those characters, my macro breaks.

I need to perform regular expression search/replace functions on these variables but without getting rid of those special characters as they are important text.

I found this thread, but the solution there does not work for me:

What is the best way to handle this? My brute force thought is to first convert the text to have easy to identify tags for the special characters. Like to do a non-regex find/replace for "(" and replace it with TAGLEFTPAREN. Do my regex operation. And then find/replace the TAGLEFTPAREN back with "(".

This seems highly inelegant, so I'm hoping there is an easier way to ask KM to auto-magically to do this.

Thanks for any help!

If you are searching or replacing characters/strings that are also regex meta characters, then you have to escape these characters in your regex (in order to make them literals). You don’t have to change the text you are searching in!

The first answer in the thread you have linked to gives you some examples. If this doesn’t work for you —or if I misunderstood your question— then please post a sample of your text (that’s what is in the variable) and of the regex you are trying to use.


BTW, you have posted the same question also in the linked thread. I propose, you remove it from there; otherwise people may waste time to answer the same question in different locations.

1 Like

Sure, and thanks Tom.

So the variable Stanzas may have the following string in it:

==Moderator:== Replace below text with same but using code block:

99. Froggy went a courtn he did ride (crembone); Froggy went a courtn he did ride (crembone). 

100. Froggy went a courtn he did ride, with a + sign by his side (crembone).

My search of that string in regex might be:

[0-9].?[0-9].[^.]*.

[EDIT: HA. When I put in my full string it doesnt show up correctly above. Here is the full string with "/" replacing the forward slash, so you can see what the website parses out]

[0-9]/.?[0-9]/.[^/.]*/.


==Moderator's Edit==: Here's the first string put between backtics:
[0-9]*\.?[0-9]*\.[^\.]*\.


That should select stanza 99 and i set that to variable called SelctionStanza and try to do things with it. But I also want to pop that stanza 99 off from the Stanzas variable and search and replace with this as regex:

\s*%Variable%SelectionStanza%\s*


==Moderator:== And the above string between backtics:
\s*%Variable%SelectionStanza%\s*


I replace with null set so it pops off stanza 99.

I hope that makes sense.

It seems to me that the problem results from you using the variable you’ve stored the string in directly to construct a new RegexPattern, and when that string contains characters that are interpreted as a RegexPattern character - this has unfortunate side-effects.

What about introducing a duplicate set of variables which you use for the construction of the RegexPattern when you need it? And for these RegExpSafe variables you do a search and replace where you make sure to escape the characters you need to escape?

So from the above sample you’d still store the Stanza in the “SelectionStanza” variable, but after doing so you’d also create a “SelectionStanzaRegExSafe” variable, with the same content but with the needed escape characters added.

Then in the example you gave above you’d use the “SelectionStanzaRegExSafe” variable.

Thanks herrvjan. I think part of the problem is im regex reject! :grinning: I'm barely a novice so I don't fully know all the ins/outs.

So I'm not sure I fully understand your meaning. But making a regex safe variable may work, but I'm not sure how to magically 'escape' it.

I did do my brute force method, and it does work! Shockingly, it seems to still work fast. I just used the regular search/replace to find bad characters and replace them with TAGLEFTPAREN (for example) and then just do the reverse process after I do the regex operations.

It works, and it seems to do what you suggest as a consequence, because now when the SelectionStanza is made, it never has a bad charcter in it. It seems it's a bit of the same thing, just a difference as to when/where you do the swap.

However, I suspect you have a more magic 'escape' for the variables that I'm not groking. I grok it enough to understand I'm not groking it. :smiley:

Sorry for my daftness.

I'm not exactly particularly skilled at this regex myself - and have had some struggles the past week trying to grasp what type of regex KM uses and which functions are supported and such.

My thought was that you would do something similar to your brute-force, but to avoid the back-and-forth. So once you store down the text you want to keep (potentially including special characters that will break your macro in the next step) you could create a new variable where you copy the text you stored, but where you add a \ before each of the characters you're currently replacing in your brute-force via non-regex search and replace. \ is the "treat the following character as a literal" character in ICU-regex.

This should leave you with one variable that has the stored string and one with the stored string altered so that it can be used in regex.

Not sure if I'm making sense of if this really helps you. :stuck_out_tongue:

1 Like

So first of all, thank you, all of this helps and it's so awesome to have any help at all as a sounding board. This KM community/board is so awesome!

Anyway, so I was thinking about this, and I think only the brute force method will work. Here's why.

If the original corpus of Stanza variable still has the regex escape charters in it, and I change just the SelectionStanza to 'escape' all the rejex characters in it, then the SelectionStanza will never match what is in the corpus of Stanza to 'pop' it off.

So, you need to do the brute force mechanism.

Ideally, at some point, KM should have a special "search as literal" mode for variables with regex escape characters in it that can be searched as a literal, even when it is surrounded by other regex codes around it. So something where you can set a variable to a "search as literal" setting would do the trick, I think. So maybe it would look like this:

\s*%VariableLiteral%SelectionStanza%\s*

The website “parses out” more correctly if you format your strings as…

code block

…or as inline code.


Code blocks you create with three backticks (grave accents) in the line above and below:

```
<your string here>
```

And inline code by enclosing the string in simple backticks (no newline required):

text text text `<your string here>` text text text

@johnk, I have edited your above post to ADD the strings placed between backtics, as @Tom suggested.

1 Like

The escape provided by @DanThomas in Filter Variable to escape regular expression special characters - Questions & Suggestions - Keyboard Maestro Discourse needs a small, but important, change, as noted by @ Joel_Rendall:

So, using that, I found the method to work fine with your data.
Here's my example macro to illustrate this method:


Example Output:

image

MACRO:   Using Regex That Contains Meta Characters


#### DOWNLOAD:
<a class="attachment" href="/uploads/default/original/3X/c/8/c819a8a5868bbb6695022cbda27eaff3ac14a7c0.kmmacros">Using Regex That Contains Meta Characters.kmmacros</a> (4.4 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---



![image|500x1174](upload://i0cpbFbUCYMvD3NkbfEXUpq9nUl.jpeg)
 ---

Questions?

2 posts were split to a new topic: Finding File Names in a LaTex Document with a Regular Expression