Help With Regex Finding Every Variant of Upper or Lowercase of Word Except the Correct One

Hi

I think my head is stuck with a regex.

I would like to find all the wrong versions of a company name.
This is the correct one:
foTEX

But some write FoTEX, Fotex or other combinations of upper and lowercase.

Is it possible to construct one regex to find all the wrong combinations?

You don't really need a regex for this:

fauxText.kmmacros (2.9 KB)

But where you would need is a regex is if you have people like me typing "fotext" now and then:

fauxText.kmmacros (2.9 KB)

(Sorry for the aborted earlier message. I had just one action selected when I hit the Share button. Again! Maybe if I ask @peternlewis nicely, he would pop up a confirmation dialog when someone tries to share a single action to ask if that's what they really want to do. Anyway, this is exactly the approach @tiffle suggested.)

1 Like

If you want to correct the wrong ones, this is what I would do:

Use a Search and Replace with regex action to

  1. Search for literally fotex case insensitive
  2. Replace with foTEX

Hope that helps.

1 Like

Yes, I can see I do not need regex for this.
But I was forgot to mention this was actually not for Keyboard Maestro specific.
This forum has a lot of smart people in regards to regex.

Adobe InDesign (layout-software) has a feature called Grep Styles.
You define a regex and then you can automatically put a style of text on it.
In this case I was trying to make so that all misspelled versions for the brand name would be highlighted in red.
And it would be easier with just one regex.

Untested elsewhere but I notice that (in Sublime Text at least) we can write things like:

.(?=fotex)(?-i)(?!(foTEX)(?i))

where:

  • (?=fotex)(?-i) is a non-consuming and case-insensitive match
  • and (?!(foTEX)(?i)) is a case-sensitive lookahead.

i.e. we are looking for any character ( . ) which:

  • is followed by some case-insensitive permutation of fotex
  • but is not followed by the particular (case-sensitive) foTEX variant

You could experiment with various things to get a highlight of the whole word, e.g.

.(?=fotex)(?-i)(?!(foTEX)(?i)).{5}

This is what I have now in InDesign:


And it works. But would be cumbersome to make with brandnames with more than 5 characters.

But I suspect there is something in your regex, which InDesign does not parse:

But I will try to use it as a starting point and see if I can get it to work.

1 Like

I don't have Indesign to hand here, but I suppose the first thing to check might be whether the regex engine considers O and Ø etc to be the same thing.

Are you sure you cannot just brute strength it? There are only? 32 permutations and 31 are wrong. So just look for those 31 using RegEx. And you might be able to get rid of a few that are really unlikely. "fOTEX"??

The Pipe character (|) is OR

(FOTEX)|(FOTEx)|(FOTeX)|(FOTex)|(FOtEX)|(FOtEx)|(FOteX)|(FOtex)|(FoTEX)|(FoTEx)|(FoTeX)|(FoTex)|(FotEX)|(FotEx)|(FoteX)|(Fotex)|(fOTEX)|(fOTEx)|(fOTeX)|(fOTex)|(fOtEX)|(fOtEx)|(fOteX)|(fOtex)|(foTEx)|(foTeX)|(foTex)|(fotEX)|(fotEx)|(foteX)|(fotex)

BBEdit has no trouble with this.

Does that match the Danish strike-through o and O ? (or might you need 64 or 128 permutations to cover that ?)

I was wondering about things like:

.(?=f[oøØ]tex)(?-i)(?!(foTEX)(?i)).{5}

Not as BBEdit sees it. Throwing in the Danish Ø brings it up to 64. That is still doable.

(FOTEX)|(FOTEx)|(FOTeX)|(FOTex)|(FOtEX)|(FOtEx)|(FOteX)|(FOtex)|(FoTEX)|(FoTEx)|(FoTeX)|(FoTex)|(FotEX)|(FotEx)|(FoteX)|(Fotex)|(FØTEX)|(FØTEx)|(FØTeX)|(FØTex)|(FØtEX)|(FØtEx)|(FØteX)|(FØtex)|(FøTEX)|(FøTEx)|(FøTeX)|(FøTex)|(FøtEX)|(FøtEx)|(FøteX)|(Føtex)|(fOTEX)|(fOTEx)|(fOTeX)|(fOTex)|(fOtEX)|(fOtEx)|(fOteX)|(fOtex)|(foTEX)|(foTEx)|(foTeX)|(foTex)|(fotEX)|(fotEx)|(foteX)|(fotex)|(fØTEX)|(fØTEx)|(fØTeX)|(fØTex)|(fØtEX)|(fØtEx)|(fØteX)|(fØtex)|(føTEX)|(føTEx)|(føTeX)|(føTex)|(føtEX)|(føtEx)|(føteX)|(føtex)

BBEdit handles this without difficulty. The original post of the original poster did not mention Danish. The above includes the "correct" spelling whatever that is so in use that would be removed.

I think I slept to little this night.
I forgot I changed the danish character ø to o in my example.
But my test-document still used the ø.

This is what I get with the one, which works on some of the characters.

@rlivingston: Of course I can use the permutation solution. Still if the brand was "føTEXBILKANETTO" it would be a lot more :slight_smile:
Do you have a smart way to generate the permutations?

Hi @JimmyHartington,
I've just had a look here and seen this screen grab

where it looks like InDesign offers some help when constructing regex for use in styles.

I know this doesn't answer your question but using this facility may help you enter @ComplexPoint's regex successfully. Apologies of you've already tried this.

This is what I get

Presumably the work flow need would not be met by a normalising search-replace operation ?

(case-insensitive search, case-specific replace)

I can write a little program to get the permutations (which I did). But I did not realize that you needed a general solution. With the permutations going up as the power of 2 it does not take a very long name to make the brute force solution impractical.

That restrict things obviously. The simple solution of running through the text and

  1. Changing all the correct spellings, foTEX, with case sensitive to some place holder like "qcorrectq".
  2. Look for all the remaining spelling with case insensitive fotex and making them red.
  3. Then returning the "qcorrectq" to foTEX.

That is a three step process but it is simple.

You might be able to write a Keyboard Maestro script to do these three steps one after another in InDesign. :slight_smile:

In the end a search and replace is going to be done.
It is for a customer to help them visualize all the places and ways it is misspelled.
So more nice to have.

And then I took it as an opportunity to see if it was possible to do with one regex.

I just tested the second example (using the regex "fotext?") I gave you in the latest InDesign's search and replace and it worked fine. I did have to use the @ menu to enable case insensitivity so the regex became "fotext?(?-i)" but I'd let InDesign construct that for you.

By making it case insensitive, one regex handles a lot of permutations (just drop my list into an InDesign document and you'll see they are all caught).

Oh, and to handle multiple o's: "f[oø]text?(?-i)" would do it.

Here, in fact, is an InDesign Keyboard Maestro macro that automates the Grep search and replace for you:

Grep S&R for "foTEX" Macro (v9.2)

Grep S&R for "foTEX".kmmacros (5.2 KB)

Running it on:

fotext
fotex
Fotex
FoTex
føtex
FOTEX
FoTEX

returned:

foTEX
foTEX
foTEX
foTEX
foTEX
foTEX
foTEX

Future reader beware:

My use of the word permutations is inexact and not that properly defined in formal mathematics.

The problem posed by the original poster is not technically a permutations problem.

1 Like

Thanks. Can certainly use the search and replace method from the macro.