String comparison macro

String comparison macro

This macro lets you compare a series of strings and it will tell you which one matches the first string the best. The approach I invented is "the sum of the squares of the differences of the frequencies of each 'letter frequency' in each string being compared." It's pretty cool. Of course, there are many ways to say what it "means" to compare two different strings for approximate equality. This method suits me just fine. I couldn't find a decent algorithm to perform this task anywhere on the internet in any language so I had to invent it. Feedback is welcome. Try it out on some examples:

Rain Spain Mainly Plain
Thwreen Two Three Twenty

As the second example above shows, it can be used as a sort of spell checker. This is important for me. I will use this macro often in conjunction with my OCR macro. Yup, I wrote a macro that performs OCR. I may upload that one also.

This macro uses an interesting collection of actions and Unix commands: tr, sed, sort, awk, gsub, for, printf and Search Variable with a calculation value. So there's a lot of educational value to this macro.

Keyboard Maestro Actions.kmactions (13 KB)

5 Likes

All this looks quite impressive. I'm trying to test the action but I don't know where to start. I've created a new macro and added an action to set the clipboard content to the variable ApproximatorParameters, but obviously, this isn't enough/correct. Could you please point me into the right direction?

I’m sorry if it wasn’t clear. What you need to do is use the “Execute a Macro” action to call this macro, and pass to it some parameters. The first parameter is the string you want to find among the second (and third…) parameters. The clipboard isn’t involved. Just use a single action: “Execute a Macro” and pass the parameters with that action. The way to pass parameters is to click on the cogwheel on the upper right of the action. I would love to send an image to illustrate, but I’m totally new at these forums and I haven’t figured out how to do that yet.

I reread your instructions and found that they were actually very clear. Thank you! I’ve recorded a short movie:

This comparison macro is very interesting: I think that you it actually provides what we translators call Fuzzy Matching (for which normally the Levenshtein distance is used).

So you use this macro (action) to cycle thru a text and have all words displayed that don’t match a spelling list?

I’m glad you liked it. I don’t understand what Levenshtein distances are even though I did try reading about them last week. I invented this algorithm on my own because I need it to help make my OCR software work well. OCR frequently makes mistakes and by calling this routine against a known list of options I can figure out which option the OCR text probably refers to. I might upload the OCR macro but haven’t decided yet. It only works for me with my fonts, but it may serve as a good example of the magic that can be done with the wonderful Find Image action. As for this string comparison macro, anyone is free to modify it as they see fit. For example, they could modify it to separate the strings as separate lines in a variable rather than separate words on a single line. Time for me to pack it in for today.