SpellCorrector [word,...] Macro (v9.0.2) for enhancing OCR

SpellCorrector [word,...] Macro (v9.0.2)

The OCR action is wonderful. I enjoy it a lot. But of course it does make mistakes, that's the nature of OCR. Fixing those mistakes is not easy. But this little macro makes it a lot easier than the ways I've been using up until now (eg, sed).

This macro assumes that the text you are trying to correct is in a global variable called "OCRtext". Of course you are free to change that name or modify this approach any way you want. This macro also assumes that the words being corrected are somewhat distinct from each other, and it tends to work best when the words you are checking are not short, say 5 or more characters in length. It really doesn't work too well if the words you are trying to correct are 2 or 3 characters in length. It needs more information than that to work correctly, so I recommend 5 letters or more, although that depends on the nature of the words you are using. If your words are very distinct, then you might even be able to get 3 or 4 letter words to work.

Let's assume you are reading an image into the OCR action and you send the output of that action into a variable called OCRtext. And let's say you were expecting the output of that OCR action to generate (among other words) the words that are the days of the week: Monday, Tuesday, etc. Here's the command you would issue (next image). Any words that were "close" to the words you indicate would be "fixed" to those corrected words. In this particular program it works only when the words are "one character off". I'm planning another version which works differently, but this version should be effective for many purposes.

If you create the action above, it would fix all "near misses". For example if OCRtext contained the word "Wedmesday" (which could be a typical OCR error) it would fix that and replace it with "Wednesday". This version fixes any single character OCR error. For many people, that should be adequate. For lack of a better title I'm calling it a "spell checker." It's correcting some text based on the "dictionary" of words that you pass it, so it seems appropriate to call it a spell checker.

I don't need credit or attribution. Feel free to use it for anything. I donate it to the public domain. Hopefully it's fully debugged but I do not guarantee that it is free of bugs. use at your own risk.

I didn't measure how fast it is, but it feels speedy to me. Compared to the OCR that you perform, it's lightning fast.

SpellCorrector [word-...].kmmacros (11 KB)

1 Like

Oops, I left a debug statement in there! I'll have to remove that and reinsert the new version without the debug. Sorry. EDIT: there, I think I edited the original post to remove the debug statement.

What is the spell correcting engine?

(BTW looks like a really useful macro - except I’m mostly picking up uncorrectable names.)

Thanks for the compliment. I was talking about the OCR action. I didn't say "spell correcting engine." So I'm not sure what you are asking.

Basically this macro is meant to help you, the programmer, to correct words that the OCR action failed to read correctly from an image. Do you use the OCR action? If so, you might find this handy to make corrections quickly and easily.

If you put words in the parameter line that you are expecting, then this macro will fix any "near misses" that the OCR action accidentally produced.

Are you using the OCR action on an image to create some text? And then did you use my action to try to make corrections to it?

Surely there is a way to take any given text string, from OCR or elsewhere, and pass it to a real spell-checking engine for correction. Anyone have any ideas?

1 Like

Of course that's a great idea. And to be honest I hadn't thought of that. But even so, this tool could still be helpful to people who only need simple text corrections.

I think it’s good if there are PERSISTENT OCR’ing errors. I can see that happening with some of my stuff - as some names look a bit like real words.

In my scenario I’d want to load a set of words - data set (file name) qualifiers - and use that for a given input. The set of words would need to be readily updatable.

(If I didn’t have a ton of (interesting) projects right now I’d work on this.)

It wouldn't be that hard to modify my algorithm to take its set of words from a file rather than from a comma-separated parameter list.

This is an interesting technique! Alas it's not clear to me, where the macro gets the 'correct spelling' from ... Is macOS's Hunspell involved in some way, to define the correct spelling?

In this macro the parameter that you pass provides the correct spelling. My example following paragraph 3 in my original post demonstrates how to call this macro.