Apple Vision OCR instead of Tessaract

Hey everybody,

I recently started using OCR via KM and like many people it seems, I've had problems with the use of dark mode and extracting text when there are other elements than text in a given area.

I was wondering if it would be beneficial to use Apples Vision Framework for OCR instead of Tessaract. It is very efficient and primarily made to extract text from photos and videos of the "real world" which is a much more dynamic problem. I could imagine, that it would handle the problems that currently exist with KMs OCR a good bit better.

One problem is obviously that the Vision Framework is only available since Big Sur. One temporary solution would be to just put an option into the preferences which lets you choose the available OCR Frameworks.

I'd love to know what the higher ups think of this!

Believe me, I'd be very happy to switch to something native rather than Tesseract. I don't know what APIs Apple provides for this (they have a tendency of releasing features instead of APIs these days). And as you note, I ideally want a solution back to High Sierra. But if the Apples Vision Framework provides useful APIs and better results, I will certainly look at using it.

2 Likes

Awesome! I've recently seen more apps popping up that utilise that framework, which is why I believe that it should be fairly accessible.
It's not that I'd know anything substantial about the topic though. :sweat_smile:

1 Like

Any updates on this front?

Kind of. I did start to utilise Vision OCR by calling it via Apple Shortcuts and execution via Terminal.
For that, I've written a little shortcut that takes the input from the clipboard, and returns the ocr result back to the clipboard as follows:

  1. Write Makro to save Screenshot of Display Area in Clipboard
  2. Execute shortcuts run "OCR Clipboard"

Its a bit of a workaround for native vision support, but it works fairly reliably and quickly.

Best wishes,
Philipp

1 Like

I'm finding it very frustrating that the OCR function is so reliably unreliable.

Green output: 16 CC 40
Red output: u(oH olen t=)

The built-in KM OCR function?

What happens when you use Shortcut's OCR function?

Yes, the built in one. I'm trying to grab every value to the left of each found image of "Keyboard Maestro", so the OCR is being automated. As far as I'm aware, that's not possible with the native Apple one. I've tried executing it as an asynchronous submacro and automating the click-and-drag using KM, but no joy.

It just seems very odd that KM's OCR misreads that red value consistently, giving the same wrong output every time. Bizarre.

I don't see why it wouldn't work with native apple OCR. I may have not written it before, but I use the native area screenshot function of macOS, triggered from KM. It stores the screenshot in the clipboard which then can be processed by the apple shortcut. I had to resort to doing the screenshot with the native implementation because it was more responsive than the KM version, at least at the time of figuring this out.

The problem with the KM OCR (Tessaract) seems to be that it can only produce reasonable results when the font is very dark and the background very light.

Also just as an addition: If this is Logic Pro, you might get some good results using UI Browser to grab the values. UI Browser simplifies using Apple Script to interact with native Apple UIs. With it, you can grab the values not by taking a screenshot, which might by fairly unstable because of possibly changing column widths. It rather finds the fields of the table by the ID or name of the field. It surely is a little bit more complicated to get into it though.

2 Likes

Ahhh... I totally forgot that I need to take a screenshot and then run the OCR on the clipboard, rather than do it all via a Shortcut. :man_facepalming:t2:

Works great! Thankyou.

FWIW, it's not Logic Pro; it's UA Midi Control, which has a tedious habit of disconnecting from KM's virtual port whenever I update the OS.

I do use UI Browser for Logic and it's great! My AS knowledge isn't up to the job of processing a table like this, but I've managed to get it working with native KM actions. If you're curious, this is what I'm doing:

For each found image of "Keyboard Maestro"

  • Click the found image to select a row
  • Screenshot an area to its left to grab the midi values
  • Use these values to send a new message from KM to be recaptured by UAM.

Incidentally, this seems like a good time for a small feature request. @peternlewis, I think it would be nice if all actions that have area coordinate fields could have a check box for highlighting the defined area. This would avoid the need to manually copy coordinates across to the dedicated Highlight Location action for this purpose.

I also think it would be great if every action in KM had save as default/restore default options in its gear menu. I know you can save favourites, but this seems like it could be a more elegant method to me. (Example: I'd like the Highlight Location action to appear as a rectangle by default; seems a bit much to save a favourite just for that).

It's certainly a possibility, the biggest problem being that coordinates are often relative to the front window, which would not show the correct results when the editor window is at the front.

Actions have multiple built in variants, so that would be problematic.

I haven't had a coffee this morning yet, so maybe I'm missing something, but I'm not sure I understand how that's any different from how it works with the *Highlight Location" action currently.

I'm seeing it as similar to the "Display" button in a found image action. If I'm checking for images in the front window, I don't need the editor to be frontmost; I just use my run current macro hotkey while the desired window is in focus.

How so? Wouldn't it be the same as recalling a favourite, but without all the clutter?

Essentially Keyboard Maestro has built in "favorites" which make up the actions you can select, to which you can add your own. Sometimes there are multiple versions of an action included in the default actions. So if you were to add one of them and change the defaults and save it, that would essentially add a new favorite, which is no different to what you can do now.

1 Like

I have added Apple Text Recognition as an OCR option for the next version (requires macOS Catalina or later).

10 Likes