KM's OCR Actions vs Monterey's Live Text OCR... And The Winner Is

This was a great tip on using Apples Live Text.
My KM Macro is a bit smaller. I use the interactive screencapture option from the terminal tool.
I do not need the coordinates I capture for anything else afterwards.

1 Like

I don't know how to "share shortcuts" from the Shortcut app so I can't do that. Until you get the shortcut working from the screenshot, any KM macro that I send you will simply fail. Let me know when you get the shortcut working.

Getting started with Shortcuts: 1 Basics – The Eclectic Light Company

Search for “export”.

-Chris

That's a good tip, but my shortcut has my folder and username hard-coded into it, which means it's of no use to anyone else without editing it, so I'll leave it to the user to manually type in their own shortcut with their own preferred folder name and location.

Stumbled onto a killer use case for this idea this morning: Too many times in a video meeting, someone will briefly present a Google document or other web site that I want to follow up with. In the old days I would either interrupt to ask them to share or screen capture the URL. Now I can just OCR the URL in a matter of a few seconds. Excellent!

1 Like

You could also have KM running in a loop checking the video for a URL, and if it sees one, it could automatically open the URL in a window. Or a notification with a link could auto-appear and you could click on the link, I guess.

I already do this for computer games on an M1 Mac with a utility (that I haven't shared on this website yet) which lets you automatically take actions, like click on a button on the screen, if it detects words like "Do you really want to quit?"

1 Like

Dear @Sleepy,
Thank you indeed for this great hack! I loved also the prompt rectangle idea from @martin.
As you mentioned - the Apple OCR is working extremely well and helped me with the project that I've spent almost whole day working on, but the result is flawless.
My use-case scenario:
I am logging to Citrix via the Workspace App and I get a passcode as sms (visible in the Messages.app), I couldn't find a way to read/copy the new sms without any input, so with the OCR trigger I can be less careful now and select much bigger portion of the screen, which after some regex filtering is getting exactly the code I need.
Now I have the whole procedure of opening, switching windows and pasting the passcode to the input window automated. Only part I do manually is to take the screenshot, but I believe that if I pre-set the window size of the messages app and get correctly the coordinates for where the new sms is appearing, I can also fully automate the screenshot part too and save annoying 2-3 minutes of my life by sipping a coffee instead.

EDIT: I have actually told the messages app to resize as I need it to, then I know where the SMS is appearing and was able to set the coordinates for the screen capture. Now all is automated!

Thank you all for this great post and the comments!
Stan

3 Likes

Following the discussion above - I have exported the shortcut and the macro, for those that are lazy to reproduce all or are stuck in some process. Hope it will help.
The OCR Shortcut in the Shorcuts App - once imported you need only to change the path as this is obviously linking to my computer - link
The OCR grab (The supported languages from Apple - I have tried with English, German and Spanish and the text recognition was flawless including special characters. Cyrillic doesn't work) -
OCR_Read_OSX.kmmacros (3.4 KB)
The Macro doesn't have enabled trigger for the time being, so you can choose your own. Please edit the path on step 3 (Write System Clipboard to File) to the same location you have given in the Shortcut App.

After importing all and triggering the Macro the rectangle appears - select the area you want to screengrab, the process flows and the OCR-ed content is copied to the clipboard - all is left for the user is to paste it wherever he wants.

Once again, many thanks for the amazing community here.
Stan

3 Likes

That's me! I'm definitely lazy, that's why I like to automate anything I can.

Thank you @stanivanov for sharing that and of course @Sleepy for coming up with this idea and sharing. And @Martin for the bit about Screen Capture Area - sorry if I missed anyone...

It works great and I've adapted for my own needs.

The path and the .jpeg file is saved to needs to be changed in both the OS Shortcut and in the Keyboard Maestro Macro (just wanted to mention that for others following along).

Also, I like to use Local Variable names so I changed the Variable names in your example Macro. It's a good practice to get into as then the Variables don't persist after the Macro has run. Just a tip to bear in mind :grinning:

image

The actual recognition of text is quite incredibly accurate - even with white text on a black background. Again well done @Sleepy for being the pioneer!! :clap: :clap: :clap:

Thanks. I actually measured Monterey's OCR speed once, (which is only a single test, so your results may differ) and it actually was 30x faster. As for its error rate, my claim that it's 30x more accurate is more subjective, based on inspection of some sample results. And for my purposes, errors that are "false negatives" are more important to solve than "false positives". What I mean by that is that it's more important that the OCR doesn't misinterpret words, rather than accidentally come up with the occasional false word. Even Monterey occasionally comes up with a spurious word from time to time. But that doesn't matter when Monterey almost always reads the words it does see correctly. I hope you can understand the distinction that I'm driving at.

There is another fabulous new feature of Monterey that I'm working on integrating with KM, and when I'm confident that my technique will earn three claps, I will also release that. Perhaps others will beat me to it; that's okay.

4 Likes

And just discovered something else very good... I thought I would have to go through the setup for each of my Macs but the Shortcuts App syncs via iCloud so the OS Shortcut was already on my second Mac and as I have Keyboard Maestro syncing, everything was in place already and just worked on the second Mac.

The path I used to save the Screenshot to is not the desktop but here (as it my previous OCR Macro was saving to here and I think this path would be the same on any Mac)

/tmp/screencap.png

image

image

1 Like

Thanks @stanivanov. That was helpful.

I wonder if someone managed to OCR entire PDFs using Live Text (ignoring already existing layers of text in the PDF). I have been trying to use shortcuts to do it, but without success.

I have lots of experience with Monterey OCR. The main issue with what you are asking is that Monterey OCR returns text from the top down, regardless of horizontal position. So if it's a two column source target, you won't get very useful results. If it's something really simple like pages in a book, I think that would be rather easy to get working.

1 Like

It's occurred to me really smart software could understand where on the page the chunks of text (or graphics) are - and do something with that.

Two-column is almost never "LRLRLRLR" and almost always is "LLLRRR".

One problem, though, is that I think for a PDF the drawing orders describe the starting position of a piece of text, not its end.

I have this problem with some PDFs we generate with tables in. Similarly, try selecting columns in tables in a web browser.

Most PDF documents have pages where the text occurs in specific rectangular blocks. If the blocks are in the exact same location (even if they alternate on odd/even pages) then it would be fairly easy to create a macro that takes a screenshot of the text block, sends that image to an OCR engine, saves the text, and then goes to the next page.

I've done this many times. In fact I've done multiple 300 page books this way, (actual physical/paper books) long before KM even existed. Doing it on a PDF-formatted book would be a piece of cake.

There are even easier solutions if the number of pages is small, say under 50. You could have a macro that simply gets triggered any time the system clipboard changes. When it changes, the macro will check that the clipboard contains an image, perform OCR on that image, and append the resulting text to a KM variable. That macro would probably contains only two actions. All the user would have to do is select each page manually with the mouse using the screen capture shortcut.

2 Likes

I cant seem to download this link for the shortcut. Ive tried to copy manually but something is going wrong. i get it to save the screenshot in my desegnated folder but then nothing/

You need to be a bit more specific as to what happens at which step... Are you trying to import the whole process - the Macro and the shortcut? The way how I have it is that at the end the result is saved in the clipboard - i.e. just CMD+V (paste) in text editor or so will give you the result of the OCR. This is how it's intended to work (per my needs).

  1. I've tried now the link of the shortcut - it works fine, once you click on the link it gets you to the shortcut and you just click - add to shortcuts on your computer (provided you have already the shortcuts app on your mac).
  2. here again all the actions - maybe this helps?
    Need more info to be able to help you...

I get this error on shortcuts. If i Paste after doing this macro it just pastes the actual screenshot.

OK i used safari and got the shortcut imported properly. the warning no longer showing up but after running macro and pasting in say Notes theres just blank

Good point about the Safari.. haven't thought about it.
I'm not that huge expert, this is why I just import the macros and edit only the path... In your case I see you used different variable, which you set in the first step:
Set Variable - LOCAL__Screen
This needs to be the same variable in the Shell Script execution, there you used "Var"...
i.e. in the Shell script - change the save to variable to be LOCAL__Screen, then in the last step it will be %Variable%LOCAL__Screen%

Then it should work... Or just really copy/paste my setup :slight_smile: