KM's OCR Actions vs Monterey's Live Text OCR... And The Winner Is

It's occurred to me really smart software could understand where on the page the chunks of text (or graphics) are - and do something with that.

Two-column is almost never "LRLRLRLR" and almost always is "LLLRRR".

One problem, though, is that I think for a PDF the drawing orders describe the starting position of a piece of text, not its end.

I have this problem with some PDFs we generate with tables in. Similarly, try selecting columns in tables in a web browser.

Most PDF documents have pages where the text occurs in specific rectangular blocks. If the blocks are in the exact same location (even if they alternate on odd/even pages) then it would be fairly easy to create a macro that takes a screenshot of the text block, sends that image to an OCR engine, saves the text, and then goes to the next page.

I've done this many times. In fact I've done multiple 300 page books this way, (actual physical/paper books) long before KM even existed. Doing it on a PDF-formatted book would be a piece of cake.

There are even easier solutions if the number of pages is small, say under 50. You could have a macro that simply gets triggered any time the system clipboard changes. When it changes, the macro will check that the clipboard contains an image, perform OCR on that image, and append the resulting text to a KM variable. That macro would probably contains only two actions. All the user would have to do is select each page manually with the mouse using the screen capture shortcut.

2 Likes

I cant seem to download this link for the shortcut. Ive tried to copy manually but something is going wrong. i get it to save the screenshot in my desegnated folder but then nothing/

You need to be a bit more specific as to what happens at which step... Are you trying to import the whole process - the Macro and the shortcut? The way how I have it is that at the end the result is saved in the clipboard - i.e. just CMD+V (paste) in text editor or so will give you the result of the OCR. This is how it's intended to work (per my needs).

  1. I've tried now the link of the shortcut - it works fine, once you click on the link it gets you to the shortcut and you just click - add to shortcuts on your computer (provided you have already the shortcuts app on your mac).
  2. here again all the actions - maybe this helps?
    Need more info to be able to help you...

I get this error on shortcuts. If i Paste after doing this macro it just pastes the actual screenshot.

OK i used safari and got the shortcut imported properly. the warning no longer showing up but after running macro and pasting in say Notes theres just blank

Good point about the Safari.. haven't thought about it.
I'm not that huge expert, this is why I just import the macros and edit only the path... In your case I see you used different variable, which you set in the first step:
Set Variable - LOCAL__Screen
This needs to be the same variable in the Shell Script execution, there you used "Var"...
i.e. in the Shell script - change the save to variable to be LOCAL__Screen, then in the last step it will be %Variable%LOCAL__Screen%

Then it should work... Or just really copy/paste my setup :slight_smile:

Yeah i had it your way origionally and didnt work. Tried the LOCAL way and didnt work either. Sucks but thanks for help

Would love if you could share your file some reason duplicating manually is not working

Thank you sir

As much as I prefer KM for everything, I ended up strictly using a Shortcuts for this. I mapped it to cmd-shift-2 and it works well, though it can be a bit slow to launch sometimes (it might take 1-3 seconds). I actually forgot I did this as a Shortcut and spent a lot of time trying to find my KM macro for this before figuring out it wasn't a KM macro.

I wonder if "combining" this with a KM macro will speed it up. I'll have to give it a try.

2 Likes

Quite good idea as well... I don't know how you got the first line with Receive Any input from Quick actions... but I just skipped it...
I just used KM to trigger shell script to run the shortcut and it does the same as the other macro, with the only difference that now there's a popup saying what's the output of the macro, which is weird, but for me a good result too..
About 1-3 seconds delay.. using KM to trigger the Shortcut - I see no delay at all, it's just instantaneous. Thanks for sharing your solution!

if you click on the shortcut and then the "settings" button up top and select "use as quick action" the "receive any..." appears. Thats also how you can assign a keyboard shortcut for it. This way seems to work for me vs the other above only difference is you cant select the portion of the screen

1 Like

under the interactive screenshot click on 'show more' and selection should be Custom. Thanks for the Quick Actions hint.
I can't get the shortcut to run consistently with a keyboard shortcut. As @edjusted mentioned - there's delay and for me depending in which app I'm trying it, the delay is indeed between 1 to 3 seconds.
All in all, I'm happy that I've learned something new, but I'll stick with KM + Shortcut. This way it works flawlessly

ah. im running on new mbp and seemless for me

Wow, that works even better. It seems convoluted but it works.

I created a KM macro that just runs the Shortcut and mapped it to a hotkey:

Screen Shot 2021-12-09 at 12.51.29 PM

On the Shortcut itself, I turned off the "Add keyboard shortcut".

No more delay!

1 Like

I'm trying out the method that saves a screenshot and then runs Monterey's OCR and although I wouldn't say it's noticeably quicker, it is a lot more accurate. Fantastic!

1 Like

Yes, I’ve found - it can recognise just about any text, even Gothic fonts and some handwritten text. And it is able to ignore lots of other non-text objects in whatever image you are using.

I'm actually finding it quite a bit slower than KM's own OCR. Is that your experience too? Probably to do with having to write a file first. Shame there isn't a way to directly OCR a defined area within Shortcuts...

It takes between 1 to 3 seconds I find. At a guess I would say it's the processing of the image to get the text rather than writing the temp file that takes time.

BTW since Keyboard Maestro 10.1 (I think) you can use Keyboard Maestro's native Execute Shortcut Action to do this. You probably already know this. Instead of this:

You can do something like this:

2 Likes

I have this as well and find it invaluable. (Someone on Zoom is showing a Google doc and I can grab the URL from the image)

What really seals the deal for me Neil is using your long-press trick. I've got a function key mapped to "Copy" and if I hold it down a bit it becomes "Select a range and copy the OCR of what you find" :+1:

3 Likes