Screen Capture Using OCR for Variable Conditions

Hans-Peter_Henkel · November 28, 2021, 7:23pm

Not sure if this tip from the "amateur base" helps someone but these days I figured out some things that helped me which I'd like to share.

Kind of related to the topic I started recently OCR Screen Feature as “Location Relative To”? my goal was to place different sized windows of an application to different locations and screens depending on the status of a popup menu.

Using the action "Screen Capture Area" I copied the text from the current popup status as an image to a named clipboard. I then used "OCR Named Clipboard" and saved the result in the same clipboard.

Then I just used "If Then Else" to place and resize the window depending on the status.

So far so good. During my first tests I had issues because the size of the text was pretty small and the contrast (white on grey) also not ideal for OCR. I created a workaround that were able to compensate the issues partly. Before pasting the screen capture to the clipboard I used some actions to paste it into an image editor (in my case Snagit), let this resize the capture to about factor 5, paste the resized result into the clipboard and then run OCR. I also included the deletion of the just temporarily used image in Snagit. Overall it all didn't work too slow.

The first tests I ran on the 30" Apple cinema display with a resolution of 2560x1440. Yesterday I changed to a 4K display and now can confirm that this gives more reliable results without the need to resize the capture. I just stumbled over a related question today whilst searching for additional things. …

Anyway, the experiences of my investigation lead to questions for @peternlewis.

The action "Screen Capture" has some options beside "Area". What I am missing is something similar to window-related mouse actions.

Would it be possible to scan a specified "Area - i.e. relative to the front windows left corner"?

This would be more precise because the bigger the scanned area the bigger the chance that OCR finds more than one match. Also, the window has to be in a dedicated position. I just found the option to scan entire windows.

Would it be possible to include the option to resize a captured image by a specified factor before running the OCR action?

Apologies if I maybe missed something that is already possible. Thanks!

Sleepy · November 28, 2021, 7:33pm

I could easily be persuaded to support your first request. I would use that feature, probably a lot. So now we have two people asking for it, although it usually takes a lot more than that to get a new feature installed in KM. In the present time we use a lot of math to do the equivalent, which is annoying but it's a workaround solution.

As for your second request, do you think scaling an image larger would help with OCR? I don't think doing that would make any/much difference to the accuracy of KM's OCR, so a new KM feature to support that is highly unlikely. HOWEVER, I think I can help you with OCR accuracy. Did you know that Monterey has an amazing new OCR feature that works wonderfully, and that you can access it from KM? In my experience it works about 30x faster with 30x smaller error rate than the OCR built into KM. In my opinion this is what you should be doing here. There are a few threads on this topic on this website. Try searching for "Monterey OCR" on this website and you will probably find code that shows you how to do that. If you try this, you won't be needing to resize images at all.

Hans-Peter_Henkel · November 28, 2021, 8:27pm

Thanks a lot @Sleepy for your reply and the support on my first one.

Indeed I read your thread about Monterey OCR support the last days. But with quite a quick overview it looked too complicated and overwhelming for me. As already mentioned in another thread I personally like to keep things as simple as possible and even more important on a level that I am able to maintain myself. In my job role v. 2.0 after being employed over more than 20 years I am now working as a freelancer which includes basically learning something new almost every day. But at some point the 24 hours a day are not sufficient to learn everything that is possible. My request for 72 hours a day is still not approved. So don't have enough time to learn scripting and coding, RegEx and what else. And this is the awesome thing about KM that it makes even guys like me WITHOUT scripting and coding knowledge at all able to automate things and streamline my workflow.

Well, I also didn't have enough time to create a research project out of it. But I had more reliable results after resizing the captured images. So I'd say yes.

Sleepy · November 28, 2021, 8:33pm

Okay, but it's really not complicated at all. We just create a shortcut in macOS Monterey, then we call it as follows:

To me this is pretty simple and easy. Just three actions. Sure, one action would be simpler, but until KM provides a built-in action, it will take three actions, plus a shortcut.

Just three actions for a payoff that's 30x faster and 30x fewer errors. That's a huge payoff.

Hans-Peter_Henkel · November 28, 2021, 8:50pm

Thanks again!

Indeed it looks quite simple. Although I didn't recognize any slowness in my macro. Now that I don't have to resize the captured image I also have just three actions (beside the delay I mostly put at the beginning).

During my tests this afternoon it runs totally reliable without a single fail and that is what I need. The popup is always in the same place. I just have to position also the window to the correct location which could have been avoided using the window as reference for the coordinates.

Thanks again. I'll keep your example in mind if I might experience the need to have a more precise OCR.

Sleepy · November 28, 2021, 9:03pm

I'm happy to help. My claims of "30x" of course are very dependent on what one is doing. My test to get that result was an entire 4K screen full of words. I guess that's not the most common scenario.

I find that Monterey OCR works well even when the screen is a full page display of Google Maps. Monterey OCR is really good at reading the words on something like this. Even the words that are diagonal are usually read accurately. Test to see if the built-in KM OCR can read diagonal or vertical words! (Maybe it can; I didn't test that.)

I've used Monterey OCR to help me find words on full screen maps like this. It's surprisingly fast and accurate!

Some people may respond, "Yes but no OCR reports on the location of the words that it finds." While technically true, I was able to get repeated calls to Monterey OCR to narrow down the location to a pretty accurate area. So it can actually tell you "where" the words are found on the screen.

Hans-Peter_Henkel · November 28, 2021, 9:11pm

Of course, in this scenario any speed difference makes totally sense.

That is awesome. I would like to gain such a result using just another three actions. That'd be great.

But I am really happy with what I was able to reach today. Now that I have this working I will probably implement it in other macros during time.

Thanks for the conversation and have a nice rest of your day. I am off for now.

Sleepy · November 28, 2021, 11:12pm

It took a lot more than 3 actions to achieve this! It was probably more than 100 actions. And my algorithm didn't always work (for reasons I won't bother to explain, but have to do with binary search algorithms.) I want to rewrite my algorithm to work 100% of the time; if I can achieve 100% I'll post in in a new thread.

Also, I should note that my most recent algorithm took a long time to calculate the position of a word on the screen, typically between 10 and 30 seconds. That might sound "unusable," but it's not as bad as it sounds because it REMEMBERS where it found the word, and if it finds the words in the same location next time, it takes only 1 second to confirm that. So for example, if the words I want to locate are "click here to continue", for a computer game, the location usually stays the same each time the words appear, and that means it won't take more than about a second to confirm the location of the words. So instead of having a KM action that says "click at 300,300" my macros now say "click on the words 'click here to continue' " and this means I've removed the hardcoded mouse values from my macro which makes my macro work even when the user changes screen resolution, resulting in a new address to click on!! It's really amazing.

peternlewis · November 29, 2021, 1:51am

Yes, you can do this now with the OCR Screen action, Area, and WINDOW function.

You can do that now with the Resize Image action and IMAGE function.

Hans-Peter_Henkel · November 29, 2021, 10:02am

Thanks for sharing the details. It really sounds amazing but way above my skills. And also probably way more than I'd ever need. Anyway I'll keep my on that. THX!

Hans-Peter_Henkel · November 29, 2021, 10:03am

Thanks so much @peternlewis! That looks awesome. I'll check this as soon as possible and report my experiences. Really looking forward to do so.

Zabobon · November 29, 2021, 11:28am

I just wanted to second Sleepy's recommendation to use the OS Monterey Shortcut for OCR (if you have Monterey of course).

It's actually not that complicated and is a perfect blend of Keyboard Maestro and built-in System tools.

Sleepy worked out how to make a Monterey Shortcut Action to do the OCR work and then made "calling" that OS Shortcut Action a part of a Keyboard Maestro Macro. The bit in the the Keyboard Maestro Macro that "calls" the shortcut looks like this (just a single Keyboard Maestro Action):

That single Action runs the OS Shortcut and then saves the OCR Text back to a Keyboard Maestro Variable. The syntax here is really interesting, simple and powerful:

shortcuts run OpticalCharacterRec

The "OpticalCharacterRec" is just what I called the OS Shortcut. Which means that with this simple instruction shortcuts run Keyboard Maestro can make use of any OS Shortcut...

Of course you have to have the OS Shortcut built and ready for Keyboard Maestro. Building this shortcut has been explained by Sleepy before. But at the bottom of this post I've uploaded my version of it - just has to be unzipped and double-clicked to add it to your Shortcuts Library.

The OS Shortcut looks like this:

This OS Shortcut is telling the Mac to read a stored image (screenshot.png) and OCR the text from that image.

The Keyboard Maestro Macro that calls this OS Shortcut is -

Getting a screenshot of an area (by you dragging the mouse).
Saving that screenshot to the file screenshot.png
Calling the OS Shortcut to do the heavy lifting
Setting the result of that OS Shortcut as a Keyboard Maestro Variable

Here's the Keyboard Maestro Macro:

OCR Using OS Shortcut.kmmacros (28.1 KB)

And here is the OS Monterey Shortcut that it calls:

OpticalCharacterRec.shortcut.zip (11.8 KB)

On first run of the Keyboard Maestro Macro, this might pop up:

EDIT - a few notes:

Once you get the text into a Variable in Keyboard Maestro of course you can do anything you want with it (the last two Actions in my Macro makes it the current clipboard in plain text so it can just be pasted and displays it in a window, but you don't have to do that).

I like to have the System "shutter" sound effect and the Play Sound Action plays that sound but it's not necessary or you could play any sound.

Very cool - as OS Shortcuts sync to all your Apple Devices through iCloud you only have to set this up on a single Mac and is will then work on all your Macs if you are also syncing Keyboard Maestro.

Now you can see why Sleepy was awarded his for inventing this

Sleepy · November 29, 2021, 5:30pm

Thanks!

I see (uh, I mean, "it appears from your first image, but I may be confused") you changed it so that the screen snapshot occurred in the shortcut. I had tried that, but I found that it (the screenshot) was slower doing it that way (3 sec vs 1 sec) than taking the screenshot in KM. Also, doing the screenshot in KM gave me access to other things like specifying a rectangle, (which made the OCR even faster) turning off the sound effect, etc.

I guess I was confused when you said "That single action ..." I thought you were saying that the screen snapping occurred inside the shortcut, because of those words and the fact that up to that point you hadn't shown the screen snapshot code. So, I'm sorry. The first time I wrote a shortcut, the snapshot was indeed inside the shortcut and my KM code was a single action like yours. So that's my rationale for being dumb here today (also, I just woke up and I'm still sleepy.)

Hans-Peter_Henkel · November 29, 2021, 7:02pm

Thanks a lot @Zabobon. I really appreciate your help and know that you chased me into the right direction a bunch of times in the past. I will keep this in mind if I might have a task that makes worth the effort. Even though I am pretty familiar with Apple so far I never worked with Shortcuts, neither on iOS nor on Mac OS. As I just quick checked it on iOS it immediately triggered my wish KM would be available for iOS.

As already mentioned in one of my previous posts I was so happy with my results so far and now even more implementing the recommendations of Peter (which I will reply to separately in a minute) using the OCR Area related to the window position. Just this one action reads the very small area I decide (which makes it super fast) and converts it into a variable which I then use to target the further direction of my macro.

Currently I don't have the need to OCR a lot of text from the whole screen or a part of it. I just wanted to read the status of my popup which can be taken in a rectangle of 153x22. And this feature combined with some If Then Else actions will probably help me in a lot more situations in the future. Of course a small step for you Pro's but a real BIG ONE for me on the opposite site of related knowledge.

Thanks again!

Hans-Peter_Henkel · November 29, 2021, 7:08pm

Thanks again so much @peternlewis for this really helpful hint. It took some time to get the correct coordinates for some unknown reason but probably just based on my faults. Now it is working that great and as written in my previous post I definitely will implement this experience a lot in the future.
KM is awesome, thanks again!

Sleepy · November 29, 2021, 7:51pm

I recognize the wink, but there may be some approaches that you haven't considered yet. For example, you can run an iOS simulator on macOS, and I think there's one in xCode. So using a simulator, I think you can use some of KM's actions to read and control an actual iOS app.

Furthermore, there's probably a way to take an actual iOS device that you own and physically connect it to your macOS computer, (perhaps xCode can also do this) and rather than using a simulator, actually use your real iPhone connected to your Mac and, seeing the iPhone's screen on your Mac, use some of KM's features to see and control it. I would really appreciate it if one of the supreme beings on this website tell us if this is possible. It should be possible. I think I recall during an Apple keynote event that they did this once. Of course, some of KM's features like triggers won't work with iOS's events.

I don't mind if this topic is split off into a separate thread.

Hans-Peter_Henkel · November 29, 2021, 10:51pm

Thanks, I read/heard about this new features.

Indeed the wink is the most important part for me. I don't like the way these shortcuts on iOS are implemented or designed. Maybe I am just too long used to KM and even just for that reason didn't have the motivation to give it a serious try.

In addition I don't really have a need for automation on iOS. I am one of the really old-fashioned guys. If I like to be productive I wouldn't use mobile devices at all. Except for everything which is great on a touch screen like hand writings, drawings or virtual UI's like for mixing consoles, remote controls etc.. In all this scenarios automation doesn't have a priority for me. And everything else I have to be productive with I like to do on my Mac.

I might have a use case where I like to try the new integration of iOS devices into Monterey. But my current schedule does probably not allow to get into it before the end of the year, if at all.

Screen Capture Using OCR for Variable Conditions

Options