OCR Screen Area Assistance

kcwhat · February 9, 2025, 10:54pm

Hello Everyone,
I need some help understanding the OCR Action as it relates to the WINDOW function.

For example, the IMDB Ferris Bueller's Day Off page.

Based off of the current position of my browser (center of my display), I use the following action:

As expected I get the following OCR'd text:

Ferris Bueller's Day Off
1986 • PG-13 • 1h 43m

However, if I move my browser screen to the left, I get totally different results.

Now @peternlewis explained the following a few years ago, and I never understood it enough to get it to work - No matter what I do, I'll get a blank OCR:

Can someone explain how I can get the same result no matter where my browser is on the screen? I've tried entering the WINDOW (1, Right) + and - entries with coordinates above and get nothing.

Thanks much!
KC

Nige_S · February 9, 2025, 11:54pm

The action uses absolute coordinates, (0,0) being the top-left corner of your main screen. So you need to, somehow, get the absolute coordinates of the area of interest of the window, wherever that window might be.

So if the rectangle of that area of interest is:

100 pixels in from the left edge of the window
200 pixels down from the top edge of the window
800 pixels wide
125 pixels high
...and assuming the window is frontmost, you'd do:

→ WINDOW(1,Left) + 100
↓ WINDOW(1,Top) + 200
←→ 800
↑↓ 125

Whether you pick left edge or right will depend on how the web page reflows when resized -- you may even need to base from the centre for those that keep the middle the same and pad out both sides!

griffman · February 10, 2025, 5:03am

If what you're after is what you showed...

Ferris Bueller's Day Off
1986 • PG-13 • 1h 43m

...you can get it without OCR at all. Ideally, if you know JavaScript for Automation, you could get it that way. But lacking that, you can brute force the data out with some ugly regular expression work:

A really ugly OCR alternative.kmmacros (4.7 KB)

Step one saves the HTML page—all 20,000+ lines of it!!!—to an HTML file in the /tmp folder; the file is deleted at the end of the macro.

Step two is a really really ugly Regular Expression that splits out all the data bits you want into their own variables. Here's the full ugliness:

\s+\}\).*\<meta property=\"og:title\" content=\"(.*?) \((.*?)\) .*? ([0-9]+\.[0-9]+) \| (.*?)\"\/>\<meta property=\"og:description\" content=\"(.*?) \| (.*?)\"\/\>

The title has encoded HTML entities, so those are filtered out, then the results shown:

It's not the recommended solution, but it does work...until they change their page format even slightly. With JavaScript for Automation, you could probably read those fields more directly, making it more robust. But that's beyond my skills.

I just wanted to offer up a non-OCR alternative.

-rob.

kcwhat · February 10, 2025, 11:59pm

Thank you Gentlemen,

@Nige_S - I was still getting nothing results-wise, using the OCR Area action, until I changed the language to Apple Text Recognition. I had it on Languages - English. I think @peternlewis indicated, a while ago, that Tesseract didn't like white on black text. As soon as I changed it, I got data. Lessons learned. Thanks for your explanation.

@griffman - Your brute force method was mean. I haven't a clue how to use JavaScript for Automation but I'll look deeper at your regex capture groups to see if I can add the Director, stars, writers and description as an exercise. Your setup worked very well for the information groups you pulled.

I made a macro months ago using a @ComplexPoint's old xpath plugin. That works wonders.

This particular exercise was to understand how to effectively use the OCR Screen Area. I still have much to learn.

Thanks for your time,

KC

Airy · February 11, 2025, 12:37am

The "screen area" in the OCR action is no different from the screen area in several other actions. It's just an area identified by absolute coordinate values. I use each of the following actions frequently, which also contain absolute area values.

OCR Screen Area Assistance

Options