"Found image" action not working when search area is set to "the front window"

I'm curious. How do you do this?

Just want to add to this thread that the Chrome issues don't seem to be limited to the "found image" action.

I have a macro group that I've used for years, that is enabled when Chrome has a focused window title that is "Read later".

This macro group is suddenly broken; the hot key triggers I've always used for the macros in this macro group suddenly do nothing. I'm guess that's because the macro group is no longer being enabled when it should be enabled.

I could be wrong, but it seems what might be the commonality between this issue and the original issue I posted about is KM suddenly struggling, at times, to determine what the focused/front window is in Chrome. I'm again tagging @peternlewis just as a heads-up that this doesn't seem limited to the "found image" action.

Yes, I will share it. But after I convert it from the slow OCR to the new, fast OCR.

I combine two techniques. (1) I use a dictionary to save the rectangular location of a given phrase. I think the name of my dictionary is "TextLocation" and if I recall the data stored for each location is a rectangle such as "100,100,300,50". (2) when the user calls the macro, the dictionary is checked first, and if the location in the dictionary contains the text string (using a single OCR check, which is very fast) then we move the mouse to the centre of the rectangle immediately, but if the saved location does not contain the word, then the screen is bisected over and over until the exact location is found. Actually, if I just bisected, it wouldn't find words that cross over a bisected area, so I have to randomly bisect. That slows it down a bit, but it still works. I'm so happy that it works.

When I update my algorithm to use the new Apple OCR, I expect that the speed of my bisection algorithm will take only "seconds" but under the old OCR, I find it takes up to a minute. For many purposes, a minute is unacceptable, but most of my macros run when I'm not at my computer, so for me a minute is perfectly acceptable, and this only happens when the word is located in a different location on the screen, so it rarely needs to be called anyway.

The main difficulty this macro faced was the time it took to perform a full screen OCR, which seemed to be over 5 seconds. But the OCR got faster and faster as my bisection algorithm worked its way to smaller portions of the screen. However the new "Apple OCR" seems to take about 1 second to perform a full screen OCR, no matter how much text is on the screen. So this will be a major improvement.

1 Like

I can't wait to get my hands on this. :fire::fire::fire:

Nice to hear that. But I basically described all my logic at a high level, and anyone could write the macro now, especially someone like you. :slight_smile:

Actually, I don't know the first thing about dictionaries, so that would be the first stumbling block.

In my opinion, the real skill was coming up with the whole idea, "Hey, why don't I bisect the screen randomly to locate a word on the screen, since the OCR programs don't return location data?"

Coming up with that IDEA was the real skill. Encoding it was actually kinda boring and frustrating.

1 Like

It's simple. Here's a KM variable that stores numbers in a sequence:

image

A dictionary is exactly the same thing, but instead of using numbers to access the items, like Local[3] which fetches the value "7", a dictionary uses a string as the index, kinda like this:

MyDictionary["Click to Continue"] = somevalue

In my case, "somevalue" is a list of four numbers representing the area on the screen where "Click to Continue" was last found.

1 Like

What if the target word is between regions?

If you mean "crossing the line between regions" then you didn't grasp what I meant when I said "random bisections." You see, if the full word isn't detected in either region, then the bisection line is redrawn in a different random location, and usually that results in the word being found on one side. I know that isn't obvious from my high level description, but maybe now it's clearer. If you are trying to write this code yourself, before I upload it, go for it! I'd be happy for you. I take great joy in writing code. Sometimes I delete all my KM macros and rewrite them from scratch just for the joy of making them better.

1 Like

Ah, so you're gradually zeroing in on the word...? If I understand rightly, you might start by scanning the top and bottom halves of the screen. If the word is found in the top half, then you'd scan the left and right halves of that and so on..?

Or maybe you scan vertical overlapping strips...?

That's a really clever use of OCR.

Yes, I am gradually zeroing in on the word. During debugging, I draw rectangles on the screen so I can see how quickly it is narrowing in. At first, it narrows quite quickly. It's the last 10% that takes 90% of the time, as the smaller my candidate block gets, the more likely a bisection will fail.

Your idea of alternating between vertical and horizontal bisections is one of the ways to do it. However since in English a word is wider than it is high, (which is a subtle point that it took me days to realize) I think my most recent algorithm favours horizontal bisection lines, especially near the end.

No, I don't use multiple overlapping strips. That would add too much complexity.

Thanks for the compliment. As I said, I'm quite proud of this macro. I think it should be included in the default set of KM macros just for its usefulness and maybe its beauty.

1 Like

I've just rewritten the entire thing from scratch (maybe two hours of work? I didn't time it.) I rewrote it from scratch because I wanted to replace the old OCR with new Apple OCR. Now technically that should have been just a few action changes, but I wanted to do a full rewrite because I don't want anyone to see sloppy looking code, and because I truly enjoy writing code, and because I felt I could do better than before.

As I predicted, it takes ~5 seconds to locate a word on my 4K screen. (It used to take up to a minute using the old OCR engine.) And I'm still on an M1 Mac Mini, so it should be faster on an M2 or M3, or even faster if you're on a smaller screen. (NOTE: the 5 second duration is only when the word isn't found where it was before, so depending on your situation, the macro may be virtually instantaneous.)

I tested the old OCR and new OCR, and found that on my Mac the old OCR takes 8 seconds (on average) to read a full screen of text, while the new Apple OCR takes 1.3 seconds. (Results vary from time to time, and results also vary by how much text is on the screen.) That's about 6 times the speed. But that's not the whole story, because Apple OCR is enormously more accurate. So Apple OCR is 6x the speed and over 10x the accuracy.

So when my algorithm runs, the Apple OCR takes 1.3 seconds to read the entire screen, but then as the screen gets subdivided each further OCR takes a fraction of a seconds. Technically, I don't have to read the whole screen right off the start, as I can just start by doing a subdivision, but it just seems cleaner my way. I guess I could remove the 1.3 second check, which means my macro would run in 3.7 seconds.

Now I need to remove some debugging statements, add comments, add documentation, etc., before I upload it.

I'm also going to tinker with the termination logic. This isn't as easy as you think, because there's no perfect way for the program can know when it's finished narrowing its search. Basically, when it's not making any more progress, it can assume it's finished. There's also a mathematical way to determine when it's finished, which I think I may use instead.

2 Likes

Chrome does not provide Accessibility information unless it decides that the accessibility system is in use by the system.

You can try launching Chrome like this:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --force-renderer-accessibility=complete

And perhaps that will help.

Thanks, @peternlewis. I gave this a shot, but no luck; the issues I've been experiencing persist.

Happy to share some screencasts with you directly if it would be helpful.

Forgive me if this has been asked and answered before, but...

If we can use %WindowFrame%1% to perform a front window found image search, shouldn't Chrome's failure to provide Accessibility info be moot for this particular task?

1 Like

I'm curious about this, too.

To me, the fact that this isn't just a "found image" problem suggests that the issue is KM/Chrome struggling to know what the front window is, whether in the context of a macro group that's only supposed to work in a certain window, or a "found image" action that's only supposed to search the front window.

I use KM for Chrome more than anything else, so this issue is currently making KM pretty useless for me.

OK, this is getting a bit too hard for me to follow on here.

Please email support@stairways.com with:

  • Keyboard Maestro version
  • macOS version
  • The simplest case you have where the macro does and doesn't work with Chrome.
  • Include the two macros.

Thanks.

Sounds good, Peter. E-mailed you yesterday. Thanks.

1 Like