KM11: Please help me use Apple's OCR (window titles? match regex?)

The option to use "Apple Text Recognition" OCR might be my reason to upgrade to KM11. However, I'm having a hard time to get this to work (better than the image matching I was using previously) and could use your help. I'm trying to automate logging in to a service from work, which happens in 3 different steps/windows (username, password, TOTP code) in a specific macOS Application.

Currently I have these questions (though more might follow):

  1. How can I find out the exact window title of this Application, so that I can configure the OCR step to only OCR that specific window?

  2. The OCR result seems to be a list of strings separated by newlines. In the last two steps I want to match 3 strings (company name, user name, and either "Enter password" or "Enter code"). How can I put this in a regex without specifying all 6 permutations of those 3 strings?

We really need more info to be able to help.

  • What application?
  • Can you provide a sample of the data you’re working with?

Get back to us with that and we can brainstorm! :wink:

I’d rather not share the Application name, but if someone answers my generic question (how to discover window titles) I think/hope that I should be able to manage that part.

The question about the regex is also rather generic (I thought), but these are examples of the 3 strings that I want to match with 1 OCR result (check that all 3 strings are present in the newline separated list):

  1. Company
  2. first.last@company.com
  3. either “Enter password” or “Enter code”

Or not... Apparently this should do it, but the title of the window of that Application is not included in the output:

osascript -e 'tell application "System Events" to get the title of every window of every process'

Fair enough. The token %WindowName%All% should give you all window names, assuming Keyboard Maestro can see them.

But RegEx, by it’s intended nature, is not generic. So to be able to give some examples, we really need to see some sample data.

I gave sample data above, but if you need them even more specific:

Set 1:

  1. Keyboard Maestro
  2. chris.thomerson@keyboardmaestro.com
  3. Enter password

Set 2:

  1. Keyboard Maestro
  2. chris.thomerson@keyboardmaestro.com
  3. Enter code

For set 1 I don't want to write a regex like this (and similar for set 2 with code instead of password):

((.|\n)*Keyboard Maestro(.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*Enter password(.|\n)*)|
((.|\n)*Keyboard Maestro(.|\n)*Enter password(.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*)|
((.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*Keyboard Maestro(.|\n)*Enter password(.|\n)*)|
((.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*Enter password(.|\n)*Keyboard Maestro(.|\n)*)|
((.|\n)*Enter password(.|\n)*Keyboard Maestro(.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*)|
((.|\n)*Enter password(.|\n)*chris.thomerson@keyboardmaestro.com(.|\n)*Keyboard Maestro(.|\n)*)

EDIT: Alternative thought: would using 3 separate OCR (simple) "contains" conditions (one per item in a set above) in 1 "Pause Until Conditions are Met" action be good enough? (Or far less efficient, because text has to be OCR'd 3 times?)

EDIT 2: Using 3 "contains" conditions is indeed a bit slow, but that does seem to work :tada:

Thanks!

That indeed shows the window titles for most Applications, but not this particular one, as I already "feared".

I probably have to OCR the front window instead.

As for the part of your questions where you want to determine the text of the window using OCR, the following works for me: (you may have to change the number "24" to a different number, because this number depends on certain macOS settings such as your screen's resolution, and you haven't provided those details.)

image

As for the part of your questions where you want to determine if the OCR result doesn't match three conditions, you could do the following, which will cancel the macro if your OCR result doesn't match your three conditions. I think that's what you want to do, right?

Just perform an OCR of the front window, store the results in MyOCRresult, then do the following command:

image

In one of your subsequent posts you explained you wanted to "wait until three strings appeared in the window." You didn't mention that in your original question. I'm sure you can adapt my solution to meet that requirement, but if you can't, let me know.

Sorry, looks like I'm really bad at expressing myself (today?).

I want(ed) to use "window with title" (with a hardcoded window title) instead of "the front window" in an OCR condition (to make sure my credentials are only automatically entered in the correct Application), but the main window of this Application does not seem to have a title, so I'm still using "the front window" (which works).

No, I want to wait until a new window appears that contains all 3 strings, so (as I wrote in my EDIT's above) I'm now using 3 OCR/contains conditions in 1 "pause until" action. This works, but I assume this is less efficient, because the window might be OCR'd 3 times instead of only once?

Since I'm not good at expressing myself, maybe this picture helps?

(%localUsername% is obtained using the 1Password CLI)

EDIT: I can add another (Application) condition in that block: "application X is at the front" (to ensure those credentials are entered in the correct application only)

1 Like

Your method of using three OCR conditions in a single PAUSE UNTIL action is definitely three times slower than using my approach of doing a single OCR, saving the value into MyOCRresult, and then doing an evaluation similar to the one I showed. I would never do it your way, I would always save the result into a variable and then conduct three tests on the variable. E.g.,

image

2 Likes

Thanks!

I did not know the "Execute Actions Until Conditions Met" action.

This is indeed slightly faster.

1 Like

If you know the order of the items appear in the text, then you can use a single regex test, something like:

If OCR matches

(?s)First(.)Second(.)Third

The (?s) flag changes the behaviour of the . character from matching any character except line terminators to matching any character including line terminators.

1 Like

Is part of this whole conundrum that you only want the macro to work in certain applications? If so, can you put the macro into a macro group that is only enabled in the application in question?

1 Like

Yes I know the order (by using a macro that OCR’d the screen and displays the result). I was thinking of using all combinations in case the OCR result is not deterministic, but maybe I’m just complicating things and should try this single order first and see whether it (ever) breaks.

Thanks for the regex. That flag is something I did not know yet either.

Excellent idea! Will definitely do that.

The order of OCR results might change. I wouldn't rely on that.

A little bit disappointing (and Apple's to blame?): "Sign in" was consistently properly detected on the internal (Retina) screen of my MacBook Pro, but on my external (non-Retina) Dell monitor Apple always detects it as "Sian in"...

(I can fix that when reverting to regular expressions, but still a bit disappointing)