Voice Control of Keyboard Maestro under macOS Monterey

I've been playing with the Beta version of macOS Monterey and I want to describe an intriguing new feature of it that I've been using which allows voice control of KM's macros. It opens up a new way to "trigger" macros. I actually had this working a few years ago but the voice control aspects (and the voice detection accuracy) of macOS were much weaker back then, but with new features in Monterey it's much easier. So it's time to post how this new macro works. I presume people will ask me to upload the macros, but this post is my "English description" of how it all gets put together.

Most of you won't be getting Monterey until September anyway, so there's no rush for me to upload my macros.

The first thing to understand is that there is a new application in Monterey called "Shortcuts". This is a pretty lame competitor to Keyboard Maestro, but it does a few things well, and one of those things is an action called "Dictate Text" that allows the user to speak to the computer and the spoken words are converted to text. (Apparently it supports many languages but I tried it only with English.) I created a single-line shortcut called "DictateText" which looks as follows:

There are two settings you select from. The first is "Language," for which I chose English. And the second is "Stop Listening," - for our purposes the best option is "After Short Pause". When this command is run, a pop-up window appears in the middle of the active screen, along with a beep, and then you can speak through your microphone (I had to buy a mic because a Mac mini doesn't come with one.) It waits about ten seconds for your voice to start talking, then starts transcribing your words until you stop talking. At that time you will get a second beep to tell you it is finished listening and the text is returned to the application that called it. In our case the text will be returned to a KM action that looks like this:

Note that this is an "Execute Shell Script" action, and that it calls the "shortcuts" command, and that it saves the results to a KM variable called Speak, and that the action's description tells you to turn off the failure flags on the action (both notify and abort flags, please.)

The next trick is to put the above action in an infinite loop in a KM macro. If it returns nothing, then nothing should be executed. If it returns the name of a macro, then the macro will be executed as the following Execute AppleScript action shows us: (again, disable the notification flags on this action)

So basically it will be listening for your voice commands non-stop. However there's a one second break between the ten second periods of listening. So about 10% of the time you may need to repeat what you said. That's unfortunate, but there isn't much we can do about that.

I decided to add a little code so that it would only call the macro if the spoken words began with the word, "Maestro" (which makes it act a little more like "Hey Siri." The KM action needed to remove "Maestro" was very simple but I won't include it here because it's not very important.

KM macros are most likely "case sensitive" so I had to include this statement to before the AppleScript action above:

For my purposes it made a lot of sense to log all the sentences in a log file, which I did like this:

Then you have to decide what macros you want to write, and what their names should be. Here are a list of the macros I created in a folder called "Voice":

Highlight Location

Most of these macros are one-line macros which contain the single action that their name indicates. A few of them are a little more complicated than that. For example, my macro called "Weather" will fetch the weather for my city and read it aloud. My macro called "Joke" will find a random joke from an internet website and read it aloud. My "Explain Computer" macro will read to me most of the important details of my computer (which can be helpful when you have more than one computer.)

These ideas are still experimental and the overall experience isn't likely to replace "Hey Siri." Although I remember a post on this website from a quadriplegic and he might be very interested in this.

Apple really needs to provide options to the Dictate Text action to allow it to be quiet and invisible. Monterey is still in Beta, but I'm not hopeful that Apple will fix this. I did submit feedback to Apple on several of Shortcuts' deficiencies.

Keyboard Maestro could possibly hook into a Monterey API for the new Dictation feature which may be able to bypass the audiovisual cues. That's a more realistic possibility. I feel comfortable in predicting that there is an API for dictation that will have more flexibility than the Shortcuts app delivers to us.

I did find solutions to some of the problems I described above. For example, I found at least two different ways to "hide" the Dictation pop-up window. But these solutions are too complicated and expensive to merit mentioning in this post. This post is just a simple introduction to voice control in Keyboard Maestro under Monterey.

2 Likes

Thanks for this. It will be very interesting to see how KM can be integrated with these new Monterey features. I’ve recently been experimenting with ways to trigger KM macros and feed information from iOS to MacOS using Shortcuts. Seems that there will be a lot more possibilities.

Even if there are no changes to KM directly when Monterey arrives, the changes to Monterey will make KM better. But it's likely that Peter will find new ways to take advantage of Monterey.

So, how do simple actions like a Mouse Click work using voice commands?

If the only thing you need to do is click the mouse, that's easy to do. Just create a macro called "Click" in a folder called "Voice" that contains this.

Then in my current implementation I would just say "Maestro Click" and it would click the mouse.

However I suspect you also want to move the mouse by voice. I actually implemented that in my Mouse macro which you can see the name of above, but I concluded that moving the mouse by voice to be too slow to be useful. It was fun to see it working, but I can't see it being very helpful.

The mandatory audiovisual responses from the Dictate Text action in Shortcuts can't (easily) be hidden, so this whole system is a little wanting. However it still might be good for certain situations. And when Monterey arrives it might eliminate some of the mandatory audiovisual feedback.

1 Like

Thanks Sleepy!