Need to Monitor screen for change...possible?

drbrad42 · March 21, 2025, 3:39pm

Okay, here is what I am trying to do.

I have a .mov of an hour-long Keynote presentation playing on screen one (in English)
On screen two, I am doing a screen recording of clicking through the exact same presentation but with slides that have been translated into a different language.
The idea is to create a recorded version of the original English presentation but with slides that are in another target language, so we can combine the English audio with the translated slides, etc.
I have been paying a guy to click through the slides manually, in sync with the original English presentation, but it's mind-numbing work, and it's not working very well.
Is there a way that KM could monitor screen one to trigger a keystroke on screen two when the presentation changes on screen one, when a new bullet point appears, or when the slide changes to the next slide? This seems like a perfect application for KM to make the computers jump through hoops, but I have no idea how to monitor an area of the screen for changes like this. I have used the 'on found image' a lot, but what do you guys think about this? Can this be done?

Airy · March 21, 2025, 6:31pm

This sounds like a fabulous (even spectacular) place for OCR to save the day. I'm lovin' it.

In order to make this problem extremely easy to solve with OCR, you need to add slide numbers to each slide. And because Apple's OCR can't do OCR on single digits, you either need your slide numbers to start with "100" or you need to get each slide number modified to include the word "Slide", like "Slide 1", "Slide 2", etc. These things are possible, but since you may already know how to do that, I won't explain how. (If you can't add slide numbers, it's probably still possible, only 100% harder.)

Here's what the working code could look like. Bear in mind, I didn't test this. But I write code like this so often I suspect it will work once the correct coordinates (and slide numbers) are added. What this macro will do is click the mouse on a specific location whenever the slide number changes. As far as I can tell, that's what you want. However if slide numbers don't change when bullets are added, we will have to make some modifications to get this to work. Everything is possible, with a little extra code.

griffman · March 21, 2025, 6:55pm

The trick is going to be that

That's the challenge: Bullets are items on one slide, so the slide number won't change until the next slide is opened. They also want to detect changes in presentation (the video opens a different presentation?), which might also cause this technique to fail.

Important edit: The thing on the other display is a movie of a presentation, so the user may not be able to have changes made to those presentations.

I was thinking of a macro that took a screenshot every second, then compared the newest shot to the n-1 shot, and clicked if they differed. But I sort of stopped thinking about it when I tried to figure out how I'd compare the screenshots :).

-rob.

Airy · March 21, 2025, 7:38pm

I've done that once! I can't remember how, but maybe I still have the macro. (I think my problem was when I wanted hockey scores in the News app to be texted to me if the image in the News app which reported the score changed.)

Well, I actually have a plan if that's true in his case. It shouldn't be hard to solve that.

Truth be told, I still don't understand the big picture here. It would be really helpful if I could see some sort of video clip showing what's happening.

drbrad42 · March 21, 2025, 7:46pm

I'll post a more graphic explanation tomorrow! Thanks for the help, guys. There has to be a way to make this work, but it's beyond me, so I'm very appreciative!

drbrad42 · March 22, 2025, 4:50am

Here is a two minute video presentation that hopefully makes things more clear:

Airy · March 22, 2025, 11:04am

Wow. Very clear. Very sophisticated video.

I see two possible solutions. At this point, I can't tell which of my solutions will have the least amount of "lag". However I can see that one of my solutions ("Solution A", using image comparison) has 100% reliability (but see WARNING below) and I'm not sure if the other one ("Solution B", using OCR) will have 100% reliability.

I have a suspicion that Solution A will have a little more "lag" than B, but I have a suspicion that Solution B may be less reliable (I.e., have false detections of motion). So you may have to try both solutions. I have used both approaches for problems in the past, problems which are similar to your problem.

The OCR solution (Solution B) is fairly simple. In fact, it's basically the same as the code I gave you above, but replace the "area" parameter by a different parameter which reads your entire "left screen". However there are ways to improve the reliability of this code if it is giving you too many "false detections of motion". So I recommend that you create that macro and give it a try.

While you do that, I will look into my macro library for my code which solves the problem by image comparison.

WARNING: I see from your short clip that your transitions are all "instant". Both of my approaches are intended to work with instant transitions. If you have words or images that are animated between transitions, this MIGHT be a problem. There are ways to solve animation issues, but I really hope your short clip above is accurately indicating that you do not use any animations.

Airy · March 22, 2025, 12:40pm

Okay, I have a new macro that will detect if there is ANY change on a given screen (except the mouse pointer, which is perfect.) It seems to work. But it's very dependent upon the application. For example, if you run this macro while the macOS GUI is visible, it will USUALLY detect a change, whether it's just the flashing text cursor, or the clock in the upper corner of the screen, etc. So you should not run this macro unless you have a full screen app running, like Keynote. And even then, any animation in Keynote will trigger a change detection.

Also, on my high-res 4K Mac, it takes about one second to detect the change. That can be fixed by temporarily switching to a low resolution setting prior to starting this macro. You can probably get it to run much faster if you do that. And since all you are doing is running a Keynote presentation, that should be a perfectly acceptable compromise. Low-res won't harm you.

At this moment the macro simply speaks "1" if it detects a change, and "0" if it detects no change. This can be fixed later in order to add the functionality you want. For now, we just want to run in this way to see if it works at all, and to see how much lag there is on your Mac.

I recommend you give it a test. You should assume it will only work reliably in a full screen app like Keynote (when running a presentation.) Your goal is to determine if it is reliably detecting screen changes. You should also measure the time between each spoken word "1" or "0". That time will determine the "lag" that this macro will give you. You can decrease the lag by using a lower screen resolution.

I am curious about the accuracy of this macro in your Keynote app. It might be anywhere from 0% to 100%. In my case, I'm able to get it to work nearly 100% accurately but only when in Keynote. In some other apps, there are small pixels on the screen that change regularly.

I'll upload the macro in a moment... OK, here it is...

Detect Screen Changes Macro (v11.0.3)

Detect Screen Changes.kmmacros (8.7 KB)

drbrad42 · March 22, 2025, 1:04pm

Wow, thank you! This makes sense, I will test it out today and get back to you when I find out if it works : ) Should I assume that the two screens need to have identical resolution to make this work?

Airy · March 22, 2025, 1:22pm

No, only the left screen needs to be low-res. But if a short lag is tolerable, you may not even need to do that. It depends on your tolerance for lag and the speed of your Mac, neither of which I can determine from here.

I am eager to know how reliable it is. The algorithm will "seem" to fail if there are any pixels changing on your screen, even if you can't see them changing. But my algorithm is not failing, it's just "too good." Keynote might have a way to temporarily disable animations if this becomes a problem. So even if you run into problems, they MAY be addressable.

I still think that you should try my OCR macro. I suspect it will have lower lag. I'm not sure how many mistakes it will make. If you do try it, lowering the screen resolution will help there too.

P.S. I'm starting to give some thought to a third approach, entirely different, which should have virtually no lag and virtually no chance of errors. The above two macros didn't take me much time to write because I already solved those problems months or years ago, but this new approach is something I have never tried before, so it may take me weeks to figure out.

Airy · March 22, 2025, 1:31pm

Yes, that's not easy to solve. My macro above does solve it, but it took me a long time, months ago, to get it to work. KM needs to have a new condition that compares two clipboards which contain images. And going one step further, that action should support the fuzz factor bar.

I don't usually tag @peternlewis but this time I will: can you add a condition that lets us compare the contents of two clipboards containing images? (That shouldn't be too hard.)

peternlewis · March 24, 2025, 2:18am

There is no compare image action, for bot the case of comparing to the screen, you can do that now with something like:

Its a pet peeve of mine that it is a bad idea to ask for something and explain how little it is (eg, "can you drive me to the airport, it wont take long"). It diminishes the requestees time/value.

In this case, it would require adding a new condition, and consideration would have to be made as to dealing with fuzz. It's probably relatively straight forward, but its surprising how often relatively straight forward turns into something else - for example, Does the match system extend to matching entire images? Should the fuzz levels be different? Does the match system actually work on images, or only on the screen, and if it is only against the screen then extending it to work against an image might be a considerable amount of work.

Almost certainly you would not want to require an exact match since that will almost never happen given the any screen image that contains text has some level of subpixel antialiasing.

Airy · March 24, 2025, 7:30am

I'm so, so sorry. I didn't think of that.

Your approach may indeed work for my case, although the DPI action adds an element of resolution-dependence, which means it won't work without modification for people with different DPI values.

Yes, and that's a good point, as the macro that I offered above may fail specifically for that reason. I definitely should investigate using your recommended approach.

peternlewis · March 24, 2025, 9:29am

I suspect 144,144 is the correct number for any retina screen. 72,72 for any non-retina screen.

But yes, it's an issue.

drbrad42 · March 24, 2025, 6:41pm

Still working on this.... I will let you know when I get something working... Nothing yet, but still at it!

Airy · March 24, 2025, 6:45pm

Sure. I think Peter had a really important observation that could help me to make my macro more reliable, if you find my macro to be unreliable.

Actually, I gave you two solutions (OCR and screen change comparison). Maybe you could test both.

drbrad42 · March 26, 2025, 5:30pm

Working on this still, in between other things I have to take care of. An interesting idea came from ChatGPT:
Here’s a Python script you can run on your computer to generate a log of when the slides or bullet points change in a PowerPoint presentation video. This script uses OpenCV and scikit-image to compare frames and logs timestamps when significant visual changes are detected.
import cv2
from skimage.metrics import structural_similarity as ssim
import pandas as pd
import os

==== CONFIGURATION ====

video_path = "YOUR_VIDEO_FILE.mp4" # Replace with your filename
frame_check_interval_sec = 2 # Check every X seconds
change_threshold = 0.94 # SSIM threshold for detecting change

========================

def detect_slide_changes(video_path, interval_sec=2, threshold=0.94):
video = cv2.VideoCapture(video_path)
if not video.isOpened():
raise IOError("Cannot open video file.")

fps = video.get(cv2.CAP_PROP_FPS)
total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
interval = int(fps * interval_sec)

prev_frame = None
changes = []

for frame_idx in range(0, total_frames, interval):
    video.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
    ret, frame = video.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    if prev_frame is not None:
        score, _ = ssim(prev_frame, gray, full=True)
        if score < threshold:
            timestamp = frame_idx / fps
            changes.append({"Timestamp (s)": round(timestamp, 2), "SSIM Score": round(score, 4)})
    prev_frame = gray

video.release()
return pd.DataFrame(changes)

if name == "main":
if not os.path.exists(video_path):
print("Please update the script with your actual video filename.")
else:
df = detect_slide_changes(video_path, frame_check_interval_sec, change_threshold)
output_file = "slide_change_log.csv"
df.to_csv(output_file, index=False)
print(f"Change log saved to: {output_file}")

Output

The script will create a slide_change_log.csv file with:

Timestamp (s): The estimated time of slide or bullet change.
SSIM Score: The similarity score compared to the previous frame (lower = more different).

I have not tried this yet, but will keep you posted..

Airy · March 26, 2025, 5:34pm

All power to you. I don't mind if you find a solution that works. I wasn't sure how reliable my solutions would be.

Need to Monitor screen for change...possible?

==== CONFIGURATION ====

========================

Output

Options