Audio Waveform Analysis

Hey guys,

I’m an video editor and I am looking for a method to automatically mark audio in my raw footage in order to have the possibility to jump to the spots when dialogue is happening.

My idea is to zoom in the relevant audiotrack and navigate the mouse over the sample plot at a certain Y-position.

Whenever the colour at the current position is not yellow (the colur of the audiotrack when there’s no audio) a marker should be set.

After that event the mouse should move until the colour is yellow again and then start the process over again until the end of the track is reached (still wondering how I am processing this).

I’m quite new to KM and this is definitley overwhelming me. Maybe some of you guys could help.

Thanks in advance.

Best,

Simon

I think this is easily solvable. If you know (or can obtain) the screen coordinates of the waveform window, you could easily write a KM macro that uses the PIXEL function moving from left to right and checking the colour of a pixel near the centerline of the waveform is black or gold. Once it's black, you know there's sound, and you wouldn't consider the sound to be "off" until you had at least ten(?) gold pixels in a row. Then you would move the mouse and click to make a marker when the pixel is gold again.

The last time I used the PIXEL function, it worked well, but it was a little slow. If you want to use the PIXEL function across every pixel in that image, it might take a while. You didn't say how fast you needed this macro to be, so I'm not sure if PIXEL will meet your needs. Tell me how fast you need this to work. I can think of a much faster method, but you would need to install ImageMagick and take statistical measurements of colours, and this is fast, but it's a huge kettle of worms to deal with.

Hey Sleepy,
Thanks for your reply. That's very interesting. Basically I don't need it to be super fast since I would regard that routine like a render outside of the actual work (at night or on another device)
So basically I would be interested in the PIXEL-workflow (although I have a look on ImageMagick but since I'm not a coder, this is probably too much for me)
Have a good day!

I can probably draft some code for you, but the problem is that it won't work for you without modification because I don't think I can possibly know the exact colour values of the gold and dark grey pixels. I'd really have to see an unadulterated non-compressed image to be able to write accurate code.

That would be awesome, man. I'm just wondering how to increase the quality of the screen snapshot.
I think the colors of the attachment are true but I just give you the colour values to be safe.
The golden/yellow track colour has the values R:148; G:114; B:22.
The black/grey audio colour hast the values R:37; G:37; B:37
I'm not a total KM beginner so maybe I'm also able to modify certain code to adapt it to my needs.
Anyway, thanks a lot in adavance. This would ease my editors life big-time.
Cheers,
Simon

That helps a lot, that you were able to determine that. (Compressed images usually damage the RGB values, so thanks for fetching that data.) Right now my priority is getting a technician here to replace my broken thermostat. Without a thermostat I don't have a furnace, and in December in Canada getting a furnace is a fairly high priority.

Good luck with your furnace and no stress with my demand.
Thanks a lot in adavance!

Sorry, I kinda forgot about your request. My furnace got repaired yesterday. There was a hot water valve that had been closed by the last technician. You'd think a half-decent computer programmer could solve a simple problem like that! But, how many computer programmers does it take to change a light bulb? None, that's a hardware problem.

I'm working on it now. I'm almost finished. But I have a concern. You gave me the pixel values for the gold and black areas. But when I look at the image you supplies above, there are other values besides that. This is called anti-aliasing. Are you SURE that you don't have that? If you do, I have to modify my code. Look at the colour values on the edge between the gold and black. Do the values change on the borders?

EDIT: I think it's working, even if you have anti-aliasing. Just one more question: what is the colour of the screen just to the right of the waveform chart? Right now I'm assuming that it's white in order to detect a stop condition. You can change that to a different colour which represents the end of the chart. My program is surprisingly simple. Just eight KM actions! Amazing! You will have to add actions for whatever you want to do when it hits a location where sound is detected. BUT if I've written a program which doesn't do what you want, I'm willing to try again. It's entirely possible I misinterpreted something.

You might want to change the hotkey. By a coincidence, it happens to be the same hotkey Safari uses to close the current window.

Audio Macro.kmmacros (8.2 KB)

When I added support for anti-aliasing, I introduced a small bug. It performs your actions once after the right edge of the waveform is reached. I suppose the simplest way to fix this is to add an IF statement in the loop that checks if the PIXEL is >0.98 and if so exit the loop. So that's an extra statement, if you. need it. The reason I didn't need it is because I just manually aborted the macro after it approached the right side of the image.

Hey Sleepy,

Thanks so much for your effort! That’s very interesting and informative for me.

Unfortunately it doesn’t work the way it should.There are several issues I hadn’t thought about.

But I might have an idea for another workflow.

Is it okay if I send you three OCR’s in oder to clarify the challenges and current problems?

It's okay if the app doesn't work for you. Programming is fun. This program in many ways was typical of the kind of thing I use KM for every day.

You used the word "OCR" when I think you meant "screenshot", right? I don't see how OCR can help. Sure, you can send me some images. But bear in mind that when macOS takes images it creates compressed files which may introduce graphical inaccuracies like anti-aliasing.

1 Like

Hey Sleepy,
I guess you misunderstood me. What I meant about different workflow is that maybe you could modify your code to that workflow if you want.
So you find three screen recordings in oder to understand the issues that are occurring right now:
You find the wetransfer-link here: https://we.tl/t-OseDNlT2qX
What I forgot is that I need the green playhead to actually move through the material and make markers at the spots I want. I modified your code to not only move the mouse but to click hold it in oder to move the green playhead. That worked but it didn’t mark the way it should.

I assume, the workflow with the mouse may be too inaccurate. (See ‚Macroplay_statusquo.mov‘)

The most accurate way would be to actually move through the timeline frame-by-frame (Hotkey ‚3’) and make an colour analysis at every movement of the playhead and then hit makrer (Hotkey ‚F3’) whenever there’s new audio occuring. But I don’t know how (and whether) it is possible to determine the pixel colour on the position of the playhead (since it is not the mouse position). In addition to that it is always a problem when the end of the topical waveform is reached because the audio track updates only when the playhead has reached the end of the timeline. (See ‚Workflow_alt.mov‘)

Another idea would be Workflow_alt02:

I modiified the Avid settings so that the playhead and the mouse cursor stays at one position and the footage is running. That would be normal speed but I’m not sure whether this is too fast for the marker macro. (See ‚Workflow_alt02.mov‘)

Sorry for the watermark within the videos.
If you're keen to find a solution or have valuable advices for me I would highly appreciate it but I also understand when you're not. So thanks anyway and kind regards form Berlin,

Simon

I downloaded your video clips. I understand better what you want now.

Yes, you are correct, I didn't know that you wanted to make these decisions occur while the program was playing back the audio. I'm not sure if KM can process the data that fast using the PIXEL() function. And I don't have your app so I can't test that type of code. In my opinion, you should definitely not be playing the audio while the macro is making its decisions. I don't see why you would want to. This should be done independently of playback because I'm not sure if KM can work fast enough to make the decisions needed when playback is occurring. Can you modify your workflow in some way to perform your tasks without playback? I see no reason at all to require the playhead and playback to occur. Can you explain why you think you need that? I'm happy to help, but it's physically impossible for me to write, test and debug code when it requires a third party app running in real time.

What I think you should be doing is coming up with a way that you can use your mouse and your eyes to solve this problem without requiring that the playback be running during the process. (This is what I was assuming when I wrote the first macro for you.) If you can do this, it should be possible to tweak my code to make it work. (It would probably require using the mouse to scroll across separate pages of the waveform.) If you can't do this, I would probably need a copy of your app to make this work in real time, and even then I'm not 100% sure I could make it work.

Hey man,
Thanks for your immediate reply.
The most accurate way would be to navigate the green playhead frame-by-frame through the timeline. But the problem is that you can't determine the position of the green playhead which goes independent from the mouse cursor, right?
If this is not working then your solution to navigate through the timeline with the mouse would serve best but somehow it didn't work. I think its still too fast for the PIXEL-analysis.

Your idea of doing it frame by frame is reasonable to consider, and finding the location of the playhead isn't particularly difficult, (I've solved harder problems than that) but why make your solution 10x more complicated and much slower than it needs to be? Why not consider my other idea, which I stressed, which was to find a way you could do it manually without playing the sound, and then we could emulate that with KM. Especially since doing so would mean I could probably write the macro without having the app.

Well, this would mean dragging the mouse (and the attached playhead) at a certain height on a horizontal line through the timeline and add a marker when you have audio and otherwise move on.But this is basically the idea we had in the first place and you wrote a good code for it but it just doesn't seem to work.But tomorrow I have another try with your code on my other machine.
Have a good evening/night/day (depending on your time zone) and thanks again for your effort!

Okay. But are you aware that I intentionally left a spot in my code to allow you to insert the mouse clicks when you need an action taken? I couldn't write that code because I didn't know what you wanted to do when the macro found the trigger condition. Is that what you meant by "it didn't work"? That would be because I didn't know what to insert at that point.

Hey man,
I used the last two days to think about a way to make it work without playback.
I guess it's probably in fact just tweaking your code since the intial idea of moving the mouse should still work.
I tried to comprehend your code but always fail when it comes to the PIXEL colour segment.
My first question would be as a matter of simplifying whether it is possible to just distinguish between black pixels (with a certain tolerance to include those anti-aliasing aspects you mentioned earlier) and not. The contrast between the (bright) audio track colour and the dark audio waveform should be clear enough, don't you think?
If this is possible I would like the mouse to stop at that point, perform a left-click on order to get the playhead to the mouse coursor location and then perform the Hotkey action 'F3' which is putting a marker. After that the mouse should go maybe 100 pixels to the right (in oder to avoid too many markers when the audio take is a little longer) and then going on with the inital analysis. I still struggle how to deal with the scrolling through seperate pages of waveform-problem.
I mean my dream scenario would be to have an automized process where KM marks hour-long-footage but this would essentially mean that KM somehow recognizes the end of the current waveform and then go the next bit of the wavefomr but this isn't so easy because when you just drag the mouse through the sequence the waveform doesn't get dynamically renewed.
That's why I brought up the idea of playing back the material in order to have that problem solved because then the waveform gets automatically renewed when the playhead reaches the end of the shown timeline. There would be always an option to speed down the running material to avoid performance problems.
What you think?
Have a nice evening,
Simon

In my code, I simply examined the "brightness" of the "red" pixel. (EDIT: that was originally how I wrote it, but I see now that I changed it to work on the brightness of the overall pixel. Sorry.) I figured that was good enough. There might be other solutions, but if it works, I'm happy. Now let me read your other questions and edit this answer.

That's exactly why I inserted a comment in my code for you to insert your actions there.

You do NOT need to do that because my code already accounts for that. (It accounts for that because of the variable which is set to "50" and lets you change that value.)

Okay, that's a tricky point. You didn't mention that you needed multiple pages because you were imagining this would work on a live playing program. But it doesn't. So either you need to shrink/scale the length of the waveform and adjust the number "50" to accommodate that, (and using as large a screen as possible) or we have to deal with scrolling. I'm not intimidated by scrolling, but it's hard for me to debug that when I don't have your code.

This is a new tricky point. And I'm not sure if I understand it correctly. My algorithm was not designed to handle detecting when the playback is over. It wasn't designed to handle playback at all.

What I think is that we have to take a step back and reconsider the problem, not try to "fix" your specific solution. There's usually a "better way" to solve problems that users raise when the user starts us off by giving us their recommended solution.

You want to "jump ahead" to "dialog" in an app which records both video and sound. Some apps may have add-ons which allow you to do this. What's the name of your app? We should google whether this problem has already been solved. Maybe the app has hooks to help us, such as support for AppleScript. I like to "think outside the box" but I can't even "see the box" right now so it's hard to do that.

The scrolling issue that we are currently pondering is certainly solvable, but I'm not sure if I can write the code effectively without having the app myself. So this is why I'm probing other ideas right now.