Believe it or not, macOS has a feature called "Image Descriptions" which provides written or spoken descriptions for images. This is implemented using AI within macOS. So for example, if an image contains a giraffe running in the desert, macOS can provide you with the words, "giraffe running in the desert." Now maybe if the image of the giraffe also contained a blue button, it might actually say, "... with a blue button."
You could maybe try it out. However I can't try it out because macOS currently allows you to use this feature only if you are within the geographical borders of the USA. This feature is still geo-locked.
Sadly, I don't think it provides the location of the blue button. But if it can recognize that the page has a blue button, that might be a good first step.
IDEA #2 (mostly for pragmatists)
MacOS also has an extensive feature called VoiceOver which described the elements on the screen, both spoken and written descriptions. A KM macro could possibly grab this descriptive data find the button then click on it. I'm 90% sure this could work. But it might take 10 seconds to find the button. Is that a problem?
IDEA #3 (mostly for lazy people)
FindImage is very good for, well, finding images. Buttons can certainly be considered images. If the buttons have a consistent colour, texture and/or background, it could be possible to solve the problem this way. Do you have any example pages that you can share?
See what HTML differentiates that element from the others -- it'll usually be a <button>, for obvious reasons!
Write a macro that sends appropriate JavaScript, targeting that element
Difficult to say more without seeing the web app, but that should be doable. You could even ask a non-integrated AI how to write the JS -- although you'll probably get a better answer from the gurus on this forum