Grabbing web page content with RegExp

I'm trying to grab some info from pages like this (tying it to an InDesign thing so I can fetch info easily from time to time):

I'm trying to only extract the bit that says:
"Photo of Bobbie Hebb
UNSPECIFIED - CIRCA 1970: Photo of Bobbie Hebb (Photo by Michael Ochs Archives/Getty Images)"

Obviously this will change from page to page, so the RegExp has to reference stuff around it. My script uses select all, and copy, then tries to RegExp from the content, but I don't seem to have any luck. I've tried using "\r" and "\n" but I think I must be overlooking something fundamental. At this stage you can probably see that I'm still troubleshooting this and using display text to debug.
CleanShot 2024-08-30 at 17.39.50

On database-generated pages like these, the content might change but the underlying structure remains constant. In your browser you can "Inspect" the page element you want to see if there something -- usually a class or id -- you can use to get the data you want with JavaScript.

In this case the title is tagged with data-testid="title" and the caption with data-testid="caption". Using two actions to make it easy to see what's going on:

Getty Caption Grabber.kmmacros (4.7 KB)

Image

Set the hot key to what ever you want and it should grab the info from the front window of your frontmost browser -- assuming, of course, that your browser supports KM's "Execute a JavaScript in Front Browser" action.

If you do need to parse the text, try a "Search using Regular Expression" instead. You should be safe using "Get the bit between Editorial Images and Save" to get both bits at once, then separating them later. The trick is to signal that your regex should include linefeeds in the . class, which you do by putting the (?s) flag at the start of your pattern:

(?s)Editorial Images\R(.*?)\RSave\R

image

...giving you something like:

Getty Caption Grabber (text).kmmacros (4.9 KB)

Image

...which you may have to tweak, as different browsers can present the copied text of a page in different ways.

1 Like

Thanks so much for this. With the supplied info I was able to create exactly what I needed and learned and understood a few more processes in KM. The javascript technique in particular was completely new to me.