Extract page content when JS slightly changes

I've been using KM for a week now and I'm very satisfied with it; I will almost certainly buy a license when the trial period expires. Until now I managed to find solutions to all the problems I encountered by just searching the forums, but this method wasn't helpful with the the issue I'm currently struggling with.

To get to the point, I'm trying to create a macro that extracts names, dates and other miscellaneous information from any blog post from a particular website. For this, I trigger the action 'Execute a JavaScript in Front Browser' multiple times, each with the code

var divElem = document.querySelector('[JS path of the content to be extracted')
divElem.innerText

Unfortunately, the site in question uses different paths for each blog post, so that e.g. a path that successfully extracts the date from one post fails to extract it from all others. To take a concrete example, the JS path for the date in this post is

#node-1995 > header > div > span.published

but for this other post the path is instead

#node-1997 > header > div > span.published

So my question is whether there's a generic way of specifying a JS path that would capture the same type of content across all pages from this site. Apologies if the answer to this question is obvious, and thanks in advance for any help.

Most likely there is a solution, but maybe not just one JS path. It will probably require a JavaScript of some type. The answer is not obvious at all.

Just took a look at the actual web pages -- thanks for providing them.

Looks like you are in luck. This JavaScript works on both pages:

document.querySelector('span.published').innerText;
//-->"June 15, 2020"
//-->"May 08, 2020"

The key here is using something other than the ID attribute -- which is often dynamically determined. The tag and class were sufficient in this case.

2 Likes

Perfect—thank you so much.

1 Like