Web Scraping Multiple Items With XPath / QuerySelector

Hi All,

For the sake of anyone trying to do something similar in the future, I decided to make this write up, so the solution is easily accessible without having to scroll through the whole discussion.

The division of work is as follows:

  1. Create a JavaScript function to traverse all nodes and pick up the URLs I needed (jpgs and mp4 links). The result is a linefeed separated list. Currently I'm running this manually to separate the two tasks and overlook the function, but it possible to merge that into the KM macro later. I used QuerySelector to get all the elements needed, but to gather the specifics I had to test for the presence of child elements and so on. I developed the script "raw" in the console of Chrome (no IDE on this Mac). There might be smarter way :wink:

  2. The responsibility of the KM macro is to iterate over the collected links, each of these opening a chrome tab, waiting to download, press command-s, wait for dialog with save button to be present and save.

For those new to KM, I recommend a divide and conquer approach:

Meaning not everything needs necessarily to be automated in one, first go. From a generic perspective the solution is essentially two parts, gathering the data in JavaScript and then processing in KM macro. They both have their sets of challenges, so I recommend figuring out the data contract first (in this case the handover of a list of URLs, but can be whatever necessary) and work from there in each part.

Noteworthy resources:

Execute JavaScript in Browser Actions

Need Help with Using KM Variables in JavaScript

(This one is an extension of the previous link and shows how a function can be implemented and return a value:

Lines In collection

By using linefeeds as a separator, one avoids the need for splitting the string by an arbitrary separator in KM. YMMV, it depends on how complex your data is).

A big thank you goes out to @ccstone and @Nige_S for their help!

2 Likes