Web Scraping Multiple Items With XPath / QuerySelector

I've been reading some posts here about how to click a single link.

My case is that I have a query that will return multiple elements, and for each of those I'd like to do the following:

  • Find URL to thumbnail image in children of sibling element.
  • Copy URL, remove part of it.
  • Open URL in new browser tab.
  • Save returned image (now the original). CMD-S keystroke + Return.
  • Close tab. CMD-W keystroke
  • Process next XPath element.

My question is, is it possible to, say, execute a first query to get all the "anchor" elements list out first and place the result in a KM variable, and then for each of those, execute a single XPath to read out the sibling URL data and process it?

I'm asking to figure out where to place responsibilities, purely in a huge chunk of JavaScript or can I place some of the logic at KM level, and loop there?

Thanks!

Continued my search this one, in which there's an example on how to return data from JavaScript. Seems promising for the task.

actions:Execute a JavaScript in Browser [Keyboard Maestro Wiki]

If I get this to work it'll shave of hours spent each week...

Noice!

Hey @christerdk,

It's very difficult to answer these sorts of questions without having a concrete example to work with and test...

That said – on first inspection I think it likely that you can scrape your main and child URLs with JavaScript and then offload the other tasks to Keyboard Maestro.

-Chris


Moderators Note:

Post #3 – Post #6 were accidentally dumped into a private message. Unfortunately the Discourse software wouldn't allow me to move them verbatim, so I had to do it the hard way.

christerdk's messages are labelled as From: @christerdk


From: @christerdk


Hi!

Yep, I’ve been experimenting with with some node traversing. There’s are some details that’ll make it a challenge, some items are images and some are video, and there may be a collection the needs to be “swiped”. I can do all that in JavaScript, so I guess I can return a string with a separator character, and then loop that in KM?

Not so sure about the latter, not yet at least, because I don’t believe I saw a split function in the article, that’ll handle an unknown list size?

I’ll follow up and post my findings, so that others can benefit from hearing about the strategy and division of work.


From: @christerdk


So far so good, I have everything I need in one string now.

I've been looking around in the documentation, and I'm not picking up a tactic loading, splitting and iterating over a string with content "Value1;Value2;Value3".

Any hints you can give?


From: @christerdk


No need, I found this: Iterating Through Entries in a String

Been running it agains some test data, looks good :+1:t2:

You can do it that way, but it's easier to join your URLs in JavaScript and then use:

Hold on, you didn’t … web scrape the private messages? :laughing:

Ok, per your suggestion, I’ll try using new line instead of the semicolon separator, as it’ll require less string manipulation in KM :+1:t2:

1 Like

Following up, I have it running now. But I'm experiencing some behavior, I can't quite get my head around.

I am downloading a list of jpgs and mp4 in Chrome tabs, one at a time. In a previous, more naive version of my script, I waited 5 seconds and 20 seconds respectively to wait for the Save as dialog to show up after a command-s keystroke. Not optimal, and the dialog has a tendency to take it's time to show up with the videos. It worked, until there were some failed saves on the videos.

So after researching a little I replaced the wait time action with if statement that waits for a Save button to be enabled.

Some observations: The Save as dialog shows up (after quite a while), but then it seems as if KM just quickly jumps to the next item in the outer loop, opens up a new tab, not finishing the work in the save as dialog. It reminds me of threaded behavior, from where the if-based wait behavior is activated and onwards. A debugger Start statement inside the if wait is also ignored.

Should I restart something? :smiley:

(I have restarted the engine a few times, just to rule that out...)

That's odd...

Post what you're doing, so we can take a look at it.

If we're not testing, we're guessing...

True words.

As I was writing out comments in the macro, I realized that I should have made a simple Display text in else-section of the if group with button condition. Turns out that there's no hit on the condition. So no relation "threaded behavior" as far as I see - plesae write it off as a bit of late night tunnel vision.

Now, it's still unclear to me why the condition isn't met. Here's a screen shot of the save dialog, just to rule out any OS differences:

Screen Shot 2022-12-04 at 08.21.49

This is the condition:

I've tried "enabled" as part of the condition, too, same behavior. I've also checked for accidental whitespace in the condition.

For reference, I've added the macro, too. Careful, it's the work of a KM beginner :smiley:

Download images and videos (with test data).kmmacros (22.5 KB)

Display-Macro-Image

Just for reference, I'm running this macro on Monterey 12.0.1, on close to updated version of Chrome.

Wrong action, I think. The "If..." action is immediate, the "Save" button doesn't exist (dialog not present yet), so "If..." goes straight to the "otherwise" section.

I'm guessing that what you actually want is a "Pause Until..." action, conditional on "the 'Save' button exists". That hold your macro until the "Save" dialog is present, and then the following actions will set to work on the dialog.

2 Likes

Yes, you’re right, thank you!

1 Like

Hi All,

For the sake of anyone trying to do something similar in the future, I decided to make this write up, so the solution is easily accessible without having to scroll through the whole discussion.

The division of work is as follows:

  1. Create a JavaScript function to traverse all nodes and pick up the URLs I needed (jpgs and mp4 links). The result is a linefeed separated list. Currently I'm running this manually to separate the two tasks and overlook the function, but it possible to merge that into the KM macro later. I used QuerySelector to get all the elements needed, but to gather the specifics I had to test for the presence of child elements and so on. I developed the script "raw" in the console of Chrome (no IDE on this Mac). There might be smarter way :wink:

  2. The responsibility of the KM macro is to iterate over the collected links, each of these opening a chrome tab, waiting to download, press command-s, wait for dialog with save button to be present and save.

For those new to KM, I recommend a divide and conquer approach:

Meaning not everything needs necessarily to be automated in one, first go. From a generic perspective the solution is essentially two parts, gathering the data in JavaScript and then processing in KM macro. They both have their sets of challenges, so I recommend figuring out the data contract first (in this case the handover of a list of URLs, but can be whatever necessary) and work from there in each part.

Noteworthy resources:

Execute JavaScript in Browser Actions

Need Help with Using KM Variables in JavaScript

(This one is an extension of the previous link and shows how a function can be implemented and return a value:

Lines In collection

By using linefeeds as a separator, one avoids the need for splitting the string by an arbitrary separator in KM. YMMV, it depends on how complex your data is).

A big thank you goes out to @ccstone and @Nige_S for their help!

2 Likes