Macro Request: Get webpage elements by class

I frequently go to a library results page that has two pieces of information I want:

  • The Listing Title: An <h1> with class="title"
  • The Call Number: A <td> with class="call"

I want to be able to automatically copy the content of these two elements to a couple of named clipboards. Is this possible? Can anyone crack this nut?

So I think that it may be done by using the Execute a javascript in Google Chrome action. Alas, I don’t know much JS. Could anyone help me out here?

The XPath for your target may be something like this:

//h1[@class='title'] | //td[@class='call']

If you are running Yosemite, you can try:

  1. Pasting the code below into the Script Editor
  2. Choosing 'Javascript' (rather than Applescript) from the top-left dropdown
  3. and clicking run

(Make sure that you copy all of the code – you will need to scroll, I think)

function run() {
  "use strict";
  
  var strXPath = "//h1[@class='title'] | //td[@class='call']";

  // Harvest elements from Chrome by XPath pattern
  function pageXPathHarvest(strXPath) {
    var lstWins = appChrome.windows(),
      oWin = lstWins.length ? lstWins[0] : null;

    return (oWin) ? appChrome.execute(oWin.activeTab, {
		javascript:"(" + xpathHarvest.toString() + ")(\"" + strXPath + "\")"
	}) : "No Chrome page open";
  }

  // Harvesting function to run in the browser context
  function xpathHarvest(strPath) {
    var r = document.evaluate(strPath, document, null, 0, null),
      lst = [],
      oNode;

    while (oNode = r.iterateNext()) {
      lst.push([oNode.className, oNode.textContent]);
    }
    return lst;
  }

  // MAIN
  var appChrome = Application("Chrome"),
    app = Application.currentApplication();

  app.includeStandardAdditions = true;

  // Gather any results as [ text ]( href ) Markdown links
  var strResultPath = strXPath,
    lstElements = pageXPathHarvest(strResultPath),
    blnFound = lstElements.length,
    strResult = blnFound ?
    lstElements.reduce(function (strAccum, lstTextLink) {
      return strAccum + lstTextLink[0] + ' = ' + lstTextLink[1] + '\n';
    }, '') :
    'No elements matching "' + strResultPath + '" found',
    strCmd;

  // and return results as text
  return strResult;
}

If it seems to work for your data source, then this execute shell script action may be enough with a few more tweaks.

Harvest title and call no. from Chrome.kmmacros (15.8 KB)

1 Like

And thanks – I have learned something useful from you, because as you suggest, a built-in Execute JS in Chrome action makes it all much simpler:

Simpler Harvest from Chrome.kmmacros (14.8 KB)

(function (strPath) {
    var r = document.evaluate(strPath, document, null, 0, null),
      lst = [],
      oNode;

    while (oNode = r.iterateNext()) {
      lst.push(oNode.className + ' = ' +oNode.textContent);
    }
    return lst.join('\n');
})("//h1[@class='title'] | //td[@class='call']")

PS if you need the harvesting fixed or reshaped, it would be helpful to have a look at:

  • a sample page, or at least at its HTML (`Chrome > Save Page As > webpage HTML only)
  • a text sample of what a useful output would look like (especially if TitlesPerPage > 1)
2 Likes

This is fantastic! Thank you so much! I know KM tells you how much time you’ve saved, but I can only imagine how much this one particular macro will save me!

The script works perfectly.

1 Like