@WEB Extract & Process Links on Web Page Using HTML Class [Example]

example
web_browser
chrome
javascript

#1

MACRO:   @WEB Extract & Process Links on Web Page Using HTML Class [Example]


VER: 1.1      2016-12-26

DOWNLOAD:

@WEB Extract & Process Links on Web Page Using HTML Class [Example].kmmacros (45 KB)

SubMacros Used

This macro uses (but does not require) this SubMacro:

@LINK Process a Web Page Hyperlink @SubMacro.kmmacros (28.6 KB)

It is provided as an example of how you can use submacros with this macro.

Be sure to read the Macro Setup in the Release Notes section below.


Use Case

  • Make it easy for most users, most use cases, to extract all hyperlinks in a list on a web page, and then process each link.

  • None of which requires the user to understand or change JavaScript.

  • Most often, these list of links will either be within a major HTML element with a unique Class, or each link will be within an element that has the same Class for all of these elements.

  • You can easily find this HTML element, and its Class, by using the Inspector in either Chrome or Safari.

  • This method/macro won't work in all cases, but hopefully it will in most cases.

  • If it does not work for you, we can probably figure out a method that will. Just post the URL of the target page.


Example Results


ReleaseNotes

Author.@JMichaelTX

PURPOSE:

  • Extract Web Page Links Using HTML Class, and Process Each Link

MACRO SETUP

Note that all Actions with the magenta color are designed to be changed by you.

In some cases, they MUST be changed to fit your specific requirements.

  1. Move Macro to Macro Group that limits trigger to apps you plan to use it with
    • (Note: This macro can be used ONLY with Google Chrome, but could be easily changed to use Safari, just by replacing the Chrome Actions with Safari Actions)
  2. Assign a Trigger
  3. Set the below Action "SET Source URL" to the URL of the Web Page that contains the list of links.
  4. Set the below Action "SET HTML Class" to the unique Class of the HTML Element that contains each, or all, of the list of links.
  5. ADD Actions at the bottom of the Macro to process each link as you desire.
  6. If your web page has a lot of links, it is best to first TEST on a similar page with just a few links)

HOW TO USE:

  1. Open Google Chrome Browser (any page)
  2. Trigger this Macro

WHAT IT DOES:

  1. Gets a HTML Collection of all Elements that have the specified Class Name
  2. Gets a HTML Collection of all Links (Anchor Tags) within that collection
  3. Builds a TAB delimited list (array) of Link Text & URL from that collection
  4. Returns a TAB delimited String, with each link on a separate line
  5. FOR EACH link/line in that String:
    • Using RegEx, parses it into Title and URL
    • Process that Link

TAGS: @Links @Web @JavaScript @HTML

USER SETTINGS:

  • Any Action in magenta color is designed to be changed by end-user
  • This macro uses Google Chrome, but can be easily changed

ACTION COLOR CODES

  • To facilitate the reading, customizing, and maintenance of this macro,
    key Actions are colored as follows:

  • GREEN -- Key Comments designed to highlight main sections of macro

  • MAGENTA -- Actions designed to be customized by user

  • YELLOW -- Primary Actions (usually the main purpose of the macro)

  • ORANGE -- Actions that permanently destroy Varibles or Clipboards

REQUIRES:
(1) Keyboard Maestro Ver 7.3+
(2) Yosemite (10.10.5)+



JavaScript


'use strict';

(function run() { // function will auto-run when script is executed

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

var ptyScriptName = "Extract Link List Using Class Name"

var ptyScriptVer = "1.1"

var ptyScriptDate = "2016-12-26"

var ptyScriptAuthor = "@JMichaelTX"

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

var returnResults = "TBD"

//--- Get Class Name from KM of Major HTML Elements Which Contain the Link ---

var classStr = document.kmvar.ELP__HTMLClass

//--- IF KM VAR "ELP__HTMLClass" IS EMPTY, RETURN ERROR ---

if (!classStr) {

returnResults = "[ERROR]\n\n"

+ "SCRIPT: " + ptyScriptName + " Ver: " + ptyScriptVer + "\n"

+ "Error Number: " + "C101" + "\n"

+ "Invalid HTML Class:" + ">>>" + classStr + "<<<"

+ "\n Must be set in KM Variable: 'ELP__HTMLClass'"

}

else {

//--- Get Element Collection Using Class Name ---

var majorElem = document.getElementsByClassName(classStr);

//--- Get Links (Anchor Elements) Within the majorElem Collection ---

var linkElem = majorElem[0].getElementsByTagName("a")

//--- Build TAB Delimited List of Link Text and URL ---

var linkList = [];

var numElements = linkElem.length;

for (var iElem = 0; iElem < numElements; iElem++) {

linkList.push(linkElem[iElem].textContent + "\t" + linkElem[iElem].href);

}

returnResults = linkList.join("\n");

} // END of else

//--- Merge Link List into String using LineFeed, & Return to KM --

return returnResults;

})();


Chrome: Open URLs on website in new tabs, create PDF and save to Downloads folder
[MULTIPLE] Click Google Chrome Link
Search regular expression failed to match (.+)\t(.+)
#2

Hey @JMichaelTX Can you tell me what this means?

I'm new to Keyboard Maestro and I downloaded this Macro @WEBExtract & Process Links on Web Page Using HTML Class [Example]. When I run it the below is the error message I'm getting.

"Search regular expression failed to match (.+)\t(.+) Macro trying cancelled (while executing Parse Title & URL from TAB Delimited List)."

I'm running Mac OS High Sierra. i purchase clothes from an online wholesaler LAShowroom.com. Ultimately I would like to build a macro that logs in to LAShowroom, goes to the "Orders" page and clicks all of the links that say "Download Image." This action opens a new tab and automatically saves the images to my Downloads folder.

I've successfully built the macro to take me to the "Orders page" and click the first "Downoad Image" link using the built in macro actions. But there are always multiple links with the same title.

I'm using your macro now to append to mine to complete the macro. The class is "image-download"


#3

Hey Dylon,

That's something where you should probably post to the original thread.

(I've obviously moved your post.)

I just tried JM's macro and get a different error, but it seems evident the test case page has changed their code and broken the macro.

@JMichaelTX – care to weigh-in?

-Chris


#4

Thanks Chris!


#5

Hi guys,

thanks for the excellent macro! I tried getting it to work on the following page

http://www.gesetze-im-internet.de/Teilliste_B.html

But was unfortunately not able to do so… I couldn't find an HTML class it works with.
My goal is to download / open all "PDF" links automatically.

Any help would be greatly appreciated.
Thanks a lot!
-Marc


#6

Hey Marc,

Extracting the links from Safari or Chrome is the easy part:

Safari or Chrome ⇢ Extract PDF Links from Frontmost Web Page.kmmacros (4.7 KB)

What's a bit more difficult is downloading the files.

You can't automatically (afaik) get either Safari or Chrome to download them – not without doing some UI-Scripting anyway.

But...

It's not too hard to download them with the curl command line utility. (I would do this in the Terminal, so I had feedback on the downloads.)

Or you could use Progressive Downloader. I believe it's still free, but you can buy it on the app-store for ($2.99 U.S.) to support the developer who's a good guy.

Or you could use Leech ($6.00 U.S.), which is my go-to downloader utility.

Both Progressive Downloader and Leech are AppleScriptable.

Oh, yeah. You could always just paste the PDF links into Safari's “Show Downloads” window.

Let me know what method interests you.

-Chris


#7

Dear Chris,

thank you very much, that's incredibly helpful. I just purchased Leech and it looks like the perfect tool for the job (saw your name in the release notes from 2016, too, so you really must be a long-time supporter! :))

I got one more question, since I'm not too familiar with AppleScript (yet). I saw that the command "download URLs" would be helpful for me to achieve what I would like to do.

What would the AppleScript then look like? Could you (once more) point me in the right direction?

Again, thank you very much for your help!

–Marc


#8

Hey Mark,

The basic code for Leech is pretty simple. The primary working code is only one line.

AppleScript Handler for Leech
----------------------------------------------------------------

# linkListStr  == A linefeed delimited text list of URLs to download.
# downloadPath == "" empty string to use default location
# downloadPath == Full folder path PLUS file NAME of first file ONLY.
# refererURL   == The URL of the page where the file URLs are found.

leech_Links(linkListStr, downloadPath, refererURL) -- call-to-handler

----------------------------------------------------------------
--» HANDLER
----------------------------------------------------------------
on leech_Links(linkListStr, downloadPath, refererURL)
   
   # Assign an empty string to "downloadPath" to download to Leech's default location.
   
   # The full "downloadPath" must contain the NAME of the first downloaded file!
   
   tell application "Leech"
      if not running then
         run
         delay 1
      end if
      activate
      
      download URLs linkListStr to POSIX path downloadPath with referrer refererURL
      
   end tell
end leech_Links
----------------------------------------------------------------

Drop Leech on the Applescript Editor.app to examine its sdef (scripting dictionary).

Here's a complete working AppleScript solution for downloading files from Safari:

Full AppleScript Code for Safari -- Leech Downloader
----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/12/13 17:17
# dMod: 2018/12/15 02:33
# Appl: Leech, Safari
# Task: Extract HREF URLs from Safari with a Regular Expression and Download with Leech.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @Leech, @Safari, @Extract, @HREF, @URLs, @Regular, @Expression, @RegEx, @Download
# Test: Only on macOS 10.12.6 with Leech 3.1.2
# Vers: 1.00
----------------------------------------------------------------

try
   
   ----------------------------------------------------------------
   # ••••• USER ••••• RegEx for Link Extraction •••••
   ----------------------------------------------------------------
   set linkListStr to extract_Href_Links_Safari("\\.pdf")
   ----------------------------------------------------------------
   
   set refererURL to safariURL()
   
   if linkListStr = "" then error "No links were found in front Safari Window!"
   
   set linkListStr to paragraphs of linkListStr
   set linkCount to length of linkListStr
   
   set areYouSure to display dialog "Your Regular Expression Found This Many Files: [ " & linkCount & " ]" & ¬
      linefeed & ¬
      linefeed & ¬
      "Are you SURE you want to download them?"
   
   if button returned of areYouSure ≠ "OK" then
      error "Bad Button!"
   end if
   
   set fileName to extractFilenameFromURL(item 1 of linkListStr)
   set {oldTIDS, AppleScript's text item delimiters} to {AppleScript's text item delimiters, linefeed}
   set linkListStr to linkListStr as text
   set AppleScript's text item delimiters to oldTIDS
   
   ----------------------------------------------------------------
   # ••••• USER ••••• Download Location •••••
   ----------------------------------------------------------------
   # User selected ONLY one or the other of the download location options
   
   set downloadPath to "" -- Leave empty string to download to Leech's default location.
   
   # To download to a custom location uncomment this - MUST end with a file name.
   #     - Leech will create directories that don't already exist.
   
   # set downloadPath to "~/Downloads/Test_Leech_Download/" & fileName
   ----------------------------------------------------------------
   
   leech_Links(linkListStr, downloadPath, refererURL)
   
on error e number n
   set e to e & return & return & "Num: " & n
   if n ≠ -128 then
      try
         tell application (path to frontmost application as text) to set ddButton to button returned of ¬
            (display dialog e with title "ERROR!" buttons {"Copy Error Message", "Cancel", "OK"} ¬
               default button "OK" giving up after 30)
         if ddButton = "Copy Error Message" then set the clipboard to e
      end try
   end if
end try

----------------------------------------------------------------
--» HANDLERS
----------------------------------------------------------------
on doJavaScriptInSafari(jsCMD)
   try
      tell application "Safari" to do JavaScript jsCMD in front document
   on error e
      error "Error in handler doJavaScriptInSafari() of library NLb!" & return & return & e
   end try
end doJavaScriptInSafari
----------------------------------------------------------------
on extractFilenameFromURL(theURL)
   set {oldTIDS, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "/"}
   set fileName to last text item of theURL
   set AppleScript's text item delimiters to oldTIDS
   return fileName
end extractFilenameFromURL
----------------------------------------------------------------
on extract_Href_Links_Safari(regexPatternStr)
   set regexPattern to regexPatternStr
   set jsCmdStr to "Array.from(document.links, x => x.href)
                  .filter(e => e.match(/" & regexPattern & "/i)).join('\\n');"
   set jsResult to doJavaScriptInSafari(jsCmdStr) of me
   return jsResult
end extract_Href_Links_Safari
----------------------------------------------------------------
on leech_Links(linkListStr, downloadPath, refererURL)
   
   # Assign an empty string to "downloadPath" to download to Leech's default location.
   
   # The full "downloadPath" must contain the NAME of the first downloaded file!
   
   tell application "Leech"
      if not running then
         run
         delay 1
      end if
      activate
      download URLs linkListStr to POSIX path downloadPath with referrer refererURL
   end tell
end leech_Links
----------------------------------------------------------------
on safariURL()
   tell application "Safari"
      try
         
         if front document exists then
            tell front document
               set _url to its URL
               try
                  _url
               on error
                  set _url to false
               end try
            end tell
         else
            set _url to false
         end if
         
         return _url
         
      on error
         error "Failure in safariURL() handler of Internet Library."
      end try
      
   end tell
end safariURL
----------------------------------------------------------------

Here's a complete working Keyboard Maestro macro that breaks things up into easier to understand bits.

Download HREF Links with Leech v1.00.kmmacros (13 KB)

-Chris


#9

Incredibly helpful, thank you so much!