Find Matches and Click Links Associated with Them

safari

#1

Hi, Dear All,

The scenario is like the following:

For example, for website: http://gametheorysociety.org, I want to search “read more” in the website and click the links associated with matched searches sequentially. How to realize this in KW?

Thank you very much in advance.


#2

Hey @Bowen,

The http://gametheorysociety.org site is weird…

Each time you page it accumulates “Read More” links.

So by the time you’ve reached page 11 you have some 52 links, instead of the 5 links displayed per page.

I don’t know enough html and JavaScript to know if this can be worked around simply.

If you run this code in the Script Editor.app with the game theory site open in Safari, you’ll see a list of links.

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/01/02 20:00
# dMod: 2018/01/02 20:43
# Appl: Safari
# Task: Return ReadMore links from http://gametheorysociety.org
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @Return, @Scrape, @ReadMore, @Links
----------------------------------------------------------------

set jsCmdStr to "

function getReadMoreButtonLinks() {

   var classNameToFind = '_self pt-cv-readmore btn btn-success';
   var classNameObjectList = document.getElementsByClassName(classNameToFind)
   var hrefArray = new Array();
   
   for (var i=0; i < classNameObjectList.length; i++) {
      hrefArray.push(classNameObjectList[i].href);
   }
   
   return hrefArray;

}

getReadMoreButtonLinks();

"

tell application "Safari" to set linkList to do JavaScript jsCmdStr in front document

if linkList ≠ {} then
   # Do something
   return linkList
end if

----------------------------------------------------------------

Advance through the pages, running the script each time, and you’ll see the links accumulating.

I have a notion of how I’d handle this with AppleScript, but I’m not going to fool with it anymore today.

-Chris


#3

Or for an ‘Execute JavaScript for Automation’ action

(Using Array.from and Array.map to side-step the slightly fiddlier and less declarative business of setting up iterations, and repeatedly pushing into specially declared new arrays)

( The fact that JS functions can be coerced to their source strings also helps )

(() => {

    // f :: () -> [URL]
    const f = () =>
        Array.from(document.getElementsByClassName(
            '_self pt-cv-readmore btn btn-success'
        ))
        .map(x => x.href);

    // TEST ------------------------------------------------------------------
    const
        sa = Application('Safari'),
        ds = sa.documents,
        links = ds.length ? (
            sa.doJavaScript('(' + f + ')()', { in: ds.at(0)
            })
        ) : [];

    return links.length ? (
        links
    ) : 'No links found';
})();

#4

@ccstone @ComplexPoint, thank you very much for your great help. I am not so familiar with such codes. I have a very stupid and inefficient way to realize this, just search keywords, and matched keywords will surrounded by white rectangles, and the one where cursor is at will be yellow. So use pixel match to do this.

The Mac is attached, do you have any idea to improve it. Or can you show step by step how to realize your code in KM? Thank you very much and have a good day.

Keywords.kmmacros (26.6 KB)


#5

Assuming that you have the page open in the front Safari document, you should be able to get a list of links like this:

Harvest links.kmmacros (19.1 KB)

Array.from(document.getElementsByClassName(
    '_self pt-cv-readmore btn btn-success'
))
.map(x => x.href).join('\n')

And once you have the links in a variable, you can do things like this:


#6

@ComplexPoint, Thank you very much and it works for the game theory webpage I posted. But if I tried to run it on other webpage, it failed. For example, the webpage: https://www.degruyter.com/view/j/fman.2017.9.issue-1/issue-files/fman.2017.9.issue-1.xml?rskey=rLLD9w&result=4.


#7

Web pages usually contain both links which we may want to click (references to papers, for example), and links which we probably don’t want to click (navigation, adverts, spam etc).

It would certainly be useful to have a macro which could tell the difference, but unfortunately each web page author uses different structures and identifiers to mark particular types of content.

On the game theory site, the links which interest you are identified by the class _self pt-cv-readmore btn btn-success but there is, alas, no way of predicting what identifiers will be used on other sites, and a macro which simply clicked everything would soon become a bit of a nuisance.


#8

OK, thank you for your help. I will try it later.


#9

If, on the other hand, the set of pages that interest you is fairly fixed and finite, then you might be able to assemble a table of the characteristic xpath or identifying class for links of interest on each page, and have the macro look up the the details for each page in that table.

Particular solutions are feasible, it’s just generality that is more or less out of reach. 贵在具体.


#10

Thank you. 谢谢,多多指教。[quote=“ComplexPoint, post:9, topic:8975, full:true”]
If, on the other hand, the set of pages that interest you is fairly fixed and finite, then you might be able to assemble a table of the characteristic xpath or identifying class for links of interest on each page, and have the macro look up the the details for each page in that table.

Particular solutions are feasible, it’s just generality that is more or less out of reach. 贵在具体.
[/quote]