How Do I learn How to Web Scrape Web Pages?

I'm trying to advance from using click at image and click at coordinates to using the source code of the website. I'm sure there is information out there but my knowledge base is low enough that I don't know how to phrase the search.

Example linkedin page: https://www.linkedin.com/in/williamhgates/

I'm trying to do things like copy name (Bill Gates) location (Seattle, Washington) or most recent experience title (Co-chair) and click buttons like follow and contact info.

I'm willing to teach myself a little but does this require an intermediate knowledge of programming or not at all?

Obviously don't expect anyone to lay it all out for me but would appreciate If someone could point me down the right path of teaching myself.

A forum search for “xpath” might give you some ideas.

Also this KM Wiki article.

I hope you don't mind that I have revised your topic title to better reflect the question you have asked.

FROM:
Can someone send me down the right path of teaching myself how to copy and click items on a website with the source code?

TO:
How Do I learn How to Web Scrape Web Pages?

This will greatly help you attract more experienced users to help solve your problem, and will help future readers find your question, and the solution.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This activity is often called "Web Scraping". A Google of that term with provide you with lots of hits, which you can use to learn more about the subject.

Personally, I mostly use JavaScript and the querySelector function to extract information from a web page. If you don't know JavaScript, then you will want to start with leaning it. I also use XPath, but it is much more difficult to learn and use, IMO.

Once you know a bit of JavaScript, then I recommend using the Google Chrome "Inspect" tool to show the HTML for the specific web page object of interest, and then to provide a great JavaScript Console to use in developing and testing your JavaScript.

You can also search this forum for the tag "webscrape" to see some examples.

I have to say that this is a very rewarding activity, and I use it almost daily. But, it is NOT something you can learn in 15 minutes. :wink: You will need to invest many hours before you start to become somewhat comfortable with it. Once you learn some of the terminology, then Google searches will often turn up hits at very useful places like StackOverflow.com.

Good luck, and as you learn, feel free to post new topics asking for help.

2 Likes

Thanks so much this is super helpful! I might decided to teach myself but we do have some friendly developers who work at my company who might be able to help and may have some freetime, may explore that...

If someone was well experienced with javascript but not with keyboard maestro what do you think would be a rough estimate for clicking 2 buttons and copying 6 items? ( I have no idea if its 30 minutes or weeks)

Hey @rayk,

30 minutes is not a bad estimate for someone familiar with JavaScript and with using the Developer tools in a web browser.

But it depends on the complexity of the page.

Using your Bill Gates Linked-In Page try running this in a Execute a JavaScript in Front Browser action:

document.querySelector('#ember45').innerText

Send the output to a window for inspection.

This will give you a glimpse of the power of QuerySelector.

-Chris

1 Like