Help web scraping 1 link with JavaScript

Was able to scrape all the text information I needed from a website with a page text extraction script and regular expression, but will need Javascript for scraping a link, which I unfortunately don't know yet.

The site is linkedin's sales navigator which is behind a pay wall so unfortunately can't link the website but I'll do my best to provide the necessary source html, please let me know if and where I can provide more context.

I'm trying to scrape the google maps link highlighted in the picture below.

Here is the raw code from the <Div class="top card"> section

https://raw.githubusercontent.com/raykokay/Raykokay/master/Code

Please let me know if more or less attached code would be better or anything else I can do.

Is there some reason this RegEx won't work for you:
<a aria-describedby=(?s).+?data-control-name="topcard_headquarters"\h+href="(?-s)(.+?)\h+rel=

The href is returned in Capture Group 1.
https://www.google.com/maps/place/101+Elliott+AVE+W+Suite+100+Seattle+WA+98119+United+States"

For details, see: regex101: build, test, and debug regex

While I would have done all of the extraction using JavaScript, since you already have most of what you want using RegEx, I'd just continue.

1 Like

Is there some reason this RegEx won't work for you:

Are there different definitions of page text? The script I was running to retrieve page text didn't retrieve any code, it only retrieved text that you could see on first glance at a page. This is the javascript

document.body.parentNode.outerText

and the result would be something like this:

Mediatonic Games
Mediatonic is one of the UK's largest independent game developers with over 200 people acr.. See all
Computer Games - United Kingdom - 201-500 employees
170 employees
|
24 Decision Makers
Add Tag
Save
Website 
Headquarters

BTW the google maps link I'm looking for is hyperlinked in the headquarters icon on the image above.

I don't know/have the javascript to retrieve the code like in your example, but If I did, than yeah this would definitely work.

Yep. There is the HTML text and the plain text.

OK, then this JavaScript should do the trick:

hqLink = document.querySelector('a[data-control-name="topcard_headquarters"]').href;
hqLink;

Just put this in a Execute a JavaScript in Front Browser action.

this returns:
https://www.google.com/maps/place/101+Elliott+AVE+W+Suite+100+Seattle+WA+98119+United+States

Questions?

Thanks so much, works great, this seems a lot cleaner and faster, follow up question for adapting this to a similar but different situation

<a data-control-name="topcard_employees" href="/sales/search/people/list/employees-for-account/5390798?_ntb=YmS1PAfQQQiLThkuuctcGA%3D%3D" id="ember2841" class="ember-view">              76,679 employees

Very similar html except topcard_employees, I'm struggling with formatting the script for the class="ember-view" portion, how would I do that? Goal is to retrieve the "76,679 employees"

In the JavaScript, change the querySelector statement to use data-control-name="topcard_employees"

employeesStr = document.querySelector('a[data-control-name="topcard_employees"]').innerText;
employeesStr;

I have NOT tested this, so there could be errors/typos.

Thankyou! I was able to scrape many more items with this format. Sorry to bombard you with questions but what would be the syntax when there's a prefix?

I know to get the phone number with this code

<span class="ng-star-inserted">(111) 111-1111 (HQ)</span>

You would do this

text = document.querySelector('span[class="ng-star-inserted"]').innerText;
text

But there's some sort of prefix ( I don't know the exact terminology but the _ngcontent-c22 ) and unfortunately the class "ng-star-inserted" is not the first in the html. It looks like this

<span _ngcontent-c22="" class="ng-star-inserted">(111) 1111-1111 (HQ)</span>

How would I change my script in this scenario?

Well, IF that is the first span element with that class, then this should work:

text = document.querySelector('span.ng-star-inserted').innerText;

However, if that is returning the wrong text, then there may be multiple spans.
So try this in the Chrome JavaScript Console with that page open:

spanElem = document.querySelectorAll('span.ng-star-inserted');

That will return ALL spans with that class. Then you can use the array index to inspect each, as in:

spanElem[0];     // first elem

If none of that works, then you will need to show me a large block of HTML code around the target span.

None of them worked unfortunately, I believe the problem might be stemming from the fact that this source is a chrome extension that pops out a panel, instead of the actual website.

Here's the larger code ( this page is different from the code I listed above )

https://raw.githubusercontent.com/ggg3ggg/non/master/example

Any javascript command I tried executing didn't work.

The first issue would be figuring out how to retrieve anything but here's the similar phone number example.

<span _ngcontent-c13="" class="phone-number">(111) 111-1000 ext. 5737</span>

I keep on getting this error

Uncaught TypeError: Cannot read property 'innerText' of null
at :85:84

Well, we can stop right there. Due to "security" reasons, Chrome prevents JavaScript injection in extensions and frames. I've got JavaScript to work with those in the Chrome console, but never from AppleScript or KM.

1 Like

Thanks for this information. It is useful