Copy data from a web page into variables to be able to paste them after in an excel document

It's not a problem for me to paste some collected datas in a document but the part I can't do is to collect the datas from a website.

for exemple in that webpage Lycée polyvalent Eugène Thomas | Ministère de l'Education Nationale de la Jeunesse et des Sports I would need to copy

the name of the school : LYCÉE POLYVALENT EUGÈNE THOMAS in a variable called : school
the address of the school : 100 Avenue Léo Lagrange - BP 63 in a variable called : adresse
the post code of the school : 59530 in a variable called : cp
the town of the school : QUESNOY in a variable called : ville
the phone number of the school : 0327205480 in a variable called : phone
the email of the school : ce.0590168m@ac-lille.fr in a variable called : mail

Sorry i'm quiet a novice and i don't know much about html and everything so I tried to do it with moves and clicks of the mouse but obviously the datas are not always the same length or places depending on each school so it doesn't work like that.
if it's possible to do it some other way that would greatly improve my productivity, thanks for your advices!

You can likely do this using RegEx and clipboard actions. Would it be possible for you to provide a link to the page you are trying to extract the info from so we can see the content and figure out a solution?

yes few exemples :

I don't know French so is Lycée - Public the name of the school you're looking to extract? Or would it be LYCÉE POLYVALENT EUGÈNE THOMAS?

EDIT: Never mind, saw it in your original post. Let me take a look at it for bit.

Here is an example of how to extract the school address. You have to copy the entire contents of the page, after which it is filtered to remove any styles (like tables and such) and then basically the RegEX looks for whatever text appears after the string "Coordonnées :" and preceding the zip code.

extract school address.kmmacros (3.7 KB)

I'm in the middle of a work call right now, but if you want more info on RegEx, check out the following websites:

https://github.com/ziishaned/learn-regex
https://regex101.com/r/1paXsy/1

I'll keep working on the other parts as I have time, and post what I come up with as I go. Likely others will chime in too with even better ways since I'm still fairly new to RegEx.

Thank you very much for your help im going to try to figure out what I can from that..

You're very welcome. Basically you can copy the entire page's contents, filter it to remove styles, then paste that into the regex101 editor and in the bar at the top experiment with RegEx expressions to see what works.

ok thanks I understand better now

Here are a few more expressions that return the results you're looking for. Again, I'm pretty new at RegEx, (and I don't know French at all) but these seem to work fine on the three pages you provided. If nothing else they will show you a little more about how RegEx works and using that editor you could no doubt verify and compile more expressions for the other pieces of in you need.

School name
(?<=Annuaire\s).*

Phone
(?<=Tél. :\s)\d{7,10}

Email
(?<=Email :\s).*

Fantastic!!! Yes it works! You can't imagine how many different things i learned from your tiny macro :slight_smile: btw. Now I need to figure out how the regex code work and I'm good to go. I'm very exited now! THANK YOU SO MUCH!!!

1 Like

Awesome! Glad that it helped. Keep us posted if you run into any more issues because there's a lot of people here who are very knowledgeable with RegEx and can help even more.

1 Like

will do :smile:

So I read the bases for RegEx you linked and I figured out your formulas and then I made 2 new ones for the datas you didnt have time to help with :

(?<=\s\d{5}\s).* for the town

(?<=\ - )\d{5} for the post code

So now I feel like I can pretty much make my own formulas and apply it to other website. Thank you so much for helping me to learn a bit of RegEx.

1 Like

Hey there, fantastic! Sorry I wasn’t able to look into it more but at the same time you were able to use what I posted and then create your own from there so that's even better. Glad you got it going for you!

Another trick I learned last night is to use JavaScript to get the contents of the webpage and save it's results to a clipboard or variable.

document.body.textContent

This way you don't even have to have the webpage at the front which might be better for your workflow.

And if you use the Get URL action, you don’t even have to have the page open in a browser! See action:Get a URL [Keyboard Maestro Wiki]

1 Like

Wow, that's pretty neat. I can't think of any personal use cases where I would need to get a webpages contents that I don't already have open but I bet if I think about it enough I could come up with some haha.

I’ve been using this action now for literally years: one of my use cases is I receive an email with a link to a web page that is an aggregate of many other news items; I just copy the link in the email and trigger a macro that then gets the contents of that page into a variable and then processes it looking for news items of interest; when it finds something it then gets the page of that item into a variable and processes it, too and then saves it into a DEVONThink database, etc. etc. All of this is done without ever having to load anything into a browser, so I can continue using my Mac without interruption. The downside is you can’t use JavaScript as you can in a browser - so a lot of the processing my macros do use a lot of regex, shell scripting and AppleScript; one day I might get round to properly learning JavaScript as it offers many powerful features.

Well pretty much all of that is way over my head :sweat_smile:
But I bet I could use JavaScript and RegEx to poll my thermostat webpage that I usually have open and minimized to get the current temperature setting to update my Stream Deck button.
Question though, suppose I have multiple Safari pages open and I need to specify which page to run the JavaScript in...what's the best way to do that?

Well, I am not the expert in this, but a quick scan of the JavaScript tokens in the KM wiki would seem to indicate that the JS will run in the active tab, so you’d need to activate whatever tab first before running your JS. I recommend you check this out for yourself, or better still - start a new topic with your question and see what the real JS experts say!

That depends on whether the page has dynamic content or not.

The Keyboard Maestro action is not a browser-in-a-box and can't run any scripts in requested content.

-Chris

The Keyboard Maestro action can only act on the active tab, but AppleScript can act on any tab in any window.

-Chris