Highlight and Copy Text Between Specific Words

I am viewing a webpage source in Chrome and trying to automate the process of CMD+F, searching for the word 'keywords', selecting the text that follows that and copying it.

Here's an example line:

“Keywords”:”beach, ocean, kayak, sand”,”title”:

So I want to highlight and copy everything between “Keywords”:” + ”,”title”: So I have 'beach, ocean, kayak, sand' in my clipboard.

Any idea if this is possible and if so - how to execute it?

Thanks!

I don’t know about highlighting the actual text (maybe you want to do this to copy it), but if you just need to get that info into the clipboard then that is easily doable using RegEx.

Basically it’s a matter of getting the webpage’s contents into the clipboard (can be done using a simple JavaScript and other ways too), and then using RegEx to search for the string between those two words and saving that string to the clipboard.

Anyway you could post a link to the page you’re working with so we can provide some more help?

Edit: just saw that this is your first post! Welcome! This is a great community with lots of knowledgeable people who are always glad to help. If you haven’t had the opportunity to do so already, check out this article that talks about the best way to get the quickest answer to your questions.

1 Like

My suggestion is that you don't try to emulate what you'd do by hand. Instead, download the source code of the page and search for what you want. Here's an example written with AppleScript (to get the URL of Chrome's front tab) and Python (to download the HTML of the page and search for the text you want.

Here's the macro itself:

Keywords.kmmacros (2.7 KB)

I chose Python to do the downloading and searching because that's what I'm most familiar with. Others in the forum may chime in with solutions using languages they prefer. There might even be a "pure Keyboard Maestro" solution.

2 Likes

Hey @Border,

Welcome to the forum!   :smile:

This task is pretty easy:

OR

JavaScript to get the full source from the browser:

document.body.parentNode.outerHTML

Save to a variable.

-Chris

1 Like

Hey Drang,

I do like the Python. It's next on my list of languages to learn.

But...

UrlOpen is just a siphon like curl or wget – yes?

In that case you can't be certain of getting the fully rendered source of a web page that contains scripted elements.

So – it's generally better to get the source directly from your web browser if possible with a little JavaScript:

document.body.parentNode.outerHTML

Keyboard Maestro does have its own action for downloading the source of a web page:

KM_Wiki ⇢ Get a URL Action

Get URL Source.kmmacros (2.7 KB)
Keyboard Maestro Export

-Chris

1 Like

No. Or maybe yes. urlopen follows the chain of redirects, so it's not like curl, but it is like curl -L. In either event, you're right that it will read the source before any JavaScript has acted on it, which could be a problem. We don't know without seeing the pages the OP is dealing with.

I'm glad to see the Get URL action. I knew about Open URL but missed Get URL.

2 Likes

Wow! Thank you all for the help and replies. Really looking forward to learning more about what Keyboard Maestro can do.

I would be using this on multiple sites that all use different words (keywords, tags, etc) so I was planning on modifying it for each site but the closest example would be from the page source of:

For that site I would be highlighting/copying everything on the page source between:
"tags":" and ","page_title":

@cdthomer @drdrang - The link shared above is what my original post is based off of.

Here is another way of doing it.

  1. Saves the entire contents of the webpage to a local variable, local_siteContents using a simple JavaScript.
  2. Searches that local variable for everything between the keywords you specified and saves it to another local variable: local_keywords.
  3. Displays the local variable local_keywords in a window

The macro itself is quite simple and will allow you to extract anything and everything between the two keywords you specify in the RegEx expression in the search action. You may still need to do some formatting to the results but this will get you started.

Assuming of course that the other (more eloquent) options don't work for you. :sweat_smile:

EDIT: I realized you were referring to the HTML code itself, so I had incorrectly typed the syntax you provided in the RegEx. I adjusted the macro and my original reply. This will work for you, but again, the results will be a long string of html code that will perhaps still need to be modified to be useful since right now it returns the following:

%5D=business%2C+closeup%2C+computer%2C+delete%2C+enter%2C+equipment%2C+fingers%2C+hand%2C+information%2C+internet%2C+keyboard%2C+keys%2C+Laptop%2C+office%2C+press%2C+pressing%2C+return%2C+shift%2C+thetechcollectionii%2C+touch%2C+touching%2C+type%2C+typing%2C+work&cd%5B

Get webpage contents between keywords.kmmacros (3.3 KB)

Hey @Border,

On the FILMPAC site a bit of JavaScript does the trick:

document.getElementById('pys-js-extra').outerHTML.match(/"tags":".+?","page_title":/).toString()

But...

I'm finding two sets of tag-page_title strings in the JSON of a JavaScript buried deep in the code.

Do you want only the first set of tags? Or both?

-Chris

1 Like

Thank you for putting this together! Definitely learning a lot. I'll play around with it a bit as I would need the result to be in the format of "business, closeup, computer, delete, enter, etc."

Thank you! I only need the first set of tags to be saved to the system clipboard.

We're at the stage where @ComplexPoint will step in and tell us (correctly) we should be using a real parser instead of regexes to get what we want. How about this?

where the JavaScript is

eval(document.getElementById('pys-js-extra'))
pysOptions.staticEvents.facebook.init_event[0].params.tags

This is not a general solution, but @Border could adapt it to the various sites he needs to get tags from. And the target of Google Chrome could be changed to Front Browser.

3 Likes

Question – what is the point of line one?

You're not doing anything with it, and line two works fine in the console by itself.

Unfortunately – I'm not getting any returns at all from a Keyboard Maestro macro.

Are you?

As far as I can see I've exactly recreated your macro above...

-Chris

The point of Line 1 was that yesterday the macro wasn't working without that line. Or at least that's how I remember it; I wasn't taking notes. Today, it isn't working with or without Line 1, and Chrome's JavaScript console is telling me

Uncaught ReferenceError: pysOptions is not defined

every time I run the macro. Even though I can paste Line 2 into the console and it knows the definition of pysOptions perfectly well then.

Also, the script (with or without Line 1) works in Safari, although it takes a looong time to run. Even though pasting Line 2 into Safari's console returns the answer immediately.

The upshot is that I shouldn't poke my nose into JavaScript.

Got it. Thanks.

:sunglasses:

I feel that way sometimes, but then I remember all the stuff I get done by scripting Google Chrome and keep muddling along.

I spent some time yesterday researching the problem and didn't find a solution, but I'm sure it's possible to reference the script as you did with your eval expression and then yank stuff out of it.

@ComplexPoint – Rob – this should be a doddle for you. Would you mind helping out?

-Chris