Getting HTML Content Within <Span> Tags - RegEx

Hi I'm trying to extract two strings of information from the inner HTML, but somehow I keep running into errors. Thanks in advance for your help :slight_smile:

  1. I want to get the text between the first set of tags without class names. In this example I would want to return "Target", however, even if it returned the whole line including the span tags that would be fine too, since I can just clean it up in Excel.

  2. I want to get the text between the second set of tags without class names. In this example I would want to return "13 yrs 11 mos", similarly to the first one, even if it returned the whole line including the span tags that would be fine as well.

Below is the output from the innerHTML:

 <h3 class="t-16 t-black t-bold">
  <span class="visually-hidden">Company Name</span>
  <span>Target</span>
</h3>
    <h4 class="t-14 t-black t-normal">
      <span class="visually-hidden">Total Duration</span>
      <span>13 yrs 11 mos</span>
    </h4>

Below is the Regex I have tried but unable to get to work:

1. (?<=(<span>))(\w+)(?=(</pre>))
2. I haven't attempted this one yet...

Are you starting with a live, open, web page, or with a HTML file you have obtained from some other source?
If you always start with a web page:

  • then the best method is to use the KM Execute a JavaScript in Front Browser action.
  • I routinely use the JavaScript document.querySelector() to extract data/text from the web page.
  • If you provide the URL for the example you posted, we can probably provide a solution.

Let us know the source you start with.

Hey Steven,

This is a bit more complex than you want it to be using Keyboard Maestro native actions – but the modality is not too bad once you get used to it.

-Chris


Search HTML Example v1.00.kmmacros (9.0 KB)

1 Like

Cool!
Had a very similar question and found this right in the homepage
that solved it
:smiley: :100:

1 Like