How to Use RegEx to Extract URL and Link Text from HTML Anchor Code?

JMichaelTX · July 23, 2015, 5:25pm

Rob, I get your point, and I want to really thank you for all of your help.

I know you prefer a JavaScript solution, but I chose not to go that route for these reasons:

I needed a solution that would work with all Browsers, including FireFox, AND with any RTF app/document. JavaScript would not work with RTF documents, and AFAIK I can't use JS from KM with FF.
The RegEx solution works for everything.
The RegEx solution is simpler to me.
I currently don't have a JavaScript environment setup to code, test, and debug JS.
Re-learning Javascript
It's been years since I last coded JS
And that was in Windows and IE
When I look at your JS code I don't have a clue how to use/modify, as in outputting the Page Title and URL to separate KM variables

But my biggest hurdle right now to JS is #4, especially testing and debugging.

So, I'd like to post some JS questions in this thread:
Learning & Using AppleScript & JavaScript for Automation (JXA)

Thanks again for everything.

ComplexPoint · July 23, 2015, 6:10pm

Of course, and what matters is the job not the tools.

( 2, 3 and 5 all make a huge amount of sense, and polishing regex skills is always good )

( 1 and 4 may not be huge problems, as it happens – as long as your system has Safari somewhere, you can delegate parsing tasks to it while working in any other browser, and textutil can reframe any RTF process as an HTML process.

(On 4, Safari itself has an excellent JS debugger, and any code that you send to it from KM actions or AS code shows up in it. For app automation through JS, you would clearly have to get Yosemite+, but for browser and general JS, Safari and the command line JSC are already very rich scripting and debugging environments)

Always best, however, to use what already works quickly for you at the time.

ars longa vita brevis Life is short and shavable yaks are many : -) Always better to learn another human language than another machine language …

JMichaelTX · July 23, 2015, 6:33pm

Maybe . . .
Seems to me that human languages are far more diverse, have poor rules, and are subject to dialects and idioms. That's why Texan is so hard to learn.

JMichaelTX · November 23, 2015, 3:03am

Rob, I really like this JXA function.
IMO, best sollution yet for parsing an HTML hyperlink

But I'd like to return a JavaScript array:

arrLink[0] -- the MD text
arrLink[1] -- the oNode.text
arrLink[2] -- the oNode.href

I know how to make these mods in normal JavaScript, but the code that is sent to the Browser is confusing to me.

How can I change the return to be the above array?

Thanks.

ComplexPoint · November 23, 2015, 10:00am

The browser evaluates a javascript string built from the the brief linkMD() function near the top of the script.

You can modify the return value of that function in any way you like. Here, for example, it has been edited to return an object with two properties (.txt and .ref), which you can then use to assemble the MD yourself:

function linkMD(strLinkHTML) {
    var oDiv, oNode;

    (oDiv = document.createElement('div')).innerHTML = strLinkHTML;

    return (
        oNode = oDiv.firstChild
    ) ? {
        txt: oNode.text,
        ref: oNode.href
    } : {}
}

JMichaelTX · November 23, 2015, 11:07am

Thanks, Rob. That really helps. Not only does it give me the solution, but it also teaches me how to deal with similar in the future.

JMichaelTX · November 24, 2015, 2:39am

Rob, I'm trying to learn from you.

Was there a specific reason that you used Strict mode in this function?

Thanks.

ComplexPoint · November 24, 2015, 2:41am

I should really use it all the time. Sometimes I forget.

Strict mode is a better subset of JS, and allows the compiler to pick up glitches like the use of undefined variable names.

PS - thanks for bringing that to mind - I’ll add it to to my Textexpander snippet for JS modules

How to Use RegEx to Extract URL and Link Text from HTML Anchor Code?

Options