Create a Markdown Link -- [URL Title](URL) -- for a URL in the Clipboard

(SEE UPDATES BELOW!)

This macro is invoked by a typed string trigger and simply takes a URL in the clipboard, extracts the URL title, swaps the URL for a Markdown link in the clipboard, and pastes it. The macro solves a minor problem that nagged at me which was that I didn't want to have to open a hyperlink to get a proper Markdown link for the URL in the link. Now, you can simply right click on hyperlinks, select "Copy Link":

Then, in your document, type your string trigger, and a Markdown link will output:

- [Mass Shootings Don’t Have to Be Inevitable - The New York Times](https://www.nytimes.com/2017/11/06/opinion/texas-guns-shooting-trump.html)

Here it is in action, where I'm copying URLs from hyperlinked article headlines and pasting the Markdown links in an email compose window in Postbox, (then I use an add-on to render the HTML):

The original version of the macro didn't get all Titles, but it got a lot of them. (It used a "curl" shell script which I found in an old post by Patrick Welker of RocketINK, but I can't find the post now, apologies! - And thanks, Patrick!).

This is what the original macro looked like:

The UPDATED macro can be downloaded here, and it now incorporates changes to the script suggested by JMichaelTX, which seems to capture URL titles in all cases (that I have tried):

Again, the updated, fully functional macro is available here!

Cheers!

It seemed to work ok for me. The problem is that it is very difficult to debug if it fails since the most crucial piece of information, namely when the server returned to curl, is lost immediately.

So it may simply be the curl failing (maybe a network issue), and then there is no way to repeat it.

A simple solution might simply be to store the result and then process it.

curl -s "$KMVAR_URLvar" > /tmp/tmp.html
cat /tmp/tmp.html | awk '/<title>/' | cut -d '>' -f 2 | cut -d '<' -f 1

Then if it fails, you can examine the /tmp/tmp.html file and see what curl returned. And if it looks ok, you can comment out the curl line and re-run the macro and then debug with consistent results and no fear the issue is simply a network failure.

Hey Brian,

For future reference — It's good to provide an actual example you know for certain doesn't work.

Your shell script is very fragile.

It expects the title in the raw html to be on a line by itself with no other tags. (And this is not the case in the code of WP pages.)

This is much more robust:

curl -Ls --user-agent 'Opera/9.70 (Linux ppc64 ; U; en) Presto/2.2.1' --url "$KMVAR_URLvar" \
| perl -wlne 'if ( m!<title>(.+?)</title>!ims ) { print $1 }'

It allows for redirects and tells the server it's a browser (using a custom user-agent). (Some servers will barf when they detect curl is the user-agent.)

Finding the relevant text with Perl gives you better control than fooling with a daisy chain of piped commands.


Keyboard Maestro 8 has a Get URL action, so this kind of thing is now quite easy to do with native actions.

This method should be easier for non-techies to debug too.

Get Remote Web Page Title Using the Page URL.kmmacros (4.7 KB)

-Chris

3 Likes

Replace your current shell script with these two actions:

Regex:
<title>(.*)<\/title>

Works for me using this URL:

https://www.washingtonpost.com/world/national-security/white-house-implements-new-cuba-policy-restricting-travel-and-trade/2017/11/08/a5597dee-c49b-11e7-aae0-cb18a8c29c65_story.html?utm_term=.17178e94ca3f&wpisrc=al_news__alert-world--alert-national&wpmk=1

2 Likes

As a good friend of the forum has informed me:

Change this:

curl -s "$KMVAR_URLvar"

To this:

curl -Ls "$KMVAR_URLvar"

The -s switch is silent — e.g. curl's progress information is NOT shown.

The -L switch is follow redirects — which is fairly vital when chasing web pages.

It is best to use unless you have a specific reason not to.

1 Like

Thanks all! This seems like a picayune action, but I really find it useful more often than you’d think, and I’d like it to work consistently!

To the suggestion that I include an example of how the script as I had it goes wrong, here’s an example. When I click on a Washington Post link and invoke the macro, I always get something like this:

- [%TITLEvar%](https://www.washingtonpost.com/politics/canary-in-the-coal-mine-republicans-fear-democratic-wins-mean-more-losses-to-come/2017/11/08/15130b64-c4b0-11e7-84bc-5e285c7f4512_story.html)

I will work with folks’ suggestions and try to get it right.

UPDATE: This version seems to work in all cases – download here.

P.S. Let me not forget to mention Brett Terpstra’s Titler Service, which inspired this macro. The Titler Service takes a selected URL, grabs the URL title, and replaces the URL with a Markdown formatted link.

What I wanted to achieve with my macro was the ability to just copy a link from a hyperlink and get that titled Markdown link with as little effort as possible.

Your suggestions worked perfectly and have been incorporated into the macro, which now seems to work in all cases. Awesome! Thank you!

1 Like

Hi Brian,

That version of the macro looks ancient. In general, using bash to get the title is tricky; Python would be much more reliable. That said, I use this along with some regular expressions (trim whitespace, trim return, …) to get the title:

#!/bin/bash

/usr/local/bin/wget -qO- "$KMVAR_URL__URL" | /usr/local/bin/gawk -v IGNORECASE=1 -v RS='</title' 'RT{gsub(/.*<title[^>]*>/,"");print;exit}'

It works with your example and should cover most cases.

Cheers,
Patrick

Patrick Welker is who I got the curl script from, y’all! Thanks Patrick!

Hey Patrick,

That looks good, but as you probably know neither wget nor gawk are installed by default on macOS.

They must be installed manually or via a package manager like MacPorts or HomeBrew.

-Chris

Just wanted to update this discussion with a new finding, this has driven me crazy (also made me learn a lot of KM debugging tools).

Turns out there are web pages that put extra attributes in the title section, for example this URL:

Has this title:

 <title data-react-helmet="true">Stop talking about AI ethics. It’s time to talk about power. | MIT Technology Review</title>

and so it breaks the regex used to extract the title.

Solution, put a new capture group in the regex, not forgetting that the relevant capture group is the second one:

This seems to be working for me for those pesky URLs from MIT Technology Review and also regular ones.

1 Like

Hey Juan,

All you have to do is write the regex to more specifically deal with the title tag structure.

Here's one way:

Get URL as HTML and Extract Title 1.00.kmmacros (6.8 KB)

Here's another:

(?s)<title.+?>(.+?)</title>

The (?s) switch allows the dot metacharacter (‘.’) to span lines.

-Chris