Extract HTML part from a formatted URL

If there is a formatted URL on the clipboard, HTML with href tag etc, is there a way to extract the underlying URL text? the http://... etc, rarther than the "pretty" text.

I thought the Filter Clipboard with a URL Fragment might be the way, but that seems to be for something different? Thanks

Do you, perhaps, mean extract the URL part from the HTML ?

( A URL contains no HTML )


It might be helpful to understand a little more of the workflow context.

You are copying a link from a browser ?

I'm not sure what the URL filter options do, other than the encode one, and there's not much help in the wiki. But a relatively simple regular expression can extract the URL:

url extraction.kmmacros (3.2 KB)

This is using a variable, but could easily be edited to refer to the clipboard instead of the variable.

-rob.

Thanks guys for the replies. The problem is KM clipboard actions seem to only see the formatted text.
E.g. if I copy the link to the Forum at the bottom of the KM website it sees the text "Forum", not the underlying HTML / URL.

<a href="https://forum.keyboardmaestro.com/">Forum</a>

What I am trying to do is to change the formatted text to add some text from the underlying URL/HTML. E.g. make above:

<a href="https://forum.keyboardmaestro.com/">Forum - forum.keyboardmaestro.com</a>

Applications from which we copy typically place several different pasteboard items (different formats) in the system clipboard.

The receiving application decides which of these to read.

Keyboard Maestro will default to reading the UTF8 text component – in the case of a browser-copied link, just the label.

One way to extract any public.html pasteboard item that is in the clipboard, so that you can work with the HTML source, is to run an Execute JavaScript for Automation action like this (which you could precede with a Copy action to copy your browser selection).

HTML source from clipboard.kmmacros (4.7 KB)


Assumes Keyboard Maestro version 11.

Expand disclosure triangle to view JS source
// (() => {
//     "use strict";

    ObjC.import("AppKit");

    const main = () =>
        either(
            x => x
        )(
            x => x
        )(
            clipOfTypeLR("public.html")
        );

    // ----------------------- JXA -----------------------

    // clipOfTypeLR :: String -> Either String String
    const clipOfTypeLR = utiOrBundleID => {
        const
            clip = ObjC.deepUnwrap(
                $.NSString.alloc.initWithDataEncoding(
                    $.NSPasteboard.generalPasteboard
                    .dataForType(utiOrBundleID),
                    $.NSUTF8StringEncoding
                )
            );

        return 0 < clip.length ? (
            Right(clip)
        ) : Left(
            "No clipboard content found " + (
                `for type '${utiOrBundleID}'`
            )
        );
    };


    // --------------------- GENERIC ---------------------

    // Left :: a -> Either a b
    const Left = x => ({
        type: "Either",
        Left: x
    });


    // Right :: b -> Either a b
    const Right = x => ({
        type: "Either",
        Right: x
    });


    // either :: (a -> c) -> (b -> c) -> Either a b -> c
    const either = fl =>
    // Application of the function fl to the
    // contents of any Left value in e, or
    // the application of fr to its Right value.
        fr => e => "Left" in e
            ? fl(e.Left)
            : fr(e.Right);

    return main();
// })();

Thanks @ComplexPoint. Combined with a bit of regex work on the results should get me what I need.

I had thought it was a simple question and never guessed it would need all of this. But thanks to you and this forum I have a way forward. :smile:

1 Like

For reference, see also:

Which browser are you using? I copied the link to the forum at the bottom of the page in Safari, and then ran this macro:

It worked perfectly:

I tried with any number of other links, and it always returned the URL.

-rob.

Rob, I am using Chrome, and the Copy right click action so that the link includes the formatted text (Forum) as well as the URL. On KM V11.

Thanks ComplexPoint for the extra links. I'll have a look at those as well.

Exactly which option is that in the contextual pop-up? I tried with Copy Link Address, and my simple macro worked fine.

It failed with Copy Link to Highlight, but that includes a bunch of extra text, so I didn't expect it would work. Edit > Copy just copies text, so that doesn't apply, and I don't see any other copy link options in Chrome's contextual menu.

-rob.

That’s strange Rob. It is the “Copy” option (no other words) via a right click context menu on the link.

For me that one copies the link as it shows in the browser, allowing me to paste it in an email as nicely readable text that is still a link the reader can click on to open the link.

Depends, perhaps, on whether the email client is in rich text mode or plain text mode ?

( The choice of pasteBoard item – public.utf8-plain-text vs public.html or public.rtf – will turn on that )


To see the set of (textually representable) pasteboard types in your clipboard at any given moment, you can use this: Clipboard Viewer


Or to simply see a list of the types, without a view of their contents:

List of pasteboard types currently in clipboard.kmmacros (2.1 KB)

2 Likes

To make sure no doubt this is what I select in Chrome, doing a right click on one of the other topics in this forum.

I can then paste it into my email etc and it looks like the below. I have done a Cmd-K on the
link so you can see what has been pasted as well.

1 Like

Which here, FWIW, creates a clipboard of the following pattern (html + plain text pasteboard items):

public.html

<meta charset='utf-8'><a href="https://forum.keyboardmaestro.com/t/extract-html-part-from-a-formatted-url/34837" role="heading" aria-level="2" class="title raw-link raw-topic-link" data-topic-id="34837" style="background-color: rgb(255, 255, 255); color: var(--primary); text-decoration: none; cursor: pointer; padding: 15px 0px; word-break: break-word; outline: none; font-family: Arial, sans-serif; font-size: 17.2397px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal;">Extract HTML part from a formatted URL</a>

public.utf8-plain-text

Extract HTML part from a formatted URL

It seems it depends on where it's being pasted, as @ComplexPoint basically explains. If I use Copy and paste into a KM variable, then I just get the copied text. If I paste into a rich field in KM (or TextEdit or Mail, etc.) then I get the formatted link.

If I were doing this, I'd probably just use the Copy Link Address menu item instead of Copy, because then you just get the URL. But I don't know your full needs, so that may not work for you. (Or I'd use Safari, where it just seems to work with Copy :slight_smile: ).

-rob.