Create a Macro to Strip HTML Tags from Text

I regularly create text with embedded HTML and want to strip the HTML from a chunk of text in the clipboard, leave the text in the clipboard for repasting. So, I might copy this text:

If you've got a Sonos smart speaker system in your house

I'd like to then execute the macro and end up with:

If you've got a Sonos smart speaker system in your house

in the clipboard. Note that this strips the anchor tag as well as the end anchor. I do this frequently, so a macro would be a real timesaver.

Thanks!
-- Dave

This action will convert the clipboard to plain text:

Appreciate the response. The problem is not to remove styles, it is to strip tags. I use the convert to plain text to remove styles such as bold or italics, but it does not strip the HTML tags from the text.

The problem with my original post is that the HTML got processed in the post. What I really want is to convert this text:

If you’ve got a [a href=“http://www.sonos.com/system” target="_blank"]Sonos smart speaker system[/a] in your house

to this text:

If you’ve got a Sonos smart speaker system in your house

Note that I replaced the angle-brackets with square brackets so the forum software would ignore the HTML.

Hopefully this makes the question a bit clearer. Thanks for your help.

– Dave

I think a little JavaScript should do the trick.
Do a google search on "JavaScript strip tags" and you will get several hits.

Here is one:
How can I strip the HTML from a string in JavaScript?

You can remove everything within angle brackets using a Regular Expression search & replace, something like:

Perhaps better is to use textutil to convert the HTML to text:

pbpaste | textutil -format html -convert txt -stdin -stdout

2 Likes

Very Nice. Added to my KM examples list.

Very interesting. Any way to do that within a Keyboard Maestro macro?

Genius! Just the right solution for me. Thanks Peter!

Peter's solution using textutil is by far the best and simplest. That's what I would use.

Pointing out that this does not work for non-english text.
Before : prépayée
After: prŽpayŽe

The pbpaste textutil I mean command, I meant.

It probably does if you have configured the LC_ALL environment variable as described in the Execute a Shell Script action by setting the ENV_LC_ALL variable to “en_US.UTF-8”

When running it on prépayée, instead of prŽpayŽe, now I get prépayée.

textutil needs an input encoding set:

pbpaste | textutil -format html -convert txt -inputencoding UTF-8 -stdin -stdout

Output encoding defaults to UTF-8, but inputencoding “will be determined from its BOM” which there is not one.

Thank you for your time and for offering these suggestions. I updated the code and now I get pr�pay�e from prépayée.

That is your actual macro? I just use this:

Keyboard Maestro Actions.kmactions (925 B)

and the window displays with the correct results (ie, unchanged). LC_ALL and LANG are both set to en_US.UTF-8.

This is also with Mojave, it is possible that textutil has changed, though I doubt it.

I duplicated this exact same macro and go pr�pay�e again. I'm also on Mojave.

My output of locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

With what macro?

This works fine for me:

Yes, still getting pr�pay�e. I'm happy to let this issue go and just use something else for text with special chars.

A JavaScript for Automation (JXA) footnote:

// plainTextFromHTML :: String -> String
const plainTextFromHTML = strHTML =>
    ObjC.unwrap(
        $.NSAttributedString.alloc
        .initWithHTMLDocumentAttributes(
            $(strHTML)
            .dataUsingEncoding($.NSUTF8StringEncoding),
            0
        ).string
    );

e.g.


Strip HTML tags and decode to plain text.kmmacros.zip (1.9 KB)

Expand disclosure triangle to view JS Source
(() => {
    "use strict";

    const main = () =>
        plainTextFromHTML(
            Application("Keyboard Maestro Engine")
            .getvariable(
                "htmlTaggedAndEncodedSample"
            )
        );

    // plainTextFromHTML :: String -> String
    const plainTextFromHTML = strHTML =>
        ObjC.unwrap(
            $.NSAttributedString.alloc
            .initWithHTMLDocumentAttributes(
                $(strHTML)
                .dataUsingEncoding($.NSUTF8StringEncoding),
                0
            ).string
        );

    return main();
})();
1 Like