Getting HTML and RTF strings from the clipboard

This has come up a few times recently, so here's a general summary note:

Once you have an HTML or RTF string, textutil does pretty good and easy HTML ⇄ RTF conversion.

Even if the clipboard contains HTML or RTF content, however, inspection through Applescript

the clipboard as record

will reveal it to be in a hex-encoded format, rather than as an expanded and legible string of the kind required by textutil

Unpacking it from Applescript has traditionally required something like:

-- classType: e.g. «class RTF »
on pboardUnpacked(classType)
    return (do shell script "osascript -e 'the clipboard as " & classType & ¬
        "' | perl -ne 'print chr foreach unpack(\"C*\"," & ¬
             "pack(\"H*\",substr($_,11,-3)))' | pbcopy -Prefer txt; pbpaste")
end pboardUnpacked

Alternatively, using the built-in .stringForType() function of NSPasteboard, we could write (in JavaScript for Applications):

ObjC.import('AppKit');

// Types: 'public.rtf', 'public.html' etc
function pboardUnpacked(strType) {
	return ObjC.unwrap(
		$.NSPasteboard.generalPasteboard.stringForType(
			strType
		)
	)
}

Remember that apps vary in what they put on the clipboard. Safari offers public.rtf but not public.html (it does puts a hex-encoded WebArchive version in the clipboard), whereas after copying from Chrome, you will find public.html contents in the clipboard, but no public.rtf.

Textutil, however, will convert string versions of HTML to RTF (and vice versa)

For example:

pbpaste | textutil -format html -convert rtf -stdin -stdout | pbcopy -Prefer rtf

UPDATE

For Safari WebArchive clipboard content (plist rather than text), again in JavaScript for Applications:

ObjC.import('AppKit');

// Types: 'com.apple.webarchive' etc
function pboardPlist(strType) {
  return ObjC.deepUnwrap(
    $.NSPasteboard.generalPasteboard.propertyListForType(
      strType
    )
  )
}

pboardPlist('com.apple.webarchive');
2 Likes

Thanks, Rob.
It’s very good to have this summarized in one place.
Very useful.

Help please? I can’t quite unpack what is being shown here. What I am trying to do is turn style text on a clipboard into HTML text (not a data structure or encoded clipboard) such as could be put into an HTML page or used directly with textutil. I can’t tell if what’s on this page would help me do that or, if it could, how.

If you have copied some styled text into the clipboard (rtf), then you should be able to get the RTF text version, which could then be passed through textutil, with something like this:

((function () {
    'use strict';

    ObjC.import('AppKit');

    function pboardUnpacked(strType) {
        return ObjC.unwrap(
            $.NSPasteboard.generalPasteboard.stringForType(
                strType
            )
        );
    }


    var strRTF = pboardUnpacked('public.rtf');

    if (strRTF) {
        var a = Application.currentApplication(),
            sa = (a.includeStandardAdditions = true, a);

        sa.setTheClipboardTo(strRTF);
        return strRTF;
    }
})();

So to replace the rtf clipboard with HTML text, ready for pasting, perhaps something like this:

(function () {
    'use strict';

    ObjC.import('AppKit');

    function pboardUnpacked(strType) {
        return ObjC.unwrap(
            $.NSPasteboard.generalPasteboard.stringForType(
                strType
            )
        );
    }


    var strRTF = pboardUnpacked('public.rtf');

    if (strRTF) {
        var a = Application.currentApplication(),
            sa = (a.includeStandardAdditions = true, a);

        sa.setTheClipboardTo(strRTF);

        sa.doShellScript('pbpaste | textutil -format rtf -convert html -stdin -stdout | pbcopy -Prefer html')

        return sa.theClipboard();
    }
})();

Which on my system takes a copied textEdit selection like:

to:



  
  
  
  
  
  
    p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 48.0px Helvetica}
    span.s1 {color: #e32400}
    span.s2 {font: 48.0px 'Ezra SIL'}
  


Big red text

2 Likes

Thanks for sharing the complete solution for this. Looks like a very useful tool.

Is there a way to modify this so that it saves the result to a Keyboard Maestro Variable instead and leaves the clipboard intact? I have a macro which currently writes the clipboard to a file and converts that with textutil to HTML because I need the clipboard to remain as it is for extra actions later. It would be cool to be able to do this with a script without having to create a file!

EDIT: Managed to get it to work by setting clipboard to Past clipboard (2) after running this script, and continuing! Still interested to know which parts to edit though.

Hey @quickreactor,

Just remove this part of @ComplexPoint’s script:

| pbcopy -Prefer html

Then save the output of the action to a variable.

-Chris

See also this thread on the Script Debugger forum, for rtfFromHTML, and htmlFromRTF functions (In Javascript and Applescript) which bypass the clipboard:

Thank you both, this will allow me to vastly simplify and streamline my macro!

Is this still working? Do I need something special to run the script?

Tell us more about the context ?

  • Where is the input coming from ?
  • what output do you want from it ?
  • and where is that going ?
1 Like

I have a software with German dictionaries called "DUDEN". The text inside the software seems to be RTF

The idea is to turn the RTF text in the clipboard to HTML.

Basically just RTF to HTML inside the clipboard.

  1. The first thing is to check what pasteBoardItems Duden puts into the clipboard when you copy. (It may be that it includes a public.html pasteBoardItem, and we just need to make that available for pasting as plain text. To inspect clipboard contents, you can use: Clipboard Viewer - Macro Library - Keyboard Maestro Discourse

  2. What are you going to paste into ? MS Word ? Something else ? (The details of how clipboard pasteboardItems are selected, translated and used are up to the receiving application).

I see you've used -Prefer html as a flag for pbcopy. Is that an undocumented option, or is it simply failing on html and falling back to txt?