RTF Clipboard Text and Mangled Accented Characters

[Edited for clarity]

Hi, I have a quite specific character encoding problem that I'm hoping I can solve with KM v9.2.

I am running a Windows XP virtual machine under MacOS 11.2.3 using Parallels. I need to copy various bits of text from a program running in XP, including formatting such as italics and bolded characters. I then need to be able to paste that text into programs in MacOS. The program I'm copying from uses RTF clipboard formatting, so that's what I use.

I presume the Windows machine is using Windows-1252/ISO-8859-1 character encoding.

In XP, when I copy text to the clipboard, which is shared by the host and guest operating systems, I get character encoding errors on the MacOS clipboard. For example, the clipboard (viewed using the KM clipboard switcher) looks like this:

Eine unterdrþckte Vorrede, unverüffentlicht.

The italics are correct, but the characters should look like this:

Eine unterdrückte Vorrede, unveröffentlicht.

Now, I can convert the wrong characters to correct characters with "Set Clipboard to Plain", for example, but then I lose the formatting (the italics).

I have tried using the "Execute shell script" option, with commands such as:

pbpaste | iconv -f ISO-8859-1 -t UTF8 | pbcopy

or the following to convert to html, in an attempt to retain the formatting:

pbpaste | textutil -convert html -stdin -stdout | pbcopy

But the first option drops the formatting and further mangles the characters, and the second option drops the formatting (no tags for the italics that are on the clipboard), though it does produce the correct characters.

I also tried unrtf to produce html output, in an attempt to retain the formatting:

pbpaste | unrtf --html | pbcopy

but that too drops the italics for some reason, and also drops the accented characters.

So, in short, is there a way to take the initial clipboard data from XP and convert the characters to correct characters while retaining the formatting?

So after quite a lot of research into the structure of the clipboard and RTF standards, I figured out that there were a number of problems working against me, none of which could really be directly addressed in KM (as far as I can make out). I am leaving this here not because the solution is KM-related, but in case the information here might help someone else (though I admit that the problem was pretty specific to my particular scenario).

One dumb problem was that I wasn't going about getting the data off the clipboard correctly. The dumbest and biggest problem, though, was that the copied RTF data didn't have a character set defined because, for whatever reason, the RTF standard doesn't require it. So I set that to ANSI code page 65001 (\ansicpg65001). That allowed most of the special characters in the copied data to be recognized under MacOS, but there were still a few weird problems, so I wrote a shell script to replace all characters above ASCII 127 with their RTF escape codes (maybe there's an easier way to do this than with sed, but if so I couldn't find it... and textutil mangled things), and set the RTF character set to plain ANSI. I also used Autohotkey and the WinClip includes in Windows XP in order to get the RTF data onto the clipboard as plain text.

In short, while none of that can be directly addressed in KM, KM did let me automate the whole process. Thanks KM.

1 Like