Is it conceivable to extract annotations (yellow on black) from selected text in an RTF file?

As far as I understand, RTF formats are not proprietary which is why apps like DevonThink do not offer the option to extract annotations (contrary to PDFs).
That being said, annotations do have black font over yellow background which is perhaps a clue to a way to extract them.
thanks in advance for your time and help

Here's an untested idea:

  1. Open the RTF file as a plain text file so you can see the RTF codes
  2. Use RegEx to extract the text marked with annotation background color.
    • For example, if the RTF text is:
      <BG:yellow>The text that is the annotation</BG>
    • Then this would extract the text in the Capture Group:
      <BG:yellow>(.+?)<\/BG>

NOTE: I have no idea what the actual RTF codes would be. I just made those up.

Good luck, and let us know how it goes.

1 Like

great idea ! I will try

Just for the record, RTF is a proprietary format belonging to Microsoft and PDF was open sourced by Adobe in 2008.

1 Like

thank you

in DevonThink, if I convert RTF to txt, all I get is the same text with no highlight. Same with BBEdit.
If I convert to PDF, the yellow annotations are visible but not seen nor compiled as such by DevonThink in the annotations listing.
What I can do in DevonThink is to convert to HTML. Quickly, without taking any of your time or doing any research, would does a solution come to mind ? If not it's all good.
thank you and happy new year !

How does the RTF looks like? Do you have a few lines of an actual file your are testing?
(The next 30 min, I'm available, otherwise next week)

Do not convert the RTF to text -- that produces pure plain text.
Open the actual RTF file in a plain text editor like BBEdit, or like the KM Read File into Variable.
That will expose the RTF codes that we are looking for.

EDIT: Example of RTF file opened in BBEdit:

{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;}
{\colortbl;\red255\green255\blue255;\red255\green255\blue51;}
{\*\expandedcolortbl;;\cssrgb\c99946\c98636\c25320;}
\margl1440\margr1440\vieww14920\viewh10800\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs32 \cf0 This is an example of a RTF file opened as text -- NOT converted.  This line has no formatting.\
Here is \cb2 text formatted with the background in YELLOW\cb1  and now more unformatted text.}

NOTE: I could not open the file with a .rtf extension in BBEdit. I had to rename the file to add a .txt extension.

If for some reason opening the RTF file does not work, then you might use this, assuming that the text highlights in the RTF are preserved in the HTML file.
Here again, you will want to open the HTML file with BBEdit, or KM Read File.

Happy New Year!

1 Like

thanks very much and happy new year to you.

@ronald -- If you're still having trouble, let us know how far you got. I'm pretty good with decyphering RTF.

Once, many years ago (1999, 2000), before there were tools for such things, I created a template in Word, where you could fill in the appropriate info (in this case, extended text references to API functions) and then run the RTF file through a BASH script using mostly SED, and get a new RTF file that collapsed the text fields into a summary table. It totally depended on everything in the original file being tagged with styles properly and strictly. Manually applied styles could break it. The script then found the appropriate bits of text as tagged by the style markers and restructured the data to be surrounded by a new set of RTF style markers, and the resulting table could be opened in Word for printing.

Haven't needed anything like that for a long time. But I can say from experience, if you can keep the way the text is tagged consistent, then you ought to be able to find it and extract the data you need.

1 Like

Thanks very much. I no longer need it. Very kind of you.