Removing Embedded links from RTF Document

I have been trying to remove Embedded links from RTF documents using Regex without making anyheadway. I've tried uding (<a href[\s\S]?>[\s\S]?)|(\b(http|https)://.*[^ alt]\b) but that requires access to the RTF source. The text editor I must use does not expose the RTF source code.

Has anyone solved the problem of removing Embedded links and the corresponding text from RTF files? I want to remove the embedded link and text "Wayback Machine", “Page 3:” and "Page 3" from the pasted text below as an example.


Perhaps easier to:

  1. Temporarily convert to HTML (e.g. with the textutil command in the shell)
  2. Strip <a> links out of the HTML
  3. Convert from the stripped HTML back to RTF (perhaps again with texutil)

They've banned you from using TextEdit?!?

If they haven't, you can "easily" get at the raw RTF by clicking "Options" in the "Open" dialog and ticking "Ignore Rich Text commands". You might be able to do something with that, eg Select All -> Copy -> regex process the clipboard in KM -> Paste.

That's a bit long-winded, though. I've just had a try in TextEdit and it appears that "Select All" then "Edit"->"Edit Link..." and then click the "Remove Link" button will strip all links from the selected text.

Call me naive, but I never knew you could use Textedit to view RTF code. Thanks for the new knowledge and the tip about "Remove Link"

I only found out the other week :wink: -- and yes, while trying to do a macro for someone here...