Text soap (clean text) macro does not work. Going bananas

Looks like @gglick has given you a very good solution.

Out of interest in showing/learning RegEx, here's another solution. You will find that there are as many RegEx solutions as there are programmers. There is almost always a trade-off between precision and comprehensiveness of the solution.

My solution is more comprehensive, but could make unwanted matches if the pattern is found in the text you want to keep.

Basically, it deletes all characters on a line prior to, and including:
]: (there's a space at the end)

So this will match any strings at the start of a line like "highlight [page 1]:" or "Underline[133]:",
or even stuff like "just some text and then ]:"

RegEx to Search for:
(?m)^[^\]\n]+\]:[ \t]+

This assumes that this pattern will NOT occur anywhere in the text you want to keep. This solution should be more flexible/comprehensive in that it allows any characters other than "]" and newline "\n" in the text to be deleted. It also allows multiple spaces and/or tabs after the "]:" .

For RegEx Details and Explanation, see:

Here's my test macro:

##example Results

##Macro Library   RegEx to Remove PDF Info at Beginning of LIne


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/b/ba4e00b337dadd7f0e426d7639ecfc6d2910d9f5.kmmacros">RegEx to Remove PDF Info at Beginning of LIne.kmmacros</a> (3.6 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---


<img src="/uploads/default/original/2X/4/450c1a89db6152efdcbc329788dd61aa4da07a98.png" width="459" height="1031">
1 Like

the expression works perfectly. thank you very much for the message and the macro.

Most of the text I will scrub will be in Scrivener or Nisus Writer Pro or perhaps Pages

One issue is that I use the underline annotations mostly for headers/titles , and the highlight to extract important snippets of text.

After processing with the text soap macro, the start of sentences (for last of a better word) are deleted and the blank lines are removed (I added replace (?m)^\n) with nil).

At the end, I have to manually make the text look nicer, so that I end up with a readable summary.

In an ideal world, most of it could be automatic, such as :

  • all lines containing the word underline are titles or headers, and I would make them bold****
  • all lines containing the word highlight are short sentences with important information**** but they do not need to each occupy one line.The could follow each other with a β€˜.’ in between.

example:
raw text before processing
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ
underline [page 11]: True Launch Bar
highlight [page 11]: create a virtual folder, click on the β€œBrowse
highlight [page 12]: True Launch Bar supports all icon sizes.
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ

result with my current macro:
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ
True Launch Bar
create a virtual folder, click on the β€œBrowse
True Launch Bar supports all icon sizes.
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ

ideal end result after processing by KBM (deleted start of sentences, underline sentence in bold. After browse (end of first line with highlight), added a β€˜.’, and deleted once space to delete newline and drag the second sentence up so that it follows the first.
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ
True Launch Bar
create a virtual folder, click on the β€œBrowse. True Launch Bar supports all icon sizes.
γ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œγ€œ

I imagine that if this is possible, I would have to change the search from plain text to markdown, which is no problem (although plain text works in Pages, Nisus, Scrivener )

I don’t want to create a lot of work. I was wondering how complicated such a endeavour would be.

thank you in advance for your time and help

The answer is "not terribly," but you do have to know how to go about it. Try this new sample macro and see if it works:

Text Soap 2.0.kmmacros (8.3 KB)

1 Like

thank you very much and so sorry for taking your time.
All is fine. Bold is replaced by asterixes of each side of the word, which is good enough.
thanks again very much

1 Like