With reference to the the "Kindle: Your Notes and Highlights" page, I'd like to paste all the highlights for a given book into a text file, but with any blank lines stripped out, and all the location markers (e.g., "Yellow highlight | Location: 1,531") deleted. Could someone suggest a macro for this? Thanks!
That page isn't on the open Web but I presume that you will be copying text from the Kindle app on your Mac and that you are fine with that part.
Are we dealing with plain text only? I will assume that we are.
To remove empty lines, you could use a Search and Replace
action set to use regular expressions. Search for the following.
(?m)^(?:[\t ]*(?:\r?\n|\r))+
[Edited 2024-06-23 to add the multiline flag ?m
, which is needed for KM.]
Leave the "replace" field empty.
This time, search for
\w+\shighlight\s\|\sLocation:\s[0-9.,]+
Try putting a macro together and let us know where it goes wrong – because, in context, that regex may need some tweaking.
This macro works, though with a very limited test set, as I only have three entries (two notes and a highlight) on my Kindle page. It only works when Safari is frontmost, and is set to trigger on Shift-Control-Option-K.
How to use: Select the text on the Kindle page, starting with the first note reference, like this:
Then invoke the macro. It will copy the text and put it in a variable, then use regex to delete the non-note text and remove the blank lines. It then displays the results, but you could easily do whatever you like with it at that point.
Download Macro(s): Kindle note processing.kmmacros (32 KB)
Macro notes
- Macros are always disabled when imported into the Keyboard Maestro Editor.
- The user must ensure the macro is enabled.
- The user must also ensure the macro's parent macro-group is enabled.
System information
- macOS 14.5
- Keyboard Maestro v11.0.3
The regex will need adjusting if there are other note types whose ID lines don't contain Location: [a digit]
. It will also fail if you have notes whose text contains that string, but hopefully that's not the case :).
Note that there's probably a much better way to do this using Javascript to extract the values using IDs or classes, but I'm nowhere near good enough with Javascript to take that approach.
-rob.
Thank you for your suggestion! This works pretty well, although there is still a blank line between each highlight, and I'd prefer the text simply be copied to the clipboard, rather than appearing in a box, if possible?
There aren't any blank lines when I run it on my sample notes, so I don't have any idea what you're seeing. However, I can fix it if you're willing to send me a copy (either paste here as text, not a screenshot!) or message me directly, the contents of that page, copied as shown above.
The text box is for a demonstration, as the data is saved in a variable. You can do whatever you like with the variable: Save it to a text file, copy it to the clipboard, whatever. The macro I provided does 99% of the work, so all you need to do is figure out what you want to do with the data. Hint, there's a Set Clipboard to Variable action you can use to do what you want.
If you send me the text of your Kindle notes page, I can look at the code I'm using to trim it down and figure out why you still have blank lines.
-rob.
Okay – I've managed to copy the text to the clipboard using "Set Clipboard to Variable" as you suggested.
Here's an example of the line breaks / blank spaces I'm getting:
La frontière séparant les deux pays ressemble à toutes celles d’Amérique latine avec leurs cohortes de changeurs au noir, de douaniers véreux et de trafiquants en tout genre.
Tête inclinée, regard suspendu, âme silencieuse et rêveuse, je m’évade avec les magnifiques condors.
Le sud-est de l’Équateur est prosaïque ; ici s’étend la terre de la classe paysanne, nourrie par le travail des champs.'
Each blank line seems to comprise 17 spaces.
I need to see the actual text you copy from the notes page, not the processed version. The code I'm using to strip out some of the content is clearly missing something, but I can't see what it is from the processed version. Just select and copy the text on your notes page there like you were going to run the macro, then paste here.
When pasting here, type three backticks (`) in a row, press Return, paste the text, press Return, then type three more backticks and another Return. That will put it in a code block that the forum software won't modify. It'll look like this:
Your text
will be here
etc.
thanks;
-rob.
Got it. Here you are:
Yellow highlight | Location: 1,662
La frontière séparant les deux pays ressemble à toutes celles d’Amérique latine avec leurs cohortes de changeurs au noir, de douaniers véreux et de trafiquants en tout genre.
Yellow highlight | Location: 1,667
Tête inclinée, regard suspendu, âme silencieuse et rêveuse, je m’évade avec les magnifiques condors.
Yellow highlight | Location: 1,671
Le sud-est de l’Équateur est prosaïque ; ici s’étend la terre de la classe paysanne, nourrie par le travail des champs.
Yellow highlight | Location: 1,677
La feuille de coca devint universelle et conquit le monde au point de donner naissance au produit le plus vendu de l’histoire : le Coca-Cola. De la coca ajoutée à de l’extrait de noix de cola.
Thanks, that's what I needed to see. There are spaces on lines that are blank (and you have two blank lines where my text only has one; I expect that must somehow be coming from whatever browser you use versus my use of Safari?). In any event, it's a pretty easy fix.
In the action named "Remove blank lines - very handy regex," change the "for" search to this:
(?m)^\s+\n
That will find both blank lines and blank lines that contain spaces; testing with your data, here's what I got for output:
La frontière séparant les deux pays ressemble à toutes celles d’Amérique latine avec leurs cohortes de changeurs au noir, de douaniers véreux et de trafiquants en tout genre.
Tête inclinée, regard suspendu, âme silencieuse et rêveuse, je m’évade avec les magnifiques condors.
Le sud-est de l’Équateur est prosaïque ; ici s’étend la terre de la classe paysanne, nourrie par le travail des champs.
La feuille de coca devint universelle et conquit le monde au point de donner naissance au produit le plus vendu de l’histoire : le Coca-Cola. De la coca ajoutée à de l’extrait de noix de cola.
-robv.
It works great with that amendment. Thanks again!
I have just updated my suggestion at the top of this thread since I had omitted the multiline flag that KM needs. So it now reads:
(?m)^(?:[\t ]*(?:\r?\n|\r))+
It's a recipe that I found in a few places on the Web. It is of course overkill here, but I use it as standard since I can just rely upon it doing the job. It handles spaces and tabs, and for different OS conventions of indicating a new line.[1]
Difference between CR LF, LF and CR line break types on Stack Overflow. ↩︎