Run OCR on Several PDF or JPG Files Selected in Finder and Save in Markdown

Hi,

How should I create a macro that would run OCR on several pdf or JPG files selected in finder and save the result in markdown format. Additional question is how one can copy a table from pdf or jpeg and save it in a table markdown format.

Thank you,

can you share what you got so far to see how you are approaching it.

Here you go:

OCR PDF and:or JPG.kmmacros (2.6 KB)

I think the only thing you are doing wrong is not putting the % symbol each side of the Variable in this Action. (The % symbol each side tells Keyboard Maestro that this a Variable rather than a file called File_Path.)

1 Like

As @Zabobon has pointed out your error, you might also want to make sure the MD file is saved in the same folder as your PDF/JPG, like this:

KM 0 2021-04-15_10-27-44
Hope that helps.

Hi @tiffle I always get confused as to whether I should indicate a variable just with % signs each side or with %Variable% in front as well. Just using the % sign each side seems to work (but are there times it wouldn't)?

1 Like

You know @Zabobon, in the mists of time I think I used to leave out the %Variable% and I ended up spending too much time trying to debug a faulty macro only to realise adding that %Variable% back in cured the problem - so now I always include it! But, to be honest, that was so long ago I’m not even sure my memory of the event is accurate. I do know for a fact that including it always works, so that’s what I do now. YMMV😗

PS - it also ensures I don’t get confused with KM tokens!

2 Likes

Ah, I have found the answer in the Wiki. I should have checked before asking, but it is reassuring to know I'm not the only one who finds this confusing :smiling_face:

You can also use a short form of just %Variable Name% to include variables as long as the variable exists and has a value and there is no corresponding text token, although generally it is better and clearer to use the longer form %Variable%Variable Name%.

So, it seems problems can occur with the short form method if there is a built in token with the same name which makes sense. So, if I made my own Variable called for example %SystemClipboard% I could still use it as an independent Variable as long as I wrote it as %Variable%System Clipboard%

2 Likes

Good to know👍

Appreciate your help @tiffle and @Zabobon.

Here is the macro that includes your suggestions.

Unfortunately while it works with image files (jpeg, png), it does not work at all with PDF files.

Also while I put .md, it only saves a text file.

The questions I have are:

  1. How to add OCR for PDFs (readable pdfs, not image pdfs)?

  2. Is it possible to add translation to markdown from txt?

  3. Is it possible to translate tables from jpg and pdf format to markdown?


OCR PDF and:or JPG.kmmacros (3.2 KB)

First off, you do not need that first Set Variable action - it serves no useful purpose so just delete it.

  1. To extract text from a readable pdf, refer to this thread from elsewhere in this forum:
    Copy Text From Multiple PDF Into a New Numbers Spreadsheet
    There you will see the solution to your requirement.
  2. Obviously sticking “.md” as an extension to a file will not automagically convert it to markdown, it will just open in your default markdown editor. I don’t know how you would do that conversion.
  3. Markdown is simply a text file with special tags in it that control the way the file is eventually displayed with formatting. To convert anything to markdown, you need to obtain a text file with the original layout preserved - and then you can convert the layout to markdown tags. Your macro gives you the ability to obtain text from images (jpg and pdf) which you then need to feed into a markdown converter; there are online ones, but KM has no actions built-in to support that. I can’t help with that but as a start, search the KM forum as I know lots of other users employ markdown and so someone may already have addressed this task.

Good luck.

1 Like

Hey Folks,

In my opinion.

Never use the shorthand notation of %VariableName%.

Always use the full notation of %Variable%VariableName%.

If you don't it will be impossible to distinguish between Keyboard Maestro text tokens and your variables, and you'll end up regretting it down the line somewhere.

-Chris

1 Like

KM Wiki ⇢ OCR Image action.

“The OCR Image action allows you to extract the text from an ==image== using OCR (Optical Character Recognition) (specifically using the Tesseract OCR library).”

Keyboard Maestro's OCR does not support styled text.

PDFs are not supported.

-Chris

2 Likes

Thank you for you help!

Thanks @ccstone and @tiffle.
I've just searched through my macros and changed any of my own Variables where I had used them in an Action just with single % each side of their name. They now all have %Variable% in front.

Where I got confused before was seeing %SystemClipboard% (for example) in a Macro without %Variable% in front of its name. But now I get it. Both Tokens and Variables use the %% to identify themselves. Both will work with just %% but including the text %Variable% confirms that what follows is definitely a Variable rather than a Token.

So, bottom line - just put %Variable% in front of any of my own Variables that I want to make use of in an Action. I was mostly doing this before but not knowing the reason why :slight_smile:

2 Likes

Hey @Zabobon,

This will ease the pain of transforming a variable name into a variable token.

Macro: Create Variable Token from Selected Text

-Chris

2 Likes