Batch scanning

Scanning and then translating that info to a searchable PDF often requires considerable waiting for the translation to be made. I want to be able to do the scanning to a folder with no OCR, then, later, when I’m doing other things, have the computer take the scanned pages into ABBYY FineReader for ScanSnap and turn each document into a searchable PDF.
So far I am just working with one file at a time. As each document becomes searchable, I want to move it to another folder, but so far the program just keeps making the same searchable PDF over and over again and keeps it, and the original, in the folder rather than moving it.
What I have so far is along these lines:

For Each Item in a Collection Execute Actions (Files in a Directory)
Execute the following actions
Open ABBYY
Move or Rename File

What do I need to do to make them complete their actions?

Thanks!

You might consider this workflow:

  1. Scan your documents to a “Inbox” folder
  • Have a folder watch script/macro that FIRST moves the file to a “OCR” folder.
  • The script knows the new path, so it triggers the ABBYY app with that path.
  • If you want to know when the ABBYY app is done (if it can’t tell you directly), then have another script/macro that checks on file modification date, and looks for one later than the date it was put in the"OCR" folder.

Just some ideas.

2 Likes

I have downloaded the trial of FineReader. It seems it is pretty scriptable via AppleScript.

Scripting FineReader directly has the advantage that you don’t have to rely on things like checking the modification date of files, since via AppleScript we can determine if FineReader is still busy or not.

So, here a first draft:

OCR with FineReader.kmmacros (2.8 KB)

The green colored actions: Set your paths here: the first one for your source directory, the second one for the destination directory (for the OCR’ed documents).

Caveats (for this draft version):

  • I can’t figure out how to set the OCR languages. Maybe somebody can help?
  • The script only works when there are more than one files in the source directory. Probably a completely stupid thing, but I’m already very tired, so if somebody has the solution, don’t hesitate to…

And, as usual, I’ve built in some more bugs, in order to prevent this topic from ending prematurely.

The important thing for the moment: in this form, does the macro work for you? If Yes, then we can go into the details.

4 Likes

It is working really fine, thank you very much for this macro.

Did you (or anybody else) ever found a way to choose OCR languages and to make it run with only 1 pdf inside the folder?

Okay it is quite simple. Just change the type of "theFiles" in the finder part to a list, which would be text, if it is only one file in the finder and then "repeat with" does not work anymore.

tell application "Finder"
   set theFiles to items of entire contents of folder srcDir as list
end tell