This was a really good read on Github about automating OCR on the Mac: https://github.com/dannguyen/abbyy-finereader-ocr-senate
He mentions some tools like http://brewformulas.org/Poppler which includes pdftotext as a useful way to extract already OCR’d PDFs and Abbyy which is what they used since Tesseract doesn’t handle tabular data. Their Mac version is pricey and is always a step or two behind the Windows version, but I’m interested in learning more API programming and feel that their cloud OCR service could be a really good automation candidate (ocrsdk.com).