Finding PDF Invoices, Combining them with Purchase Orders and Check Copies


I'm fairly new here, but I was wondering if anyone could help out/point me in the right direction.

I have a fairly large project and need to find thousands of invoice copies scattered around in a Dropbox account and file them with their check copy and purchase order copy.

I have Excel spreadsheets of the check numbers, invoice numbers and PO numbers, but not sure where I would begin?

The issue is some of the invoices are separated individually and others are all merged in one pdf, so I would need to be able to get the content from all of the pdfs to search for the invoices and then match those to the check copy they belong to.

Any help would be awesome!

Thank you all!

Hey Roger,

Start by seeing if Spotlight can find your invoice numbers.

Also – tell us how you have your checks and POs organized.


Hey ccstone - thanks for reaching out.

So, I have one folder with all 3000 check copies and another folder with the 2200 po's.

I am able to find some in Finder - they're also all in DocuSign.

The problem there are a number of single files with 100+ invoices in one file all for different vendors etc that need to be split then merged in to the check copy they belong to.

I appreciate any help - I'm pretty new at this, but am in a major time crunch to get this completed.

I have a ledger of everything in Excel as well.

It sounds like the first step is to explode the merged invoice files into a one invoice per file structure. Is there a naming scheme you need to stick to?

Also, not all PDFs are equal. Are these PDFs containing text, or scanned documents?

Merged how?

1 Like

Hey Nige_S,

The final file which would be check copy with the invoices listed in the remittance, then followed by the invoice(s) and po(s) attached - that's what I meant by merge.

Some PDF's are scanned and others have text, some are hand written and hardly legible it's kind of a mish mash of everything.

The PO's and the Check Copies are all uniform in the same format all digital copies downloaded from our software. They are all named and in one location - it's the invoices that are the major problem.

Do you have any software that can OCR non-text PDFs?

I have adobe pro, wondershare, abby reader and a few others

If you don't want to do everything by hand you need one that's AppleScriptable.

Which one would you suggest?

I don't have scripting experience with any of the apps you've listed.

The only app I have recent experience scripting is PDFpenPro (now Nitro PDF Pro).

It looks like someone at Nitro has half a brain – they've improved the AppleScript dictionary and made batch OCR easier.

I'm present inquiring as to what they want to charge me for an upgrade from PDFpenPro 11 and looking at the demo of v13.3.1.

Thank you so much Christopher - let me try this right now and ill get right back to you.

Can any of your apps batch OCR PDFs?

I'm not going to guarantee the quality of Nitro's OCR, because I haven't used the product since they bought it from Smile.

I did just run a test with the current version and wasn't fully satisfied with its ability to detect whether a PDF needed to be OCR'd.

Try running this macro on a couple or three of your non-ocr'd files:

If you get what I expect I can probably write you a macro to do the testing.

Chris - this is great! It seems to be working just fine

That macro only finds text in text capable PDFs.

Have you found any that don't contain text yet?

The pop-up window shoud be entirely blank or only have a very few garbled characters.

Let me keep trying here and I'll get right back to you - so like a hand written one correct?

Okay so yes, I've found some that it does not work on.

What does it produce?

Ideally nothing...

yes thats correct - absolutely nothing

the text box is empty

Okay, how are you going to find all the PDFs?

They're scattered all over Dropbox – yes?

I'd probably create a smart-search in the Finder – or more likely HoudahSpot, since I own a copy.