I'm attempting to build upon macro support provided by Chris to read the contents of a PDF file to create a travel itinerary calendar event. Once I'm able to capture all the relevant travel information, I'll then work to create the calendar even. However, I'm having difficulties with the pdftotext shell script because instead of finding the searched text using regex, it seems to read the entire file and it only does this after running PDFpenPro to OCR the initial PDF received from my administrative assistant. Chris has helped make me dangerous with a little information, but I can't seem to find answers on the Internet to work through this. Can someone explain why I need to OCR the file before using the pdftotext script and why does it put the entire file into my search result variable ("Itinerary_LookupResult")?
Hey @KM_Panther,
Something very screwy is going on there if PDFpen is getting launched…
Post the macro file in addition to the graphic, and if you can send me the PDF file to test with.
-Chris
Chris,
Sorry, I didn't mean to imply that PDFpenPro is getting launched
inadvertently! No, I have to use PDFpenPro to OCR the file in order for
the script to find the searched text. Otherwise, the search result is
always blank. So my first question was why do I have to OCR the file to get
any results and my second question is why does the entire file end up in
the search result instead of my searched text?
See the attached files.
Create Travel Itinerary Calendar Event.kmmacros (8.48 KB)
Ah, I see.
You didn't provide the “Delta.pdf” file, so I have to conjecture. (You might have to zip it for the forum to allow it to upload.)
This suggests the PDF is an image rather than editable text. Perhaps someone is scanning a schedule to PDF rather than acquiring an original text PDF from Delta?
If you open the PDF in PDFPen you can't select any of the text in the document right?
The pdftotext
command-line tool only manages PDFs with extractable text.
Send me the Delta.pdf file and the OCR'd output, and I'll see if I can figure that out.
-Chris
Chris,
Ahhh, that’s it! You’re absolutely correct about the PDF file from Delta
as I know the administrators use the copier/scanner/fax machine to email
files throughout the company!!!
Forum,
I’ve used grep - Unix, Linux Command as a reference, but do others have suggestions for better understanding how to use this command for file access within their scripts or macros?
You’ll have to be more specific about what you’re accessing and what you want to do with it.
–Chris
Chris,
Thanks for responding. The best reference I’ve obtained to date is the link I supplied. At this point, I’m simply asking if there are recommended reference materials to read or utilize (with examples) to better understand the options that can be used in the grep command?
I see. Perhaps I misread your original query.
That links to a nice table for the grep
man page but is likely a different version of it than you have on your system – so some of the options might not work.
Type man grep
in the Terminal.app for a local reference to your version.
Here are a couple of links:
This is for GNU grep. You'll have to install MacPorts or HomeBrew to install it, but there's good information in the PDF.
Several good examples.
I haven't had my hands on this one, but I've had pretty good luck with O'Reilly books.
Now, are you really using the grep
command-line tool? Or are you wanting information on more general regular expression usage?
-Chris