Search / Copy / Paste Text From PDFs? Life Changing

Hey Kurt,

Zip it.  :wink:

-Chris

Hey Kurt,

Oh, heck. Let’s do this the easy way (for me).

Please also download and install the Satimage.osax AppleScript Extension.

http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html

It adds regular expressions to AppleScript (amongst other things) and will make this task much easier.

-Chris

Kurt Kessler - resume.pdf.zip (257.0 KB)

Thanks! I installed the Satimage software. Also, I OCR'd the pdf using PDFPenPro and figured out how to use Hazel to OCR all the files in the folder...

Hey Kurt,

Don't do that.  :wink:

The command-line tool will be more accurate – provided these are real PDFs and not images saved as PDF.

-Chris

ok. Got it.

Hey Kurt,

Okay, let’s start out simple.

Make sure you have at least one PDF in the “Sample pdfs” on the desktop, and run this script from the Script Editor.app.

set sourceFolder to alias ((path to desktop as text) & "Sample pdfs")
tell application "Finder"
   set thePdfFile to first file of sourceFolder as alias
end tell
set thePdfFile to quoted form of (POSIX path of thePdfFile)

set shCMD to "
export PATH=/opt/local/bin:/opt/local/sbin:/usr/local/bin:$PATH;
pdftotext -layout " & thePdfFile & " -
"
do shell script shCMD

You should end up with the text of the resume.

-Chris

hmmm…

I have a pdf in the folder. I get the error message

error “File alias /Users/KurtKessler/Desktop/Sample pdfs of «script» wasn’t found.” number -43

My bad…this script works fine! It returns the text.

Hey Kurt,

Okay, now run this one.

I’m using your resume as a model for this, so drop any others into a temp folder for this test.

-------------------------------------------------------------------------------------------
# dNam: Kurt Kessler → KM Forum → Working
# dCre: 2016/07/29 13:12 
# dMod: 2016/07/29 14:02
-------------------------------------------------------------------------------------------

set sourceFolder to alias ((path to desktop as text) & "Sample pdfs")
tell application "Finder"
   set thePdfFile to first file of sourceFolder as alias
end tell
set thePdfFile to quoted form of (POSIX path of thePdfFile)

set shCMD to "
export PATH=/opt/local/bin:/opt/local/sbin:/usr/local/bin:$PATH;
pdftotext -layout " & thePdfFile & " -
"
set pdfText to do shell script shCMD
set educationText to fndUsing("(?m)^(Education.*\\s.*)(?=(^\\w|\\Z))", "\\1", pdfText, false, true) of me

-------------------------------------------------------------------------------------------
--» HANDLERS
-------------------------------------------------------------------------------------------
on cng(_find, _replace, _data)
   change _find into _replace in _data with regexp without case sensitive
end cng
-------------------------------------------------------------------------------------------
on fnd(_find, _data, _all, strRslt)
   try
      find text _find in _data all occurrences _all string result strRslt with regexp without case sensitive
   on error
      return false
   end try
end fnd
-------------------------------------------------------------------------------------------
on fndUsing(_find, _capture, _data, _all, strRslt)
   try
      set findResult to find text _find in _data using _capture all occurrences _all ¬
         string result strRslt with regexp without case sensitive
   on error
      false
   end try
end fndUsing
-------------------------------------------------------------------------------------------

NOTE that this is only a demonstration. I expect the various resume formats will not be uniform and will require more clever parsing.

-Chris

I copied the script into Script Editor and it appears to error on this line…

set educationText to fndUsing("(?m)^(Education.\s.)(?=(^\w|\Z))", “`”, pdfText, false, true) of me

It doesn’t like the “’”

Expected “"” but found unknown token.

Hell. That’s a bug in the Discourse forum software.

I’ll post a script file for you in a sec.

-ccs

Hey Kurt,

Okay – download this script file – and give it a try:

Kurt Kessler → KM Forum → Working.scpt.zip (10.2 KB)

-Chris

Yep. Works fine. Returns a “false”

It shouldn't.

Are you running it on your resume?

-Chris

Yes I am.

The exact same one you posted for me?

-Chris

Yes it is.

Weird… That shouldn’t be possible.

Okay, zip it and email it to me at this address: kmf@thestoneforge.com

-Chris

Done! Thanks!