Is anyone able to explain why the PDFToText file conversion handles the word "confined" as it does (see both the PDF and converted files)? The results shown are obtained for any field where the word "confined" is entered! I don't believe the problem is a Keyboard Maestro issue, but I often use the pdftotext command for processing files and I'm hoping others do as well. I know the "raw" option is a hack not recommended, but works best for this file and application.
Keyboard Maestro 8.2 “PDF2Text Test Macro With Word "Confined"” Macro
I’m not familiar with the textutil, but on first examination it appears that it will only be useful for processing my text file after using pdftotext on the pdf and therefore will already have the problem. Remember, I’m starting with a pdf. Am I missing something?
I used PDFPenPro to enter the information into the fields, so that’s likely the source of the issue based upon your investigation. Its just odd that I’ve only encountered this issue with the word “confined”. Perhaps, I can make an inquiry to Smile.
I’ve noticed in my pdf conversions that “fi” and certain other letter combinations (which when they used to be set in type were known as “ligatures”) often convert weirdly. It would be interesting to see if you get the same result converting a pdf with a word like “confirmation” or “first” or some other word with an “fi” in it.
Again, recognizing that this is not a KM problem, thanks for your input. I tested converting a PDF using “First Last” as the employee name and “Training For Resolution Confirmation” as the trip purpose. First did not trigger an issue, but Confirmation did. Seeing that I then tried using “Training For Conflicted Confirmation” for the purpose which yielded problems with both Conflicted and Confirmation! I’ve submitted an email to Smile for PDFPenPro for technical support with hopes they can provide some understanding.
Yes, “fl” would be another ligature. “First” probably worked because the upper-cased F followed by an “i” isn’t a ligature. Based on your testing, I would be surprised if lower-cased “first” didn’t generate the issue as well.
pdfs use ligatures for several combinations ‘ff’ , ‘ffi’ and in some fonts fl.
You may be able to set your word processor to recognise and render ligatures. Alternatively you should be able to set PDFPenPro to not use ligatures
As Chris noted, while the -enc[ode] options handles the ligatures, it doesn’t provide a comparable and usable text output as the -raw option. The output looks very much like what one gets with the -layout option. Smile technical support has responded with acknowledgment of the ligature issue and maybe as a solution they’ll provide a mechanism to set PDFPenPro not to use ligatures.
PDFPenPro technical support identified that using a different font, specifically Arial, instead of Helvetica where the issue was observed, avoids the ligature issues when using pdftotext to convert the PDF to text.