How can I fix text encoding errors with the 'Read File to Variable' action?

Hi,

I am getting errors with the action = Read File to Variable

In my case, I manually created from a blank document a simple Microsoft word file with just a few lines of text and saved as test.docx

Also, I manually created a similar document with Apple Pages app and had the same problem = test.pages

Same with test.doc file

I'm doing nothing unique as far as text encoding = just basic Apple defaults.

Then in KM, I created a simple action:

Read File to Variable

Read file ~/Desktop/test/test.docx
or
Read file ~/Desktop/test/test.doc
or
Read file ~/Desktop/test/test.pages

To variable MyContent

And I got the error for each attempt:

Read File action failed to read text file with error Error Domain=NSCocoaErrorDomain Code=264 "The file “test.docx” couldn’t be opened because the text encoding of its contents can’t be determined." UserInfo={NSFilePath=/Users/xxx/Desktop/test/test.docx} in macro “msword1” (while executing Read File to Variable “MyContent”).

BTW: The read file action has no problem with xxx.txt files - only .docx and .doc and .pages documents

My bigger goal is to be able to read hundreds of previously created Microsoft word files and do additional actions after reading the file.

--> Q: is there some way to get around this text encoding error issue?

I imagine if I had to, I could create another KM macro to loop and open and save all as plain text - but that sounds like a cumbersome solution.

Here is an associated thread:

Thanks for any help - Dave

From the KM wiki page for the Read a File action:

The Read a File action allows you to take the contents of a text or image file

That suggests to me that the file you’re trying to read isn’t actually a text file.

1 Like

Convert the Word docs to text using the built-in textutil utility:

4 Likes

Thanks so much this is very helpful. I was able to use this manually in the terminal for now

I cd'd to the directory
then
textutil -convert txt *

and txt copies were made for ALL (using the placeholder *)

  • wow this is great!
  • so simple and powerful!
2 Likes

@mrpasini,

Can you expound on this? Either I don’t follow or I’m doing it incorrectly, in Keyboard Maestro. I’m getting the encoding error, concerning reading a Word doc file, as the OP indicated.

I think I’m botching the textutil -convert txt KMVAR syntax within the Execute Shell Script. I’d like to save the Word Doc output to a system clipboard or a variable. An example would be helpful.

Thanks much!

KC

If you know the encoding of the Word file, you can specify it in the textutil text script iwth -encoding [something] otherwise macOS's native coding (UTF-8) is used for the output.

If that isn't helping, you could try iconv, but I've never found that necessary.

Regarding the output, you can direct it from the popup (ignore results in my example) in the Execute a Shell Script action to your preference. But if you are working with large files, you'll hit a snag. Large files saved as a variable consume too much of the environment. Best to write them to a file, perhaps in /tmp if you need to do further processing.

1 Like

Thank you for your response. I still don’t understand so I’ll have to read this several times. Maybe after a few months, it will make more sense. In the meantime, do you have a simple macro that is successful reading a small Microsoft word file that will convert the text to a variable, using the textutil action above? If so, can you post it. I’m sorry to bug you on this. Someone asked me and I couldn’t help them. Thank you for the solid.

KC

Here's what I've been using for years. Select a file in the Finder and run this macro. Change the trigger first (I run it from a palette so 'V' works for me but not too bright as a standalone trigger).

You'll get the Word file you selected converted to a text file.

If you have pdftotext installed, this macro also converts a PDF to text.

If you have a problem with this macro, show me the original Word file or one like it that exhibits the problem. I've never had an encoding problem with this.

Convert to Text Macro (v11.0.2)

42)Convert to Text.kmmacros (5.5 KB)

1 Like

Thank you @mrpasini!

As an addition to the textutil and pdftotext incantations shown by @mrpasini, it may also be worth experimenting, for docx -> markdown, with pandoc

Assuming, for example that you have installed pandoc through homebrew, and that it is installed (check in Terminal. app with which pandoc) at a path like: /opt/homebrew/bin/pandoc

you can write things like:

 /opt/homebrew/bin/pandoc -f docx -t markdown Sample.docx -o converted.txt

(where -o specifies the path of the output file)

2 Likes