Rename A Document File in the Finder With Word Count Appended

Hey Guys,

I’m a writer and I have a lot of documents (text, word, markdown, pdf etc) and am often plagued with the question of which draft might be the one that might have less or more words — especially a few years down the line when I’m looking for something specific or want to get a quick comparison of similar drafts by a quick glance alone.

I realize this is not something many will request, but I was thinking — is it possible to create a keyboard maestro macro that in Finder will return the word count of a document and rename it with the word count appended?

Something like this: draft 6 (1234 words)

I’m not sure if this is even possible but I thought I’d run the idea past you all who are far more knowledgeable than me.

Appreciate your thoughts and help in advance :blush::metal:

Hey Shande,

Possible but not easy – particularly since you're dealing with multiple file types that are not all text based.

Word is a bit relative, so what you think is a word and what the parser thinks is a word are not always the same thing.

Nevertheless you should be able to do this in a consistent fashion once you have the right tools.

It's probably better to wait until Keyboard Maestro 10 comes out to work on this seriously. ( Don't ask when. :)

Tools will include:

  • The shell.
    • Direct word count for text files (if other methods are not available).
    • Xpdf to TEXT tools.
    • Probably PanDoc.
    • Perhaps Python 3.

It'll be a pretty substantial task.

-Chris

I'll (foolishly) take a stab at this.

The word count is wildly inaccurate (it just counts strings of characters that are not white space as words) but it is consistent. So it will be precisely as inaccurate with draft 7 as draft 6. I think that meets your specs.

This version does rename the file so test it on copies of real work. Only. Compare with the word count you get in your main application. I only tested it on text files (readme.txt) not on, say, Word files (where "wildly inaccurate" would be an understatement).

But if you are trying to do this with Word files (or any proprietary document format), you can adjust the word count after it's totaled:

$words = $words/5;

Which cuts the count down to a fifth of what it was. Just keep adjusting until the approximation is closer to the actual count and it should work for other files in the set.

As @ccstone said, this isn't easy because accurately counting words for different file types isn't trivial. We do isolate the file extension here so we could apply a different correction based on that extension but I'm guessing you don't jump around from Word to Scrivener to InDesign to BBEdit.

Alternately, you could just report the file size in bytes, which would be accurate but not a word count. That would show you what you're looking for, too. Replace the three while lines with:

my $words = -s $old_name;

Bigger number of course. And not "words."

Rename with Approximate Word Count Macro (v9.2)

Rename with Approximate Word Count.kmmacros (3.6 KB)

2 Likes

Just a thought: since Ms Word can open all mentioned file formats, why not use it for the counting? For Markup format you’ll need to delete the Markup strings via Find and Replace. Word can be controlled via VBA or AppleScript.

See here for a workaround regarding the counting of punctuation marks:

http://books.gigatux.nl/mirror/applescript/0596008503/applescripttmm-chp-4-sect-7.html

Edit: I just realised that you didn't mention that you have access to Ms Word, so probably my advice is of no use to you.

@mrpasini's stab at the task might be sufficient for @shanden's needs.

I was thinking more about NLP (Natural Language Processing) with Python, because the NLTK is free and very mature. I haven't even looked at Python in ages though.

As for Microsoft Word files – if they're in .doc or .docx format then the macOS has a built-in tool for converting them to other formats.

And the shell has the wc exe for getting a word count. It's not sophisticated, but it's a mature, robust, and fast Unix tool.


textutil


textutil <path/to/file> -stdout -convert txt -encoding UTF-8 | wc -w


Data Collected with Word 16's Word Count Tool:


test_01.doc					442 (shell wc -w)

Pages						1
Words						440
Characters (no spaces)		2,260
Characters (with spaces)	2,720
Paragraphs					21
Lines						47

Code Signing Guide.docx		9,826 (shell wc -w)

Pages						43
Words						9,316
Characters (no spaces)		48,444
Characters (with spaces)	56,809
Paragraphs					931
Lines						1,356

The test_01.doc document is a simple document with styled text and an embedded URL in it.

The Code Signing Guide.docx document is very complex with tables and images.


I supposed I should have said to do this job right would be pretty complicated.

Doing it well enough for @shanden might not be so tough.

-Chris

In fact, though, assuming the most words is the latest version may work for @shaden but it would seem to be wiser to look for and report the date of the last time the file was modified (which can of worms I'll leave closed).

I also find it significant that he moves around between Markdown, Word, text and PDF (and maybe not always for one piece). I would assume the comparison would be between similarly formatted files (two Markdown files) rather than between different formats (you don't write in PDF, by definition it would be the latest, I would think).

I'd suggest just comparing file times in the Finder itself for that matter when the question arises, rather than fiddling with file names.

Thank you all for your input and apologies for the delay in responding -- work deadlines and family drama unfortunately got in the way and I honestly forgot all about this thread.

I will take a closer look at all the suggestions offered.

Many thanks! :slight_smile: