I’m a writer and I have a lot of documents (text, word, markdown, pdf etc) and am often plagued with the question of which draft might be the one that might have less or more words — especially a few years down the line when I’m looking for something specific or want to get a quick comparison of similar drafts by a quick glance alone.
I realize this is not something many will request, but I was thinking — is it possible to create a keyboard maestro macro that in Finder will return the word count of a document and rename it with the word count appended?
Something like this: draft 6 (1234 words)
I’m not sure if this is even possible but I thought I’d run the idea past you all who are far more knowledgeable than me.
The word count is wildly inaccurate (it just counts strings of characters that are not white space as words) but it is consistent. So it will be precisely as inaccurate with draft 7 as draft 6. I think that meets your specs.
This version does rename the file so test it on copies of real work. Only. Compare with the word count you get in your main application. I only tested it on text files (readme.txt) not on, say, Word files (where "wildly inaccurate" would be an understatement).
But if you are trying to do this with Word files (or any proprietary document format), you can adjust the word count after it's totaled:
$words = $words/5;
Which cuts the count down to a fifth of what it was. Just keep adjusting until the approximation is closer to the actual count and it should work for other files in the set.
As @ccstone said, this isn't easy because accurately counting words for different file types isn't trivial. We do isolate the file extension here so we could apply a different correction based on that extension but I'm guessing you don't jump around from Word to Scrivener to InDesign to BBEdit.
Alternately, you could just report the file size in bytes, which would be accurate but not a word count. That would show you what you're looking for, too. Replace the three while lines with:
Just a thought: since Ms Word can open all mentioned file formats, why not use it for the counting? For Markup format you’ll need to delete the Markup strings via Find and Replace. Word can be controlled via VBA or AppleScript.
See here for a workaround regarding the counting of punctuation marks:
@mrpasini's stab at the task might be sufficient for @shanden's needs.
I was thinking more about NLP (Natural Language Processing) with Python, because the NLTK is free and very mature. I haven't even looked at Python in ages though.
As for Microsoft Word files – if they're in .doc or .docx format then the macOS has a built-in tool for converting them to other formats.
And the shell has the wc exe for getting a word count. It's not sophisticated, but it's a mature, robust, and fast Unix tool.
In fact, though, assuming the most words is the latest version may work for @shaden but it would seem to be wiser to look for and report the date of the last time the file was modified (which can of worms I'll leave closed).
I also find it significant that he moves around between Markdown, Word, text and PDF (and maybe not always for one piece). I would assume the comparison would be between similarly formatted files (two Markdown files) rather than between different formats (you don't write in PDF, by definition it would be the latest, I would think).
I'd suggest just comparing file times in the Finder itself for that matter when the question arises, rather than fiddling with file names.
Thank you all for your input and apologies for the delay in responding -- work deadlines and family drama unfortunately got in the way and I honestly forgot all about this thread.
I will take a closer look at all the suggestions offered.