Filtering large amounts of text in the clipboard

This macro causes the KM engine to crash when the word/line count gets large; around 51,000 lines 227,000 words.

Is there a better way to do this? I surmise it has something to do with the fact I'm using the system clipboard.

Keyboard Maestro 8.2.4 “Word counter” Macro

51%20AM

Word counter.kmmacros (2.6 KB)

I have not tested this, but assuming your data is in files, you might be better off of using a Execute a Shell Script action.
Here's a link to the bash commands you could use:
How to Count the Number of lines, Words, and, Characters in a Text File from Terminal?

Thank you for the help. I tried implementing the shell command but still hit an error when dealing with large amounts of text:

Task failed with status -1 Macro “Word counter” cancelled (while executing Execute Shell Script).

Is there just a limitation as to how much data can be fed through KM?

Word counter.kmmacros (12.1 KB)

Try the shell commands in Terminal to find out.

My guess is when KM tries to append the contents of the clipboard into a text file, in my case 12MB worth of text, to a file, there's some kind of timeout error happening. But that's my laymen's diagnosis.

Good call. The commands worked in Terminal so it appears the limitation lies in KM. I'm not sure where the limitation lies, whether it's in how I designed the macro or if KM just can't process that size of data set.

Any idea what that error means?

Are you using the same exact shell commands in an KM Execute Shell Script as you are in Terminal, including using files, NOT the clipboard?

Please post your KM Execute Shell Script aciton.

This action is instantiated three times so I save the number of words, characters, and lines to different variables.

Execute a Shell Script.kmactions (782 Bytes)
04%20PM

Hey Paul,

You CAN'T use “With input from” and a file path IN the action itself – they are mutually exclusive.

Change your actions to look like this, and your macro should work.

image

-Chris

That worked! Thank you for your help.

The macro did work though on word counts under a certain size, so I wonder what the mechanism is allowing that.

Yes, the Word Count filter seems inexplicably slow. I looked at the code, and it's actually very trivial:

  • Replace all single apostrophes with nothing
  • Replace all sequences of word characters with “x”
  • Replace all non-word characters with nothing
  • Return the length of the string.

But that second step is inexplicably slow. I guess the endless mutation of the mutable string is resulting in the text being copied entirely for each word.

I will look in to why it is so slow and resolve it, at least for the word count case, though I'll also see if I can resolve it for the general regex search & replace, but that may be impossible as this looks to be the system API that is just really poorly implemented in this case.

Did you use the same very large files in your test using Terminal?

Did you use the same very large files in your test using Terminal?

Yes. There was no difference between the data I plugged into Terminal directly and the data I plugged into the KM macro. Whether it was a text file or from the system clipboard the result would be the same.

@peternlewis, putting the built-in KM Word Count filter aside for the moment, why would the KM Execute Shell Script fail (time-out) when the same commands work in Terminal?

Because the script never reads the stdin that Keyboard Maestro provides. So the input just sits there, and then either the script will timeout never reading the input, or you will get a broken pipe because the input cannot be sent and the stdin pipe breaks when the script terminates.

1 Like

The buffer size of the input pipe affects the behaviour.

2 Likes

This should be resolved in the next version. The action is approximately 100 times faster, on the order of 5,000,000 words per second on my Mac (my Mac is fairly fast of course).

3 Likes