Check for multiple occurrences of a word in the clipboard

CafeTran_Training · October 7, 2017, 10:14am

Following this idea here, I'm trying to find the fastest way to check for multiple occurrences of a word in the clipboard. In fact, you can reduce this task to finding a second occurrence of any word. This will already make a careful checking of the clipboard content (e..g. a translated sentence) necessary.

Clipboard contains one of these three example lines:

Macro gives either a warning when a word is found several times in (a sentence on) the clipboard or even displays a message/sticky note indicating the word in bold.

How about this approach?

Save the clipboard content in oldClip.
Determine the first word, via a regular expression looking for word boundaries or white spaces. (Or is there a concept of a word in Keyboard Maestro that I can use?) Assign the word to variable findWord
Calculate the length of variable findWord as lengthWord.
Replace all occurrences of findWord in the clipboard with nothing.
Calculate the length of the clipboard: if the length of the clipboard is shorter than oldClip minus lengthWord, then several occurrences were found.
Display a warning.
Determine the second word and check for multiple occurrence.
Etc.

I'm grateful for all suggestions!

peternlewis · October 7, 2017, 11:58am

Here is a possible starting point.

I'm not sure how it handles case, and it bases words just around \w+ which is pretty basic (for example isn’t is two words). But it's perhaps a way to get started.

Keyboard Maestro Actions.kmactions (5.4 KB)

gglick · October 7, 2017, 12:45pm

Here's another possible method I happened to come up with independently of @peternlewis's macro (though I'm glad to see there are some close similarities; perhaps a sign that great minds really do think alike? ):

Count Multiple Word Occurrences in Clipboard.kmmacros (6.0 KB)

(my regex is a tad more complex, perhaps unnecessarily so, but it does handle words with apostrophes like "isn't" "can't" and so on, so at least it has that going for it )

The route I decided to go with is to list all words that occur twice or more and exactly how many times they occur. When run on the example text given, it produces this:

The obvious downside with this approach is that it also shows words that must necessarily be repeated often within a given text (like "the" "it" "this" and so on) but at least it should suffice for finding multiply occurring words for now. If displaying common words like the examples just given too often becomes a problem, it can be modified fairly easily to include a blacklist of sorts so it doesn't include specified common words.

Tom · October 7, 2017, 12:52pm

Here a third approach:

Output:

[test] Find Lines with Repeating Words [v1.1].kmmacros (6.7 KB)

Edit:

Please download the macro again (-> v1.1). I fixed a bug (variable was not reset) and set the minimum word length to 3 chars. (You can adjust this in the regex.)

CafeTran_Training · October 8, 2017, 7:22am

Thank you all for your kind answers! I've decided to work with Peter's solution.

I'm trying to expand the macro with two features:

Ignore words on a list with words to ignore
Surround all duplicates on the clipboard with a string to mark them (or even better: make them bold on the clipboard) and then display the original sentence with the duplicates marked.

I've now defined one word ('to') to ignore, but I'd like to list these words in a file (each word on a line). I've tried and tried but cannot get it working. I'd really appreciate a pointer here, where and how to insert the command(s) to check for words in a file like words_to_ignore.txt.

Test line: Don't forget to check each and every segment to check for duplicates.

Result of the macro:

(Is there a way to make the word 'check' bold on the clipboard?)

Warn at multiple occurrences of words.kmmacros (10.1 KB)

CafeTran_Training · October 8, 2017, 7:46pm

Yes, there is, it's actually quite simpel, see the last actions of the macro:

Result:

Now the only question remains: how to read the words to ignore in this duplicates test from a file?

Updated macro:

Warn at multiple occurrences of words.kmmacros (11.5 KB)

Tom · October 8, 2017, 10:31pm

Just a note on the character class in your regex:

[A-zÄÖÜäöüßéèï’']

Unless you are aiming to exclude certain letters you could simply use the unicode Letter class: [\p{L}’']

This would also include some missing west european letters like É È î Î ê ç Ç ñ Ñ etc.

See Unicode categories on regular-expressions.info

JMichaelTX · October 8, 2017, 11:44pm

That's quite clever, and has many other use cases:

First, apply HTML formatting tags

and then Convert HTML to Rich Text

CafeTran_Training · October 9, 2017, 6:36pm

I can see how relevant this setting to 3 chars is! Thanks for that. (I got the 'I' recognised in words like 'insert'.)

Tom · October 9, 2017, 7:15pm

This shouldn’t have anything to to do with the min 3 chars setting. Parts of words should be excluded by the word boundary (\b).

I’m not at my Mac ATM to check it.