Check the Spelling of Text on the Clipboard

Fine!

If you don’t want to change the LANG setting of KM globally, you can also prepend it directly in the shell script action. The command would then look like this:

export LANG=en_US.UTF-8
aspell list | sort
1 Like

Well, since I'm being high maintenance at this point - do you know of any way to eliminate capitalized words from result? The last thing I need to do is to get rid of proper names so that they're not included.

Sure? This would also eliminate all words at the beginning of a sentence, no?

The proper way would probably be to give Aspell a list with exclusions (blacklist). I don’t know Aspell but I’m pretty sure this is possible, and, it should also be possible to find a pre-made list with common proper names.

Or: try to find a different dict list that contains more proper names than the default list. (I don’t know what Aspell is using as default list.)

Something like this.

Okay - this looks good, I'll tinker around and see what I can come up with. If I can get it to work, I'll post my macro here for future reference! :slight_smile:

In another forum, someone suggested:

    LC_ALL=C grep -v '[A-Z]'

which seems to work well!

Hmm, doesn’t work for me.

With this sample text…

mispeledword parent question skepticism

Note: The spaces are all no-break spaces (U+00A0)!

… I get the following results with the respective setup:


aspell list | sort

mispeledwordÂ
parentÂ
questionÂ

-> Wrong


LC_ALL=C grep -v '[A-Z]'
aspell list | sort

mispeledword parent question skepticism

-> Wrong


LC_ALL=C
aspell list | sort

mispeledwordÂ
parentÂ
questionÂ

-> Wrong


export LC_ALL=C
aspell list | sort

mispeledwordÂ
parentÂ
questionÂ

-> Wrong


export LC_ALL=en_US.UTF-8
aspell list | sort

mispeledword

-> Correct


export LANG=en_US.UTF-8
aspell list | sort

mispeledword

-> Correct


export LANG=fr_FR.UTF-8
aspell list | sort

mispeledword
skepticism

-> Correct



Here is a little test macro:

[Test] Aspell with non-ASCII chars and different language settings.kmmacros (7.5 KB)

When testing make sure you do not have any ENV_LANG or ENV_LC_ALL variables stored in Keyboard Maestro!

For testing, best first run the action that sets the variable to the sample text (right-click on the action > Try), then right-click and Try with each of the test actions.

Interesting - I ran it as two separate actions - the first one:

aspell list | sort

Then:

LC_ALL=C grep -v '[A-Z]'

It got rid of all of the capitalized words from the first action, but it didn't detect misspelled words in the body of the text, only if they were on their own line... I'm going to try this with the ones you've provided that yield correct results and see if it will grab the misspelled words from the body of the text...

Okay, so here's what worked for me:

Spell check on Named Clipboard without Capitalized Words.kmmacros (3.3 KB)

I basically used one clipboard to catch all of the misspelled words, then filtered just those words with a second "final" clipboard to eliminate the capitalized words.

As said, by doing that you will ignore all misspelled words at the beginning of sentences. Unless I’m misunderstanding something.

Try this:

41-pty-fs8

Spell check on Named Clipboard without Capitalized Words [mod].kmmacros (2.9 KB)

For grepping the capitalized words you don’t need to set the LC_ALL — unless I’m misunderstanding your purposes. Just grep -v '[A-Z]' does the same.

Besides that, I doubt that the LC_ALL without export does work at all. In my previous test it didn’t. And, in case it does work, then you are probably overwriting your stored ENV_LANG variable, set to en_US.UTF-8; and that one seems to be necessary if you don’t want to get “â” or similar artifacts, if your source text is not plain ASCII.

Hey Scott,

Of course not – grep operates on a per-line basis. The -v switch inverts what's returned, so your upper-case items are eliminated line-by-line.

It's aspell that detects the misspelled words and then returns them as a list of one word per line when used with the list switch.

Please explain why you are removing ALL words with capital letters in them.

Also – why are you fooling around with multiple clipboards when it makes more sense to use variables?

Tom is right as far as I can see in everything he says about this.

I was going to post another macro with an available exclusion list, but especially do to the possibility of UTF8 characters this job is getting rather complex for the shell.

My inclination is to move things into Perl, but I'm too tired to fool with that tonight.

-Chris

I ended up getting rid of the LC_All and started making use of the variables, with only the original text going to a clipboard - variables are easier to work with.

For my purposes, I'm not too concerned with capitalized words that begin a sentence because generally they are not misspelled (due to repetition of the writer). The reason I wanted to get rid of capitalized words was to eliminate proper nouns from the result list. Names, especially last names, show up on the list, but aren't necessarily misspelled.

Just to take this a step further - and something that @Tom referred to was making a "whitelist" of industry specific words. For example, I work in philosophy and words like "telos" or "eudaemonia" are spelled correctly but show up as results from Aspell. If anyone has a way to, say, create a .txt file that could be referenced where I could just add words on their own line as I come across them - that would be fantastic! Or, perhaps, there's a way to add words to the dictionary?

I'm getting "ve" as a result from anytime "I've" shows up - so it would be nice to add "ve" to the safe list and just have a way to refine this a bit.

That said, this has gone above and beyond my expectations - so thanks again, everyone!!

Hi Scott,

read through Aspell’s man page. There is at least one option described, how to use custom word lists.

To open the man page type man aspell in the Terminal, followed by the Return key.

The language setting (LC_ALL or LANG) has nothing to do with whether you use variables or clipboards.

Without a proper UTF-8 setting you definitely will see those “weird” characters, as you already have experienced. Unless you are always using ASCII-only texts. But keep in mind, even such trivial and common things like no-break spaces are enough to mess up your output, if you don’t use UTF-8 as language setting.

Yo can also set language and encoding directly as an Aspell parameter (see the man page), but I suggest to save it as KM variable ENV_LANG. This way it is out of the way, and you probably might need UTF-8 also for other scripts/macros.

This aspell tool is really nice. I like this interface to correct or add words:

One can check a file with this command line:

aspell -c 1.txt -l de

If one would like to check and modify the content of the clipboard, is there a simpler way than writing the clipboard content to a file, run the command line above from KM and afterwards read the content of the modified file to the clipboard?

1 Like

Hey Hans,

No. Not as far as I can tell.

There might be a fiddly way of using redirects in Bash 4, but that's a lot more trouble than simply using a temp file.

#!/usr/bin/env bash
# Write the Clipboard to a temp-file.
# Spell-check with `aspell`.
# Write the spell-checked temp-file back to the clipboard.

tempFile=$(mktemp)
pbpaste > "$tempFile"
aspell -c "$tempFile"
cat "$tempFile" | pbcopy

In order to get aspell's UI, you have to run this script from the Terminal.app.

-Chris