Cyrillic text handling in KM variables

UncleWalrus · November 2, 2020, 3:38pm

I do linguistics and work extensively with text in non-Latin scripts, chiefly Cyrillic. I'm hoping that other users could shed light on what KM is doing internally in terms character set encoding because it doesn't seem to handle Cyrillic well.

I have a sqlite3 database with Russian terms, encoded in UTF-8. My KM action takes a word off the macOS clipboard, puts it into a KM variable and launches a Perl script to query the db and return a numeric parameter from it. Although the action correctly generates the query based on a KM variable, it appears to be in an encoding that the sqlite3 engine doesn't recognize.

For example, this is a proper query:

SELECT rank FROM corpus WHERE word LIKE 'удовлетворительный'

but it returns no rows because KM is doing something to the encoding of the Russian term that I can't seem to account for. I suspect this an encoding issue in the KM variable because simply re-typing the Russian term inside the single quotes and executing it directly in sqlite3 works as expected. In any case, here's the script:

#!/usr/bin/perl

use DBD::SQLite;
use feature 'unicode_strings';

my $db_path = "/Users/alan/Documents/dev/RussianNationalCorpus";
my $dbh = DBI->connect("dbi:SQLite:dbname=$db_path","","");
my $query_word = $ENV{KMVAR_ru_word}
my $query = "SELECT rank,word,lemma FROM corpus WHERE word = '$query_word'";
$sth = $dbh->prepare($query);
my $rv = $sth->execute();

my $rank;
while(my @row = $sth->fetchrow_array()) {
	$rank = $row[0];
}
print $rank; print "\n";
$dbh->disconnect();

Playing around with encoding the variable into UTF-8 inside the script did not make any difference. Thinking it's something strange about how Perl is handling the encoding, I rewrote the script in Python. Same thing. Some queries work, others do not. Any query with the letter "й" never works. I've solved the problem by just piping the macOS pasteboard to another tool, bypassing (I think) KM's variable and clipboard handling. This always works:

#!/usr/local/bin/zsh

pbpaste | xargs /Users/alan/Desktop/GetRNCRank

So, in summary, it seems that KM is doing something to the encoding of non-Latin characters behind the scenes. Anyone else encounter this? Workarounds?

peternlewis · November 2, 2020, 3:41pm

See:

https://wiki.keyboardmaestro.com/action/Execute_a_Shell_Script#UTF-8_and_Non-ASCII_Characters

If you are dealing with non-ASCII characters, you probably want to set the LC_ALL environment variable to UTF8, which you can do by setting the Keyboard Maestro variable ENV_LC_ALL to “en_US.UTF-8”.

By default (v9.0+), if you have not set these environment variables they will be set to UTF-8 for you.

Cyrillic text handling in KM variables

Options