KM shell and diacritic variable names / values

Works quite well but how to "help" echo to display what expected?

KM shell and diacritic variable names : values.kmmacros (27.4 KB)

Thanks,
--Alain

Hey Alain,

Please post the actual macro, so people don’t have to recreate it to test.

-Chris

You are right: Done :wink:

I suspect this is really a bash issue (or an OS X issue) with little or nothing to do with Keyboard Maestro.

Keyboard Maestro sets the environment variables via a a dictionary of high level full unicode strings - what happens after that is largely outside Keyboard Maestro’s control.

But for example:

% KMVAR_mg__test_appele=hello
% KMVAR_mg__test_appelé=hello
sh: KMVAR_mg__test_appelé=hello: command not found

I’ve searched around but cannot find much definitive on the behaviour of non-ascii environment variable names I’m afraid.

Thanks Peter for your testing.

I am not an shell expert :wink: but reading bash man:

  • From the Definitions section in the manual page of bash:

name A word consisting only of alphanumeric characters and underscores, and beginning with an alphabetic character or an underscore. Also referred to as an identifier.

  • From the Parameters section in the manual page of bash:

A parameter is an entity that stores values. It can be a name, a number, or one of the special characters listed below under Special Parameters. A variable is a parameter denoted by a name.

and discouraged the tcsh man:

Native Language System support (+)
The shell is eight bit clean (if so compiled; see the version shell variable) and thus supports character sets needing this capability. NLS support differs depending on whether or not the shell was compiled to use the system’s NLS (again, see version). In either case, 7-bit ASCII is the default character code (e.g., the classification of which characters are printable) and sorting, and changing the LANG or LC_CTYPE environment variables causes a check for possible changes in these respects.

When using the system’s NLS, the setlocale(3) function is called to determine appropriate character code/classification and sorting (e.g., a ’en_CA.UTF-8’ would yield “UTF-8” as a character code). This function typically examines the LANG and LC_CTYPE environment variables; refer to the system documentation for further details. When not using the system’s NLS, the shell simulates it by assuming that the ISO 8859-1 character set is used whenever either of the LANG and LC_CTYPE variables are set, regardless of their values. Sorting is not affected for the simulated NLS.

Is their some hope shelling under tcsh?

–Alain

My first thought would be to ensure that your UTF8 language preferences are exported to the Keyboard Maestro instance of the shell:

#!/bin/bash
LANGSTATE="$(defaults read -g AppleLocale).UTF-8"
if [[ "$LC_CTYPE" != *"UTF-8"* ]]; then export LC_ALL="$LANGSTATE" ; fi

but perhaps that’s not enough.

:arrow_down:

Not enough indeed :wink: but thank you for your help.
--Alain

tcsh gets closer but still seems to strip the diacritics from the variables when accessing them. For example:

#!bin/tcsh

setenv LANG en_US.UTF-8
setenv LC_CTYPE en_US.UTF-8
setenv LC_ALL en_US.UTF-8
setenv LANGSTATE en_US.UTF-8

env | grep KMVAR_mg__test_appelé
echo $KMVAR_mg__test_appelé

displays this:

KMVAR_mg__test_appelé=BlÉÈŒÀâöabla NOP 
KMVAR_mg__test_appele: Undefined variable.

Note how the accent is removed from the variable name in the error.

It’s probably possible somehow with tcsh, but I can’t see how at the moment. Might require reading through the source code.

I suppose you could always do an end-run around the environment variables. This works:

#!bin/tcsh

set v=`osascript -e 'tell app "Keyboard Maestro Engine" to get value of variable "mg__test appelé"'`
echo $v
1 Like

Here the very same code gives me:
KMVAR_mg__test_appelÃ: Undefined variable.

And:

env | egrep KMVAR_mg__test_appel echo $KMVAR_mg__test_appelé

leads to

KMVAR_mg__test_appelé=BlÉÈŒÀâöabla NOP KMVAR_mg__test_appelÃ: Undefined variable.

This very basic too (hopefully) :wink:

--Alain

Yep. So as far as I can see, Keyboard Maestro puts them into the environment in the only possible way, and then the various shells simply do not support them from there. Your (@alain) results look like your shell is set to ISO-8859 (or some other 8 bit character set) instead of UTF-8 which is why you are seeing the bogus results from env | grep.

What I have to do to test switching to UTF-8?

The setenv stuff should do something:

setenv LANG en_US.UTF-8
setenv LC_CTYPE en_US.UTF-8
setenv LC_ALL en_US.UTF-8
setenv LANGSTATE en_US.UTF-8

I don’t really know which ones are required, what they do, or whether there are others that are also important, or some other setting.

@JMichaelTX please see this post

I read it. Looks like an unresolved issue.

If you have text you'd like to add to the Wiki concerning this, please let us know. Otherwise, it is out of my skill/KB.

That issue has absolutely nothing to do with AppleScript -- it's a Shell Script problem.

-Chris

@JMichaelTX, @ccstone Please don't mind it's just a misunderstanding (maybe I wasn't explicit enough): I was just pointing out @JMichaelTX that I had already encountered difficulties regarding KM variables with diacritical characters, due to the shell interpretation through the natural interface in reaction at

Hoping to have been more clear :wink:

Cheers,
-Alain

1 Like

There's monkey business when involving the shell with Unicode characters.

Accented Varaible Names -- 2-macros -- Test Group Macros.kmmacros (9.8 KB)

Note that the macro with the variable name ending in an accented character passes that character to the output...

-Chris

1 Like

@ccstone Thank you Christopher for these careful and clear tests.

1- For the few KM :wink: users who want to follow the discussion without doing these tests themselves here are their family portraits:


variable-name-without-accented-character--result


variable-name-with-accented-character--result

2- AppleScript appendix question: what is the instruction for?

set AppleScript's text item delimiters to "" (characters of l) as text

since deleting it (apparently) produces the same result?

-alain

1 Like

Folks who are new to shell scripting and who might be looking here for help/advice/guidance should note that the above syntax is for tcsh.

It is more common to use bash or now zsh since it is now the default shell.

In those, you assign variables differently:

LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LANG=en_US.UTF-8
LANGSTATE=en_US.UTF-8

There are also cases where you might need to use:

LC_ALL=C

if you have a tool that is being fed UTF-8 but can't handle UTF-8, that might help.

2 Likes