How to Reverse Characters in a Variable

Weird. I’ve run it in Script Editor now and still get the same (wrong) result:

33-pty-fs8

I think you see what I meant with “detached”: The ̛̣ went to the “u” (formerly on the “o”) and the ́̂ went before the “e” (formerly on the “e”).

Have you tried my macro from above? The script actions I have marked orange and red, are they also producing correct results on your system?

No, your very first version produces this (on my system!):

ÅÃÇÃeëƒ u£ÃõÃoõÃuR

I do see what you mean, and that's odd that it's happening for you and not for me. Here are my system details: System info: AppleScript version: 2.7 System version: 10.13.6, although I don't think that's especially pertinent to what's going on. As some diacritics are character modifiers that operate upon a preceding character, it almost seems like the reversal has caused the diacritics to have been removed for the text transformation, then reapplied afterwards when the base characters were in reverse order. Whether that would be the AppleScript engine processing the characters wrongly, or the textviews of Script Editor drawing them wrongly, I couldn't guess. It could even be the fonts being used in Script Editor, some of which could lack full unicode support. You could try Script Debugger, though I suspect the result will be the same as it uses the same AppleScript engine, the same compiler, the same fonts, ...

Yes, there we match. And that result is perfectly expected from what I know about Keyboard Maestro and the pros and cons of the various different methods of reading its variable data.

I have not as yet, but I will do so and report back my findings.

But I did try it in a shell, which can be fussy with non-Roman characters, but it seems happy with them in this instance:

In the shell I’m also getting the same as with Script Editor or KM:

26-pty-fs8

(I appended a space to the string for better visibility.)

I’m using DejaVu in Script Editor (certainly one of the most Unicode-capable fonts) and SF Mono in the Terminal.

If it’s a font problem you should see it already with the input string, for example:

57-pty-fs8

Yes, Script Debugger gives the same.


I’m on macOS 10.14.1, AppleScript version 2.7.

This is the shell env:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
CLICOLOR=1
TMPDIR=/var/folders/wn/28w_v3513m50gcc9qtvg3bfh0000gn/T/
PERL5LIB=/Users/tom/perl5/lib/perl5
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.h6zvb2xn2V/Render
TERM_PROGRAM_VERSION=421.1
PERL_MB_OPT=--install_base "/Users/tom/perl5"
TERM_SESSION_ID=0645EEDB-5FE2-470F-BE4C-4E601362C25E
LC_ALL=en_US.UTF-8
USER=tom
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.yB3a2jkO6w/Listeners
PATH=/Users/tom/perl5/bin:/Users/tom/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/Users/tom/ConTeXt/Beta/tex/texmf-osx-64/bin:/Applications/VMware Fusion.app/Contents/Public:/opt/X11/bin
PWD=/Users/tom
EDITOR=/usr/local/bin/bbedit
LANG=en_US.UTF-8
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/tom
PERL_LOCAL_LIB_ROOT=/Users/tom/perl5
LOGNAME=tom
VISUAL=/usr/local/bin/bbedit
LC_CTYPE=UTF-8
DISPLAY=/private/tmp/com.apple.launchd.zmpTruViEy/org.macosforge.xquartz:0
PERL_MM_OPT=INSTALL_BASE=/Users/tom/perl5
_=/usr/bin/env

One thing that changes the behavior in the Terminal is the “wide” setting for East Asian in the preferences. But selecting it makes it worse.

45-pty-fs8

I think I've got to the bottom of why it's happening. I'll update you a bit later when I have some extra time. But, the long and short of it is that the id property isn't well suited for string reversals, and it seems my earlier hunch about the way modifying diacritical characters are added onto a base character was a pretty good guess.

Yes, that’s what I thought initially. (Comparable to Perl’s reverse which is usable here only if the string is splitted by grapheme clusters (\X, script “1)” from here).

But then you showed me that this is not the case on your computer

Hey Guys,

I wasn't having any problems when reversing this string on my macOS 10.12.6 system:

“Rượu đế”

But when I started testing @CJK's ID method with the longer “Rượu đế | résumé | Käse | 123” test case I ended up with decomposition:

“321 | esäK | émusér | ́̂eđ ựơuR”

-Chris

Chris, in case you have downloaded my macro with the script collection, can you confirm my results on your system, or do you get different results for some of the other scripts too? (With the long and the short test case.)

I would really like to know if those different results are limited to the AppleScripts, or if it is something more general…

Hey @Tom,

As far as I can see I get the same results as you when running through your test macro.

Although I wasn't able to test your Perl script that uses the GCString module. It's not on my system, and I'm not seeing it when searching MacPorts – so I'm not wanting to mess with tying to install it at the moment.

-Chris

That's interesting. What gave you the inkling/feeling that the id property wouldn't work as well ?

I investigated this disparity a bit more and found a partial explanation. The characters @ccstone and I were using in the tests that succeeded on our systems were different to the characters you were using that failed on your system. Here's your test string, taken from the testing macro you shared:

① Rượu đế

And here's the test string that I copied-and-pasted from your post that originally cited these erroneous results:

You can't really blame me for thinking that they are identical. However, when I saw that other methods in the collection were throwing out errors of the same nature, while I was still getting a positive result in Script Editor, it occurred to me that the issue stemmed from the nature of the input. So, I ran this command:

use framework "Foundation"

① set a to current application's NSString's stringWithString:"Rượu đế"
② set b to current application's NSString's stringWithString:"Rượu đế"

a's isEqualToString:b --> false

(and, in fact, thanks to the menlo font, you can see demonstrable visual difference between the two strings, noticeably in the topmost diacritics above the e:

  • Rượu đế | Rượu đế • seemingly the same
  • Rượu đế | Rượu đếnoticeably variant as soon as the code block encloses them

Here's the output of an AppleScript acting on these two variants:

② id of "Rượu đế" --> {82, 432, 7907, 117, 32, 273, 7871}
① id of "Rượu đế" --> {82, 117, 795, 111, 795, 803, 117, 32, 273, 101, 770, 769}

As you can see, ① is comprised of 12 separate unicode characters, 5 of which are modifying characters (combining diacritics) that result in a 7-character long string; ② has precisely 7 individual characters, all of which are precomposed entities.

54 37

@Tom, I'm guessing you originally composed or obtained that Vietnamese string from a source that made use of combining diacriticals, and this became the test case for your macro. But, when it was transcribed to Keyboard Maestro from where I copied the text, the multi-character entities were substituted for their precomposed counterparts, which I think is a process called normalisation. Thus, when I came to test my handler, it appeared to pass with flying colours.

The nature of the undesirable outcomes of my handler and a couple of the others, are a result of exploding the word not just into characters, but decomposing the characters into into their individual entities. If one simply recombines them, there's nothing to see that's out of the ordinary:

① character id (id of "Rượu đế") --> "Rượu đế"

But when one applies a reverse transform:

① character id (reverse of id of "Rượu đế") --> "́̂eđ ựơuR"

each of the diacritical marks has been displaced to the left, which now makes sense: the 7 characters were decomposed fully into 12 entities, so when recombined into a string, ended up modifying the character that was previously to the other side of it, resulting in new character compositions. (When the reverse transform is applied again—being a symmetrical transformation—the original string is restored.) Obviously, when operating on the precomposed version of the string, this issue never arises.

As for why the id property would fully decompose a string whereas characters of... would not is peculiar. But it's very good to now be aware that this happens.

1 Like

Rather simple: The Perl scripts I experimented with have taught me that this string seems to be reversible only when treating it as a sequence of grapheme clusters, not as a sequence of “characters” or code points. (See the Perl scripts using the regex with the \X class and the Unicode::GCString module, “1)” and “2)” in the post.)

Since also the JavaScripts on the thread showed that “issue”, I was not really surprised to see the “character id” AppleScript doing the same. (The surprise —for me— rather was, that @ccstone’s “reverse of characters” AppleScript worked correctly.)

Thanks for investing the time and finding out that we worked with different source strings. I didn’t consider that the forum website might change the composition of the characters. Definitly my fault, especially since the original post with that string explicitly mentioned that it is necessary to use <pre> tags instead of <code> tags in order to not manipulate the characters. But somehow I had overlooked that warning. Sorry.

PS:

I have added a warning and a copy-safe version of the string to my post above.

1 Like

Thanks for taking the time for testing, Chris. Fortunatly @CJK could solve the mystery now.

PS:

Perl modules are usually installed from the CPAN repository via the cpan client (comes with any Perl installation), or the cpanm client (separate install, but IMO the better way). So, MacPorts shouldn’t be involved here. (Except for installing the cpanm client, if you wish so.)

Thanks for this. And now I know what a grapheme cluster is. This one, seemingly-innocuous task request by a user has ended up teaching me a lot of good stuff about text.

Yes, me too :sweat_smile: (:nauseated_face:)

My learning odyssey started when I had the idea to post a nice and elegant Perl one-liner with reverse, but soon noticed that Perl’s reverse out of the box wasn’t even capable to reverse “Käse” correctly.

Then I found out that “Käse” reverses correctly if Perl just gets the input in an adequate form (via binmode STDIN, ':utf8';), the same problem as with your initial AppleScript.

For a short moment I was content, but then came across the Perlmonks thread with that nasty lovely Vietnamese test string, where UTF8 stdin wasn’t enough, but requires to be split by \X before reversing it. (Where \X matches grapheme clusters, instead of characters like the ordinary dot does.)

Voilà, grapheme cluster. Now we know you :face_with_monocle:

2 Likes

Hey @Tom,

Not so when you use MacPorts to install Perl. They recommend you install modules via MacPorts as well.

-Chris

Lots of learning in the solutions proposed above :+1:
I tried this Python 3.6.5 one-liner in an "Execute Shell Script" KM action:

/usr/local/bin/python -c "n='$KMVAR_TxtInput';d=list(n);d.reverse();print(''.join(d))"

It works fine with non-Unicode inputs, but crashes with the following error with Unicode input:

File "", line 1, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u0308' in position 8: ordinal not in range(128)
Macro “Reverse a text string” cancelled (while executing Execute Shell Script).

However, the exact same script works perfectly in Terminal with either ASCII or Unicode inputs.
How can I fix this?

Tried your script in KM and with the test case from above, with Python 3.7.1 and with 2.7.15.

The output is wrong in both cases, but it does not crash here.

Python 3:

45-pty-fs8

54-pty-fs8

[test] Reverse with Python.kmmacros (2.8 KB)

@Tom Thanks for testing it on your system

I'm giving up trying Python with Unicode characters in KM, because the simple act of printing them seems to hit an encoding barrier in the "Execute a Shell Script" KM action

#!/usr/local/bin/python
output = "Rượu đế | résumé | Käse | 123"
print(output)

Results in:
Traceback (most recent call last):
File "/var/folders/9c/7mk3_3vn6h1gy0pr78nj28kc0000gn/T/Keyboard-Maestro-Script-4C47A7F1-7471-4BD0-BE08-D1A6EA221917", line 3, in <module>
print ("R\u01b0\u1ee3u \u0111\u1ebf | r\xe9sum\xe9 | K\xe4se | 123")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
Note: same error whether output sent to KM variable or to system clipboard or to a window display or set to ignore results. Also same error if code executed from a script file

However, bypassing KM output and sending the output directly to the system clipboard works fine:

#!/usr/local/bin/python
output = "Rượu đế | résumé | Käse | 123"
import subprocess
process = subprocess.Popen('pbcopy', env={'LANG': 'en_US.UTF-8'}, stdin=subprocess.PIPE)
process.communicate(output.encode('utf-8'))

BTW, my interest in Python is because I can hack it better than Perl or JS. AppleScript interest has faded after struggling with inconsistent syntax and implementations over the decades. Maybe I should learn JXA

This doesn’t work for me either.

However, this — note: python3— does work:

51-pty-fs8
40-pty-fs8

I’m not into Python at all, but can it be that it is a python install, path, symlink or environment issue?

Are you sure your python link in /usr/local/bin is pointing to a python3, and not to a python2?
(There seems to be a significant difference in Unicode handling between python2 and python3.)


I’m seeing a plethora of “pythons” in my /usr/local/bin: python, python2, python2.7, python3, python3.7; seems quite a mess. But I have no clue of the good python install practices, I use it just to run scripts, when needed.

Yes, I'm sure. This code verifies that:

#!/usr/local/bin/python
import sys  
print (sys.version)

Result:
3.6.5 |Anaconda custom (64-bit)| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

:rofl: Yes, always messy. Probably because a new install is better than messing with the dependencies, so all versions are left in there.

The same scripts run fine in the default bash shell, so I'm guessing the KB Action might be getting into the mix.

So, what is your result with /usr/local/bin/python3 (instead of python) from within KM? Does this work?