How to Reverse Characters in a Variable

Hey @CJK,

Characters, words, and paragraphs are completely separate from AppleScript's text item delimiters.

-Chris

Yes, I know. But this wouldn't be:

Hey @CJK,

Good point.

Even so – esoterica doesn't really have an advantage over meat and potatoes in this case.

I haven't seen an AppleScriptObjC method for doing this. Have you?

-Chris

----------------------------------------------------------------

set dataStr to "Now is the time for all good men to come to the aid of their country."
set newStr to reverseStr(dataStr)

----------------------------------------------------------------
--» HANDLERS
----------------------------------------------------------------
on reverseStr(dataStr)
   set {oldTIDS, AppleScript's text item delimiters} to {AppleScript's text item delimiters, ""}
   set revStr to (reverse of (characters of dataStr)) as text
   set AppleScript's text item delimiters to oldTIDS
   return revStr
end reverseStr
----------------------------------------------------------------
1 Like

Totally agree, and I endorse your solution over mine, both for simplicity and, as @Tom stated, for the fact that you were wise enough to use getvariable, which handles diacritics with ease. I wasn't trying to be a smart ass with my 5-course dinner of an AppleScript. It just coincided with the moment I happened to be writing a handler that did that particular text transformation. I actually believe simplicity is generally better for several reasons, unless there's a specific reason to employ more obscure methods. I remember deleting a post of mine off this forum because I went back to read the AppleScript I had supplied and, although ti worked, I was disgusted with myself for how obnoxiously opaque the whole thing was for no good reason.

How about:

set dataStr to "Now is the time for all good men to come to the aid of their country."
set newStr to reverseStr(dataStr)

on reverseStr(dataStr)
    character id (reverse of dataStr's id)
end reverseStr

Not a builtin method, but I've no doubt you already thought of this:

use framework "Foundation"

set str to "Now is the time for all good men to come to the aid of their country."
set NSstr to current application's NSArray's arrayWithArray:(characters of str)
NSstr's reverseObjectEnumerator()'s allObjects() as list as text
1 Like

Hey @CJK,

Hmm... That method is fully unicode compliment, and it's actually a bit faster than decomposing the characters and using AppleScript's text item delimiters.

Okay, you've got a winner.  :smile:

I knew Shane's BridgePlus library had such a method, so I was afraid that Objective C lacked one.

(Bad Apple!)

-Chris

1 Like

On my machine, it isn’t.

It works correctly with the “simple” kind of strings like résumé, but fails with the Rượu đế test string from above. (The diacritics ́̂ and ̛̣ get detached from the letters they belong to.)

(The AppleScriptObjC method works correctly with all.)



For ease of testing, here a macro with all posted solutions so far:

[test] Ways to Reverse String.kmmacros (18.5 KB)

  1. Open the macro in KM Editor.
  2. Use “Try Action” from the contextual menu on the first action to set the variable.
  3. Open KM Editor > Preferences > Variables and select the “INPUTVALREVERSE” variable. Leave that window open and visible on screen.
  4. Now apply “Try Action” on a script action you want to test and watch the output in the variables window.

As said, for display I’m using the variables window, because the font (Menlo) used in the KM Shell Script Results window cannot handle all characters correctly.

The colors of the actions reflect the grade of Unicode compliance of the script:

  • red: fails completely
  • orange: OK with résumé | Käse, fails with Rượu đế
  • green: OK with all

Untitled-pty-fs8

The green actions should produce this result:

07-pty-fs8

2 Likes

Thanks for sharing, Tom. This is a very useful tool.

From my perspective, I prefer Chris' (@ccstone) AppleScript to be the best for all of my use cases:

It is fast and seems to work with all unicode characters. We shouldn't be afraid to use AppleScript Text Item Delimiters (TID), just learn how to use them properly, always setting and restoring.
I find that encapsulating TID in a handler, like Chris did, easily removes any issue with using TID.

BTW, it is very disappointing the ASObjC does not offer a native reverse() function for a string. Since using it also requires set/restore of TID, I see no advantage in using it.

Assuming I know what it means for a diacritic to become detached, then I don't see this happening in my testing. As far as I can tell, the handler I gave to @ccstone handles the string you supplied correctly:

I agree with @JMichaelTX, in that the fact one still needs to be mindful of the text item delimiters does take away from both the appeal and the simplicity of implementing it. There's also the overhead of using ASObjC methods, but if the rest of the script is already using them, that's less of a concern. However, I wonder whether it brings with it any actual performance benefits. My gut feeling is that, particularly for more commonly-sized strings that most scripts are likely to handle, vanilla AppleScript is going to be faster than ASObjC.

@ccstone said my alternative method was a tad faster, which makes sense just from the number of operations each method is performing. But I'll wait to hear back from @Tom with advice on how to reproduce the error he's seeing, as I'm not getting the same result.

@Tom, just to clarify, are you still utilising the macro code I posted in the first instance ? That one did indeed have an issue with those special characters, although that was down to my whimsical choice to use system attributes to read the value of a Keyboard Maestro variable, rather than tell app "Keyboard Maestro Engine"....

I really like this communal debugging and swapping of ideas and different methodologies. I wish there were more of this somewhere. The Latenight Forum is a good place for educational tips, but doesn't have this feel of collaboration that occurs more often in this forum.

On a parting note, if you wish to circumvent the text item delimiters, you can opt to coerce, for example, the ASObjC's Cocoa object to a linked list instead of a list, i.e.:

NSstr's reverseObjectEnumerator()'s allObjects() as linked list as text

That way, the subsequent coercion to text concatenates the list items without using a delimiter. This method applies equally to lists created from exploding a string into its characters:

(reverse of characters of dataStr) as linked list as text

Weird. I’ve run it in Script Editor now and still get the same (wrong) result:

33-pty-fs8

I think you see what I meant with “detached”: The ̛̣ went to the “u” (formerly on the “o”) and the ́̂ went before the “e” (formerly on the “e”).

Have you tried my macro from above? The script actions I have marked orange and red, are they also producing correct results on your system?

No, your very first version produces this (on my system!):

ÅÃÇÃeëƒ u£ÃõÃoõÃuR

I do see what you mean, and that's odd that it's happening for you and not for me. Here are my system details: System info: AppleScript version: 2.7 System version: 10.13.6, although I don't think that's especially pertinent to what's going on. As some diacritics are character modifiers that operate upon a preceding character, it almost seems like the reversal has caused the diacritics to have been removed for the text transformation, then reapplied afterwards when the base characters were in reverse order. Whether that would be the AppleScript engine processing the characters wrongly, or the textviews of Script Editor drawing them wrongly, I couldn't guess. It could even be the fonts being used in Script Editor, some of which could lack full unicode support. You could try Script Debugger, though I suspect the result will be the same as it uses the same AppleScript engine, the same compiler, the same fonts, ...

Yes, there we match. And that result is perfectly expected from what I know about Keyboard Maestro and the pros and cons of the various different methods of reading its variable data.

I have not as yet, but I will do so and report back my findings.

But I did try it in a shell, which can be fussy with non-Roman characters, but it seems happy with them in this instance:

In the shell I’m also getting the same as with Script Editor or KM:

26-pty-fs8

(I appended a space to the string for better visibility.)

I’m using DejaVu in Script Editor (certainly one of the most Unicode-capable fonts) and SF Mono in the Terminal.

If it’s a font problem you should see it already with the input string, for example:

57-pty-fs8

Yes, Script Debugger gives the same.


I’m on macOS 10.14.1, AppleScript version 2.7.

This is the shell env:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
CLICOLOR=1
TMPDIR=/var/folders/wn/28w_v3513m50gcc9qtvg3bfh0000gn/T/
PERL5LIB=/Users/tom/perl5/lib/perl5
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.h6zvb2xn2V/Render
TERM_PROGRAM_VERSION=421.1
PERL_MB_OPT=--install_base "/Users/tom/perl5"
TERM_SESSION_ID=0645EEDB-5FE2-470F-BE4C-4E601362C25E
LC_ALL=en_US.UTF-8
USER=tom
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.yB3a2jkO6w/Listeners
PATH=/Users/tom/perl5/bin:/Users/tom/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/Users/tom/ConTeXt/Beta/tex/texmf-osx-64/bin:/Applications/VMware Fusion.app/Contents/Public:/opt/X11/bin
PWD=/Users/tom
EDITOR=/usr/local/bin/bbedit
LANG=en_US.UTF-8
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/tom
PERL_LOCAL_LIB_ROOT=/Users/tom/perl5
LOGNAME=tom
VISUAL=/usr/local/bin/bbedit
LC_CTYPE=UTF-8
DISPLAY=/private/tmp/com.apple.launchd.zmpTruViEy/org.macosforge.xquartz:0
PERL_MM_OPT=INSTALL_BASE=/Users/tom/perl5
_=/usr/bin/env

One thing that changes the behavior in the Terminal is the “wide” setting for East Asian in the preferences. But selecting it makes it worse.

45-pty-fs8

I think I've got to the bottom of why it's happening. I'll update you a bit later when I have some extra time. But, the long and short of it is that the id property isn't well suited for string reversals, and it seems my earlier hunch about the way modifying diacritical characters are added onto a base character was a pretty good guess.

Yes, that’s what I thought initially. (Comparable to Perl’s reverse which is usable here only if the string is splitted by grapheme clusters (\X, script “1)” from here).

But then you showed me that this is not the case on your computer

Hey Guys,

I wasn't having any problems when reversing this string on my macOS 10.12.6 system:

“Rượu đế”

But when I started testing @CJK's ID method with the longer “Rượu đế | résumé | Käse | 123” test case I ended up with decomposition:

“321 | esäK | émusér | ́̂eđ ựơuR”

-Chris

Chris, in case you have downloaded my macro with the script collection, can you confirm my results on your system, or do you get different results for some of the other scripts too? (With the long and the short test case.)

I would really like to know if those different results are limited to the AppleScripts, or if it is something more general…

Hey @Tom,

As far as I can see I get the same results as you when running through your test macro.

Although I wasn't able to test your Perl script that uses the GCString module. It's not on my system, and I'm not seeing it when searching MacPorts – so I'm not wanting to mess with tying to install it at the moment.

-Chris

That's interesting. What gave you the inkling/feeling that the id property wouldn't work as well ?

I investigated this disparity a bit more and found a partial explanation. The characters @ccstone and I were using in the tests that succeeded on our systems were different to the characters you were using that failed on your system. Here's your test string, taken from the testing macro you shared:

① Rượu đế

And here's the test string that I copied-and-pasted from your post that originally cited these erroneous results:

You can't really blame me for thinking that they are identical. However, when I saw that other methods in the collection were throwing out errors of the same nature, while I was still getting a positive result in Script Editor, it occurred to me that the issue stemmed from the nature of the input. So, I ran this command:

use framework "Foundation"

① set a to current application's NSString's stringWithString:"Rượu đế"
② set b to current application's NSString's stringWithString:"Rượu đế"

a's isEqualToString:b --> false

(and, in fact, thanks to the menlo font, you can see demonstrable visual difference between the two strings, noticeably in the topmost diacritics above the e:

  • Rượu đế | Rượu đế • seemingly the same
  • Rượu đế | Rượu đếnoticeably variant as soon as the code block encloses them

Here's the output of an AppleScript acting on these two variants:

② id of "Rượu đế" --> {82, 432, 7907, 117, 32, 273, 7871}
① id of "Rượu đế" --> {82, 117, 795, 111, 795, 803, 117, 32, 273, 101, 770, 769}

As you can see, ① is comprised of 12 separate unicode characters, 5 of which are modifying characters (combining diacritics) that result in a 7-character long string; ② has precisely 7 individual characters, all of which are precomposed entities.

54 37

@Tom, I'm guessing you originally composed or obtained that Vietnamese string from a source that made use of combining diacriticals, and this became the test case for your macro. But, when it was transcribed to Keyboard Maestro from where I copied the text, the multi-character entities were substituted for their precomposed counterparts, which I think is a process called normalisation. Thus, when I came to test my handler, it appeared to pass with flying colours.

The nature of the undesirable outcomes of my handler and a couple of the others, are a result of exploding the word not just into characters, but decomposing the characters into into their individual entities. If one simply recombines them, there's nothing to see that's out of the ordinary:

① character id (id of "Rượu đế") --> "Rượu đế"

But when one applies a reverse transform:

① character id (reverse of id of "Rượu đế") --> "́̂eđ ựơuR"

each of the diacritical marks has been displaced to the left, which now makes sense: the 7 characters were decomposed fully into 12 entities, so when recombined into a string, ended up modifying the character that was previously to the other side of it, resulting in new character compositions. (When the reverse transform is applied again—being a symmetrical transformation—the original string is restored.) Obviously, when operating on the precomposed version of the string, this issue never arises.

As for why the id property would fully decompose a string whereas characters of... would not is peculiar. But it's very good to now be aware that this happens.

1 Like

Rather simple: The Perl scripts I experimented with have taught me that this string seems to be reversible only when treating it as a sequence of grapheme clusters, not as a sequence of “characters” or code points. (See the Perl scripts using the regex with the \X class and the Unicode::GCString module, “1)” and “2)” in the post.)

Since also the JavaScripts on the thread showed that “issue”, I was not really surprised to see the “character id” AppleScript doing the same. (The surprise —for me— rather was, that @ccstone’s “reverse of characters” AppleScript worked correctly.)

Thanks for investing the time and finding out that we worked with different source strings. I didn’t consider that the forum website might change the composition of the characters. Definitly my fault, especially since the original post with that string explicitly mentioned that it is necessary to use <pre> tags instead of <code> tags in order to not manipulate the characters. But somehow I had overlooked that warning. Sorry.

PS:

I have added a warning and a copy-safe version of the string to my post above.

1 Like

Thanks for taking the time for testing, Chris. Fortunatly @CJK could solve the mystery now.

PS:

Perl modules are usually installed from the CPAN repository via the cpan client (comes with any Perl installation), or the cpanm client (separate install, but IMO the better way). So, MacPorts shouldn’t be involved here. (Except for installing the cpanm client, if you wish so.)

Thanks for this. And now I know what a grapheme cluster is. This one, seemingly-innocuous task request by a user has ended up teaching me a lot of good stuff about text.