Using RegEx to sort text in a variable?

Hi guys, I’m brand spanking new to RegEx and if anyone is willing I could use a hand doing the seemingly simple task of sorting text (where the source is separated by a carriage return), ascending. I’ve been attempting this all morning and my newbiness points to asking for help. I don’t need it to be fancy at this point, but in the following sample data, ideally text that starts with a number would be at the top, and then text that has trailing numbers would sort by the leading text first, and then those trailing numbers:

clip 03.jpg
clip.jpg
03 clip.jpg
Water 06.wav
Water 01.wav

So ideally the result of this example would be:

03 clip.jpg
clip.jpg
clip 03.jpg
Water 01.wav
Water 06.wav

But honestly, I’ll take any help as a stepping stone to learning RegEx.

Wrong tool. Regexp doesn’t sort. But the command line tool “sort” with the -f option does (or you can use BBEdit’s sort to do it). In Keyboard Maestro use the Execute A Shell Script action to “sort -f input.txt -o output.txt” with the -f ignoring case.

1 Like

At first glance this seems very easy. Here a macro based on @mrpasini's idea:

[example] Sort Lines (Simple).kmmacros (1.9 KB)

But the output

is not what you have expected. (Position 2 and 3.)

Obviously the problem is that you are seeing the strings as file names (without extension), while the sort program is seeing each line as one string.

The straightforward way would be to delete the extensions and the dots, but we can't do that, because there are different extensions (wav and jpg).

So, I propose to temporarily replace the dot by a whitespace character (for example a tab), sort the lines, and then change back the tabs to dots:

[example] Sort Lines (With Workaround).kmmacros (3.6 KB)

The output now corresponds now to your manual sorting:

There are probably more elegant ways, but this workaround is the best that occurs to me at the moment…

3 Likes

That is a very elegant way, in that it is very simple, very easy to understand, and will work in all normal cases.

1 Like

I think I have found a more proper way to sort it:

[example] Sort Lines (Perl).kmmacros (2.0 KB)

This looks more complicated, but what the script does is basically just this:

  1. It creates a list from the input, where each line is a list element (e.g. "clip.jpg").
  2. It converts each list element into an "array", where the first element is the complete list element itself ("clip.jpg") and the second element is only the filename part before the dot ("clip"), which is the part that is actually relevant for sorting.
    • The second element is created with the regex ([^.]+)
  3. Now the list is being sorted according to the lowercase variants of the second element of each array ("clip", "clip 03", etc.).
  4. After sorting the second part of each list element is stripped, as it is no longer needed.
  5. Finally the sorted list is converted back into individual lines again.

For a better explanation of the basic principle see this Wiki article.

5 Likes

Thanks everyone for all the input. Both of these solutions work great and will give me great examples to learn form. Thanks!!!

Great solution @Tom! :+1:

Bravo, Tom!

I should really have tested my proposed solution. As you pointed out, the period before the file extension makes it tricky. Periods (46) are farther down the ASCII list than a Space (32) so spaces take precedence in an ASCII sort.

For my punishment, I submit this macro to sort a selection of lines in a text editor. I didn't resort to the Schwartzian Transform, though. I just told old Unix sort to ignore case and use the period as a delimiter, sorting the first field (delimited by the period) and then sort the next field.

echo "$KMVAR_kmVar" | sort -f -t'.' -k1,1 -k1,2

There are some cases where you'd want to look past the first field (just add 'Water 06.jpg' to that list, for example). The ST solution fails there but this one handles that.

Sometimes it seems like every list needs custom code to sort. But I'm going to try using this macro generally for a while to see when it breaks. I'm sure it will. :persevere:

Anyway, I learned something following you on this topic and wanted to say thanks.

Sort Lines [test].kmmacros (5.6 KB)

2 Likes

Hey @mrpasini, that is a nice one and I think it deserves the award "Most elegant solution" :slight_smile: in the sense that it gets the job done without adding complexity. Also using the extension as a second key is a nice option, if desired.

That being said, the ST has one advantage: Since it is using a regular expression to determine the search key (the part in-between the slashes) it is more flexible than counting the key delimiter positions.

This allows you to handle also cases – for example – with a varying number of "fields". Let's say you have this:

03.longclip.jpg
clip 03.jpg
clip.jpg
03 clip.jpg
03.jpg
Water 06.wav
Water 01.wav
clip.03.jpg
03.clip.jpg

where you want to treat everything up to the last dot as the filename root.

This can be easily achieved by making the regex greedy, that is, changing it from ([^.]+) to (.+)\.

Sorted output:

03.jpg
03 clip.jpg
03.clip.jpg
03.longclip.jpg
clip.jpg
clip 03.jpg
clip.03.jpg
Water 01.wav
Water 06.wav

I tried to duplicate this with the sort program but I couldn't find a way to make it count the positions from the end of the string. Something like -k-2,-2 doesn't work.

1 Like

Agreed, Perl is more powerful. Unix sort needs help when the number of fields vary.

I thought it would be nice to have a sort macro that lets you experiment with the regexp pattern in the Schwartzian Transform. So here it is.

The default pattern is your original ([^.]+) just to have a guide to what's expected. No error checking on the regexp, though.

Sort Lines With Pattern.kmmacros (6.7 KB)

This is not a Perl thingy. Any mature language should be able to accomplish that. The ST Wiki article gives some hints, I haven't tried all that, though.

Anyways, you receive my personal Thanks and Kisses for posting this line:

my $pattern =  $ENV{KMVAR_Pattern};

For some obscure reasons I always used to get KM variables into Perl via osascript […] getvariable […]

Your variant (via env) is much more clean, and I have no clue why it didn't occur to me earlier :wink:

Thanks for sharing. That looks very useful.

Here's a few comments on your macro:

Unfortunately, all apps do not handle the "Copy" menu item appropriately, and leave it enabled even if nothing is selected, so you may want to use another method to determine if someething is selected.

The KM Action "Copy" will fail if nothing is selected, so you could test for success of that action.

I often use this method:

##Macro Library   COPY with Selection Test [Sub-Macro]


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/f/f31e210b0732821c0fa13a869eb0eb0285346066.kmmacros">COPY with Selection Test [Sub-Macro].kmmacros</a> (6.3 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---

###ReleaseNotes

1. Call this Sub-Macro in an Execute Macro Action
2. Then use a IF/THEN Action to test for KM Variable "CBS__Clipboard_Changed"
      = 1 IF the Clipboard has changed

---


<img src="/uploads/default/original/2X/e/e143f08c6c0dec941e48b7a7c667f95290b9095c.png" width="595" height="953">

I guess the problem is beyond me, but for testing different regexen just use something like this:

[example] Sort Lines With Pattern.kmmacros (2.7 KB)

This will

  • not alter your system clipboard
  • no need of a Prompt
  • to run it just hit the Try button or ⇧⌘T

Right, didn’t mean ST was exclusively Perl, just then we kick things up to a full-blown language, you’ve got more horses pulling the wagon.

I got the $ENV trick from something Peter Lewis wrote long ago. Can’t remember where I saw it, though.

Yeah but you have to Launch Keyboard Maestro Editor and navigate to the maco, which takes too long for me to remember what I was going to do. :slight_smile:

If you’re going to work with the Editor, you don’t really need the variable. But if you run the macro with the prompt, you can avoid the Editor.

I recall reading about that problem with the Edit menu from a while ago. And I told myself if I ever ran into it, I’d use your solution, preferring the simpler menu check (you know, as long as it works :slight_smile:).

But thanks. I just implemented it in a test macro so it’s handy when I need it. And it isn’t noticeably slower, either. But then I was able to turn off the pause entirely.

Thanks again.

If you are referring to the Pause after the ⌘C, then that can be dangerous.
Often the app needs some time AFTER the Copy command to complete the operations. Without the pause you could end up without anything on the Clipboard.

I just updated my original macro here:
MACRO: COPY with Selection Test [Sub-Macro]

NOTE: This version uses the KM Copy Action (rather than using ⌘C)

  • It is faster if there is a selection (no pause needed)
  • Has a timeout of 2 seconds (which you can change in the Action gear menu) if there is no selection.

In hindsight it's pretty obvious: just run /usr/bin/env from within KM and you'll see all your "KMVAR" variables. Just had to make use of it :wink:

It is in the KM Wiki:
Using Keyboard Maestro Variables in a Shell Script action (KM Wiki)