Using RegEx to sort text in a variable?

I think I have found a more proper way to sort it:

[example] Sort Lines (Perl).kmmacros (2.0 KB)

This looks more complicated, but what the script does is basically just this:

  1. It creates a list from the input, where each line is a list element (e.g. "clip.jpg").
  2. It converts each list element into an "array", where the first element is the complete list element itself ("clip.jpg") and the second element is only the filename part before the dot ("clip"), which is the part that is actually relevant for sorting.
    • The second element is created with the regex ([^.]+)
  3. Now the list is being sorted according to the lowercase variants of the second element of each array ("clip", "clip 03", etc.).
  4. After sorting the second part of each list element is stripped, as it is no longer needed.
  5. Finally the sorted list is converted back into individual lines again.

For a better explanation of the basic principle see this Wiki article.

5 Likes

Thanks everyone for all the input. Both of these solutions work great and will give me great examples to learn form. Thanks!!!

Great solution @Tom! :+1:

Bravo, Tom!

I should really have tested my proposed solution. As you pointed out, the period before the file extension makes it tricky. Periods (46) are farther down the ASCII list than a Space (32) so spaces take precedence in an ASCII sort.

For my punishment, I submit this macro to sort a selection of lines in a text editor. I didn't resort to the Schwartzian Transform, though. I just told old Unix sort to ignore case and use the period as a delimiter, sorting the first field (delimited by the period) and then sort the next field.

echo "$KMVAR_kmVar" | sort -f -t'.' -k1,1 -k1,2

There are some cases where you'd want to look past the first field (just add 'Water 06.jpg' to that list, for example). The ST solution fails there but this one handles that.

Sometimes it seems like every list needs custom code to sort. But I'm going to try using this macro generally for a while to see when it breaks. I'm sure it will. :persevere:

Anyway, I learned something following you on this topic and wanted to say thanks.

Sort Lines [test].kmmacros (5.6 KB)

2 Likes

Hey @mrpasini, that is a nice one and I think it deserves the award "Most elegant solution" :slight_smile: in the sense that it gets the job done without adding complexity. Also using the extension as a second key is a nice option, if desired.

That being said, the ST has one advantage: Since it is using a regular expression to determine the search key (the part in-between the slashes) it is more flexible than counting the key delimiter positions.

This allows you to handle also cases – for example – with a varying number of "fields". Let's say you have this:

03.longclip.jpg
clip 03.jpg
clip.jpg
03 clip.jpg
03.jpg
Water 06.wav
Water 01.wav
clip.03.jpg
03.clip.jpg

where you want to treat everything up to the last dot as the filename root.

This can be easily achieved by making the regex greedy, that is, changing it from ([^.]+) to (.+)\.

Sorted output:

03.jpg
03 clip.jpg
03.clip.jpg
03.longclip.jpg
clip.jpg
clip 03.jpg
clip.03.jpg
Water 01.wav
Water 06.wav

I tried to duplicate this with the sort program but I couldn't find a way to make it count the positions from the end of the string. Something like -k-2,-2 doesn't work.

1 Like

Agreed, Perl is more powerful. Unix sort needs help when the number of fields vary.

I thought it would be nice to have a sort macro that lets you experiment with the regexp pattern in the Schwartzian Transform. So here it is.

The default pattern is your original ([^.]+) just to have a guide to what's expected. No error checking on the regexp, though.

Sort Lines With Pattern.kmmacros (6.7 KB)

This is not a Perl thingy. Any mature language should be able to accomplish that. The ST Wiki article gives some hints, I haven't tried all that, though.

Anyways, you receive my personal Thanks and Kisses for posting this line:

my $pattern =  $ENV{KMVAR_Pattern};

For some obscure reasons I always used to get KM variables into Perl via osascript […] getvariable […]

Your variant (via env) is much more clean, and I have no clue why it didn't occur to me earlier :wink:

Thanks for sharing. That looks very useful.

Here's a few comments on your macro:

Unfortunately, all apps do not handle the "Copy" menu item appropriately, and leave it enabled even if nothing is selected, so you may want to use another method to determine if someething is selected.

The KM Action "Copy" will fail if nothing is selected, so you could test for success of that action.

I often use this method:

##Macro Library   COPY with Selection Test [Sub-Macro]


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/f/f31e210b0732821c0fa13a869eb0eb0285346066.kmmacros">COPY with Selection Test [Sub-Macro].kmmacros</a> (6.3 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---

###ReleaseNotes

1. Call this Sub-Macro in an Execute Macro Action
2. Then use a IF/THEN Action to test for KM Variable "CBS__Clipboard_Changed"
      = 1 IF the Clipboard has changed

---


<img src="/uploads/default/original/2X/e/e143f08c6c0dec941e48b7a7c667f95290b9095c.png" width="595" height="953">

I guess the problem is beyond me, but for testing different regexen just use something like this:

[example] Sort Lines With Pattern.kmmacros (2.7 KB)

This will

  • not alter your system clipboard
  • no need of a Prompt
  • to run it just hit the Try button or ⇧⌘T

Right, didn’t mean ST was exclusively Perl, just then we kick things up to a full-blown language, you’ve got more horses pulling the wagon.

I got the $ENV trick from something Peter Lewis wrote long ago. Can’t remember where I saw it, though.

Yeah but you have to Launch Keyboard Maestro Editor and navigate to the maco, which takes too long for me to remember what I was going to do. :slight_smile:

If you’re going to work with the Editor, you don’t really need the variable. But if you run the macro with the prompt, you can avoid the Editor.

I recall reading about that problem with the Edit menu from a while ago. And I told myself if I ever ran into it, I’d use your solution, preferring the simpler menu check (you know, as long as it works :slight_smile:).

But thanks. I just implemented it in a test macro so it’s handy when I need it. And it isn’t noticeably slower, either. But then I was able to turn off the pause entirely.

Thanks again.

If you are referring to the Pause after the ⌘C, then that can be dangerous.
Often the app needs some time AFTER the Copy command to complete the operations. Without the pause you could end up without anything on the Clipboard.

I just updated my original macro here:
MACRO: COPY with Selection Test [Sub-Macro]

NOTE: This version uses the KM Copy Action (rather than using ⌘C)

  • It is faster if there is a selection (no pause needed)
  • Has a timeout of 2 seconds (which you can change in the Action gear menu) if there is no selection.

In hindsight it's pretty obvious: just run /usr/bin/env from within KM and you'll see all your "KMVAR" variables. Just had to make use of it :wink:

It is in the KM Wiki:
Using Keyboard Maestro Variables in a Shell Script action (KM Wiki)

Or, in an 'Execute JavaScript for Automation' action:

(using regexes here, since you expressed curiosity about that, though I would probably reach more quickly for split functions myself)

Note: The higher-order mappendComparing function supports n-ary sorts by deriving a comparator function from a list of property-getting functions.

Sorted filenames.kmmacros (19.9 KB)

Full source of the Execute JS4A action
(ES6 JavaScript, so Sierra onwards – for [Yosemite onwards] ES5 JS, past into the Babel JS repl at https://babeljs.io/repl/)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        const stemPreNum = s => toLower((s.match(/^[^\d\.]+/) || [''])[0]);

        const stemNum = s => parseInt((s.match(/^\D+(\d+)\./) || ['', '0'])[1], 10);

        const suffix = s => (s.match(/\.(\w+)$/) || ['', ''])[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();

@ComplexPoint, thanks for sharing a JavaScript for Automation (JXA) solution.

It works well for the dataset you used (same as the OP).
You mentioned using split functions. It would be interesting to see that solution for comparison.

Is there a way to handle the dataset posted by @Tom?

Or again, extending it to Tom's data sample, and proceeding for the sake of variety by splits (my patience for regex-fiddling shortens with advancing years :slight_smile: )

Sorted filenames by splits.kmmacros (23.2 KB)

JavaScript ES6 source (Paste into Babel JS REPL at https://babeljs.io/repl/ to get pre-Sierra ES5 JavaScript source)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // (++) :: [a] -> [a] -> [a]
        const append = (xs, ys) => xs.concat(ys);

        // concat :: [[a]] -> [a] | [String] -> String
        const concat = xs =>
            xs.length > 0 ? (() => {
                const unit = typeof xs[0] === 'string' ? '' : [];
                return unit.concat.apply(unit, xs);
            })() : [];

        // elem :: Eq a => a -> [a] -> Bool
        const elem = (x, xs) => xs.indexOf(x) !== -1;

        // id :: a -> a
        const id = x => x;

        // init :: [a] -> [a]
        const init = xs => xs.length > 0 ? xs.slice(0, -1) : [];

        // intercalate :: String -> [a] -> String
        const intercalate = (s, xs) => xs.join(s);

        // isDigit :: Char -> Bool
        const isDigit = c => {
            const n = ord(c);
            return n >= 48 && n <= 57;
        };

        // last :: [a] -> a
        const last = xs => xs.length ? xs.slice(-1)[0] : undefined;

        // length :: [a] -> Int
        const length = xs => xs.length;

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // map :: (a -> b) -> [a] -> [b]
        const map = (f, xs) => xs.map(f);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // ord :: Char -> Int
        const ord = c => c.codePointAt(0);

        // show :: Int -> a -> Indented String
        // show :: a -> String
        const show = (...x) =>
            JSON.stringify.apply(
                null, x.length > 1 ? [x[1], null, x[0]] : x
            );

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // Splitting not on a delimiter, but whenever the relationship between
        // two consecutive items matches a supplied predicate function

        // splitBy :: (a -> a -> Bool) -> [a] -> [[a]]
        const splitBy = (f, ys) => {
            const
                bool = typeof ys === 'string',
                xs = bool ? ys.split('') : ys;
            return (xs.length < 2) ? [xs] : (() => {
                const
                    h = xs[0],
                    lstParts = xs.slice(1)
                    .reduce(([acc, active, prev], x) =>
                        f(prev, x) ? (
                            [acc.concat([active]), [x], x]
                        ) : [acc, active.concat(x), x], [
                            [],
                            [h],
                            h
                        ]);
                return map(
                    (bool ? concat : id),
                    lstParts[0].concat([lstParts[1]])
                );
            })();
        };

        // splitOn :: a -> [a] -> [[a]]
        // splitOn :: String -> String -> [String]
        const splitOn = (needle, haystack) =>
            typeof haystack === 'string' ? (
                haystack.split(needle)
            ) : (function sp_(ndl, hay) {
                const mbi = findIndex(x => ndl === x, hay);
                return mbi.nothing ? (
                    [hay]
                ) : append(
                    [take(mbi.just, hay)],
                    sp_(ndl, drop(mbi.just + 1, hay))
                );
            })(needle, haystack);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        // suffixDotSplit :: String -> (String, String)
        const suffixSplit = s =>
            elem('.', s) ? (() => {
                const xs = splitOn('.', s);
                return [intercalate('.', init(xs)), last(xs)];
            })() : [s, ''];

        // stemNumSplit :: String -> (String, String)
        const stemNumSplit = s => {
            const tpl = splitBy(
                (a, b) => isDigit(b) && !isDigit(a),
                suffixSplit(s)[0]
            );
            return length(tpl) > 1 ? tpl : append(tpl, ["0"]);
        };

        const stemPreNum = s => toLower(stemNumSplit(s)[0]);
        const stemNum = s => parseInt(stemNumSplit(s)[1], 10) || 0;
        const suffix = s => suffixSplit(s)[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();
2 Likes

I understand that, if you are not a native speaker of it, English is one of the hardest languages in the world to learn/understand. LOL I always thought the Asian languages had to be the hardest, but maybe not . . .

English seems pretty easy to me, but some would say I have a long way to go. :wink:

The thing is, about RegEx, is that, like many things, the more you use it, the easier it becomes, and the more you love it.

I am far, far from being proficient in RegEx, but I now love it and find many opportunities to use it, almost every day. Whereas when I started using it, I thought it was the most frustrating language I'd ever encountered.

To each his own. :smile:

BTW, I also have the "advancing years" syndrom. :wink: