Using RegEx to sort text in a variable?

Hey @mrpasini, that is a nice one and I think it deserves the award "Most elegant solution" :slight_smile: in the sense that it gets the job done without adding complexity. Also using the extension as a second key is a nice option, if desired.

That being said, the ST has one advantage: Since it is using a regular expression to determine the search key (the part in-between the slashes) it is more flexible than counting the key delimiter positions.

This allows you to handle also cases – for example – with a varying number of "fields". Let's say you have this:

03.longclip.jpg
clip 03.jpg
clip.jpg
03 clip.jpg
03.jpg
Water 06.wav
Water 01.wav
clip.03.jpg
03.clip.jpg

where you want to treat everything up to the last dot as the filename root.

This can be easily achieved by making the regex greedy, that is, changing it from ([^.]+) to (.+)\.

Sorted output:

03.jpg
03 clip.jpg
03.clip.jpg
03.longclip.jpg
clip.jpg
clip 03.jpg
clip.03.jpg
Water 01.wav
Water 06.wav

I tried to duplicate this with the sort program but I couldn't find a way to make it count the positions from the end of the string. Something like -k-2,-2 doesn't work.

1 Like

Agreed, Perl is more powerful. Unix sort needs help when the number of fields vary.

I thought it would be nice to have a sort macro that lets you experiment with the regexp pattern in the Schwartzian Transform. So here it is.

The default pattern is your original ([^.]+) just to have a guide to what's expected. No error checking on the regexp, though.

Sort Lines With Pattern.kmmacros (6.7 KB)

This is not a Perl thingy. Any mature language should be able to accomplish that. The ST Wiki article gives some hints, I haven't tried all that, though.

Anyways, you receive my personal Thanks and Kisses for posting this line:

my $pattern =  $ENV{KMVAR_Pattern};

For some obscure reasons I always used to get KM variables into Perl via osascript […] getvariable […]

Your variant (via env) is much more clean, and I have no clue why it didn't occur to me earlier :wink:

Thanks for sharing. That looks very useful.

Here's a few comments on your macro:

Unfortunately, all apps do not handle the "Copy" menu item appropriately, and leave it enabled even if nothing is selected, so you may want to use another method to determine if someething is selected.

The KM Action "Copy" will fail if nothing is selected, so you could test for success of that action.

I often use this method:

##Macro Library   COPY with Selection Test [Sub-Macro]


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/f/f31e210b0732821c0fa13a869eb0eb0285346066.kmmacros">COPY with Selection Test [Sub-Macro].kmmacros</a> (6.3 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---

###ReleaseNotes

1. Call this Sub-Macro in an Execute Macro Action
2. Then use a IF/THEN Action to test for KM Variable "CBS__Clipboard_Changed"
      = 1 IF the Clipboard has changed

---


<img src="/uploads/default/original/2X/e/e143f08c6c0dec941e48b7a7c667f95290b9095c.png" width="595" height="953">

I guess the problem is beyond me, but for testing different regexen just use something like this:

[example] Sort Lines With Pattern.kmmacros (2.7 KB)

This will

  • not alter your system clipboard
  • no need of a Prompt
  • to run it just hit the Try button or ⇧⌘T

Right, didn’t mean ST was exclusively Perl, just then we kick things up to a full-blown language, you’ve got more horses pulling the wagon.

I got the $ENV trick from something Peter Lewis wrote long ago. Can’t remember where I saw it, though.

Yeah but you have to Launch Keyboard Maestro Editor and navigate to the maco, which takes too long for me to remember what I was going to do. :slight_smile:

If you’re going to work with the Editor, you don’t really need the variable. But if you run the macro with the prompt, you can avoid the Editor.

I recall reading about that problem with the Edit menu from a while ago. And I told myself if I ever ran into it, I’d use your solution, preferring the simpler menu check (you know, as long as it works :slight_smile:).

But thanks. I just implemented it in a test macro so it’s handy when I need it. And it isn’t noticeably slower, either. But then I was able to turn off the pause entirely.

Thanks again.

If you are referring to the Pause after the ⌘C, then that can be dangerous.
Often the app needs some time AFTER the Copy command to complete the operations. Without the pause you could end up without anything on the Clipboard.

I just updated my original macro here:
MACRO: COPY with Selection Test [Sub-Macro]

NOTE: This version uses the KM Copy Action (rather than using ⌘C)

  • It is faster if there is a selection (no pause needed)
  • Has a timeout of 2 seconds (which you can change in the Action gear menu) if there is no selection.

In hindsight it's pretty obvious: just run /usr/bin/env from within KM and you'll see all your "KMVAR" variables. Just had to make use of it :wink:

It is in the KM Wiki:
Using Keyboard Maestro Variables in a Shell Script action (KM Wiki)

Or, in an 'Execute JavaScript for Automation' action:

(using regexes here, since you expressed curiosity about that, though I would probably reach more quickly for split functions myself)

Note: The higher-order mappendComparing function supports n-ary sorts by deriving a comparator function from a list of property-getting functions.

Sorted filenames.kmmacros (19.9 KB)

Full source of the Execute JS4A action
(ES6 JavaScript, so Sierra onwards – for [Yosemite onwards] ES5 JS, past into the Babel JS repl at https://babeljs.io/repl/)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        const stemPreNum = s => toLower((s.match(/^[^\d\.]+/) || [''])[0]);

        const stemNum = s => parseInt((s.match(/^\D+(\d+)\./) || ['', '0'])[1], 10);

        const suffix = s => (s.match(/\.(\w+)$/) || ['', ''])[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();

@ComplexPoint, thanks for sharing a JavaScript for Automation (JXA) solution.

It works well for the dataset you used (same as the OP).
You mentioned using split functions. It would be interesting to see that solution for comparison.

Is there a way to handle the dataset posted by @Tom?

Or again, extending it to Tom's data sample, and proceeding for the sake of variety by splits (my patience for regex-fiddling shortens with advancing years :slight_smile: )

Sorted filenames by splits.kmmacros (23.2 KB)

JavaScript ES6 source (Paste into Babel JS REPL at https://babeljs.io/repl/ to get pre-Sierra ES5 JavaScript source)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // (++) :: [a] -> [a] -> [a]
        const append = (xs, ys) => xs.concat(ys);

        // concat :: [[a]] -> [a] | [String] -> String
        const concat = xs =>
            xs.length > 0 ? (() => {
                const unit = typeof xs[0] === 'string' ? '' : [];
                return unit.concat.apply(unit, xs);
            })() : [];

        // elem :: Eq a => a -> [a] -> Bool
        const elem = (x, xs) => xs.indexOf(x) !== -1;

        // id :: a -> a
        const id = x => x;

        // init :: [a] -> [a]
        const init = xs => xs.length > 0 ? xs.slice(0, -1) : [];

        // intercalate :: String -> [a] -> String
        const intercalate = (s, xs) => xs.join(s);

        // isDigit :: Char -> Bool
        const isDigit = c => {
            const n = ord(c);
            return n >= 48 && n <= 57;
        };

        // last :: [a] -> a
        const last = xs => xs.length ? xs.slice(-1)[0] : undefined;

        // length :: [a] -> Int
        const length = xs => xs.length;

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // map :: (a -> b) -> [a] -> [b]
        const map = (f, xs) => xs.map(f);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // ord :: Char -> Int
        const ord = c => c.codePointAt(0);

        // show :: Int -> a -> Indented String
        // show :: a -> String
        const show = (...x) =>
            JSON.stringify.apply(
                null, x.length > 1 ? [x[1], null, x[0]] : x
            );

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // Splitting not on a delimiter, but whenever the relationship between
        // two consecutive items matches a supplied predicate function

        // splitBy :: (a -> a -> Bool) -> [a] -> [[a]]
        const splitBy = (f, ys) => {
            const
                bool = typeof ys === 'string',
                xs = bool ? ys.split('') : ys;
            return (xs.length < 2) ? [xs] : (() => {
                const
                    h = xs[0],
                    lstParts = xs.slice(1)
                    .reduce(([acc, active, prev], x) =>
                        f(prev, x) ? (
                            [acc.concat([active]), [x], x]
                        ) : [acc, active.concat(x), x], [
                            [],
                            [h],
                            h
                        ]);
                return map(
                    (bool ? concat : id),
                    lstParts[0].concat([lstParts[1]])
                );
            })();
        };

        // splitOn :: a -> [a] -> [[a]]
        // splitOn :: String -> String -> [String]
        const splitOn = (needle, haystack) =>
            typeof haystack === 'string' ? (
                haystack.split(needle)
            ) : (function sp_(ndl, hay) {
                const mbi = findIndex(x => ndl === x, hay);
                return mbi.nothing ? (
                    [hay]
                ) : append(
                    [take(mbi.just, hay)],
                    sp_(ndl, drop(mbi.just + 1, hay))
                );
            })(needle, haystack);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        // suffixDotSplit :: String -> (String, String)
        const suffixSplit = s =>
            elem('.', s) ? (() => {
                const xs = splitOn('.', s);
                return [intercalate('.', init(xs)), last(xs)];
            })() : [s, ''];

        // stemNumSplit :: String -> (String, String)
        const stemNumSplit = s => {
            const tpl = splitBy(
                (a, b) => isDigit(b) && !isDigit(a),
                suffixSplit(s)[0]
            );
            return length(tpl) > 1 ? tpl : append(tpl, ["0"]);
        };

        const stemPreNum = s => toLower(stemNumSplit(s)[0]);
        const stemNum = s => parseInt(stemNumSplit(s)[1], 10) || 0;
        const suffix = s => suffixSplit(s)[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();
2 Likes

I understand that, if you are not a native speaker of it, English is one of the hardest languages in the world to learn/understand. LOL I always thought the Asian languages had to be the hardest, but maybe not . . .

English seems pretty easy to me, but some would say I have a long way to go. :wink:

The thing is, about RegEx, is that, like many things, the more you use it, the easier it becomes, and the more you love it.

I am far, far from being proficient in RegEx, but I now love it and find many opportunities to use it, almost every day. Whereas when I started using it, I thought it was the most frustrating language I'd ever encountered.

To each his own. :smile:

BTW, I also have the "advancing years" syndrom. :wink:

Regexes deserve their place in the toolbox – they’re useful things, and it does make sense to slightly over-use them in the first few years, just to get a bit more experience.

After 20+ years though, I think the novelty wears off. They are easy enough to write, but they can make code rather slower to read, more fiddly to maintain, and perhaps just a little more ugly.

More generally, the need to code-switch (whether shelling out, calling ObjC methods, or going for a Regular Expression solution) is always likely to express some kind of gap, or lack of expressivity, in the main idiom that one is writing in.

1 Like

PS there’s a stronger view expressed here:

1 Like

Perhaps it is a novelty for you, but for me, and I think many others, it is used because it is a powerful tool.

Actually, they are hard to write, at least to do so correctly.

That's just your opinion. I could say the same thing about code others write, but I won't.

I thought you knew that JavaScript has a built-in RegEx engine. There are no "gaps". The RegEx functions are powerful, flexible, and fairly easy to use.