Using RegEx to sort text in a variable?

ComplexPoint · September 15, 2017, 10:37pm

Or, in an 'Execute JavaScript for Automation' action:

(using regexes here, since you expressed curiosity about that, though I would probably reach more quickly for split functions myself)

Note: The higher-order mappendComparing function supports n-ary sorts by deriving a comparator function from a list of property-getting functions.

Sorted filenames.kmmacros (19.9 KB)

Full source of the Execute JS4A action
(ES6 JavaScript, so Sierra onwards – for [Yosemite onwards] ES5 JS, past into the Babel JS repl at https://babeljs.io/repl/)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        const stemPreNum = s => toLower((s.match(/^[^\d\.]+/) || [''])[0]);

        const stemNum = s => parseInt((s.match(/^\D+(\d+)\./) || ['', '0'])[1], 10);

        const suffix = s => (s.match(/\.(\w+)$/) || ['', ''])[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();

JMichaelTX · September 15, 2017, 11:49pm

@ComplexPoint, thanks for sharing a JavaScript for Automation (JXA) solution.

It works well for the dataset you used (same as the OP).
You mentioned using split functions. It would be interesting to see that solution for comparison.

Is there a way to handle the dataset posted by @Tom?

Tom:

That being said, the ST has one advantage: Since it is using a regular expression to determine the search key (the part in-between the slashes) it is more flexible than counting the key delimiter positions.

This allows you to handle also cases – for example – with a varying number of "fields". Let's say you have this:
03.longclip.jpg
clip 03.jpg
clip.jpg
03 clip.jpg
03.jpg
Water 06.wav
Water 01.wav
clip.03.jpg
03.clip.jpg
where you want to treat everything up to the last dot as the filename root.

This can be easily achieved by making the regex greedy, that is, changing it from ([^.]+) to (.+)\.

Sorted output:
03.jpg
03 clip.jpg
03.clip.jpg
03.longclip.jpg
clip.jpg
clip 03.jpg
clip.03.jpg
Water 01.wav
Water 06.wav

ComplexPoint · September 16, 2017, 9:05am

Or again, extending it to Tom's data sample, and proceeding for the sake of variety by splits (my patience for regex-fiddling shortens with advancing years )

Sorted filenames by splits.kmmacros (23.2 KB)

JavaScript ES6 source (Paste into Babel JS REPL at https://babeljs.io/repl/ to get pre-Sierra ES5 JavaScript source)

(() => {
    'use strict';

    // GENERIC FUNCTIONS -----------------------------------------------------

        // (++) :: [a] -> [a] -> [a]
        const append = (xs, ys) => xs.concat(ys);

        // concat :: [[a]] -> [a] | [String] -> String
        const concat = xs =>
            xs.length > 0 ? (() => {
                const unit = typeof xs[0] === 'string' ? '' : [];
                return unit.concat.apply(unit, xs);
            })() : [];

        // elem :: Eq a => a -> [a] -> Bool
        const elem = (x, xs) => xs.indexOf(x) !== -1;

        // id :: a -> a
        const id = x => x;

        // init :: [a] -> [a]
        const init = xs => xs.length > 0 ? xs.slice(0, -1) : [];

        // intercalate :: String -> [a] -> String
        const intercalate = (s, xs) => xs.join(s);

        // isDigit :: Char -> Bool
        const isDigit = c => {
            const n = ord(c);
            return n >= 48 && n <= 57;
        };

        // last :: [a] -> a
        const last = xs => xs.length ? xs.slice(-1)[0] : undefined;

        // length :: [a] -> Int
        const length = xs => xs.length;

        // lines :: String -> [String]
        const lines = s => s.split(/[\r\n]/);

        // map :: (a -> b) -> [a] -> [b]
        const map = (f, xs) => xs.map(f);

        // mappendComparing :: [(a -> b)] -> (a -> a -> Ordering)
        const mappendComparing = fs => (x, y) =>
            fs.reduce((ord, f) => (ord !== 0) ? (
                ord
            ) : (() => {
                const
                    a = f(x),
                    b = f(y);
                return a < b ? -1 : a > b ? 1 : 0
            })(), 0);

        // ord :: Char -> Int
        const ord = c => c.codePointAt(0);

        // show :: Int -> a -> Indented String
        // show :: a -> String
        const show = (...x) =>
            JSON.stringify.apply(
                null, x.length > 1 ? [x[1], null, x[0]] : x
            );

        // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
        const sortBy = (f, xs) =>
            xs.slice()
            .sort(f);

        // Splitting not on a delimiter, but whenever the relationship between
        // two consecutive items matches a supplied predicate function

        // splitBy :: (a -> a -> Bool) -> [a] -> [[a]]
        const splitBy = (f, ys) => {
            const
                bool = typeof ys === 'string',
                xs = bool ? ys.split('') : ys;
            return (xs.length < 2) ? [xs] : (() => {
                const
                    h = xs[0],
                    lstParts = xs.slice(1)
                    .reduce(([acc, active, prev], x) =>
                        f(prev, x) ? (
                            [acc.concat([active]), [x], x]
                        ) : [acc, active.concat(x), x], [
                            [],
                            [h],
                            h
                        ]);
                return map(
                    (bool ? concat : id),
                    lstParts[0].concat([lstParts[1]])
                );
            })();
        };

        // splitOn :: a -> [a] -> [[a]]
        // splitOn :: String -> String -> [String]
        const splitOn = (needle, haystack) =>
            typeof haystack === 'string' ? (
                haystack.split(needle)
            ) : (function sp_(ndl, hay) {
                const mbi = findIndex(x => ndl === x, hay);
                return mbi.nothing ? (
                    [hay]
                ) : append(
                    [take(mbi.just, hay)],
                    sp_(ndl, drop(mbi.just + 1, hay))
                );
            })(needle, haystack);

        // toLower :: Text -> Text
        const toLower = s => s.toLowerCase();

        // unlines :: [String] -> String
        const unlines = xs => xs.join('\n');


        // COMPARABLE FILE NAME PARTS --------------------------------------------

        // suffixDotSplit :: String -> (String, String)
        const suffixSplit = s =>
            elem('.', s) ? (() => {
                const xs = splitOn('.', s);
                return [intercalate('.', init(xs)), last(xs)];
            })() : [s, ''];

        // stemNumSplit :: String -> (String, String)
        const stemNumSplit = s => {
            const tpl = splitBy(
                (a, b) => isDigit(b) && !isDigit(a),
                suffixSplit(s)[0]
            );
            return length(tpl) > 1 ? tpl : append(tpl, ["0"]);
        };

        const stemPreNum = s => toLower(stemNumSplit(s)[0]);
        const stemNum = s => parseInt(stemNumSplit(s)[1], 10) || 0;
        const suffix = s => suffixSplit(s)[1];

        // TEST ------------------------------------------------------------------
        return unlines(
            sortBy(
                mappendComparing(
                    [stemPreNum, stemNum, suffix]
                ),
                lines(
                    Application('Keyboard Maestro Engine')
                    .getvariable('fileNames')
                )
            )
        );
    })();

JMichaelTX · September 17, 2017, 12:53am

I understand that, if you are not a native speaker of it, English is one of the hardest languages in the world to learn/understand. LOL I always thought the Asian languages had to be the hardest, but maybe not . . .

English seems pretty easy to me, but some would say I have a long way to go.

The thing is, about RegEx, is that, like many things, the more you use it, the easier it becomes, and the more you love it.

I am far, far from being proficient in RegEx, but I now love it and find many opportunities to use it, almost every day. Whereas when I started using it, I thought it was the most frustrating language I'd ever encountered.

To each his own.

BTW, I also have the "advancing years" syndrom.

ComplexPoint · September 17, 2017, 5:02am

Regexes deserve their place in the toolbox – they’re useful things, and it does make sense to slightly over-use them in the first few years, just to get a bit more experience.

After 20+ years though, I think the novelty wears off. They are easy enough to write, but they can make code rather slower to read, more fiddly to maintain, and perhaps just a little more ugly.

More generally, the need to code-switch (whether shelling out, calling ObjC methods, or going for a Regular Expression solution) is always likely to express some kind of gap, or lack of expressivity, in the main idiom that one is writing in.

ComplexPoint · September 17, 2017, 5:22am

PS there’s a stronger view expressed here:

JMichaelTX · September 17, 2017, 6:28am

Perhaps it is a novelty for you, but for me, and I think many others, it is used because it is a powerful tool.

Actually, they are hard to write, at least to do so correctly.

That's just your opinion. I could say the same thing about code others write, but I won't.

I thought you knew that JavaScript has a built-in RegEx engine. There are no "gaps". The RegEx functions are powerful, flexible, and fairly easy to use.

Using RegEx to sort text in a variable?

Options