Need macro to sort tab-del on length of first column

ALYB · June 28, 2024, 4:12pm

I need a macro to sort the clipboard's content of tab-delimited lines with two columns by the length of the items of the first column.

So if the clipboard contains:

The macro should put this sorted result on the clipboard:

Thank you for your help!
example_files.zip (1.2 KB)

ComplexPoint · June 28, 2024, 4:41pm

One unambitious approach (putting aside, for the moment, any question of secondary and perhaps tertiary sort orders)

(Assumes, for the .toReversed() method, a recent macOS. Otherwise, you could use a function like:

Expand disclosure triangle to view JS source

// reverse :: [a] -> [a]
const reverse = xs =>
    xs.slice(0).reverse();

otherwise, we could look more directly at secondary and tertiary sort orders)

Sorted by decreasing count of words in first tab-delim col.kmmacros (14 KB)

Expand disclosure triangle to view JS source

return (() => {
    "use strict";

    const main = () =>
        sortOn(
            s => words(s.split("\t")[0]).length
        )(
            lines(kmvar.local_Source)
        )
        .toReversed()
        .join("\n");


    // --------------------- GENERIC ---------------------

    // comparing :: Ord a => (b -> a) -> b -> b -> Ordering
    const comparing = f =>
    // The ordering of f(x) and f(y) as a value
    // drawn from {-1, 0, 1}, representing {LT, EQ, GT}.
        x => y => {
            const
                a = f(x),
                b = f(y);

            return a < b
                ? -1
                : a > b
                    ? 1
                    : 0;
        };

    // lines :: String -> [String]
    const lines = s =>
    // A list of strings derived from a single string
    // which is delimited by \n or by \r\n or \r.
        0 < s.length
            ? s.split(/\r\n|\n|\r/u)
            : [];


    // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
    const sortBy = f =>
    // A copy of xs sorted by the comparator function f.
        xs => xs.slice()
        .sort((a, b) => f(a)(b));


    // sortOn :: Ord b => (a -> b) -> [a] -> [a]
    const sortOn = f =>
    // Equivalent to sortBy(comparing(f)), but with f(x)
    // evaluated only once for each x in xs.
    // ('Schwartzian' decorate-sort-undecorate).
        xs => sortBy(
            comparing(x => x[0])
        )(
            xs.map(x => [f(x), x])
        )
        .map(x => x[1]);


    // words :: String -> [String]
    const words = s =>
    // List of space-delimited sub-strings.
    // Leading and trailling space ignored.
        s.split(/\s+/u).filter(Boolean);


    // --------------------- LOGGING ---------------------


    // showLog :: a -> IO ()
    const showLog = (...args) =>
    // eslint-disable-next-line no-console
        console.log(
            args
            .map(JSON.stringify)
            .join(" -> ")
        );

    // sj :: a -> String
    const sj = (...args) =>
    // Abbreviation of showJSON for quick testing.
    // Default indent size is two, which can be
    // overriden by any integer supplied as the
    // first argument of more than one.
        JSON.stringify.apply(
            null,
            1 < args.length && !isNaN(args[0])
                ? [args[1], null, args[0]]
                : [args[0], null, 2]
        );

    return main();
})();

griffman · June 28, 2024, 4:58pm

I figured this was a perfect exercise for Javascript, and @ComplexPoint showed just that. Here's an alternative approach just using string manipulation in a brute force manner :). I count the characters in the first segment of each line, prepend that to the list, then sort the list based on the number and remove the characters.

Download Macro(s): Sort list by first field length.kmmacros (6.4 KB)

Macro screenshot

Macro notes

Macros are always disabled when imported into the Keyboard Maestro Editor.
- The user must ensure the macro is enabled.
- The user must also ensure the macro's parent macro-group is enabled.

System information

macOS 14.5
Keyboard Maestro v11.0.3

But don't use my method, it's much less efficient than the above Javascript solution—I mainly wrote it to see if what I thought would work would, in fact, work :).

-rob.

Airy · June 28, 2024, 6:23pm

I have another approach. I love coming up with alternate approaches (without using ChatGPT.) Especially using shell.

awk -F'\t' '{print length($1) "\t" $0}' | sort -k1,1 -n -r | awk -F'\t' '{print $2 "\t" $3}'

Nige_S · June 28, 2024, 6:25pm

It's shell time!

This will take your input text and sort it by length of the first column (longest first), sub-sorted alphabetically (A-Za-z).

Length then Alpha Sort.kmmacros (3.6 KB)

Airy · June 28, 2024, 6:55pm

Your solution has 84 characters, mine has 85. You beat me by one.

Nige_S · June 28, 2024, 8:05pm

And I'm stealing your usage of $0 -- I completely forgot about that -- so that's another 3 fewer!

Nige_S · June 28, 2024, 9:11pm

I don't know why I even thought to try this, but:

Mind. Blown.

So everything but the sort as a "native" KM action:

Length then Alpha Sort 2.kmmacros (4.5 KB)

Image

Airy · June 28, 2024, 9:29pm

Migrating 50% of the work to native KM actions is very good, but 100% would be better. I decided that a one line shell script would be "simpler" than even a 100% KM solution, which is why I wrote my solution using a shell.

Nige_S · June 28, 2024, 9:44pm

More to the point, way more people here have at least some familiarity with KM's regular expressions than they do with awk. So that method of prepending character counts to a line might be more generally useful.

And @peternlewis has given us the ability to do a calculation using one of the capture groups and use the result in the replacement -- how cool is that?

Airy · June 28, 2024, 11:05pm

I still haven't fully wrapped my head around how KM manages regular expressions. As you've show, it seems to be re-evaluated on a line by line basis. I'm not entirely sure how KM implements that.

griffman · June 28, 2024, 11:48pm

If I'm understanding your question right, it's the (?m) flag that tells the regex engine to process each line separately. Without that, it treats it as a blob.

-rob.

Nige_S · June 28, 2024, 11:49pm

Is that different to other PCRE engines? The global /g switch you might expect to see is hidden in the action's settings as "All Matches" (the default), sure, but everything else bar the %Calculate% looks pretty standard.

The (?m) switch tells the engine to include "line start" and "line end" in ^ and $ anchor matches, /g is set, so this is just "with every occurrence that matches line-start, one or more non-tab characters, then anything else to line-end; replace with...".

For the replacement, $1 is the first capture group, $0 is the whole match (so the entire line) and the only KM-specific thing is using that first capture group in a function to get the character count, using the result in our replacement string.

ComplexPoint · June 29, 2024, 5:09am

Or with a primary descending sort on the word count of the left hand expression,
and a secondary case-insensitive a-z sort on the LHE:

Sorted by decreasing LHE word-count- then a-z.kmmacros (14 KB)

Expand disclosure triangle to view JS source

return (() => {
    "use strict";

    // Primary sort :: DESC word count in LHE
    // Secondary sort :: Case-insensitive a-z in LHE

    // (Left hand expression)

    // main :: IO ()
    const main = () =>
        sortBy(
            mappendComparing(
                // Descending word count on left,
                flip(comparing(x => x.n))
            )(
                // and secondary case-insensitive a-z.
                comparing(x => x.lower)
            )
        )(
            lines(kmvar.local_Source)

            // Decorated list.
            .map(s => {
                const a = s.split("\t")[0];

                return {
                    s,
                    lower: toLower(a),
                    n: words(a).length
                };
            })
        )

        // undecorated list.
        .map(x => x.s)
        .join("\n");


    // --------------------- GENERIC ---------------------

    // comparing :: Ord a => (b -> a) -> b -> b -> Ordering
    const comparing = f =>
    // The ordering of f(x) and f(y) as a value
    // drawn from {-1, 0, 1}, representing {LT, EQ, GT}.
        x => y => {
            const
                a = f(x),
                b = f(y);

            return a < b
                ? -1
                : a > b
                    ? 1
                    : 0;
        };


    // flip :: (a -> b -> c) -> b -> a -> c
    const flip = op =>
        // The binary function op with
        // its arguments reversed.
        1 !== op.length
            ? (a, b) => op(b, a)
            : (a => b => op(b)(a));


    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single string
        // which is delimited by \n or by \r\n or \r.
        0 < s.length
            ? s.split(/\r\n|\n|\r/u)
            : [];


    // mappendComparing (<>) :: (a -> a -> Bool)
    // (a -> a -> Bool) -> (a -> a -> Bool)
    const mappendComparing = cmp =>
        cmp1 => a => b => {
            const x = cmp(a)(b);

            return 0 !== x
                ? x
                : cmp1(a)(b);
        };


    // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
    const sortBy = f =>
        // A copy of xs sorted by the comparator function f.
        xs => xs.slice()
        .sort((a, b) => f(a)(b));


    // toLower :: String -> String
    const toLower = s =>
        // Lower-case version of string.
        s.toLocaleLowerCase();


    // words :: String -> [String]
    const words = s =>
        // List of space-delimited sub-strings.
        // Leading and trailling space ignored.
        s.split(/\s+/u).filter(Boolean);

    // MAIN ---
    return main();
})();

ALYB · July 16, 2024, 10:58am

This is amazing. Just one line ...

I'm trying to understand this. You first define the tab character as a separator. Then you add the length of column 1 of every line to a new column 0, then you sort on that column (which then has become column 1). And then you run awk again to just print the columns 2 and 3?

Nige_S · July 16, 2024, 11:17am

For clarity -- "...$0 refers to the entire line." (taken from man awk).

So that section is "print the length of column 1, then a tab character, then the entirety of the line under consideration".

Airy · July 16, 2024, 3:25pm

I think you understand it, taking Nige's comment into consideration.

Yes, shell scripts are amazingly powerful. I have a new macro that uses shell scripts to help us, but it's so big I need to take some time to test it before I upload it.

Need macro to sort tab-del on length of first column

Options