Break List into Groups [Example]

JMichaelTX · May 10, 2018, 7:15pm

MACRO: Break List into Groups [Example]

~~~ VER: 1.1 2018-05-12 ~~~

2018-05-12 15:50 GMT-0500

Added option to output each Group to a File

DOWNLOAD:

Break List into Groups [Example].kmmacros (23 KB)
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.

This macro was built in response to this request:
Find and cut lines from list to new list based on name

Example Results

ReleaseNotes

Author.@JMichaelTX

PURPOSE:

Separate a List into Variables Based on String at Beginning of Line
- Provide option to Output to Files

REQUIRES:

KM 8.2+

But it can be written in KM 7.3.1+
It is KM8 specific just because some of the Actions have changed to make things simpler, but equivalent Actions are available in KM 7.3.1.
.

macOS 10.11.6 (El Capitan)

KM 8 Requires Yosemite or later, so this macro will probably run on Yosemite, but I make no guarantees.

NOTICE: This macro/script is just an Example

It has had very limited testing.
You need to test further before using in a production environment.
It does not have extensive error checking/handling.
It may not be complete. It is provided as an example to show you one approach to solving a problem.

How To Use

Enable the Action you wish to use to set the Source Data:
- Set Variable (default, and enabled)
- Copy (disabled)
  - If you use this, then first select the text to be used as Source
- Read file (disabled)
Trigger this macro.

It will then sort the data so that lines that begin with the same string are grouped to together.
A RegEx is performed to extract the Groups into separate Variables, named as follows: Local_List<N>
- where <N> is the sequential integer based on Group position in the sorted Source List.

MACRO SETUP

Carefully review the Release Notes and the Macro Actions
- Make sure you understand what the Macro will do.
- You are responsible for running the Macro, not me. ??
  .

Assign a Trigger to this maro..
Move this macro to a Macro Group that is only Active when you need this Macro.
ENABLE this Macro.
.

REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
(all shown in the magenta color)
- Enable ONE of the first 3 Actions to choose your method of setting the Source Data.
- Be sure to DISABLE the other two Actions.
- IF you choose Read File, enter full POSIX path to file
- IF you choose Set SourceList to Text, Enter the list in the Action's text box
- Prompt User for Output Options
  - Change defaults as desired

TAGS: @List @Variables @RegEx

USER SETTINGS:

Any Action in magenta color is designed to be changed by end-user

ACTION COLOR CODES

To facilitate the reading, customizing, and maintenance of this macro,
key Actions are colored as follows:
GREEN -- Key Comments designed to highlight main sections of macro
MAGENTA -- Actions designed to be customized by user
YELLOW -- Primary Actions (usually the main purpose of the macro)
ORANGE -- Actions that permanently destroy Variables or Clipboards,
OR IF/THEN and PAUSE Actions

USE AT YOUR OWN RISK

While I have given this limited testing, and to the best of my knowledge will do no harm, I cannot guarantee it.
If you have any doubts or questions:
- Ask first
- Turn on the KM Debugger from the KM Status Menu, and step through the macro, making sure you understand what it is doing with each Action.

RegEx Details

(?sm)^([^,]+?),.+?(?:(\n(?!\1))|\Z)

For detailed explanation, see:

Uses Negative Lookahead based on string found in first Capture Group.
Match all lines until a line starts with something different from the Capture Group in the first line.

JMichaelTX · May 12, 2018, 8:53pm

Just posted an update to my OP.

JMichaelTX · May 17, 2018, 1:59am

I have just posted a version of this macro which is specifically focused to @BillytheHicks requirements here:

Primary Changes:

No longer sort the source data -- it MUST already be ordered with all lines of a group together.
Changed most Local variables to Global variables to permit use in other macros.
For Even-Numbered Groups, extract the Group Name, TimeCode In, TimeCode Out

ComplexPoint · May 17, 2018, 9:32am

As a footnote, another way of specifying the grouping criterion would be to write something analogous to the following (in an Execute Script action)

Applescript:

-- How do we define equality for grouping ?

-- groupEq :: String -> String -> Bool
on groupEq(a, b)
	item 1 of splitOn(",", a) = item 1 of splitOn(",", b)
end groupEq

or

Javascript:

// groupEq :: String -> String -> Bool
const groupEq = (a, b) =>
	splitOn(',', a)[0] === splitOn(',', b)[0];

and then use a generic and reusable groupBy function, which in JS might look like:

// Typical usage: groupBy(on(eq, f), xs)
// groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
const groupBy = (f, xs) => {
    const dct = xs.slice(1)
        .reduce((a, x) => {
            const h = a.active.length > 0 ? a.active[0] : undefined;
            return h !== undefined && f(h, x) ? {
                active: a.active.concat([x]),
                sofar: a.sofar
            } : {
                active: [x],
                sofar: a.sofar.concat([a.active])
            };
        }, {
            active: xs.length > 0 ? [xs[0]] : [],
            sofar: []
        });
    return dct.sofar.concat(dct.active.length > 0 ? [dct.active] : []);
};

and in AS:

-- Typical usage: groupBy(on(eq, f), xs)
-- groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
on groupBy(f, xs)
	set mf to mReturn(f)
	
	script enGroup
		on |λ|(a, x)
			if length of (active of a) > 0 then
				set h to item 1 of active of a
			else
				set h to missing value
			end if
			
			if h is not missing value and mf's |λ|(h, x) then
				{active:(active of a) & {x}, sofar:sofar of a}
			else
				{active:{x}, sofar:(sofar of a) & {active of a}}
			end if
		end |λ|
	end script
	
	if length of xs > 0 then
		set dct to foldl(enGroup, {active:{item 1 of xs}, sofar:{}}, tail(xs))
		if length of (active of dct) > 0 then
			sofar of dct & {active of dct}
		else
			sofar of dct
		end if
	else
		{}
	end if
end groupBy

A full AS example below (JS is a bit briefer, but works the same way):

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

-- How do we define equality for grouping ?

-- groupEq :: String -> String -> Bool
on groupEq(a, b)
	item 1 of splitOn(",", a) = item 1 of splitOn(",", b)
end groupEq

on run
    set strList to "Christi,19946752,20194048\nBrandon,20194048,20369664\nChristi,20369664,20520192\nChristi,20520192,20631296\nBrandon,20745984,20980736\nJoe-010,28729383,28733432\nBrandon,22341211,22443280\nJoe-010,23488449,23499482\nChristi,20123984,20124432"
    
    groupBy(groupEq, sort(|lines|(strList)))
    
    --> {{"Brandon,20194048,20369664", "Brandon,20745984,20980736", "Brandon,22341211,22443280"}, {"Christi,19946752,20194048", "Christi,20123984,20124432", "Christi,20369664,20520192", "Christi,20520192,20631296"}, {"Joe-010,23488449,23499482", "Joe-010,28729383,28733432"}}
    
end run

-- REUSABLE GENERIC FUNCTIONS ------------------------------------------------------------

-- foldl :: (a -> b -> a) -> a -> [b] -> a
on foldl(f, startValue, xs)
    tell mReturn(f)
        set v to startValue
        set lng to length of xs
        repeat with i from 1 to lng
            set v to |λ|(v, item i of xs, i, xs)
        end repeat
        return v
    end tell
end foldl

-- Typical usage: groupBy(on(eq, f), xs)
-- groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
on groupBy(f, xs)
    set mf to mReturn(f)
    
    script enGroup
        on |λ|(a, x)
            if length of (active of a) > 0 then
                set h to item 1 of active of a
            else
                set h to missing value
            end if
            
            if h is not missing value and mf's |λ|(h, x) then
                {active:(active of a) & {x}, sofar:sofar of a}
            else
                {active:{x}, sofar:(sofar of a) & {active of a}}
            end if
        end |λ|
    end script
    
    if length of xs > 0 then
        set dct to foldl(enGroup, {active:{item 1 of xs}, sofar:{}}, tail(xs))
        if length of (active of dct) > 0 then
            sofar of dct & {active of dct}
        else
            sofar of dct
        end if
    else
        {}
    end if
end groupBy

-- Lift 2nd class handler function into 1st class script wrapper
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn

-- lines :: String -> [String]
on |lines|(xs)
    paragraphs of xs
end |lines|

-- map :: (a -> b) -> [a] -> [b]
on map(f, xs)
    tell mReturn(f)
        set lng to length of xs
        set lst to {}
        repeat with i from 1 to lng
            set end of lst to |λ|(item i of xs, i, xs)
        end repeat
        return lst
    end tell
end map

-- sort :: Ord a => [a] -> [a]
on sort(xs)
    ((current application's NSArray's arrayWithArray:xs)'s ¬
        sortedArrayUsingSelector:"compare:") as list
end sort

-- splitOn :: String -> String -> [String]
on splitOn(strDelim, strMain)
    set {dlm, my text item delimiters} to {my text item delimiters, strDelim}
    set xs to text items of strMain
    set my text item delimiters to dlm
    return xs
end splitOn

-- tail :: [a] -> [a]
on tail(xs)
    if xs = {} then
        missing value
    else
        rest of xs
    end if
end tail

ComplexPoint · May 17, 2018, 2:02pm

Full Javascript version:

(() => {
    'use strict';

    const main = () => {
        const strText = 'Christi,19946752,20194048\nBrandon,20194048,20369664\nChristi,20369664,20520192\nChristi,20520192,20631296\nBrandon,20745984,20980736\nJoe-010,28729383,28733432\nBrandon,22341211,22443280\nJoe-010,23488449,23499482\nChristi,20123984,20124432'

        return groupBy(
            (a, b) => splitOn(',', a)[0] === splitOn(',', b)[0],
            sort(lines(strText))
        );
    };
    
    // --> [["Brandon,20194048,20369664", "Brandon,20745984,20980736", "Brandon,22341211,22443280"], ["Christi,19946752,20194048", "Christi,20123984,20124432", "Christi,20369664,20520192", "Christi,20520192,20631296"], ["Joe-010,23488449,23499482", "Joe-010,28729383,28733432"]]
    

    // REUSABLE GENERIC FUNCTIONS -------------------------

    // Typical usage: groupBy(on(eq, f), xs)
    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = (f, xs) => {
        const dct = xs.slice(1)
            .reduce((a, x) => {
                const h = a.active.length > 0 ? a.active[0] : undefined;
                return h !== undefined && f(h, x) ? {
                    active: a.active.concat([x]),
                    sofar: a.sofar
                } : {
                    active: [x],
                    sofar: a.sofar.concat([a.active])
                };
            }, {
                active: xs.length > 0 ? [xs[0]] : [],
                sofar: []
            });
        return dct.sofar.concat(dct.active.length > 0 ? [dct.active] : []);
    };

    // lines :: String -> [String]
    const lines = s => s.split(/[\r\n]/);

    // sort :: Ord a => [a] -> [a]
    const sort = xs => xs.slice()
        .sort((a, b) => a < b ? -1 : (a > b ? 1 : 0));

    // splitOn :: String -> String -> [String]
    const splitOn = (needle, haystack) =>
        haystack.split(needle)

    // MAIN ---
    return main();
})();

or, equivalently:

(() => {
    'use strict';

    const main = () => {
        const strText = 'Christi,19946752,20194048\nBrandon,20194048,20369664\nChristi,20369664,20520192\nChristi,20520192,20631296\nBrandon,20745984,20980736\nJoe-010,28729383,28733432\nBrandon,22341211,22443280\nJoe-010,23488449,23499482\nChristi,20123984,20124432'

        return groupBy(
            on(eq, compose(fst, splitOn(','))),
            sort(lines(strText))
        );
    };

    // --> [["Brandon,20194048,20369664", "Brandon,20745984,20980736", "Brandon,22341211,22443280"], ["Christi,19946752,20194048", "Christi,20123984,20124432", "Christi,20369664,20520192", "Christi,20520192,20631296"], ["Joe-010,23488449,23499482", "Joe-010,28729383,28733432"]]


    // REUSABLE GENERIC FUNCTIONS -------------------------

    // compose :: (b -> c) -> (a -> b) -> a -> c
    const compose = (f, g) => x => f(g(x));

    // eq (==) :: Eq a => a -> a -> Bool
    const eq = (a, b) => {
        const t = typeof a;
        return t !== typeof b ? (
            false
        ) : t !== 'object' ? (
            a === b
        ) : (() => {
            const aks = Object.keys(a);
            return aks.length !== Object.keys(b).length ? (
                false
            ) : aks.every(k => eq(a[k], b[k]));
        })();
    };

    // fst :: (a, b) -> a
    const fst = tpl => tpl[0];

    // Typical usage: groupBy(on(eq, f), xs)
    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = (f, xs) => {
        const dct = xs.slice(1)
            .reduce((a, x) => {
                const h = a.active.length > 0 ? a.active[0] : undefined;
                return h !== undefined && f(h, x) ? {
                    active: a.active.concat([x]),
                    sofar: a.sofar
                } : {
                    active: [x],
                    sofar: a.sofar.concat([a.active])
                };
            }, {
                active: xs.length > 0 ? [xs[0]] : [],
                sofar: []
            });
        return dct.sofar.concat(dct.active.length > 0 ? [dct.active] : []);
    };

    // lines :: String -> [String]
    const lines = s => s.split(/[\r\n]/);

    // e.g. sortBy(on(compare,length), xs)
    // on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
    const on = (f, g) => (a, b) => f(g(a), g(b));

    // sort :: Ord a => [a] -> [a]
    const sort = xs => xs.slice()
        .sort((a, b) => a < b ? -1 : (a > b ? 1 : 0));

    // splitOn :: String -> String -> [String]
    const splitOn = needle => haystack =>
        haystack.split(needle)

    // MAIN ---
    return main();
})();

ComplexPoint · May 17, 2018, 2:26pm

Finally, grouping is quite an expensive operation for very long lists, and it can be speeded up a little by using a sortOn pattern (decorate -> sort -> groupBy -> undecorate), so that the value extraction function (
in this case
compose(fst, splitOn(',')) or
x => splitOn(',', x)[0]
)

is only applied once to each item in the list:

(() => {
    'use strict';

    const main = () => {
        const strText = 'Christi,19946752,20194048\nBrandon,20194048,20369664\nChristi,20369664,20520192\nChristi,20520192,20631296\nBrandon,20745984,20980736\nJoe-010,28729383,28733432\nBrandon,22341211,22443280\nJoe-010,23488449,23499482\nChristi,20123984,20124432'

        return groupSortOn(
            compose(fst, splitOn(',')),
            lines(strText)
        );
    };

    // --> [["Brandon,20194048,20369664", "Brandon,20745984,20980736", "Brandon,22341211,22443280"], ["Christi,19946752,20194048", "Christi,20123984,20124432", "Christi,20369664,20520192", "Christi,20520192,20631296"], ["Joe-010,23488449,23499482", "Joe-010,28729383,28733432"]]


    // REUSABLE GENERIC FUNCTIONS -------------------------

    // Tuple (,) :: a -> b -> (a, b)
    const Tuple = (a, b) => ({
        type: 'Tuple',
        '0': a,
        '1': b,
        length: 2
    });

    // compare :: a -> a -> Ordering
    const compare = (a, b) => a < b ? -1 : (a > b ? 1 : 0);

    // compose :: (b -> c) -> (a -> b) -> a -> c
    const compose = (f, g) => x => f(g(x));

    // eq (==) :: Eq a => a -> a -> Bool
    const eq = (a, b) => {
        const t = typeof a;
        return t !== typeof b ? (
            false
        ) : t !== 'object' ? (
            a === b
        ) : (() => {
            const aks = Object.keys(a);
            return aks.length !== Object.keys(b).length ? (
                false
            ) : aks.every(k => eq(a[k], b[k]));
        })();
    };

    // flatten :: NestedList a -> [a]
    const flatten = t =>
        Array.isArray(t) ? (
            [].concat.apply([], t.map(flatten))
        ) : t;

    // fst :: (a, b) -> a
    const fst = tpl => tpl[0];

    // Typical usage: groupBy(on(eq, f), xs)
    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = (f, xs) => {
        const dct = xs.slice(1)
            .reduce((a, x) => {
                const h = a.active.length > 0 ? a.active[0] : undefined;
                return h !== undefined && f(h, x) ? {
                    active: a.active.concat([x]),
                    sofar: a.sofar
                } : {
                    active: [x],
                    sofar: a.sofar.concat([a.active])
                };
            }, {
                active: xs.length > 0 ? [xs[0]] : [],
                sofar: []
            });
        return dct.sofar.concat(dct.active.length > 0 ? [dct.active] : []);
    };

    // Sort and group a list by comparing the results of a key function
    // applied to each element. groupSortOn f is equivalent to
    // groupBy eq $ sortBy (comparing f),
    // but has the performance advantage of only evaluating f once for each
    // element in the input list.
    // This is a decorate-(group.sort)-undecorate pattern, as in the
    // so-called 'Schwartzian transform'.
    // Groups are arranged from from lowest to highest.
    // groupSortOn :: Ord b => (a -> b) -> [a] -> [a]
    // groupSortOn :: Ord b => [((a -> b), Bool)]  -> [a] -> [a]
    const groupSortOn = (f, xs) => {
        // Functions and matching bools derived from argument f
        // which is a single key function
        const fsbs = unzip(
                flatten([f])
                .reduceRight((a, x) =>
                    typeof x === 'boolean' ? {
                        asc: x,
                        fbs: a.fbs
                    } : {
                        asc: true,
                        fbs: [
                            [x, a.asc]
                        ].concat(a.fbs)
                    }, {
                        asc: true,
                        fbs: []
                    })
                .fbs
            ),
            [fs, bs] = [fsbs[0], fsbs[1]],
            iLast = fs.length;
        // decorate-sort-group-undecorate
        return groupBy(
                (p, q) => p[0] === q[0],
                sortBy(
                    mappendComparing(
                        // functions that access pre-calculated values by position
                        // in the decorated ('Schwartzian') version of xs
                        zip(fs.map((_, i) => x => x[i]), bs)
                    ), xs.map( // xs decorated with precalculated key function values
                        x => fs.reduceRight(
                            (a, g) => [g(x)].concat(a), [
                                x
                            ]
                        )
                    )
                )
            )
            .map(gp => gp.map(x => x[iLast])); // undecorated version of data, post sort
    };

    // lines :: String -> [String]
    const lines = s => s.split(/[\r\n]/);

    // mappendComparing :: [((a -> b), Bool)] -> (a -> a -> Ordering)
    const mappendComparing = fboolPairs =>
        (x, y) => fboolPairs.reduce(
            (ordr, fb) => {
                const f = fb[0];
                return ordr !== 0 ? (
                    ordr
                ) : fb[1] ? (
                    compare(f(x), f(y))
                ) : compare(f(y), f(x));
            }, 0
        );

    // e.g. sortBy(on(compare,length), xs)
    // on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
    const on = (f, g) => (a, b) => f(g(a), g(b));

    // sortBy :: (a -> a -> Ordering) -> [a] -> [a]
    const sortBy = (f, xs) =>
        xs.slice()
        .sort(f);

    // splitOn :: String -> String -> [String]
    const splitOn = needle => haystack =>
        haystack.split(needle)

    // unzip :: [(a,b)] -> ([a],[b])
    const unzip = xys =>
        xys.reduce(
            (a, x) => Tuple.apply(null, [0, 1].map(
                i => a[i].concat(x[i])
            )),
            Tuple([], [])
        );

    // zip :: [a] -> [b] -> [(a, b)]
    const zip = (xs, ys) =>
        xs.slice(0, Math.min(xs.length, ys.length))
        .map((x, i) => Tuple(x, ys[i]));

    // MAIN ---
    return main();
})();