How to convert this glossary to a sensible format?

I have found this automotive glossary on the internet and --alas-- it has a very impractical format:


I'd like to convert this glossary to a usable format where all source term components are resolved and source term and target term are separated by a tab character:

What would be the best approach to do this?
test-files.zip (1.8 KB)

There seem to be two issues:

  1. preliminary unwrapping of overflowed lines (...bagage-afdekplaat), then
  2. filling column gaps from values in the preceding line.

Pattern two is a "map-accumulation" ( mapping an existing set of lines to a corresponding new set, with that mapping affected by continuously accumulating a memory of the preceding line)

It could probably be done more "natively" – (apparently this means without scripting actions) – though splitting on more than one character is not very directly supported in Keyboard Maestro at this time, but using a JS action I might sketch something like this:

Gaps filled from preceding line.kmmacros (5.6 KB)


Expand disclosure triangle to view JS source
return (() => {
    "use strict";

    // Rob Trew @2024
    // Ver 0.2

    // Gaps in current line filled from previous line.
    const main = () => {
        const
            xs = overflowUnwrapped(
                lines(kmvar.local_Source)
            )
            .map(
                x => x.split(/\s*-\s+/u)
            );

        return mapAccumL(previousLine => columns => {
            const
                // Any gaps filled by terms from
                // preceding line.
                filled = columns.map(
                    (x, i) => x
                        ? x
                        : previousLine[i]
                );

            return Tuple(
                init(filled.flatMap(
                    s => s
                        ? s.split(/\s+/u)
                        : []
                ))
            )(
                `${init(filled).join(" ")}\t${last(filled)}`
            );
        })([])(xs)[1]
        .join("\n");
    };


    const overflowUnwrapped = xs => {
        const rgx = / - /u;

        return zip(xs)(xs.slice(1))
        .flatMap(
            ([a, b]) => !rgx.test(a)
                ? []
                : !rgx.test(b)
                    ? [`${a.trim()} ${b.trim().replace(/^\s+/u, "")}`]
                    : [a]
        );
    };


    // --------------------- GENERIC ---------------------

    // Tuple (,) :: a -> b -> (a, b)
    const Tuple = a =>
    // A pair of values, possibly of
    // different types.
        b => ({
            type: "Tuple",
            "0": a,
            "1": b,
            length: 2,
            *[Symbol.iterator]() {
                for (const k in this) {
                    if (!isNaN(k)) {
                        yield this[k];
                    }
                }
            }
        });


    // init :: [a] -> [a]
    const init = xs =>
    // All elements of a list except the last.
        0 < xs.length
            ? xs.slice(0, -1)
            : null;


    // last :: [a] -> a
    const last = xs =>
    // The last item of a list.
        0 < xs.length
            ? xs.slice(-1)[0]
            : null;


    // lines :: String -> [String]
    const lines = s =>
    // A list of strings derived from a single string
    // which is delimited by \n or by \r\n or \r.
        0 < s.length
            ? s.split(/\r\n|\n|\r/u)
            : [];


    // mapAccumL :: (acc -> x -> (acc, y)) -> acc ->
    // [x] -> (acc, [y])
    const mapAccumL = f =>
    // A tuple of an accumulation and a list
    // obtained by a combined map and fold,
    // with accumulation from left to right.
        acc => xs => [...xs].reduce(
            ([a, bs], x) => second(
                v => [...bs, v]
            )(
                f(a)(x)
            ),
            [acc, []]
        );


    // second :: (a -> b) -> ((c, a) -> (c, b))
    const second = f =>
        // A function over a simple value lifted
        // to a function over a tuple.
        // f (a, b) -> (a, f(b))
        xy => Tuple(
            xy[0]
        )(
            f(xy[1])
        );


    // zip :: [a] -> [b] -> [(a, b)]
    const zip = xs =>
        // The paired members of xs and ys, up to
    // the length of the shorter of the two lists.
        ys => Array.from({
            length: Math.min(xs.length, ys.length)
        }, (_, i) => Tuple(xs[i])(ys[i]));


    // MAIN
    return main();
})();
1 Like

Terrific, Rob. Many thanks, I'll post the converted glossary to the translator community. They can manually correct the remaining errors.

1 Like