Regex for Search Variable to find closest line with all-caps to another variable

I can’t figure out how to get the regex to do what I want for a Search Variable action. I have a long multi-line text variable where sometimes the letters in a line are in all-caps, and each line may have characters other than letters and some lines may be empty. I am trying to write a regex to find another variable (that is not all-caps and may or may not match the exact case) within the variable, as well as the lines above it up to the closest line that is in all-caps.

For instance, suppose variable A is:

ABCD1.
baseball bat
carla jones
r2d2
Samantha Who
jersey-mike

CAT
a

Rita Diaz-T’wine
31.2 men
FREE
carlton
oak Tree
Lake Titicaca
LALA
francis
Session 9 Days
oak Tree
beetle mania

Now, suppose variable B is:

oak tree

I am trying to write a regex to find:

FREE
carlton
oak Tree

The only thing I can consistently know is variable B. I tried case sensitively writing:

\n([^a-z][^a-z]*?\n[\s\S]*?(?i)%Variable%b%)

However, this finds the first instance of an all-caps line and includes everything from it to the variable found, so it includes all lines from "ABCD1." to the first "oak Tree".

Next I tried adding [\s\S]* to the front of the regex, but that skipped to the end and found the lines from "LALA" to the second instance of "oak Tree".

Is there any way to properly accomplish what I am trying to do?

All you need is to:

  1. read the text to a list of lines
  2. split to a list of lists on any line for which an isAllCaps test returns true
  3. harvest from the beginning of each sub-list formed by a split.

Introducing regular expressions here just adds a further problem.

Much easier in any scripting language (Python, JS etc etc) which provides (or supports definition of) splitting functions.

Another approach would be to use a grouping function. The Python itertools module, for example provides itertools.groupby, and below we define something for ourselves using JavaScript.

You can make use of Keyboard Maestro's very convenient and powerful %JSONValue% token if you return the result in a JSON-stringified form:

local_Harvest.FREE[2]  :: "oak Tree"

{
  "CAT": [
    "a",
    "Rita Diaz-T’wine",
    "31.2 men"
  ],
  "FREE": [
    "carlton",
    "oak Tree",
    "Lake Titicaca"
  ],
  "LALA": [
    "francis",
    "Session 9 Days",
    "oak Tree",
    "beetle mania"
  ]
}

Group lines on the value they return to an isAllCaps test.kmmacros (6.6 KB)


Expand disclosure triangle to view JS source
return (() => {
    "use strict";

    // main IO ()
    const main = () => {
        const
            groups = groupOn(isAllCaps)(
                lines(kmvar.local_Source).flatMap(s => {
                    const token = s.trim();

                    return 0 < token.length
                        ? [token]
                        : [];
                })
            );

        return JSON.stringify(
            zip(groups)(groups.slice(1))
            .reduce(
                (dict, [x, y]) => isAllCaps(x[0])
                    ? Object.assign(
                        dict,
                        {[x[0]]: y}
                    )
                    : dict,
                {}
            ),
            null, 2
        );
    };


    // --------------------- GENERIC ---------------------

    const isAllCaps = s =>
        [...s].every(isUpper);


    // lines :: String -> [String]
    const lines = s =>
    // A list of strings derived from a single string
    // which is delimited by \n or by \r\n or \r.
        0 < s.length
            ? s.split(/\r\n|\n|\r/u)
            : [];


    // isUpper :: Char -> Bool
    const isUpper = c =>
    // True if c is an upper case character.
        (/[A-Z]/u).test(c);


    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = eqOp =>
    // A list of lists, each containing only elements
    // equal under the given equality operator, such
    // that the concatenation of these lists is xs.
        xs => 0 < xs.length
            ? (() => {
                const [h, ...t] = xs;
                const [groups, g] = t.reduce(
                    ([gs, a], x) => eqOp(a[0])(x)
                        ? [gs, [...a, x]]
                        : [[...gs, a], [x]],
                    [[], [h]]
                );

                return [...groups, g];
            })()
            : [];


    // groupOn :: (a -> b) -> [a] -> [[a]]
    const groupOn = f =>
        // A list of lists, each containing only elements
        // which return equal values for f,
        // such that the concatenation of these lists is xs.
        xs => 0 < xs.length
            ? groupBy(a => b => a[0] === b[0])(
                xs.map(x => [f(x), x])
            )
            .map(gp => gp.map(ab => ab[1]))
            : [];


    // zip :: [a] -> [b] -> [(a, b)]
    const zip = xs =>
        // The paired members of xs and ys, up to
        // the length of the shorter of the two lists.
        ys => Array.from({
            length: Math.min(xs.length, ys.length)
        }, (_, i) => [xs[i], ys[i]]);

  
    return main();
})();
2 Likes

Thanks for the help! Unfortunately, I do not know any scripting languages except the basic regex used for Keyboard Maestro so all of that looks very complex to me and it may take me a long time to understand it. I can try to incorporate the macro you provided and if it works I suppose it doesn't matter if I understand what's going on. The only catch would be that I couldn't alter anything in it because I wouldn't know how so if there's any problems I couldn't troubleshoot effectively or if what I need changes at all I wouldn't know how to update it. Still, I'm going to try it out.

Since there's been no other suggestions and based on what you said, it seems like the basic regex can't really do what I want simply. Over the last day I did think of a few more complicated ways to accomplish this with regex. One option is to search Variable A for Variable B ignoring case and save everything from the beginning of Variable A to the first instance of Variable B to another variable (e.g. Variable C):

^[\s\S]*?%Variable%B%

Then, I can search Variable C case sensitively using the regex I provided that skips to the last possible instance of an all-caps line before the last instance of Variable B, which now wouldn't matter because Variable C would have only the one instance of Variable B:

[\s\S]*\n([^a-z][^a-z]*?\n[\s\S]*?(?i)%Variable%B%)

If I need to find multiple instances in the same text, I can then search Variable A ignoring case for the first instance of Variable B and save all the text after that first instance back to Variable A, thus erasing the first lines that included the pertinent text I'd already found:

%Variable%B%.*?\n([\s\S]*?)$

Then I can repeat the process until there's no more instances of Variable B left in Variable A.

This isn't an ideal solution but I did want to prove to myself I could think of a way to do this with regex.

@sursur Could you tell us about the end goal ? What are you trying to accomplish ?

I don't know if it will change much in the way of being able to help me by going into the details, but it's basically that I'm going to be given 5-10 long lists at a time that consists of companies (in all caps) followed by people that work for them after and sometimes the peoples' names are all lowercase and sometimes the first letters are capitalised. I need to find each time whether a person (Variable B) is in the lists and if so, the company they work for. The same person's name may be listed for multiple companies and this may be needed in the future but for now I mostly need to find the first instance that their name is listed in each list.

I've considered two separate approaches to going through the 5-10 lists each time. One is to save them altogether to the same variable and then search for all the instances of the person's name and then figure out a way to discern the first instance (if any instances) of the person's name in each list. This would be more seamless but unfortunately Keyboard Maestro/my computer cannot handle such a large variable too well, so the programme kept freezing when doing the Search Variable action. Thus, I've split every list to its own variable and am searching them one at a time for each group of lists I do. Even doing this, the variables are still large and the programme runs slowly, but at least it doesn't freeze. I'm also finding that when searching these large variables that if I search for more than one thing at a time in the Search Variable action such as:

(%Variable%B%)[\s\S]*?(%Variable%B%)

that it slows down even more or is more likely to freeze, so I've been trying to just find one thing at a time here. With these constraints, this is the best I've been able to come up with. I mainly posted because I thought that if there's a way in regex to find the first instance of an all-caps line down past other all-caps lines to first instance Variable B, and there's a way to find the last instance of the closest all-caps line to the last instance of Variable B, that there's likely to be an easy way to find the first instance of the closest all-caps line to the first instance of Variable B that I'm somehow missing... but I guess not lol.

Here's a regexp that identifies the three lines in your example source that start with the all caps line and ends two lines later:

^([A-Z0-9.])+\n.+?\n(.+?)\n

It captures the first line and the last line (but you could capture anything you want), so you can assign each line to a variable and evaluate them. You'd ask if the second variable matched your %Variable%B%, for example, and if it did, you'd know the person worked for the first match.

If your data is too large for a variable, you can use the regexp on a file. In that case, I do something in Perl, slurping the file rather than reading it line by line.

But here's a macro to get you started:

Regex Rescue Macro (v11.0.2)

Regex Rescue.kmmacros (6.4 KB)

Thanks! There's a lot to unpack there.

I'm not sure the first code (^([A-Z0-9.])+\n.+?\n(.+?)\n) would work for what I need. It didn't work when I tried it. If I'm reading it correctly, the ^ would need to be dropped because with it, it would only match the very first line of the variable to the company or else none at all, right? Wouldn't it need to start with (?:^|\n) instead?

The ([A-Z0-9.])+\n is not making sense to me exactly either. The + outside the parentheses would make it only capture the first character of the line, right? So wouldn't it need to be inside the parentheses? Also, I should have been clearer, but the company lines could have almost any character in them except lower-case letters. So there may be numbers, periods, spaces, commas, ampersands, dashes, etc., and there will never be a lower-case letter, but there should always be at least one capital letter. This is why I went with [^a-z] but if there's a reason why what you posted is better please let me know because I still have a lot to learn I'm sure. I do see that [^a-z\n] now might be safer. From yours I see that + on only one is better and simpler than the two with *? on the second one that I had.

The person's name will not always be exactly two lines after the company name. It may be the very next line after it (which will be the case most often), or the line after that, or multiple lines after. Most of the time the companies only have 1-3 people listed under each, but some have many more. The most I've seen so far is about 50 people(/lines) under one company, and the person I'm looking for could be anywhere in that group. Some lines are empty and a few lines have info other than people or companies, although both of these instances are rare. When I'm searching, I prefer to capture everything from the company name to the person so I can double-check nothing is amiss in the intervening lines (such as the correct company was skipped over somehow) and that the find is correct before proceeding.

My data even before this has often been too large for a variable. I've found creative ways around it, but it's often a problem for me. I'm going to have to look into this regex on a file thing. I know nothing about Perl or slurping right now, but I want to look into it along with the Python/JavaScript/JSON scripting that user ComplexPoint provided. I've honestly been meaning to branch out past Keyboard Maestro regex to learn more related scripting/coding things for a long while now but I never seem to have the time to really dig in in any meaningful or lasting way.

Looking at your macro, first I notice you use a very similar hotkey as I usually use, lol. Mine is usually option-left arrow. The macro looks mostly good but I'm still not sure about the regex in the search variable. It didn't work the first time, so I switched the first part ^([A-Z]+?)\n with (?:^|\n)([^a-z\n]+)\n. Then, \n(.+?)\n only finds one line, which means the person must be on the second line past the company name or it won't find it. If I change the . to [\s\S] so that it can find multiple lines, then it has the exact same problem my original regex does of finding the very first all-caps line and everything past it (including other all-caps lines too) up to the person's name.

Also, why do you use the flag (?m)? I've never used it before but I see it's for multiline, but the regex seems to do the same thing whether or not I include it.

That's fatal, sorry.

The sample macro shows you how to identify a multiline pattern (which is what the (?m) is for) and capture elements of it in different groups. But if there's no pattern, if won't work. So, for example, it does find the default name (if you just run what I gave you without changing anything) but it won't find anyone else.

My suggestion would be to read the file line by line (I'd do it in Perl), saving a line that matches your criteria for a company name (although no lowercase is not much of a criteria) while looking line-by-line for your employee name. If you find a match for that name, the last saved company name would match.

That scheme doesn't care if there's a name on the second or third line or if there are 50 names ahead of it.

Here's a test of the sample data that returns the company names. But I doubt it will work for all the variations you have.

Companies Macro (v11.0.2)

Companies.kmmacros (5.0 KB)

And here's an example of the save-the-name-for-a-match approach:

Companies Macro (v11.0.2)

Companies.kmmacros (6.3 KB)

That finds oak Tree and carla jones but there are other issues (why isn't FREE seen as a company name?) to work out.

I'm still not understanding this. Won't a regex/search find a multiline pattern even without the (?m) flag?

I will look into that. I do remember when learning regex that I learned some Perl. I think regex is based on Perl? I can't remember. I just vaguely recall thinking I should understand Perl better than others, but I haven't looked at anything non-regex since I first learned it all many years ago.

Thanks for these. You're giving me ideas on how to adjust the macro to make it better.

I have finally thought of a regex that would find exactly what I want, but it's extremely clunky, requires knowing the maximum possible number of lines, and would probably cause Keyboard Maestro to freeze or at best work very slowly. Here's the idea:

\n([^a-z]*?[A-Z][^a-z]*?)\n((?:\n|[^A-Z\n]+?\n|[^\n]*?[a-z][^\n]*?\n)?(?:\n|[^A-Z\n]+?\n|[^\n]*?[a-z][^\n]*?\n)?)?(?i)%Variable%B%

This would first find a line with at least one upper-case letter in it and no lower-case letters in it, the company name.

Next, it would find lines that either (a) are empty, (b) have characters but no upper-case letters such as a line of all digits, or (c) have at least one lower-case letter. Or, it would just skip this step if there are no lines in between.

Finally, it would find the line with the Variable B, the person's name I'm looking for.

The problem is that I must include a possibility for every line in between the company and person's name. This would require me to know the maximum possible number of lines in between, which currently is about 50 but could be more. Even if I knew the maximum were 50, the regex would be ridiculous because the example I just used only allowed for up to two lines in between. This part is for one possible line:

(?:\n|[^A-Z\n]+?\n|[^\n]*?[a-z][^\n]*?\n)?

I wrote it twice above, but if I were to use it knowing I could have up to 50 lines in between, I'd have to write that 50 times in the regex... which would probably make Keyboard Maestro freeze. However, I have just proven it is theoretically possible to write a regex for what I want.

You're welcome. I think you'd be wise to spend your time improving the last macro I gave you. It's not only efficient but readable.

Your long regexp will fail on the first company name. I didn't analyze it beyond that. I think I can be forgiven.

Probably fair to say that Regular Expressions constitute a write-only language :slight_smile:

I suppose it's possible that there are people who are born with an innate knowledge of Kleene algebra and regular expressions, but on the whole I think it's more likely that they have to put time into learning them.

The rewards of putting that same time into learning any Turing-complete ( and readable ) scripting language are much greater, and the time wasted in subsequent debugging very much less ...

A tiny subset of JavaScript or Python will get you much further (and will let you write things that are much more readable)

1 Like

Why would it fail?

I had to google what a write-only language is, but I think you're right. If you think that example is bad, you should see some of the regexes I've made over the years. Practically monstrous!

To be fair to me though, I don't think the example I've written here is that bad. Over the years I've had to google a lot to find info on regexes and I've found some very long, very complex regexes written by others and posted to the internet to help others, that others in their respective forums were regarding as good and easy to understand. I think I may just have a tough crowd here lol.

I started with Keyboard Maestro as a complete coding/scripting/programming novice. I only started learning regex because it was basically the langue du jour of Keyboard Maestro. I had always meant to start learning some scripting language after I got familiar with KM and regex since KM also had the power to let you use scripting language on certain actions.

I remember first thinking I'd learn Applescript since it made sense - I use Apple products, KM is made for Apple, etc. But then I remember somehow finding out that very few people use it, even those using Apple products. Regardless, I got too busy and never even fully mastered regex (although I'm starting to realise that almost no one masters any of this coding stuff, they just get better and better at googling what they need), so I never got around to picking a scripting language and starting, and just made everything work with regex if at all possible.

I just feel so overwhelmed on where to start with learning a coding language. Which one should I choose to start with? Which one will best work with KM? Where do I go to start learning?

That's very understandable, but if you've got that far with regular expressions, whose limitations quickly multiply complexity, you will get much further, more easily, with a better-equipped language which comes provided with fully-charged batteries.

  • JavaScript or Python would be two obvious choices, and both would work well. The specific fit might vary with the type of work you do.
  • JavaScript can be used on the web, as well as on macOS and iOS/iPadOS (AppleScript doesn't work on phones or iPads)
  • Keyboard Maestro has a built-in Execute JavaScript for Automation action, which gives you access to the same osascript libraries for Mac Scripting as ApplesScript.

  • Since Keyboard Maestro 11, access to KM variables through Execute JavaScript for Automation actions has become easier than through Execute AppleScript actions.

  • Keyboard Maestro also provides very good support for JSON, which, as you know, is a textual representation of structured JavaScript data (See the %JSONValue% token, for example)

The first 8 chapters of Eloquent JavaScript are relevant,

and the MDN pages are always good for looking things up.

Note that you don't need for Mac scripting you don't need the Web libraries for JS (i.e. you don't need to go near the DOM or Document Object Model web page interface).

You really just need to understand the basic types of data:

  • String
  • Number
  • Array (lists of Strings or Numbers etc)
  • Dictionaries of Key:Value pairs, which JS calls Objects

And most of what you do will be served well by Arrays, and by their built-in bag of tricks:

  • Array.map
  • Array.filter
  • Array.reduce
  • Array.flatMap

(which spare you from having to mess with "loops" and mutable variables)


There are also people here ( @unlocked2412 comes to mind), who have learned JavaScript specifically for macOS scripting, and who might have thoughts on:

  • What particular subset of JS makes things easiest (you certainly don't need it all), and
  • what approach and materials work best for experimenting with it.
1 Like

And of course JavaScript String values come ready-equipped with a bag of tricks too.

If we give a constant name to your source String above:

const source = `ABCD1.
                baseball bat
                carla jones
                r2d2
                Samantha Who
                jersey-mike

                CAT
                a

                Rita Diaz-T’wine
                31.2 men
                FREE
                carlton
                oak Tree
                Lake Titicaca
                LALA
                francis
                Session 9 Days
                oak Tree
                beetle mania`;

and use String.split to get a list (Array) of lines:

const separateLines = source.split("\n");

then separateLines will look like this:

[
  "ABCD1.",
  "                baseball bat",
  "                carla jones",
  "                r2d2",
  "                Samantha Who",
  "                jersey-mike",
  "",
  "                CAT",
  "                a",
  "",
  "                Rita Diaz-T’wine",
  "                31.2 men",
  "                FREE",
  "                carlton",
  "                oak Tree",
  "                Lake Titicaca",
  "                LALA",
  "                francis",
  "                Session 9 Days",
  "                oak Tree",
  "                beetle mania"
]

but we probably don't want that leading whitespace, so let's give a name to a version of the list in which each line is trimmed.

We can define our trimmed copy (or 'map') like this:

const trimmedLines = separateLines.map(
    line => line.trim()
);

and trimmedLines looks like this:

[
  "ABCD1.",
  "baseball bat",
  "carla jones",
  "r2d2",
  "Samantha Who",
  "jersey-mike",
  "",
  "CAT",
  "a",
  "",
  "Rita Diaz-T’wine",
  "31.2 men",
  "FREE",
  "carlton",
  "oak Tree",
  "Lake Titicaca",
  "LALA",
  "francis",
  "Session 9 Days",
  "oak Tree",
  "beetle mania"
]

We can also make use of your existing knowledge of regular expressions.

We're going to need to define a test for whether a string is all upper case.

Let's start with a test which returns true if a single character is upper case, and otherwise returns false.

const isUpper = c =>
    // True if c is an upper case character.
    (/[A-Z]/u).test(c);

and now let's use that test as a Lego brick to build another test which returns true if and only if all of the characters in a string are upper case:

const isAllUpper = someString =>
    [...someString].every(isUpper);

Where [...someString] breaks the string down into a list (Array) of individual characters, and JS Arrays have a built-in .every(test) function which return true if and only if every item in the list passes the test.

Now we can try out JavaScript Array.filter, applying our isAllUpper test.

We can generally define the sub-list of those lines which are all upper case as follows:

const onlyLinesWhichAreNonEmptyAndUpperCase = someLines =>
    someLines.filter(
        line => line.length > 0 && isAllUpper(line)
    );

Let's give a name to the filtered version of our trimmed list:

const justUpperCaseLines = onlyLinesWhichAreNonEmptyAndUpperCase(
    trimmedLines
);

It looks like this:

[
  "CAT",
  "FREE",
  "LALA"
]

If we generally wanted a Key:Value dictionary (JS Object) in which some chosen words are the keys, and the value of each key is, for the moment, an empty list, we could define the pattern as something like:

const dictionaryFromWords = chosenWords =>
    Object.fromEntries(
        chosenWords.map(
            word => [word, []]
        )
    );

An initially empty dictionary based our upper case words could be defined as:

const emptyDictionary = dictionaryFromWords(
    justUpperCaseLines
);

and emptyDictionary looks like this:

{
  "CAT": [],
  "FREE": [],
  "LALA": []
}

How would we define a filled dictionary in terms of

  1. A dictionary in which the keys are the empty headings, and
  2. our ordered list of all the terms

We could define it in terms of Array.reduce, which takes a list, and a starter value, and returns a summary value to which every item in the list has contributed.

Here, our start value is a pair of things:

  1. The dictionary with headers but no entries
  2. The current header (initially we don't have one, so just an empty string "")

How does each line (term) contribute to building the summary value ?

  1. If the line is one of the keys in the dictionary, it is adopted as the current heading, under which a few following items can be added
  2. if the line/term is not itself a dictionary header, and we do have a non-empty current header, then that term is added to the current header's list.
const filledDictionary = emptyDictionary =>
    termList => termList.reduce(
        ([dictionary, heading], term) =>
            term in dictionary
                ? [dictionary, term]
                : heading.length > 0
                    ? [
                        Object.assign(
                            dictionary,
                            {
                                [heading]: dictionary[heading]
                                .concat(term)
                            }
                        ),
                        heading
                    ]
                    : [dictionary, heading],
        [emptyDictionary, ""]
    )[0];

So we can now define a filled dictionary and give it a constant name:

const headingsWithFollowingTerms = filledDictionary(
    emptyDictionary
)(
    trimmedLines
);

and headingsWithFollowingTerms turns out to look like this:

{
  "CAT": [
    "a",
    "",
    "Rita Diaz-T’wine",
    "31.2 men"
  ],
  "FREE": [
    "carlton",
    "oak Tree",
    "Lake Titicaca"
  ],
  "LALA": [
    "francis",
    "Session 9 Days",
    "oak Tree",
    "beetle mania"
  ]
}

We can refer to items in JavaScript lists (Arrays) with a zero-base numeric (integer) index.

so for example:

headingsWithFollowingTerms.FREE[1]

is "oak Tree"


Lines Gathered under UPPER CASE Headings.kmmacros (6,3 Ko)


Expand disclosure triangle to view KM-independent testing version of JS source
(() => {
    "use strict";

    const kmvar = {"local_Source": `ABCD1.
baseball bat
carla jones
r2d2
Samantha Who
jersey-mike

CAT
a

Rita Diaz-T’wine
31.2 men
FREE
carlton
oak Tree
Lake Titicaca
LALA
francis
Session 9 Days
oak Tree
beetle mania`};

    // MAIN
    const main = () => {

        const separateLines = kmvar.local_Source.split("\n");

        const trimmedLines = separateLines.map(
            line => line.trim()
        );

        const justUpperCaseLines = onlyLinesWhichAreNonEmptyAndUpperCase(
            trimmedLines
        );

        const emptyDictionary = dictionaryFromWords(
            justUpperCaseLines
        );

        const headingsWithFollowingTerms = filledDictionary(
            emptyDictionary
        )(
            trimmedLines
        );

        return headingsWithFollowingTerms;
    };

    // ------- TERMS FOLLOWING UPPER CASE HEADINGS -------

    const dictionaryFromWords = chosenWords =>
        Object.fromEntries(
            chosenWords.map(
                word => [word, []]
            )
        );


    const filledDictionary = emptyDict =>
        termList => termList.reduce(
            ([dictionary, currentHeading], term) =>
                term in dictionary
                    ? [dictionary, term]
                    : currentHeading.length > 0
                        ? [
                            Object.assign(
                                dictionary,
                                {
                                    [currentHeading]: dictionary[
                                    currentHeading
                                    ]
                                    .concat(term)
                                }
                            ),
                            currentHeading
                        ]
                        : [dictionary, currentHeading],
            [emptyDict, ""]
        )[0];


    const onlyLinesWhichAreNonEmptyAndUpperCase = someLines =>
        someLines.filter(
            line => line.length > 0 && isAllUpper(line)
        );

    // --------------------- GENERIC ---------------------

    // isUpper :: Char -> Bool
    const isUpper = c =>
    // True if c is an upper case character.
        (/[A-Z]/u).test(c);

    // const isAllUpper :: String -> Bool
    const isAllUpper = s =>
        [...s].every(isUpper);

    return JSON.stringify(
        main(),
        null,
        2
    );
})();


1 Like

To get a sense of the .reduce() function (or 'method') which JS Arrays come equipped with, it might be helpful to experiment with reducing a list of numbers to their sum total.

const sum = numberList => numberList.reduce(
    // Adding each list item to the sub-total,
    (subTotal, listItem) =>
        subTotal + listItem,
    // and starting the subTotal count at zero.
    0
);

// [1, 2, 3, 4, 5] -> 15
const total = sum([1, 2, 3, 4, 5]);

Or to separate out and name the parts a fraction more:

(() => {
    "use strict";

    const sum = numberList => numberList.reduce(
        // Adding each list item to the sub-total,
        add,
        // and starting the subTotal count at zero.
        0
    );

    // add :: (Num, Num) -> Num
    const add = (a, b) => a + b;

    // [1, 2, 3, 4, 5] -> 15
    const total = sum([1, 2, 3, 4, 5]);

    return total;
})();

A commented version in which reducing your text to a dictionary of lines under headings is all done with .reduce :

// dictionaryOfLinesUnderALLCAPSheading ::
    // String -> Dict
    const dictionaryOfLinesUnderALLCAPSheadings = s =>
        s.split("\n")
        .reduce(
            dictionaryAndCurrentHeadingUpdatedByLine,
            [{}, ""]
        )[0];

    // dictionaryAndCurrentHeadingUpdatedByLine ::
    // (Dictionary, String) -> String -> (Dictionary, String)
    const dictionaryAndCurrentHeadingUpdatedByLine = (
        [dict, currentHeading], line
    ) => {
        // Any leading or trailling space
        // on a line discarded.
        const term = line.trim();

        // Is this a new upper-case heading ?
        return isAllUpper(term) && term.length > 0
        // If so, for the first item in our
        // [dict, currentHeading] pair,
        // it becomes a new entry in the
        // dictionary, with an initially empty
        // list of following lines
        // For the 2nd item in the [dict, currentHeading]
        // pair, it takes the place of currentHeading.
            ? [
                Object.assign(dict, {[term]: []}),
                term
            ]
        // Otherwise, (not a new heading),
        // Do we already have a non-empty current heading ?
            : currentHeading.length > 0
            // If so (we have a non-empty current heading),
            // we replace the relevant entry in the dictionary
            // with a slightly longer one:
            // Same heading, but an additional item added
            // (concatenated, to the list of lines under that
            // heading)
                ? [
                    Object.assign(
                        dict, {
                            [currentHeading]: dict[
                            currentHeading
                            ]
                            .concat(term)
                        }
                    ),
                    currentHeading
                ]
            // Otherwise (no heading yet) our
            // [dict, currentHeading] pair is returned
            // unchanged by this line.
                : [dict, currentHeading];
    };

First, a source version which you can test in Script Editor (set the language selector at top left to JavaScript rather than AppleScript)

Expand disclosure triangle to view JS source for testing in Script Editor
(() => {
    "use strict";

    const kmvar = {"local_Source": `ABCD1.
baseball bat
carla jones
r2d2
Samantha Who
jersey-mike

CAT
a

Rita Diaz-T’wine
31.2 men
FREE
carlton
oak Tree
Lake Titicaca
LALA
francis
Session 9 Days
oak Tree
beetle mania`};

    // MAIN
    const main = () => {
        const
            headingsDictionary = dictionaryOfLinesUnderALLCAPSheadings(
                kmvar.local_Source
            );

        return headingsDictionary;
    };

    // dictionaryOfLinesUnderALLCAPSheading ::
    // String -> Dict
    const dictionaryOfLinesUnderALLCAPSheadings = s =>
        s.split("\n")
        .reduce(
            dictionaryAndCurrentHeadingUpdatedByLine,
            [{}, ""]
        )[0];

    // dictionaryAndCurrentHeadingUpdatedByLine ::
    // (Dictionary, String) -> String -> (Dictionary, String)
    const dictionaryAndCurrentHeadingUpdatedByLine = (
        [dict, currentHeading], line
    ) => {
        // Any leading or trailling space
        // on a line discarded.
        const term = line.trim();

        // Is this a new upper-case heading ?
        return isAllUpper(term) && term.length > 0
        // If so, for the first item in our
        // [dict, currentHeading] pair,
        // it becomes a new entry in the
        // dictionary, with an initially empty
        // list of following lines
        // For the 2nd item in the [dict, currentHeading]
        // pair, it takes the place of currentHeading.
            ? [
                Object.assign(dict, {[term]: []}),
                term
            ]
        // Otherwise, (not a new heading),
        // Do we already have a non-empty current heading ?
            : currentHeading.length > 0
            // If so (we have a non-empty current heading),
            // we replace the relevant entry in the dictionary
            // with a slightly longer one:
            // Same heading, but an additional item added
            // (concatenated, to the list of lines under that
            // heading)
                ? [
                    Object.assign(
                        dict, {
                            [currentHeading]: dict[
                            currentHeading
                            ]
                            .concat(term)
                        }
                    ),
                    currentHeading
                ]
            // Otherwise (no heading yet) our
            // [dict, currentHeading] pair is returned
            // unchanged by this line.
                : [dict, currentHeading];
    };


    // --------------------- GENERIC ---------------------

    // isUpper :: Char -> Bool
    const isUpper = c =>
    // True if c is an upper case character.
        (/[A-Z]/u).test(c);


    // const isAllUpper :: String -> Bool
    const isAllUpper = s =>
        [...s].every(isUpper);

    // MAIN ---
    return main();
})();

then, a copy for running in Keyboard Maestro:

ALL BY REDUCE (String -> Dictionary) Lines Gathered under UPPER CASE Headings.kmmacros (6,9 Ko)

You've convinced me. I'm going to start with Javascript and try to learn the most pertinent parts that you mention. Thanks!

1 Like

Thank you Rob for the mention. I was in a trip so I couldn't reply.

Adding to your excellent suggestions, I would recommend learning the functional programming paradigm from a book like:

Programming in Haskell - 2nd Edition

You can play around with Haskell and keyboard maestro using my plug-in:

Execute a Haskell Script with Arguments - Plug In Actions - Keyboard Maestro Discourse

Haskell helps developing a good and clear mindset about programming in general. In terms of actual use within macOS and iOS apps, JavaScript is a great option. People from Omni have been developing amazing JavaScript API for their products that work both on macOS and iOS. Other apps have taken a similar route.

Rob's functional JS library:

RobTrew/prelude-jxa: Generic functions for macOS and iOS scripting in Javascript – function names as in Hoogle

provides a rich set of composable functions.

You would find that you can get really far just by composing with a few functions (and the result would be much more maintainable and clear).

1 Like