How Do I Get List of RegEx Capture Group of Multiple Matches?

###How Do I Get List of RegEx Capture Group of Multiple Matches?
(EDIT: Change "Match Group" to "Capture Group", to clarify.)

####Final Macro: MACRO: Get List of RegEx Capture Group of Multiple Matches


I need help in fixing the below macro.

I've got a text selection that contains one or more matches.
I need a list of the Capture Group for each match.
I know how to use For Each action with a match, but I need the Capture Group.

Extract an unknown number of Capture Groups from text on clipboard.

Each match is a JavaScript function.

This pattern works at Regex101.com
^\s*?(function[ ]?\w*\(.*\)).*[\n\r] /gm

• finds the matches
• finds the correct Capture Group for each match.

The result would look something like this:

function helloJS(pMsg)
function initProgressBar(psTitle, psMsg, piMax, piCurrent)
function closeProgressBar()

####The below macro is NOT working correctly

  • It does NOT return any Capture groups
  • That is because the For Each action uses matches, not Capture groups
  • For each match, I need the Capture group returned
  • I have a group identified in the pattern

[JS] Get List of JavaScript Functions in Selection.kmmacros (8.3 KB)

I could not figure out how to do this using KM Actions, so I wrote this JXA script that does the job:

function run() {
'use strict';

var app = Application.currentApplication()
app.includeStandardAdditions = true

var myString = app.theClipboard();

//console.log(myString)

var myRegEx = /^\s*?(function[ ]?\w*\(.*\)).*[\n\r]/gm;

// Get an array containing the first capturing group for every match
var matches = getMatches(myString, myRegEx, 1);

// Log results
//console.log(matches.length + ' matches found: ' + JSON.stringify(matches))
//console.log(matches);

//--- CREATE STRING FROM ARRAY JOINED WITH NEW LINE ---
var matchesStr = matches.join("\n");

return (matchesStr)
//~~~~~~~~~~~~~~~~~~~~~~~~~~~ END OF MAIN SCRIPT ~~~~~~~~~~~~~~~~~~

function getMatches(string, regex, groupIndex) {

  groupIndex || (groupIndex = 1); // default to the first capturing group
  var matches = [];
  var match;
  while (match = regex.exec(string)) {
    matches.push(match[groupIndex]);
  }
  return matches;
}	// END function getMatches

}	// END function run

You use the For Each action to find each match, and then you use a Search Variable action to find the capture group within the match.

So basically:

  • For Each variable Match matching “(?m)^\s*?(function[ ]?\w*(.)).[\n\r]”
    • Search Variable Match for (?m)^\s*?(function[ ]?\w*(.)).[\n\r] returning the capture group into a variable.
    • Use the captured variable
1 Like

FWIW, if you ever do need to get matches within a JavaScript for Automation action, you can simplify a little by nesting the exec calls in the while() test, where they will return a value which reduces to a boolean.

e.g. sth like:

function run() {
    'use strict';

    var a = Application.currentApplication(),
        sa = (a.includeStandardAdditions = true, a),
        strText = sa.theClipboard();

    var rgx = /function\s*(\w+\s*\(.*\))/gm,
        lstMatches = [],
	lstMatch;

    while (lstMatch = rgx.exec(strText)) {
        lstMatches.push(lstMatch[1]);
    }

    return lstMatches;
}

Rob, thanks for sharing your optimization of the get matches code.

Trying to learn from your example, what do you see as the advantages over the code I posted?

Looking again, I think it’s essentially the same – I probably read it a bit quickly – and yours is absolutely fine.

If one one wanted to generalise a bit more, a useful goal is always to frame things as composable (nestable) expressions rather than statements (which have an effect but not a value), and moving from a string and regex to a list of matches is essentially an example (in functional terms) of the unfoldr pattern.

(See, for example http://hackage.haskell.org/package/base-4.8.2.0/docs/Data-List.html#v:unfoldr )

An iterative implementation of unfoldr in ES5 JS (recursive implementations will make more sense in ES6) might look something like:

    // (b -> Maybe (a, b)) -> b -> [a]
    function unfoldr(mf, v) {
        var lst = [],
            a = v,
            m;

        while ((m = mf(a)) && m.valid) {
                lst.push(m.value);
                a = m.new;
            }
            return lst;
        }

If you had unfoldr in your library, and had the habit of framing things in terms of fold and unfold expressions, you could write a composable expression in the form:

unfoldr(fMatches(strClip, [1]), /function\s*(\w+\s*\(.*\))/gm)

Where the [1] means that the return value is built from index 1 of the matches.

The fMatches() function might look a bit puzzling - it’s a function that returns a ‘monadic’ function:

    // String -> [Int] -> (Regex -> [[String]])
    function fMatches(s, lstIndices) {
        return function (rgx) {
            var m = rgx.exec(s),
                blnMatch = Boolean(m);

            return {
                    valid: blnMatch,
                    value: blnMatch ? lstIndices.map(function (i) {
                        return m[i];
                    }) : [],
		        new: blnMatch ? rgx : undefined
                };
            }
        }

But by now, I am sure that your clean and simple while loop is looking much more appealing than further generalisation : - )

Thanks, Peter. That was the (obvious) Action I was missing.

Here is my final macro:

Thanks.

I've added that concept/method to my advanced JavaScript course, which is some distance off. :smile:

Since almost all of my KM macros, AppleScripts, and JXA scripts are mostly for automating simple things, and not processing of large batches of data, I'm not too concerned with highly optimizing my code. If I notice a serious performance issue, then I'll look at optimization. I do generally put reusable code into functions.

To be honest, I find my code more user/coder friendly. The main code is in a function, and the arguments to that function are clear and obvious, making it very easy in the future to use that function for other cases:

var matches = getMatches(myString, myRegEx, 1);
function getMatches(string, regex, groupIndex)

Absolutely, and functional construction is more likely to save programmer time than machine run-time.

For small day to day scripts though, Array.map(), and Array.reduce() (and, of course, .filter() and .sort()) are all very useful, and amply repay a bit of experimentation )

1 Like