Find and Replace with regex doesn't find every occurrence

I have a macro I've created that will take the selected text and add line breaks based on a delineation string but it seems to stop matching occurrences in the string part of the way through for some text and I'm not sure why.

It works like this:

  1. Prompt for a couple of variables:
    • local__Break After {string} - This will be used for a look behind that line breaks will be inserted after.
    • local_Replace {string} - This is a string I want replaced with the new line, usually just a space.
    • local__Use Regex {boolean} - If this isn't checked escape regex meta characters in Break After and Replace
  2. Call a subroutine to escape Regex unless Use Regex is true.
  3. If Break After is defined search for (?<=%Variable%local__Break After%)%Variable%local__Replace% otherwise search for %Variable%local__Replace%, in either case, replace with %LineFeed%

Here's the text that's giving me trouble:

(in:inbox OR in:later) AND (title__contains:"vs code" OR title__contains:"vscode" OR title__contains:"visual studio code" OR url__contains:"vs code" OR url__contains:"vscode" OR url__contains:"visual studio code" OR author__contains:"vs code" OR author__contains:"vscode" OR author__contains:"visual studio code" OR tag:"Library / Apps / Microsoft / VS Code")

If I do a find and replace in BBEdit using the pattern (?<=OR) and replacing it with \n I get this:

(in:inbox OR
in:later) AND (title__contains:"vs code" OR
title__contains:"vscode" OR
title__contains:"visual studio code" OR
url__contains:"vs code" OR
url__contains:"vscode" OR
url__contains:"visual studio code" OR
author__contains:"vs code" OR
author__contains:"vscode" OR
author__contains:"visual studio code" OR
tag:"Library / Apps / Microsoft / VS Code")

Exactly what I expect from Keyboard Maestro.

If I run my macro and define Break After as OR and Replace as (space) I get this:

(in:inbox OR
in:later) AND (title__contains:"vs code" OR
title__contains:"vscode" OR
title__contains:"visual studio code" OR
url__contains:"vs code" OR url__contains:"vscode" OR url__contains:"visual studio code" OR author__contains:"vs code" OR author__contains:"vscode" OR author__contains:"visual studio code" OR tag:"Library / Apps / Microsoft / VS Code")

If I run it again I get:

(in:inbox OR
in:later) AND (title__contains:"vs code" OR
title__contains:"vscode" OR
title__contains:"visual studio code" OR
url__contains:"vs code" OR
url__contains:"vscode" OR
url__contains:"visual studio code" OR author__contains:"vs code" OR author__contains:"vscode" OR author__contains:"visual studio code" OR tag:"Library / Apps / Microsoft / VS Code")

I can continue running it and it will insert one more line break each time I do until it's matched every occurrence.

Using an alert with the same string I've got for the search pattern I get Regex = "(?<=OR) ". I did also validated that "(?<=OR) " does match every occurrence in Expressions which also uses ICU.

I feel like I must be missing something stupid here but I'm not sure what. Any Ideas what I’m doing wrong?

Here's Break On.kmmacros (17.8 KB) and { Subroutine } Escape Regex.kmmacros (6.2 KB)

This is a bit on the edge of my KM regex knowledge, but try adding (?m) at the front of your regex expressions, to make sure you're looking for matches across multiple lines.

-rob.

1 Like

Have you set the option using the gear menu in the S/R action to replace all the matches? By default it does only the first I believe.

cf action:Search and Replace [Keyboard Maestro Wiki]

"Using the action (gear) :gear: menu, you can select (v10.0+) to replace all, or only the first or last match."

1 Like

@griffman I don't think multi-line is needed for this input since it's a single line to begin with but that's a good suggestion for making it more adaptable. I've gone ahead and added it but it didn't change the behavior here.

@tiffle I did already have it set All Matches.

What's seems weird is that if I set Break After to : or Replace to " " (space) both work as expected and both are more matches than OR + " " (space). It's not like it's running into some kind of limit for how many occurrences KM will match.

This is a bit off a long shot but a while back there was a long discussion about problems with the use of variables in Search/Replace actions. I'm not going to repeat it here but I will link to the solution I came up with (a KM subroutine macro) which also provides a link to the discussion I'm referring to.

My thinking is that you may be running into a similar problem so you might benefit from looking at the discussion/solution to see if it might help you...

Here's the link:

Oh, processing it through AppleScript is brilliant, I'll give that a try when I get back to my computer later, I imagine it should work just fine or if this is some kind of ICU weirdness I could process it through grep or it with JavaScript.

Processing through AppleScript gives me the exact same results as using Keyboard Maestro's native action, since it uses the search from Keyboard Maestro Engine's AppleScript library I'm guessing it's using the same logic.

I'm not sure of another tool that lets me do regex search and replacement using ICU to test this out so I figured Swift and NSRegularExpression would be the closest I could get. It handles this issue exactly like I expect so it seems like either Keyboard Maestro or ICU are wonky. Given that Expressions matches every orrcurence correctly (but doesn't do replacements) I'm leaning to this being a KM issue.

Swift, unfortionately, is pretty slow when I try to execute the macro so I ended up rewriting the AppleScript macro to use perl for the replacement. Fast and the result I wanted.

I looked in to this, but as near as I can figure it is a bug in Apple’s NSRegularExpression.

Basically, on the fourth match it is erroneously setting the NSMatchingHitEnd.

It does this reliably for this search, but weirdly it does not do it if I try to set up a test framework that just does this search at launch (then it works properly).

So it seems that at some point the regex system gets itself in to a state where this happens.

I can't simply ignore this flag (because then replacements for things that can match nothing (eg .*) will match an extra time at the end.

I have added some code to ignore NSMatchingHitEnd unless the matching string is at the end of the search string for the next version.

2 Likes

Might also be worth experimenting with a JavaScript splitBy function.

Since Keyboard Maestro 11, KM variables have become directly available to Execute JavaScript for Automation actions

  • we just prefix their names with kmvar. (replacing any spaces with _ )
  • and can optionally limit which variables are accessible in this way, through the chevron at the left. (by default they are all available)


So for example, roughly this kind of thing:

splitBy previous token.kmmacros (5.3 KB)


Expand disclosure triangle to view JS source
return (() => {
    "use strict";

    const main = () =>
        splitBy(
            (prev, next) => prev === kmvar.local_Break_After
        )(
            words(kmvar.local_Source)
        )
        .map(unwords)
        .join(kmvar.local_Replace);

    // --------------------- GENERIC ---------------------

    // splitBy :: (a -> a -> Bool) -> [a] -> [[a]]
    // splitBy :: (String -> String -> Bool) ->
    // String -> [String]
    const splitBy = p =>
        // Splitting not on a delimiter, but wherever the
        // relationship between consecutive terms matches
        // a binary predicate.
        xs => (xs.length < 2)
            ? [xs]
            : (() => {
                const
                    bln = "string" === typeof xs,
                    ys = bln
                        ? xs.split("")
                        : xs,
                    h = ys[0],
                    parts = ys.slice(1)
                    .reduce(([acc, active, prev], x) =>
                        p(prev, x)
                            ? [acc.concat([active]), [x], x]
                            : [acc, active.concat(x), x], [
                        [],
                        [h],
                        h
                    ]);

                return (bln
                    ? ps => ps.map(cs => "".concat(...cs))
                    : x => x)(parts[0].concat([parts[1]]));
            })();


    // unwords :: [String] -> String
    const unwords = xs =>
    // A space-separated string derived
    // from a list of words.
        xs.join(" ");


    // words :: String -> [String]
    const words = s =>
        // List of space-delimited sub-strings.
        // Leading and trailling space ignored.
        s.split(/\s+/u).filter(Boolean);

    // MAIN ---
    return main();
})();

Interesting, this uses NSRegularExpression as well and doesn't run into the problem, maybe for the same reason that using a test framework?

import Foundation

enum ReplacementScope {
    case all, first, last
}

/**
 regexReplace 

 Replaces occurrences of a substring within a given string based on the specified options.

 - Parameters:
    - text: The text to search within.
    - pattern: The regex pattern to search for.
    - replacement: The pattern to replace matches with.
    - isCaseSensitive: Determines if the search should be case sensitive.
    - scope: The scope of the replacement operation (all, first, or last).

 - Returns: The modified string after applying the replacement, or the original string if an error occurs.
 */
func regexReplace(in text: String, searchFor pattern: String, replaceWith replacement: String, isCaseSensitive: Bool, replaceWhere scope: ReplacementScope) -> String {
    var options: NSRegularExpression.Options = []
    if !isCaseSensitive {
        options.insert(.caseInsensitive)
    }
    
    do {
        let regex = try NSRegularExpression(pattern: pattern, options: options)
        let range = NSRange(text.startIndex..., in: text)
        
        switch scope {
        case .all:
            return regex.stringByReplacingMatches(in: text, options: [], range: range, withTemplate: replacement)
        case .first:
            if let firstMatch = regex.firstMatch(in: text, options: [], range: range) {
                return regex.stringByReplacingMatches(in: text, options: [], range: firstMatch.range, withTemplate: replacement)
            }
        case .last:
            let matches = regex.matches(in: text, options: [], range: range)
            if let lastMatch = matches.last {
                return regex.stringByReplacingMatches(in: text, options: [], range: lastMatch.range, withTemplate: replacement)
            }
        }
    } catch {
        print("Regex error: \(error)")
    }
    
    return text // Return the original text if there was an error or no matches
}

// Map Keyboard Maestro variables to something more readable.
let searchPattern = ProcessInfo.processInfo.environment["KMVAR_local__Search_Pattern"] ?? ""
let replacementPattern = ProcessInfo.processInfo.environment["KMVAR_local__Replacement_Pattern"] ?? ""
let searchText = ProcessInfo.processInfo.environment["KMVAR_local__Search_Text"] ?? ""

// Convert the caseSensitive string to a boolean. If the variable is not set, default to false.
let caseSensitiveString = ProcessInfo.processInfo.environment["KMVAR_local__Case_Sensitive"] ?? "false"
let caseSensitive = caseSensitiveString.lowercased() == "true"

// If local__Replacement Scope is undefined use all.
let replacementScopeString = ProcessInfo.processInfo.environment["KMVAR_local__Replacement_Scope"]?.lowercased() ?? "all"

var replacementScope: ReplacementScope

switch replacementScopeString {
case "first":
    replacementScope = .first
case "last":
    replacementScope = .last
default:
    replacementScope = .all
}

let processedText = regexReplace(in: searchText, searchFor: searchPattern, replaceWith: replacementPattern, isCaseSensitive: caseSensitive, replaceWhere: replacementScope)

print(processedText)

It's been a few years since I've written anything in Swift so forgive the likely poor qualifty of that script.

Thanks for doing some testing and adding a fix, I'll look forward to the next update, I have a feeling it's a lot more robust than my AppleScript and perl solution.

Thanks, JavaScript would have been my next solution if I hadn't gotten the AppleScript working. I've just been doing a lot of JS recently so it was nice to get into a different mindspace for a while trying to figure this out.

Yeah, I set up a test at launch time, and it worked fine, but later on the same code doesn't work, so at that point, who knows. It's weird that such a bug is so reliable.