Regex Question

RogerW · July 9, 2022, 10:53am

I am trying to understand and learn more about Regex.

I have a string b: 23 Jul 1910, d: 11 Jul 2002, FSFTID and I want to pick out just the 1910

Regex statement (?<=, b: )(.*)(?=, d: ) and I get 23 Jul 1910 but I want only the 1910 and if the 1910 is missing I do not want the 2002.

Also can you recommend a good book or on-line tutorial to learn Regex?

Roger

cyoungers · July 9, 2022, 1:33pm

This post has lots of info about learning regex.
REGEX a simple example on how to use REGEX for validation in KM

Zabobon · July 9, 2022, 1:42pm

I'm sure there is a Regex expression to do this and someone will supply if for you.

In the meantime, I was wondering if you could extract the "1910" from a string that includes a birth date in the form you have quoted using just native Keyboard Maestro Actions.

If the logic is that the string will contain a birth date if it starts with "b:" then the year will be characters 11 to 14. If the string does not start with "b:" then presumably it doesn't contain a birth date. Anyway, this is the kind of logic you can use with Keyboard Maestro directly.

(Of course I might be making a wrong assumption about "b:" not being in the string when the birth date is missing as you did not give an example string when the birth year is missing. But the point is that native Keyboard Maestro Actions can achieve quite a lot and are easy to understand and edit.)

EXAMPLE Extract Birth Year from String.kmmacros (4.8 KB)

Click to Show Image

Nige_S · July 9, 2022, 2:46pm

Difficult without examples of "bad" strings, but...

Looking at your string you have some text, an optional 2-4 numbers you want to extract, then ", d" (which doesn't appear after the year of death).

So a simple (\d{2,4}), d should do the trick. Put it in a "Try" action and you can return whatever you want when the birth year is missing:

Search Year.kmmacros (2.6 KB)

But there are other patterns which are as good and may be better, depending on your "bad" strings' contents, eg (\d{2,4}), [^F] (2-4 digits then a comma then a space then anything but an "F") or b:[^:]+ (\d{2,4}) ("b:" then one or more non-colons, then a space and 2-4 digits), and that's without getting into complicated look-aheads etc.

ComplexPoint · July 9, 2022, 3:36pm

The first thing to grasp about Regular Expressions is that they seldom the provide the quickest or most solid approach.

Just splitting, for example, is often conceptually simpler. You can split any Keyboard Maestro variable into an 'array' of parts with any delimiter you like.

See How to use Custom Array Delimiter in manual:Variables [Keyboard Maestro Wiki]

For example here:

first split on commas,
and then on spaces,
using (at each stage) a (one-based) index to the part you want.

Splitting and sub-splitting a string.kmmacros (3.2 KB)

Result:

Nige_S · July 9, 2022, 5:00pm

I believe the saying is "I had a problem, which I solved with a Regular Expression. Now I have two problems."

But here we don't have enough information to decide which method is best -- we know the year might be missing (so your macro needs a check for that. What's the best way to "is a number?" in KM?) but we don't know what other formats, if any, the DoB may be presented in -- "b: Dec 23, 1910," would fail a comma-split but is fine with the regexs.

(Also, OP was specifically asking about regex and may have just given an example for learning purposes. Whilst an important lesson is "It's often better to use something other than a regex", it isn't the only lesson.)

RogerW · July 9, 2022, 5:10pm

Thanks

\d{4}, d This did the trick except it also selects the , d and gives me this 1990, d as a result where I only want 1990. How would I remove the , d. I know could write another line of code to remove them but I would like to only use one line of code.

Is there a and maybe && that could be used to have 2 Rexeg expressions in one line?

Roger

Nige_S · July 9, 2022, 5:16pm

I think you missed out the parentheses -- (\d{4}), d, which puts the four-digit year into the first capture group (the "1:" variable line in the action).

But take @ComplexPoint's comments to heart -- do you need a regex, or will string splitting work?

RogerW · July 9, 2022, 6:03pm

Yes I did and that fixed it.

Yes there are other ways of doing it using just KBM but I am trying to learn how to use Regex. By using Regex I can run it in an Apple Script, java script, java, python without much changes.

Thanks for your help.
Roger

Nige_S · July 9, 2022, 6:47pm

Not natively (in case you didn't already know that). There are things like the Satimage OSAX, but don't forget that as a KM user you have access to KME's search and replace functionality -- including regex.

set theString to "b: 23 Jul 1910, d: 11 Jul 2002, FSFTID"

tell application "Keyboard Maestro Engine"
	set theResult to search theString for ".*(\\d{4}), d.*" replace "$1" with regex
end tell

return theResult

This is my first time looking at this -- I couldn't quickly see a way to search and return the found pattern rather than search and replace everything with the found pattern, which is why the regex is different.

But @ComplexPoint's comments still stand -- if you've a relatively well-defined input it's often easier and more comprehensible to use a languages's text-split and array functions. Again in AppleScript:

set theString to "b: 23 Jul 1910, d: 11 Jul 2002, FSFTID"

set {oldTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, ","}
set theResult to word -1 of text item 1 of theString
set AppleScript's text item delimiters to oldTIDs

try
	(0 + theResult)
	return theResult
on error
	return "No birth year given"
end try

ComplexPoint · July 9, 2022, 7:58pm

Well ... AppleScript itself provides no regex engine, and elsewhere there are dialectal variations.

Splitting a string on a delimiter is easier to use and define (often built-in), and much more consistent its behaviour in all of those languages.

mrpasini · July 9, 2022, 9:26pm

Bravo! As @ComplexPoint points out, there are other ways to grab a substring, but I bless the day I discovered regular expressions. They are all about pattern matching and @Nige_S has eloquently matched your pattern efficiently (as you discovered).

It turns out people are not quite so good at pattern matching as they think they are. They like to embrace sea-to-shining-sea patterns when what anyone really wants is the simplest expression of the pattern. The most concise, as @Nige_S delivered.

So don't be deterred from learning about these magical incantations that can bend a string to your will.

anamorph · July 10, 2022, 6:57am

ComplexPoint · July 10, 2022, 8:49am

Splitting in AppleScript, might look, for example, like:

on run
    set haystack to "b: 23 Jul 1910, d: 11 Jul 2002, FSFTID"
    
    set beforeComma to item 1 of splitOn(",", haystack)
    
    item -1 of splitOn(space, beforeComma)
    
    --> "1910"
end run


------------------------- GENERIC ------------------------
-- splitOn :: String -> String -> [String]
on splitOn(pat, src)
    set {dlm, my text item delimiters} to ¬
        {my text item delimiters, pat}
    set xs to text items of src
    set my text item delimiters to dlm
    return xs
end splitOn

And to be fair, if you:

import the Foundation classes
use the foreign function interface to ObjC
escape certain characters in the Regular Expression string, and
adjust between zero-based ObjC indexes, and 1-based AppleScript indexes,

then you can also find your way towards a Regex route in AppleScript:


use framework "Foundation"


on run
    set haystack to "b: 23 Jul 1910, d: 11 Jul 2002, FSFTID"
    
    set regexNeedle to "\\d{4},"
    
    set matches to regexMatches(regexNeedle, haystack)
    if {} is matches then
        "pattern not found: " & regexNeedle
    else
        tell item 1 of matches
            set patternStart to its location
            set patternLength to its |length|
        end tell
        
        -- Zero-based ObjC matches, but 1-based AppleScript indexes
        text (1 + patternStart) thru ((patternStart - 1) + patternLength) of haystack
    end if
end run


-- regexMatches :: Regex String -> String -> [[String]]
on regexMatches(strRegex, strHay)
    set ca to current application
    -- NSNotFound handling and and High Sierra workaround due to @sl1974
    set NSNotFound to a reference to 9.22337203685477E+18 + 5807
    set oRgx to ca's NSRegularExpression's regularExpressionWithPattern:strRegex ¬
        options:((ca's NSRegularExpressionAnchorsMatchLines as integer)) ¬
        |error|:(missing value)
    set oString to ca's NSString's stringWithString:strHay
    
    script matchString
        on |λ|(m)
            script rangeMatched
                on |λ|(i)
                    tell (m's rangeAtIndex:i)
                        set intFrom to its location
                        if NSNotFound ≠ intFrom then
                            text (intFrom + 1) thru (intFrom + (its |length|)) of strHay
                        else
                            missing value
                        end if
                    end tell
                end |λ|
            end script
        end |λ|
    end script
    
    script asRange
        on |λ|(x)
            range() of x
        end |λ|
    end script
    map(asRange, (oRgx's matchesInString:oString ¬
        options:0 range:{location:0, |length|:oString's |length|()}) as list)
end regexMatches


-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    -- 2nd class handler function lifted into 1st class script wrapper. 
    if script is class of f then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn


-- map :: (a -> b) -> [a] -> [b]
on map(f, xs)
    -- The list obtained by applying f
    -- to each element of xs.
    tell mReturn(f)
        set lng to length of xs
        set lst to {}
        repeat with i from 1 to lng
            set end of lst to |λ|(item i of xs, i, xs)
        end repeat
        return lst
    end tell
end map

( Though by now, of course, the new problem of getting your Regex to work is already an order of magnitude larger than the original problem of finding a substring at some position between some delimiters )

In JavaScript and Python etc, a split function is built in.

tiffle · July 11, 2022, 8:14am

In addition to @anamorph's go-to recommendation, here are two more websites that I have bookmarked:

and

I used to refer to these a lot but now I build/test regex expressions using RegExRX app.

RogerW · July 12, 2022, 11:05am

tiffle

Is this a Regex Code generator? Can you give me some ideal on how you use this and how the Regex matches KM?

Roger

tiffle · July 12, 2022, 11:11am

No - it's a bit like regex101.com but is a native Mac app. Have a look at it on the Mac App store.

I used to use a regex generator when I was on Windows - it was called RegexMagic. The same developer also produced RegexBuddy.

Regex Question

Options