What's the best way to use regex to match a UUID like: DC383C5C-A0DD-43D7-845B-FE99056B4238

I'm trying to create a macro that uses regex to match a UUID in a string, and it seems that the regex is unfortunately quite long:

([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})

Is there a better way? I was hoping that there was a regex character like "\w" that would match numeric, alphabetic and dashes.

Sometimes I need to match three UUIDs in a string, so I have to use strings more than three times the length of the above string.

Is there a way for me to define a "token" like "\UUID" so that I can concisely create search strings with multiple UUIDs, like:

abc \UUID def \UUID ghi \UUID

I did some more googling and found this which seems to work. I think the {Pd} means "punctuation including dash". This should help me.

[\p{Pd}a-z0-9]{36}

2 Likes

Funnily enough, Web pages about the category don't use very accurate punctuation in their titles. Think of the category as "Punctuation: dash"... or, aptly, "Punctuation – dash".

Yes, it turns out to be a Unicode category just for dashes (well, actually dashes and hyphens). Unicode thrills!

Note that this isn't a strict match for an UUID, as you're including [g-z], and you aren't checking for the usual 8-4-4-4-12 format.

KM's regex engine does accept POSIX character classes, so you can use [:xdigit:] for 0-9a-f. No shorter, but perhaps more readable! So you could do something like:

\b([[:xdigit:]]{8}(-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12})\b

...which would include the requirement for 8-4-4-4-12 formatting and "word breaks" at either end of the UUID.

2 Likes

I liked your trick using "{3}". I probably never would have thought of that myself. Did I mention that I learn something from you every day?

I'm still a regex noob -- most of what I've learnt has been from trying new things in this Forum, and seeing how others solve the same problem. The big thing, IMO, is practice at spotting patterns in the first place -- how to match/use them is then just a google away :wink:

This regex can certainly be improved. For example, we can make the "inside" group non-capturing to make it slightly more efficient and -- bonus! -- suppress the unnecessary KM variable field:

\b([[:xdigit:]]{8}(?:-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12})\b
                   ^^
        make group non-capturing

I'm sure others here can improve it still further.