RegEx Split by Capital Letters

Hi,

Does anyone have a regex to use with the “Search Variable” action that will split a string by capital letters into separate variables:

NoCoffeeTodayThanks

and save capture groups to Variables:

1: Variable1 --> No
2: Variable2 --> Coffee
3: Variable3 --> Today
4: Variable4 --> Thanks

Each string will have a different number of groups - but should be between 2 and 6

Any help is much appreciated!

Thanks!

Will there be any single character words? ie:

NoCoffeeTodayIThink

Not that I can provide an answer, but to clarify for those who can.

1 Like

Thanks for the nudge to clarify.

No, there will not be any single character words. Though I’m not sure it should matter since the regex will in theory find a capital, search until it finds another capital etc.

Thanks!

This should get you started.
It is just an example.

It uses this RegEx:
([A-Z][a-z]*)

which is very constraining. You may need to adjust if anything other than lower case letters follow the upper case letter.

Please test and let us know if it meets your needs.


##example Results


##Macro Library   @RegEx Parse String by Capital Letters @Example


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/3/34a09fa01947df26baff0eb53d58521f69f345a6.kmmacros">@RegEx Parse String by Capital Letters @Example.kmmacros</a> (4.3 KB)

---

###ReleaseNotes

TBD

---

<img src="/uploads/default/original/2X/b/bfff5b004cdfeec30937180ee3f90ed8aa8afc85.png" width="588" height="1116">
3 Likes

Thanks for this! Works great. I didn’t even think about doing it this way - I was using the “Search Variable” action and the RegEx was tripping me up there.

Thanks again!

That’s clever.

@JMichaelTX provides an interesting way to do it, and has the advantage of working with any number of words. Given your constraint for 2-6 words, you can use this regex:

([A-Z][a-z]*)([A-Z][a-z]*)([A-Z][a-z]*)?([A-Z][a-z]*)?([A-Z][a-z]*)?([A-Z][a-z]*)?

Just add a ? after each capture bracket that is optional.

Hey Folks,

I like the way JM approached this problem.

When you're conceptualizing regular expressions, sometimes it's good to simplify, and this is especially true for people new to them.

I've often found that changing text with 1 or more passes can leave me with data that's easier to process.

Here's an example:

RegEx ⇢ Make Word on Capital Letter.kmmacros (3.3 KB)

Here's now I'd go about it in AppleScriptObjC:

------------------------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2017/03/31 01:02
# dMod: 2017/03/31 01:02 
# Appl: AppleScriptObjC & Keyboard Maestro Engine
# Task: Split a String into Words with a RegEx and Enter them into KM Variables.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @ASObjC, @Keyboard_Maestro_Engine, @Split, @String, @RegEx, @Enter, @Words, @Into, @Variables
------------------------------------------------------------------------------
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
------------------------------------------------------------------------------
# Set Keyboard Maestro Variable Example
------------------------------------------------------------------------------
tell application "Keyboard Maestro Engine"
   setvariable "testStr" to "NoCoffeeTodayThanks"
end tell
------------------------------------------------------------------------------
# Get Keyboard Maestro Variable Example
------------------------------------------------------------------------------
tell application "Keyboard Maestro Engine"
   set myAppleScriptVar to getvariable "testStr"
end tell
------------------------------------------------------------------------------

# Break string into paragraphs at lower-case/upper-case letter boundary:
set wordList to its cngStr:"(?-i)([a-z])([A-Z])" intoString:("$1" & linefeed & "$2") inString:myAppleScriptVar

# Ensure there is no vertical whitespace at top or bottom of string:
set wordList to its cngStr:"\\A\\s+|\\s+\\Z" intoString:"" inString:wordList

# Split the paragraphs into an AppleScript list-object:
set wordList to its splitstring:wordList withString:linefeed

------------------------------------------------------------------------------
# Create the New Variables in Keyboard Maestro
------------------------------------------------------------------------------
set baseKmVarName to "kmVar"

tell application "Keyboard Maestro Engine"
   repeat with wordNum from 1 to length of wordList
      set varName to baseKmVarName & wordNum
      setvariable varName to (item wordNum of wordList)
   end repeat
end tell

------------------------------------------------------------------------------
--» HANDLERS
------------------------------------------------------------------------------
on cngStr:findString intoString:replaceString inString:dataString -- courtesy of Shane Stanley
   set anNSString to current application's NSString's stringWithString:dataString
   set dataString to (anNSString's ¬
      stringByReplacingOccurrencesOfString:findString withString:replaceString ¬
         options:(current application's NSRegularExpressionSearch) range:{0, length of dataString}) as text
end cngStr:intoString:inString:
------------------------------------------------------------------------------
on splitstring:someText withString:mySeparator -- courtesy of Shane Stanley
   set theString to current application's NSString's stringWithString:someText
   set theList to theString's componentsSeparatedByString:mySeparator
   return theList as list
end splitstring:withString:

------------------------------------------------------------------------------

And here's why I won't give up the Satimage.osax:

------------------------------------------------------------------------------
# REQUIRES the Satimage.osax ⇢ http://tinyurl.com/satimage-osaxen
------------------------------------------------------------------------------

set myString to "NoCoffeeTodayThanks"
set wordList to splittext myString using "(?<=[a-z])(?=[A-Z])" with regexp

--> {"No", "Coffee", "Today", "Thanks"}

------------------------------------------------------------------------------

One line using positive-lookbehind and positive-lookahead assertions.

Poof!

Of course in this case I'm not allowing for the possibility of vertical whitespace, but that is also easily done using the Satimage.osax.

-Chris

2 Likes

[A-Z][a-z]

Hint for users of non-English alphabets


The character classes used in the examples above ([A-Z][a-z]) may not work if your input text contains characters like é, è, ü, ö etc. [1]

In that case use the more portable Unicode classes:

It seems that KM supports at least these three variants:

[\p{upper}][\p{lower}]

or shorter (Lu ‘Letter uppercase’, Ll ‘Letter lowercase’):

[\p{Lu}][\p{Ll}]

or with POSIX-brackets style:

[[:upper:]][[:lower:]]

(There are probably subtle differences, but any will work here.)
‌‌

So, this would be an internationalized version of @peternlewisexample:

([\p{upper}][\p{lower}]*)([\p{upper}][\p{lower}]*)([\p{upper}][\p{lower}]*)?([\p{upper}][\p{lower}]*)?([\p{upper}][\p{lower}]*)?([\p{upper}][\p{lower}]*)?

(It should work for the other examples in this thread, too.)


See also


[1] It may depend on the Locale setting of the computer. Not sure.

3 Likes

Definitely a valid point.

Also, you can select the Regular Expression Unicode Properties item in Keyboard Maestro’s Help menu to take you to this ICU Unicode Properties for Regular Expressions page that I created that will show you what characters are included in each character class (it’s not live updated, so it is always possible some have changed, but it was right at the time I made it).

So \p{Lower} is at Characters matching \p{Lowercase}