RegEx question: how to search for white space repetitions (1 tab and 1 new line)

Just tried it on regex101.com and it gives me the following error.

Can you work with something like this, which captures spaces too ?

(?<=Status)\s+.*(?=\sCustomerConnected)

perhaps, for example by adding a capturing group which excludes the leading space ?

(?<=Status)\s+(.*)(?=\sCustomerConnected)

Those work...but I'm trying to avoid capturing the spaces preceding and following the string because then I would have to filter out those spaces in another action or else it would lead to extra lines in the end result. The way I was doing it before was by including the the words "Status" and "CustomerConnected" and then filtering them out in a subsequent action. I just want to simplify things and have one action to get the exact string I'm looking for.

You can just do

Status\s+(.*)\sCustomerConnected

and save the capture group into a variable. Therefore, no need for an extra action.

Thanks! I should look more into the capture groups, because this is actually just one of four pieces of information I need to extract from a page. Right now I have them setup the following way.

Ideally I would set it up to extract all that info in one shot, with a single action. But at least for now I have greatly streamlined it, going from literally 20 actions to only 4. So I'm making progress. I was just trying to understand why I was having trouble writing in a RegEx for multiple white spaces in the participant search. Thanks again for your help!

Hey Chris,

Yes!

Stay away from lookaheads and lookbehinds, unless you really need them. They add complexity and confusion – especially for neophyte users.

As you've discovered lookbehind assertions must be of fixed length and cannot contain quantifiers.

Don't get overly fixated on creating the perfect RegEx – use two, three, or more if required for simplicity and readability.

Learn – improve your proficiency – but don't waste too much time.

You'll understand this better after you spend 10 hours working on the “perfect” regular expression only to discover it breaks too easily.

Then you spend 10 more hours trying to fix every possible breakage.

Then you realize you wasted your time, because a single regex can't do everything you need.

After all that you find that 4 regular expressions were all that were required, and the total dev time needed for them was 20 minutes...

:sunglasses:

After you do that a time or two (or three) you'll pay a little more attention to the Keep it Simple rule.

Have fun though! For me regular expressions are fun puzzles that make my work easier – as long as I don't get too precious about them.

-Chris

2 Likes

You just described what my wife has often accused me of...always wanting to refine something that already works :laughing:

But you’re both very right...sometimes it's best to leave something alone once it works well enough. Thanks again for your help with AppleScript and RegEx!

1 Like

It's possible to do multiple capture groups in one action, provided the text follows the same format. You will need to provide the relevant sample text block for test.

:sunglasses:

The former NASA engineer @JMichaelTX used to say “Better is the enemy of good enough...”

Since nothing is perfect one can always spend time in pursuit of better.

Since I'm a perfectionist by trade I have to make sure I don't go excessively far down that particular rabbit hole.

-Chris

1 Like

I think I'll follow Chris' (@ccstone) advice and leave the macro itself alone since it works quite well. BUT, I also want to learn some more so I've included the sample text block for you to take a look at.

 Your Conference Details
Customer Name:	DHS U.S. Citizenship and Immigration Services
Site Name:	Houston (ZHN)
Language:	Spanish
Reference ID:	11111111
Call Duration:	00:00:01
 Participants
Name	Status	
Name changed for security	CustomerConnected

I changed the reference ID and customer's name for security reasons, but they are still in the same format as what the actual text would read.

I need to extract the following info:

DHS U.S. Citizenship and Immigration Services
This name can be just one word or several as you can see.

Houston (ZHN)
This name can also be just one word or several.

11111111
Obviously that's not the real number, but it is always an 8-digit string.

Name changed for security
This is usually two or three words depending on if it's the actual persons name, or the name of their office.

So just for curiosity sake at this point (because why screw up a perfectly good macro I managed to reduce from 20 actions to 4? :laughing:) would there be a way to parse this information with a single action, and set each set of data to a separate variable?

The page's format is the same every single time...well, 98% of the time. For the other 2% I have 4 separate macros that can extract just one single item in case one of those items is missing (sometimes the participant's name is missing for example).

Thanks!

This is the action based on your sample text:
Search pattern:

Name:\s(.*?)\sSite Name:\s(.*?)\sLanguage(?:.|\s)+ID:\s(\d{8})\sCall(?:.|\s)+Status\s+(.*)\sCustomerConnected

3 Likes

aka spending 10 hours to save 10 minutes. :grin:
Guilty! :raised_hand:

2 Likes

IF

  • That 10 minutes is repeatedly saved over time.
  • Keeps me from pulling my hair out.
  • Improves the likelihood that I'll get work done on time.
  • Makes my work more accurate.

OR

  • Helps me to learn things that will facilitate the above.
  • Is a FUN pastime that takes the place of other leisure activities, and let's me learn something at the same time.

I'll gladly spend the 10 hours.

OTHERWISE

  • If I'm not getting paid for the work.
    • I better take a good long look at why I'm doing what I'm doing.

:sunglasses:

-Chris

5 Likes

I must have messed up the formatting when I posted it, because using the action you built doesn’t return me any results. I’ll take a closer look at it tomorrow though and see what I can figure out.

A good portion of my time spent building and tweaking macros is for these very reasons, either to learn and/or to just have fun haha.

1 Like

Hey Chris,

How are you acquiring the data in the first place?

-ccs

Copy contents of page to system clipboard and then filtering to remove styles. But I have several copies saved in text files on my computer to test things with and I used one of those in a previous post. It must have gotten corrupted somehow; likely I deleted some white space character or the like.

How?

In what browser?

-ccs

Bring page to front, keystroke ⌘A, copy to clipboard, filter clipboard to remove styles to target source

Google Chrome.

Don't do that.

Run this in a Execute a JavaScript in Front Browser action:

document.body.textContent

Better yet – find the correct element(s) and get the correct text on the first try.

-ccs