RegEx Matching Large Blocks of Text

I (love|hate) RegEx, and have done for decades...however, I'm struggling to get KM to do what I want with a few blocks of text.

Goal: I receive emails which have a large amount of automated text inserted in them, followed by customised input from a web form; I want to automate removal of the automated text, yet retain the customised input for email response.

The only good news is that if the automated text is working right, it always ends with the same phrase, and that's what I'm trying to match on.

For example, pretend that this is what I receive:

Field 1 : Data
Field 2 : data
Some other data:  Data
Some more data = fubar
And yet more information = stuff

And here is more : http://data
Custom input=
Hi there, this is custom input.  It may be many paragraphs long, but, it is always until the end of the email itself.

My intention is this:

  • Select all of the text with command-a
  • Send it to the clipboard with command-x
  • Search the Clipboard and search for "Custom input=" and maybe create two variables, the data before "Custom input=" and the data after "Custom input="
  • Paste the second variable, ONLY the data AFTER the "Custom input=" back into the email
  • The data after "Custom input=" should also be properly quoted for email wrapping (ASCII, no HTML)

However, despite many years of RegEx pain and agony, I'm not able to get KM to do this quite the way I want.

I know I'm missing something obvious. This should be easy.

I've been staring at this for too long, so any thoughts appreciated!

Not sure that a regular expression is needed here.

It might be simpler to split the text on Custom input= using an Execute JavaScript or Execute AppleScript action.

split email on phrase.kmmacros (18.6 KB)

1 Like

Essentially the same in AppleScript, if, perhaps, fractionally noisier.

(and AppleScript indices start, a little unusually, at 1 rather than 0)

tell application "Keyboard Maestro Engine"
    set wholeEmail to getvariable "wholeEmail"
    
    item 2 of my splitOn("Custom input=", wholeEmail)
end tell


------------------------- GENERIC ------------------------

-- splitOn :: String -> String -> [String]
on splitOn(needle, haystack)
    set {dlm, my text item delimiters} to ¬
        {my text item delimiters, needle}
    set xs to text items of haystack
    set my text item delimiters to dlm
    return xs
end splitOn
1 Like

Certainly a split (in any language) is all you need but to address your Regex pain, I suspect it isn't so much the regex itself as the options. Like whether newlines are included by the dot metacharacter.

Those options are a little obscure with Keyboard Maestro's built-in commands but a long time ago a few of us built macros to set them. Mine is Regexp Options Macro.

I agree with @ComplexPoint about avoiding regexes and splitting on the delimiter string directly. If you want another example, here it is in Perl:

After delimiter.kmmacros (2.4 KB)

You may want to change the Cut to a Copy and put the output of the script onto the clipboard instead of immediately pasting, but the key is the Perl one-liner that extracts the text below the delimiter:

perl -e '$/="Custom input=\n";$top=<>;$bottom=<>;print $bottom;'
2 Likes

I encourage you to keep on using RegEx. The more you use it, the easier it becomes.
It is, IMO, one of the most powerful languages available.

Actually this is fairly easy as RegEx expressions go. The key is in setting the RegEx option (?s) which means that the dot . will include new line characters.

RegEx Search for
(?s)(.+)(Custom input=.+)

image

For RegEx details, see regex101: build, test, and debug regex

Let us know if you have any followup questions.

1 Like

Wow, thank you ALL for some great suggestions here. Despite having used Apple since they existed, I never really got into AppleScript - but perhaps it is time. I didn't even think of just solving the problem with Perl, which I definitely could have done...sigh.

This RegEx tip did the trick, and is exactly what I needed. You were right, it was the (?s) that I needed, and despite having mucked about with Patterns.app (great app on macOS) and so on, I managed to completely overlook that.

Thanks again everyone!

1 Like

Er ... well ... it's not Turing complete, so formally speaking, and in practice, regular expressions are actually one of least powerful languages available : -)

Kleene algebra - Wikipedia

But they do, of course, have their uses, in small doses, and they can be fun to learn and experiment with, though they typically have to be helped out by more powerful languages, which leads to a bit of semantic confusion: One problem, but two languages, in the same code.

@ComplexPoint, you are, of course, welcome to your opinion.

In terms of any language that a KM user might want to learn to further automate Mac workflows, I will stand by my opinion that RegEx is one of the most powerful. And, KM makes it very easy to use RegEx for both search extraction, and replace, as well as many other places (like Typed String triggers).

The trick, I think, is just not to push people into reaching too automatically for regular expressions, in contexts in which they add a layer of complexity, rather than removing one.

This looks like an example of that issue, which can demonstrably cost a lot of time and frustration.

Regular expressions (beyond the very useful – short and trivial – ones for line breaks, word breaks etc) need a lot of practice to build and maintain any kind of fluency, and that understandably creates a strongish appetite for wax-on wax-off exercises, and for practice material, either from own working problems, or from those reported by others.

Hence I think, a tendency to rush in a bit too fast and say:

"I know, I'll use regular expressions !"

Which, for the reasons well expressed in the Jamie Zawinski quote, we should probably hesitate to encourage unreflectingly, or excessively.

@ComplexPoint, you have made many numerous posts in multiple topics raising your objections to use of RegEx. Again, you are welcome to your opinion.

I disagree with you. KM makes use of RegEx very easy, easier than using a scripting language. And there is lots of help available for RegEx, both for learning (tutorials) and for getting help on specific questions (stackoverflow.com and others).

Perhaps you would like to post a new topic to fully develop your opposition to RegEx, so that we don't continue to go off-topic in so many threads.

No. No objections. No opposition : -)

Regular expressions, particularly short regular expressions, sometimes reduce complexity rather than adding to it.

The longer our regex get, however, the more likely they are to be the wrong tool for the job. (Write only, and ill-equipped, by their nature, for recursive patterns like nested brackets and tags etc).

As I said:

The trick, I think, is just not to push people into reaching too automatically for regular expressions, in contexts in which they add a layer of complexity, rather than removing one.

Over-evangelizing their use has a pay-off for us – it generates lots of badly-needed practice material to hone our skills on as people start to make some degree of assumption that 'RegEx' will usually be the right solution, unless proved otherwise, and then bring various kinds of puzzlement to us.

It can, however, have a cost for the converted – frustration and wasted time, as we see at the top of this thread.

Regular expressions are most useful in moderation, at small scale, and in a limited set of contexts.

Over-promotion of them soon assumes the shape of a kind of Ponzi scheme, in which early entrants reap the benefit of practice material brought to them by puzzled others, but late-comers mainly reap wasted hours.

Regex are sometimes a good tool for the job.
Often, they are not.

Shorter regex are legible.
Longer regex are write-only.

Non-nested patterns can be expressed by regex.
Nested patterns can not.

The problem with veering away from RegExes is that the user never builds the considerable experience that is required. (Only yesterday I tweeted that I was refactoring some crappy RegEx usage to use what I’d learnt about RegExes. To give one corroboration.)

@ComplexPoint, yet your many, many posts in opposition to Regex scream the opposite.

It is too bad that you find RegEx so difficult. Perhaps you have not properly tried learning how to use them.

Quite the opposite, I find RegEx useful and easy to use almost on a daily basis.

I find RegEx very useful in BBEdit to quickly cleanup and/or extract text I've copied from a multitude of sources, usually on a one-off basis. It is not something I'm trying to automate so much as to quickly extract the desired text for use elsewhere.

I find RegEx very useful in KM to automate routine processes including a variety of source text from sources like email, PDFs, and even some web pages. In some cases it is useful to combine use of a language like JavaScript with RegEx, for which JavaScript has excellent support.

For an example of using Regex with JavaScript, see grab a number from the HTML code of a product page.

The Many Uses of Regex -- RexEgg.com

Regex is the gift that keeps giving. Once you learn it, you discover it comes in handy in many places where you hadn't planned to use it. On this page, we'll first look at a number of contexts and programs where you may find regex. Then we'll have a quick look at some regex flavors you may run into. Finally, we'll study some examples of regex patterns in [many different] contexts.

@MartinPacker, well put.

RegEx definitely has an initial steep learning curve, that you really only overcome with repeated use of RegEx. It will depend on the individual and the frequency of use, but this is generally not a long period. It can vary from a few days to a few weeks, but probably spending less than a hour a day.

Once you get over this initial learning hump, you will find many, many use cases where you can quickly develop a RegEx solution.

==So yes, I do encourage all of those interested in RegEx to try developing a RegEx solution as often as a use case presents itself.== See below for help on getting started.

I haven't calculated any specific statistics, but it seems to me that many, many requests here in the KM forum revolve around processing of text. Almost all of these can be solved using RegEx with a reasonable amount of effort/time. Of course, the more experienced you are as a RegEx developer, the easier and quicker it will be. BBEdit has several very powerful RegEx tools.

There is lots of help available for Regex:

As I have often mentioned, one of the most important RegEx resources is the Regex101.com web site.

  • A great place to develop and text your Regex
  • A great place to view the details of someone else's Regex

For example, from the RegEx that I developed as a solution to the problem posted by the OP of this topic:

RegEx Search for
(?s)(.+)(Custom input=.+)
see regex101: build, test, and debug regex

RegEx Getting Started

If you need a more formal, guided instructions to learn RegEx, see

The Complete Regular Expressions Course with Exercises 2020

  • The full course cost $10-12, 89% off regular price
  • But it has an extensive free preview so you can determine if you like the style or not.

Please feel free to ask any questions about RegEx, since it seem we have gone completely off-topic. :wink:

3 Likes

a numbing of the ability to judge when they will genuinely reduce work, and when they risk adding to it.

... and, of course, a tendency to strongly recommend large quantities of (regex|chilli|alcohol)to others : -)

I rest my case :slight_smile:

1 Like

When you say “hours” I think for most people those hours are spent over months or years - as they gain real world experience with projects they develop (or ad hoc uses as you describe with your BBEdit example). This would be my case.

Some people learn in a concentrated week or two; I’m not sure many do. But, apart from the above paragraph, I would think a discussion of learning styles is taking this thread way off topic. (Blame me, as usual.) :slight_smile:

Rob,

You don't like regular expressions? Fine. Don't use them – but stop demonizing them on this forum.

Regular Expressions are a very useful tool used by hundreds of millions of people and processes every day to get real work done.

You want to provide alternatives? Fine. The more the merrier – but your editorializing is tiresome.

-Chris

5 Likes

I certainly use regular expressions, in moderation, where they seem likely to reduce complexity rather than adding to it, and you will find no demonization of them if you read my posts.

When I feel that a problem is more simply solved without them, as in this case, I will say so, and I will show an alternative.

You will notice that I was not the only poster to express that view, and show a non-regex alternative, in this thread.

There is a real problem in over-encouraging reliance on regular expressions:

  • it is expressed in the dysfunctionally large number of posts in this forum in which "Regex" is used in the title of the problem rather than the solution to a workflow issue.
  • and, as demonstrated by the problem brought to this thread and many other threads, jumping automatically to a Regex solution often ends up wasting time.

It's not responsible to gloss over that issue, and it's not a joke to encourage people into a regex as default panacea perception, which frequently and demonstrably causes real loss, of the real time, of real people.

Not to mention frustration.

You are equally (or more so) irresponsible to continually denigrate their usefulness.

Cut it out – and your rabid rationalizing.

No one on the forum things regex is a panacea. It's simply a useful tool that's readily accessible to Keyboard Maestro.

Take your frustration somewhere else.

-Chris

PS fun fact: as it happens I submitted an AppleScript solution to a Rosetta Code problem only this morning, and, unlike most of the posters there, used a regular expression at the heart of the solution.

In that particular case, using a regular expression reduced complexity.

In other cases, using them just adds to complexity.

[Find words with alternating vowels and consonants - Rosetta Code](http://rosettacode.org/wiki/Find_words_with_alternating_vowels_and_consonants#Functional)