MultiLine Regex101 working but not in KM

Sorry for a repeat post , I understand this has been addressed but I'm learning regex and can't figure.
I'm trying to extract text in multiple lines as seen below.

I'm trying to extract the two reference number as seen below. KM extracts a single one.

Screen Shot 2020-07-17 at 4.07.36 AM

Regex101.com usually provides the "global" option by default, which means it will show all matches that are found in the string.

KM Search shows only the FIRST match.

So, you have to do one of the following:

  • ADD another Capture Group to match the second occurrence.
  • Use for For Each action with a collection of "The substrings", like I just used in my post to another question:

image

In the future, please post your macro, or in the case of a RegEx request, post a real-world example of:

  1. Source text using Forum Code Block.
  2. Text after extraction/change
  3. The Regex pattern you have tried

If it is just a follow-on question, the post in the same topic.
If the subject of your question is materially different, the post in a new topic.

2 Likes

@JMichaelTX has answered your question, but I have to ask, what is your intention with (.|)*? Because that is a very strange construction, which matches exactly the same as .*, but is dangerously close to being a pathological regex since it can match an infinite number of nothings before matching the .. For example the text “hello” could match that regex as:

()()()()()(.)(.)()(.)()()()(.)()()()(.)()()()

And any number of () empty matches could be included in each position.

Unless I am missing something…?

Hi @peternlewis, no intention , like I said as a regex n00b, this is the one which matched , when I was doing trial and error .

OK, well, if you aren't trying to do anything more, .* is generally what you want, matching any sequence of zero or more of any character except line ending characters (there are ways to make . also match line ending characters, but by default it does not).

Here's one way to do it.

Extract Text Macro (v9.0.5)

Extract Text.kmmacros (4.0 KB)

1 Like

My preference would be to use two Capture Groups with one Search using Regular Expression action, since you probably want to use each Reference# separately.

Assuming the Reference numbers are defined by a Regex Word, then this Regex should work:
(?i)reference.+?(\w+).*\R.+?reference.+?(\w+)

For details, see https://regex101.com/r/BDK5LZ/1/

If you want a broader definition of Reference number that must start with a RegEx word character, but then can be anything other than a SPACE, then this would work:
(?i)reference.+?(\w[^ ]+).*\R.+?reference.+?(\w[^ \n\r]+)

Below is just an example written in response to your request. You will need to use as an example and/or change to meet your workflow automation needs.

Example Output

image

NOTE: There may be minor errors in the source text, which was obtained by OCR of your image. In the future, please post your source text using a Forum Code Block.

Please let us know if it meets your needs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MACRO:   Extract Reference Numbers from Multiline Sring [Example]

-~~~ VER: 1.0    2020-07-17 ~~~
Requires: KM 8.2.4+   macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Extract Reference Numbers from Multiline Sring [Example].kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.


ReleaseNotes

Macro Author: @JMichaelTX

This is just an example written in response to the below KM Forum Topic. You will need to use as an example and/or change to meet your workflow automation needs.

MACRO SETUP

  • Carefully review the Release Notes and the Macro Actions
    • Make sure you understand what the Macro will do.
    • You are responsible for running the Macro, not me.

I want to personally thank you all for answering my queries in such detailed replies. @JMichaelTX @peternlewis @thoffman666 There are so many things for me to learn

So \R allows me to continue my regex to match next line but not all lines .
@JMichaelTX in ur macro is \i needed when regex is already case insensitive ?
@thoffman666 in your answer we capture the string in group and replace with string and new line , may ask what does localextractedtext regex doing ?

The second regex search and replace is just to get rid of all the text after the last found Reference Number. You can see that as the difference between the outputs for Preprocessed Text and Extracted Text. There's no need to use two different variables for this. I did so just to make the example easier to understand.

Technically it is not needed, but I have just developed a habit of always providing the Regex Options at the beginning of the pattern -- it makes things explicitly clear. Some KM Actions let set choose case matching, but others don't.