Regex - how to match everything after a 5 digit number

I have documents referenced like this:

Original to ABC

AGENCY OF WHATEVER | 2024-12345
Short description of the document contents

I'm still learning RegEx so I was wondering if someone could help me:

  • Extract ABC (the first three letters after "Original to")
  • Extract the Agency (everything before | on that line)
  • The short description, always on a new line after the 2024-12345

So far, all I've managed to do successfully is to extract the four digit number followed by a hyphen followed by a 5 digit number:

\d{4}-\d{5}

and extract everything before the |

> ^[^|]+(?=|)

Any help would be greatly appreciated! RegEx is so hard!

There are probably many ways to do it. I did it this way. Hopefully I interpreted your request accurately, but half the time people respond, "That's close, but I didn't want {x} to be included."

Thank you for the quick response too, you rock!

I was having trouble getting it to work, then I realized the text I had copied had extra lines. (Copying from email, it looked like it was spaced normally but then on the clipboard it had a bunch of extra lines.) So I tried to filter the clipboard with Remove Whitespace but that just wrecked the RegEx search. Also, it didn't extract the description if the description was more than one line. So I guess the question now is:

Is it possible to extract everything before the | on that line only?
And extract everything after the 1234-12345, no matter how long it is?

Thank you again, I'm learning a lot from you!

Original to xx
[empty line]
[empty line]
AGENCY OF WHATEVER | 1234-12345
[empty line]
Description line 1
Description line 2

Ok, so here's a new action:


The string you need is:

(?s)(Original to )([^\n]*)\n\n\n([^|]*)\|([^\n]*)\n(.*)

From the sound of your last email, I think you will be able to figure out which parts of the string correspond to which parts of the results. But if not, just ask.

Things you may not know:

  • the opening four characters (?s) tell regex to treat newlines as matchable to a dot. I needed to add this because you changed some things from your original question.
  • ^ means "not", so ^\n means "not a newline"
  • [something]* means 0 or more occurrences of any character from the 9-character string "something"
  • .* means as many characters (dot means any one character) as possible
2 Likes

I. Love. You. This is perfect!
Thank you so much, you really helped me out. I'm willing to send you stickers and postcards from Guam as a thank you, so please let me know where to mail them to.

1 Like

You are welcome. I never give out my address, and I never accept payment for helping people. But thanks for making me feel good today. Chew some pugua for me.