Is it possible to extract the following string in a variable

Hi,
Is it possible to get the string after a text in a variable that states ‘our ref’ I would want the text after the string our ref as I want to save that string as a KM variable.

e.g. I have a variable named ‘content’ inside this variable the following text is stored: some text Our ref: 1548797 …more text

I would want to automatically locate the string after ‘Our ref:’ which is ‘1548797’ and save that as a KM variable and maybe upto he next 5 characters after 1548797* ends…

Thank you

Sure! Try this "Search Variable using Regular Expression" action:

Here's the text for the regular expression:

our ref: ?(\d+)

Which means:

our ref: Find the text "our ref:"
 ? Optionally followed by a space
() Capture the match inside the parens
\d+ Match at least one digit

1 Like

Hi Dan,

Thank you very much it works, but it does not pick up when I have a hyphen in the reference such as 1545-445 i get the result 1545445 instead. Sometime the variable may have ‘/’ or letters inside of it too. Whats the best way to do this?

OK, we can fix that. How do we know when the reference number ends? For instance, will it always have a space after it?

1 Like

ok, the reference would end if we have about 2 or more spaces after it.

Can the reference number contain spaces inside it?

yes it can

OK, use this as the search expression - everything inside the quotes, but don’t use the quotes:

"our ref: ?(.+)  "

NOTE: It has 2 trailing spaces at the end, after the closing “)”.

Dan, scansnap scans from KM, then saves the letter as OCR. KM then reads the letter without opening it and saves the text of the letter as a variable. We now need to grab the references. What would you say is the best way to handle these references?
generally the references are on the same line. Sometimes the gap can be more than 2 spaces......

Here are the non OCR versions of the refs that we scan.

That helps, to be able to understand your context.

Can you provide the OCR text for those various images?

Yes sure, I will send you a PM

1 Like

You might try this:

(?mi)our ref:\s*?([\/\-\w\d]+\s*[\/\-\w\d]*)\s*

This will match any ref# that contains any of these:

  • any letter or number
  • "/" or "-"
  • any number of white spaces between the above

The match must start with "our ref:" (case insensitive)
and end with zero or more white spaces and/or end of line

Here is the detailed explanation from regex101.com:

2 Likes

If you know that the ref# will ALWAYS end in letters or digits ("\w"), followed by zero or more white space and then end of line, this should work:

(?mi)our ref:\s*?([ \/\-\w]+\w+)\s*

1 Like

That will capture everything up to the last two spaces (including any previous spaces before the last two).

You want a non-greedy search:

"our ref: ?(.+?)  "

That will match the minimum required to complete the pattern as opposed to the normal .+ which will match as much as possible.

3 Likes

Thank you for this, it’s better that I learn these keys to save a repetition of questions.

1 Like

Regular expressions (RegEx or RegExp) are extremely powerful, but have an initial steep learning curve that is often intimidating. But once you get over that initial hump, and you continue to write new RegExp, it will become much easier.

You should also know that there are often many ways to write a RegEx to achieve the same results. Most of the time it does not matter, unless the source text is very long, and the RegEx is very complex.

I do all of my RegEx development at this free website:
www.regex101.com

You may also find this site helpful:
Regular Expressions Quick Start

2 Likes

Thank you I will look into this.

Hi DanThomas,

Here are some examples, I am having a lot of dificulty trying to get a one size fits all approach. Do you have any suggestions how I can approach this? generally the ref is on the same line and it is rare that something else is on the same line but it does happen. When it does happen there is a large space before the next writing commences.

Our reference; A4846408
Our reference: A4846408

O u r r e f e r e n c e : 3 - 4 7 1 9 8 0 3 1 7
Our reference : 3-471980317

O u r R e f : P P I / 3 8 7 6 1 3 1
Our Ref: PPI/3876131

Our reference: 3-471980407

Re: CPPI5506 Claim PPI/957188/1518413978

Our ref no: SAR Team/TG - 860384

O u r r e f e r e n c e : 3 - 4 7 1 9 8 0 4 1 3

O u r R e f U 6 0 1 8 9 0
Our Ref U601890

Our Ref; 3935878
Our Ref: 3935878
Ou r R e f ; 3 9 3 5 8 7 8
O u r R e f : 3 9 3 5 8 7 8

O u r r e f e r e n c e : 3 - 4 7 1 9 8 0 4 0 4

R e f e r e n c e C O M / 3 3 4 5 3 3 / 2 0 1 6
Reference COM/334533/2016

something like this works but it does not cover all elements, is it even possible to cover all in one? also the dots only work if the string matches if i add more dots than the string the result is then blank. I guess using the dots is not a good idea.

I’m sure other people will disagree with me, and I’ll give my reasons in a moment.

But I think the only sure and easy-to-maintain way is what you’re doing, except I’d use a “Switch” action. You can have all sorts of conditions, regexes, etc. And use an “otherwise” condition at the end, so you can display a message if none of the conditions match.

Let me know if you need help with it.

Now here’s my reasoning.

First off, while it might be possible to develop a few regexes to handle all the conditions, they’d probably be fairly complex. Which means, the first time they don’t match something, you have to figure out why, and either fix them (without breaking them) or craft another one. This might be OK if you’re a regex expert, but I still wouldn’t recommend it.

I’ve been developing professionally since 1979. I always want to overengineer everything. Because it feels good to have some small, tight bit of code. But then it breaks. And you can’t figure out how to refactor it to handle this new issue. And you spend forever on it.

It’s so much easier to have a long switch statement. While it looks ungainly, it’s a snap to add a new condition. And trust me, when you’ve been using the macro for a while, and it suddenly breaks, you’ll be incredibly happy that it’s so simple to understand.

Mind you, I’m not saying you can’t use some regexes. Just don’t try to handle everything in one of them. It will take time at the start, but as I said, it’s time you’d spend (and them some) trying to figure it out later.

1 Like