I'm trying to use the OCR function to find dollar amounts in scanned receipts. Using (?m)(\d+\.\d\d)+
only returns the first match as shown in the screenshot. I've also included the text yielded from the OCR showing multiple matches to return. I've tried (?s)
and (?m)
to no avail. I'm sure I have a syntax error somewhere, but not smart enough to figure it out. Please advise. Thanks.
This RegEx works for me:
(?m)(\d+\.\d{2})$
assuming that the dollar amounts are always at the end of the line.
You need to use a For Each action with a Substrings In collection.
Example Output
That did it! Thanks for the help.
Is there a benefit to using \d{2}
versus \d\d
? I like yours better, but not sure if there's a systematic reason it's better.
It is mostly style, but I prefer it because it is easier to read, and change.
Plus you don’t have to count things. Which SHOULDN’T be a problem with just two.
If your source totals sometimes include commas try:
(?m)([\d,]+?\.\d{2})$
The \d{2}
form means match exactly two times, while \d{2,}
would match at least two times (not an issue here) and \d{2,4}
would match at least two and no more than four times (the more common usage). So it's pretty versatile.
@mrpasini That's actually very helpful. I'm also scraping the receipts for dates which sometimes have the year as four digits and sometimes only two, so that will help solve that problem. Thanks.