I'm trying to create a macro to transform my bank .csv file into one that is accepted by my finance software. I'm using below regex statement but it doesn't match everything.
It is hard to tell exactly what fields you are trying to extract, and how you want to change the CSV file for your finance software.
Have you considered opening the CSV file in a spreadsheet, like Excel?
Once opened there, it should be much easier to make the changes you want.
Having said that, if you want your RegEx .+ to include end of line characters, you can start the Regex with (?s)
This may or may not actually fix your problem.
Note that [^;] matches everything except ;, which includes line ending characters - this is important since it is often easy to think that it would match less than . (since it is excluding ;), but . is not “match any character” by default, it is “match any character except line ending characters”.
Thanks guys for the answers.
The reason I can't use excel is that I'm in Greece and the CSV file from my bank is in a weird encoding, so my macro does more than just match the lines.
However, peternlewis you're correct. This is mostly what I want to do.
eight columns separated by ";" but using your proposal I couldn't get anything to match on my example.
I'm not sure what the "^" and "$" characters are supposed to do on either side.
I know they state the start and end of the match but I don't understand exactly how they work.
If I remove the "$" at the end I get below match.
This includes the start of the 2nd match "Home / Supermarket" as part of the last group of the 1st match
Try combining @JMichaelTX's tip with @peternlewis's suggested regex minus the ^ and $ characters at the start and end. You should get something like this:
After I more closely examined your source text it appears to me that you don't really have a CSV (Comma Separated Values) text file.
If you have Excel, then you might try this: Use the Excel Text Import feature, with ";" as the delimiter. Just quickly scanning through your screenshots it seems that might work.
Make sure that there is a \n on the end of your data file though.
However, as I originally noted, parsing CSVs this way is asking for trouble. It will fail if your fields ever contain a semicolon (the CSV will quote the semicolon like:
"6232074351955; Rent / GD19162442342582"
But your regex will not process the quotes so it wont know it is not a field separator.
I think I see the problem with your data.
The two lines that have "TRANSFER" near the beginning that contain a quote mark (") and then the next line which also has a quote mark, do not fit the same pattern. I believe the next line is actually part of the "Comments" column.
Example:
Redestribution;ACCOUNT TRANSFER ;24/09/2019;"6232074351955
Rent / GD19162442342582";1000;EUR;2744,4;EUR
Finally I did make it work.
The most helpful thing was [^;]+ which also matches new lines.
I got very specific about dates and values and that finally matched everything.
I didn't have to match the header line anyway because I always apply a new header line for the fields I do use.
I will keep an eye on it for the next few times I do this to make sure nothing breaks but I think it will work.
This set of macros allowed me to transform all CSV files from 4 banks in about 7 seconds.
A process that used to take at least 10 minutes and was very tedious!
Thanks for sharing that. I just leaned something. I previously thought that I needed to set the option for "dot matches newline" in order for any other matches to include a newline. But clearly using the match any character that is NOT in the character class (Negating Character Class), as in [^;], includes newline characters.
Yes, I suspect this is a common misconception. The “s” flag affects only what a “.” means. People are frequently caught out when they go from using something like .* to [^;]* and suddenly find it is matching their whole file instead of part of one line.