Trying to parse csv file into variables

Good point. Indeed. Wrong files.

Archive 2.zip (30.6 KB)

Looks very difficult to solve downstream of that rather unconstrained and under-grouped data collection …

Really needs more rigour and ‘bracketing’ of the fields upstream (form design ?)

At the moment there are really no reserved field delimiters or record delimiters – it must be creating a huge amount of work for you. Fixing it upstream should make a big difference …

I’m not sure I understand fully. Especially by ‘reserved field delimiters or record delimiters’. But I do think I generally understand what you’re saying. The most obvious changes I can think to the form are to separate the address into separate fields, and to be clear in my instructions for certain things - like not to use a currency symbol and not to use a semicolon or blank lines.

It’s not causing me too much work because it’s just a small side project, not my day job or anything, in fact it’s all also a part of just learning KM and automation, while having a real life application, but there are obvious limit to what I can do. At the moment it takes about 5 minutes or so to manually adjust everything so that KM can read it. That’s not too bad for now, and KM reading it, means I’ve automated a lot of stuff, filling in webforms basically. Anyway, thanks for all your help, I’ve got a lot to think about. I’ll work on the form for a start.

1 Like

reserved field delimiters or record delimiters

Just that there needs to be:

  1. One unique character (or short string of characters) which marks the definite end of one record and the start of another (in CSV that's generally assumed to be some LineFeed / CarriageReturn combination, so any CR or LF inside the data fields will prevent interpretation. (Record delimiter). And,
  2. One unique character (or short string of characters) which marks the definite divider between fields. (Field delimiter).

(CSV formats often make some use of quoting to cope with fields in which a delimiter character needs to be used as data)

So in my example, there are many ways for the form to produce CR and LF’s, thereby starting a new record, and no way of preventing the user entering commas, which act as the definite divider. I get it. Yes, the quoting has happened and is a pain because it turns my entry from
Damian Evans
into
"Damian Evans"
which I don’t want either.
Also, sometimes in the long description people use quotation marks, so that gets messy also, I can’t use them as definite dividers. Semicolons seem to make the most sense.
It’s certainly a messy business. I’ll give it some good thinking. :wink:

Yes, the quoting has happened and is a pain

Well, the quoting may turn out to be your friend:

  1. CSV parsing software (like Python3 CSV) will treat it as 'syntax' and can drop it automatically from the output,
  2. The quoting can solve some of the problems of special characters like commas occurring within the data - CSV parsers can ignore such things if they occur inside a string that is between quotes.

Good luck !

1 Like

Hi @ComplexPoint
I was looking at your script again. While it worked with your example, I can't get it to work with any other csv file. They seem to be simple files, with common delimiters and only one record. Like this one:
Example.zip (1.1 KB)
Could you tell me why please? I'm leaning more towards having a neater form, then turning it into a json list, so I can retain some formating for the 'long description' section. ie. CR's and LF's. I'm also looking more into Reg Ex, so I should be able to parse a lot of it, maybe from with KM. (Check this out! (["'])(?:(?=(\?))\2.)*?\1/ig !!)

I'm not sure of the workflow yet, but I'm stuck at step one.
Thanks.

If that is literally the file that you have passed to the macro, then then problem is that it is too short for the Python sniffer process to be able to see the pattern of the comma-separated data.

  1. You need to give it at least a few lines
  2. In that macro, the size of the sample it tries to analyse is 1024 bytes (the sample file you show us is shorter than that, and seems to contain no examples of line delimiters)

The section of the Python code which tries to sniff out a consistent pattern in the data is the line which starts with the word dialect and specifies reading in 1024 bytes for analysis, before returning the file pointer back to the start of the file.

If the sample is too small, or insufficiently consistent for a clear pattern to be easily and automatically discerned, then it won’t be possible to derive a usable value for dialect.

Well that is interesting.

Because of my workflow, my data file only has one record, with multiple fields. I copied and pasted the one record many times and it still wouldn't read it (notworking2.csv) , but it did read my source file that I download from the server (cfdb7-2017-11-20), which is very interesting. I now have a json list of three records - a header record, my first line of data, and that copied (which would normally be different). Could you point the way to turning that data into variables in the same way my original macro did? I only need the 2nd line, but I could possibly map that to the header as variable names. I'm willing to do some work here, but perhaps you can suggest the best way forward? Thanks so much for all the help. It's amazing.Archive 3.zip (3.1 KB)

Various ways, I would probably write a short Execute JavaScript for Automation action in Sierra-onwards JavaScript (let me know if you are using an earlier macOS), and do something like the macro below:

(note that I have adjusted one line in the Python, to use print (json.dumps( [ row for row in reader ] ))

Bind KMvars to CSV data thru Python, JSON.kmmacros (20.3 KB)

1 Like

OK, that’s amazing. Now the data is going straight from being downloaded into KM variables. And with a watch folder action, I can make that even slicker. I had some data manipulation going on in excel, which I’ll now do in KM, like turning the date into %variable%day%, etc, and the time into 24hr, 12hr, hour, minute, AMPM, etc. Again, I have no doubt I can do that in KM too. And the biggest thing, it handles the LF and CR’s in the form. It didn’t like the quotation marks, but that’s ok, I can either work around that or just not allow them in the form. This probably saves me 20 minutes processing each form. I was saying it was taking me a few minutes, but that probably wasn’t realistic, more like what I wanted to bring it down to. Thanks so much. I don’t understand the javascript unfortunately, but I understand parts of it. Anyway, thanks so much, it’s incredible. Now I have some work to do to tie it all together. I hope I can contribute some way in the future.

2 Likes

@ComplexPoint
I've been using your python and javascripts above regularly. I generally take out the long descriptions that use punctuation, and enter them in 'manually' using KM macros. It's been working fairly well to allow automated entry of the data into various online forms.

However for some reason, this file here can't be read. I've gradually reduced it down, trying to elimate all possible reasons, extra comma's, CR etc, but it's very bare now and still not reading.

Any ideas on what is going wrong here?

Also, when I get the error, it is the notification style in the top right, which also means I don't see much of what the error is. Is there a way to change that also?

I've attached the full macro and the file that it can't read below.
Thanks.

180701 John copy.zip (1.1 KB)

If Then Else.kmactions (1.3 KB)

You seem to have seens a kmActions file rather than a kmMacro file.

Perhaps you had selected actions rather than just the macro when you exported ?

(All that has come through is this:

Indeed. Let's try this.

0.1 Bind KMvars to CSV data thru Python- JSON.kmmacros (28 KB)

The file read is working here with that macro and file - we get a result showing the mapping of data to fields, the only failure I am getting is in the file write (because I don’t have your dropbox path on my system)

ok here’s something odd. I downloaded zip file I had uploaded, and it also worked for me. I went back to the original I had uploaded and it didn’t work.

I then moved that downloaded file to the original folder and it didn’t work. Moved it back to the download folder, and it worked.

Would you mind trying it with this one? It won't work for me.

180701 John 2.zip (1.3 KB)

OK, but then I downloaded it and it did work. Why would that be? A .csv thing?

Perhaps line-endings ? CR vs LF (e.g. Windows vs Mac/Unix) ?

I'm beat. I used a csv converter program to change it to every possible combination (LF, CR, CRLF). Tried UTF-8 and UTF-16, tried Excel and Apple Cell format. Even tried a pipe | separator. Nothing worked.

I would be useful if I could read the error msg. The most I can get is this:

image

There has to be an easy way to read the rest of that right!? :wink:

EDIT:

While I'd still like to work out how to see the rest of that error message, I think I have narrowed down the issue with the macro to the path name containing invalid characters. In particular, single quotation marks. I'm sure there's more it doesn't like, but that seems to be the offender in this case. The issue was the folder name, not the file name - which took me a lot longer to hunt down. Thanks for the help, I hope I can now call this solved again!

Open the Keyboard Maestro Engine Log in the Console or BBEdit

If you don't have BBEdit you should. The demo is fully functional and will revert to “lite” mode after 30 days (if not purchased). BBEdit in lite-mode is still highly scriptable and quite powerful.

-Chris

1 Like