Extract two columns from a CSV file

Could someone please help me with a script or macro to extract the columns Dutch_Netherlands and German_Germany from the attached CSV file and write them to a new file:

German_Germany TAB Dutch_Netherlands

Very much appreciated!

Hans

de-nl.csv.zip (844 Bytes)

Obviously the file is comma-delimited. Does it ever contain quoted strings? In other words, does it ever contain commas that are NOT column delimiters?

OK, this is really crude, but I think it will work. Change the file paths in the first and last actions.

ExtractCSVColumns.kmmacros (2.8 KB)

Thank you Dan! I guess that the comma is always a separator. At least I hope so, since I’ve currently only this example at hand.

I get the correct result in the the Keyboard Maestro window, but the output file stays empty. Where is csvData set? Should I set it to the Clipboard content?

Oops sorry. Change “display results in a window” to “save results to variable”, and type “csvData” as the variable name.

Magnificent. Thank you. Is there a way to have the \t interpreted as a real TAB instead of the string \t?

Hey Hans,

Change echo to echo -e in the shell script.

-Chris

1 Like

Thanks, Chris. That’s what I get for not testing it fully. But then again, if I had, I wouldn’t have known the answer, so I guess it worked out just fine!

1 Like

if it did, how would you change the script?

No clue. I didn’t even know this was possible - I just stole it from a Google example. :open_mouth:

Hey Hans,

If the CSV source file is is consistently formatted this is what I would do using an Execute a Shell Script action:

#!/usr/bin/env bash

inputFile=~/'Downloads/de-nl.csv';
outputFile=~/'Downloads/New CSV File.csv';

awk 'BEGIN { FS = "," };{ print $12"\t"$16 }' "$inputFile" > "$outputFile";

OR

#!/usr/bin/env bash

inputFile=~/'Downloads/de-nl.csv';
outputFile=~/'Downloads/New CSV File.csv';

cut -d ',' -f 12,16 "$inputFile" | sed 's!,!	!'  > "$outputFile"

NOTE: Dan’s use of a test in his shell script to make certain the input file exists is a good idea.

Dan – restoring the IFS is not necessary when running from Keyboard Maestro, since the environment for the Execute a Shell Script action is completely transient. (Nevertheless it is good practice for normal shell scripting.)

That can get really complicated.

It seems to me Shane Stanley had a nice ASObjC method, but I can’t lay my hands on it. (Discussion either on the ASUL or MacScripter.net – I think.) I’ll ask him and see if he remembers.

-Chris

2 Likes

Hey, I just stole the example from Google somewhere. I didn’t even know that sort of thing was possible! (Living on the PC side of things has stunted my education.)

This is one of the many reasons I like helping people on this board - it’s a great way to learn things!

And yes, I suspected it didn’t need to be restored, but since it was in the code, I just left it there.

2 Likes

Thank you!

Hey Folks,

Just to make everyone a little crazier let's add AppleScriptObjC into the mix. This should handle complex quoted data in CSV format. (Although I haven't tested to see just how complex.)

The main script produces a list of lists (columns) for the CSV data file.

The part that's customized for Hans pulls columns 12 and 16 and collates them as he wanted.

(Test using the Script Editor.app and Han's data file above.)

I expect ASObjC has classier methods for joining lists than what I've done here, but it works.  :sunglasses:

-Chris


CSV to List of Lists.scptd.zip (18.1 KB)

--------------------------------------------------------------------------------
# Auth: Christopher Stone { Heavy Lifting by Shane Stanley }
# dCre: 2015/03/05 15:28 +1100
# dMod: 2016/07/25 13:34 -0500
# Appl: AppleScriptObjC
# Task: Extract CSV data from a file into a list of lists.
# Aojc: True
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @ASObjC, @Extract, @CSV, @Data
--------------------------------------------------------------------------------
use AppleScript version "2.3.1"
use framework "Foundation"
use scripting additions
--------------------------------------------------------------------------------

# Path to csv source file:
set csvFilePath to "~/Downloads/de-nl.csv"

--------------------------------------------------------------------------------

set anNSString to current application's NSString's stringWithString:csvFilePath
set csvFilePath to (current application's |NSURL|'s fileURLWithPath:(anNSString's stringByExpandingTildeInPath()))'s |path|() as text
set theString to read csvFilePath as «class utf8»
set listOutPut to its makeListsFromCSV:theString commaIs:","

--------------------------------------------------------------------------------
# Customization for Hans
--------------------------------------------------------------------------------
# Extract Columns 12 & 16 with a Tab delimiter.
--------------------------------------------------------------------------------

set _collate to {}

repeat with theItem in listOutPut
   set end of _collate to (item 12 of theItem) & tab & (item 16 of theItem)
end repeat

set AppleScript's text item delimiters to linefeed
set _collate to _collate as text

--------------------------------------------------------------------------------



--------------------------------------------------------------------------------
--» HANDLERS
--------------------------------------------------------------------------------
on makeListsFromCSV:theString commaIs:theComma
   set theRows to {}
   set newLineCharSet to current application's NSCharacterSet's newlineCharacterSet
   set importantCharSet to current application's NSMutableCharacterSet's characterSetWithCharactersInString:("\"" & theComma)
   importantCharSet's formUnionWithCharacterSet:newLineCharSet
   set theNSScanner to current application's NSScanner's scannerWithString:theString
   theNSScanner's setCharactersToBeSkipped:(missing value)
   repeat while (theNSScanner's isAtEnd() as integer = 0)
      set insideQuotes to false
      set finishedRow to false
      set theColumns to {}
      set currentColumn to ""
      repeat while not finishedRow
         set {theResult, tempString} to theNSScanner's scanUpToCharactersFromSet:importantCharSet intoString:(reference)
         if theResult as integer = 1 then set currentColumn to currentColumn & (tempString as text)
         if theNSScanner's isAtEnd() as integer = 1 then
            if currentColumn is not "" then set end of theColumns to currentColumn
            set finishedRow to true
         else
            set {theResult, tempString} to theNSScanner's scanCharactersFromSet:newLineCharSet intoString:(reference)
            if theResult as integer = 1 then
               if insideQuotes then
                  set currentColumn to currentColumn & (tempString as text)
               else
                  if currentColumn is not "" then set end of theColumns to currentColumn
                  set finishedRow to true
               end if
            else
               set theResult to theNSScanner's scanString:"\"" intoString:(missing value)
               if theResult as integer = 1 then
                  if insideQuotes then
                     set theResult to theNSScanner's scanString:"\"" intoString:(missing value)
                     if theResult as integer = 1 then
                        set currentColumn to currentColumn & "\""
                     else
                        set insideQuotes to not insideQuotes
                     end if
                  else
                     set insideQuotes to not insideQuotes
                  end if
               else
                  set theResult to theNSScanner's scanString:theComma intoString:(missing value)
                  if theResult as integer = 1 then
                     if insideQuotes then
                        set currentColumn to currentColumn & theComma
                     else
                        set end of theColumns to currentColumn
                        set currentColumn to ""
                        theNSScanner's scanCharactersFromSet:(current application's NSCharacterSet's whitespaceCharacterSet()) intoString:(missing value)
                     end if
                  end if
               end if
            end if
         end if
      end repeat
      
      if (count of theColumns) > 0 then set end of theRows to theColumns
      
   end repeat
   
   return theRows
   
end makeListsFromCSV:commaIs:
--------------------------------------------------------------------------------

To add some more fun:
perl -MText::CSV_XS -ne 'BEGIN{$csv=Text::CSV_XS->new()} if($csv->parse($_)){@f=$csv->fields();for $n (11,15) {print "$f[$n]\t"}; print "\n"}' ~/Downloads/de-nl.csv
(customized for Hans’ CSV)

Needs Perl with Text::CSV_XS. This is a slightly modified homage to this.

Jeez, that’s gnarly…  :sunglasses:

Unfortunately it’s only returning 4 of 5 rows (skipping the first row after the header-line), and I have no idea why. (I did have to change the line-endings of the file to Unix, but that didn’t solve the problem.)

If you have Macports you can install the Text::CSV_XS module like this from the Terminal:

sudo port install -v p5.22-Text-CSV_XS

Otherwise you’ll probably need to use CPAN.

-Chris

I’ve installed it with cpanminus, which in turn can easily be installed via Homebrew.

Unfortunately it’s only returning 4 of 5 rows (skipping the first row after the header-line), and I have no idea why. (I did have to change the line-endings of the file to Unix […]

It seems you have a different file?! Hans’s file from the download link above does have exactly 4 rows (rows 0 to 3, plus one header row).

And the script —at least for me— returns exactly 4 rows:

2-modus	2-Betrieb	
4-modus	4-Betrieb	
afdekkap	Abdeck, ,kappe	
afdekking	Abdeckung	

In my file the first row (after the header row) is the one with “2-modus” and “2-Betrieb”.

(The commas have been added by me to the source CSV, for testing ("Abdeck, ,kappe").)

And when I add a fifth row, the script returns exactly 5 rows. The same for 6 rows and so on. It also works with Hans’ original CRLF line endings.

Can it be that your Macports install gave you an outdated module, once again:wink:

Nope.

Perl module ports are supposed to use CPAN indirectly and be up-to-date, and I’ve verified the installed module is version 1.24.

I re-downloaded Han’s data file and tried ran the script again.

With column 0 added for reference I get:

0	2-modus	2-Betrieb	
1	4-modus	4-Betrieb	
2	afdekkap	Abdeckkappe	
3	afdekking	Abdeckung	

Very curious… As far as I know I didn’t change the original data file I was working with.

On further inspection it seems the data file is Windows (CRLF) with UTF8 and a BOM.

Now then – if I change the file to plain UTF8 and run the script I get this:

Entry_ID	Dutch_Netherlands	German_Germany	
0	2-modus	2-Betrieb	
1	4-modus	4-Betrieb	
2	afdekkap	Abdeckkappe	
3	afdekking	Abdeckung

Reading the docs I find the module is BOM-aware, so I reckon this is expected.

So… Things are working, but the mystery of why they weren’t yesterday is unsolved.

<shrug>

-Chris

Chris,

I have also something I don’t understand —with your ASOC script above.

I get constant editor crashes (Script Editor or Script Debugger, doesn’t matter).

The culprit is seems to be the this line:

set newLineCharSet to current application's NSCharacterSet's newlineCharacterSet()

However, this works fine, and gives the expected result:

set newLineCharSet to current application's NSCharacterSet's newlineCharacterSet

And this works also (at least for my file which only contains \n):

set newLineCharSet to current application's NSMutableCharacterSet's characterSetWithCharactersInString:"\n"

Hey Tom,

Hm… I’ve had no problems with SD5 running the code, but I was able to produce crashes in SD6 and the Script Editor.

Turning off debugging in SD5 produced some crashes.

Runs fine from an Execute an AppleScript action in Keyboard Maestro.

Curious.

I’ve reported the issue to Shane Stanley who wrote the makeListsFromCSV handler and to Mark Alldritt (SD-Dev).

Thanks for pointing this out.

-Chris