Match fields in two separate .txt files?

demirtas · February 19, 2016, 3:59pm

Can i use KM to match fields of two separate .txt files? say i have a .txt file with 20 million records I want to match this against my other .txt with 200,000 fields and return a value showing only the fields that are present (matched) in both files. Can I do this in KM if so how?

ccstone · February 19, 2016, 10:05pm

Hey Ali,

Keyboard Maestro will probably choke on 20 million record text file.

It might be reasonable to do this with Perl, and you can probably run the Perl from Keyboard Maestro.

How large is the 20 million record file?

-Chris

peternlewis · February 20, 2016, 3:32am

Doing pretty much anything with 20,000,000 anythings in Keyboard Maestro is unlikely to work.

Keyboard Maestro maxes out executing approximately 1000 actions per second, so that makes for 6 hours per action per record at best.

This sort of task is easily done with perl - read the small file line by line into a hash table, then read the big file line by line and print out any that exist in the hash table.

If both files themselves do not have duplicates within them, and if you have to do it just once, you can just open a new document in BBEdit, read in both files, sort the files, and then Process Duplicate Lines:

And you'll get the result of all the lines that appear twice (ie once in each file).

demirtas · February 20, 2016, 1:13pm

Hi Peter, Thank you very much for your assistance. How can I create this macro? which actions did you use?

Yes I want to get collate the result of all the matching fields.

demirtas · February 20, 2016, 1:13pm

303mb not to worry i am running 12gb RAM and 3.1ghz i5 intel quad core

ccstone · February 20, 2016, 2:50pm

Hey Ali,

That's a function of BBEdit NOT Keyboard Maestro.

I wasn't overly worried about the memory on your system.

Really large files can choke many tools regardless of the amount of system memory.

303 MB shouldn't be too much of a problem though.

Try this from the shell:

sort filePath1 filePath2 | uniq -d

Make sure to quote the file paths with single-quotes if they contain spaces.

See how long that takes to run.

-Chris

Match fields in two separate .txt files?

Options