Split rows of text into batches

You can use the unix tool split to split a file into parts as you describe, something like this:

cd /Wherever/the/file/is
split -l 100 example example-

That will produce files like example-aa, example-ab, etc.

Alternatively, you can do something like this in Keyboard Maestro:

  • Set Variable File Number to text 001
  • Set Variable Line Number to 0
  • For Each variable Line in Lines In file “~/Desktop/example”
    • Append text “%Variable%Line%\n” to file “~/Desktop/example-%Variable%File Number%”
    • Set Variable Line Number to calculation Line Number + 1
    • If calculation Line Number >= 100
      • Set Variable File Number to calculation File Number + 1 format 000
      • Set Variable Line Number to 0

Something like that should do the trick.

1 Like

ComplexPoint, unfortunately I do not know javascript.

JMichaelTX, are you saying that I cannot have a string array? My line by line data will actually have letters and numbers in it. e.g. GB234534500

peternlewis, yes I could split into separate files. Then I would need to cycle through each file in the folder and load in the lines to a variable, so I can then paste them into another app. Not sure how to call or run a unix tool split as I didn’t even know there was such a tool! Or just try your second approach. Thank you.

My programming is rusty and this is all proving a great challenge to me!

Hey Jon,

Select your file in the Finder and then run this script from the Script Editor.app.

You'll find in the result pane that the lines have been separated into a 1 dimensional array (an AppleScript list object). From there you have many options.

------------------------------------------------------------
tell application "Finder" to set finderSelectionList to selection as alias list
if length of finderSelectionList = 0 then error "No files were selected in the Finder!"
set theItem to item 1 of finderSelectionList

set fileText to read theItem as «class utf8»
set fileText to paragraphs of fileText
set fileText to reverse of fileText
repeat while item 1 of fileText = ""
  set fileText to rest of fileText
end repeat
set fileText to reverse of fileText

set splitTextList to {}
repeat with i from 1 to (length of fileText) by 100
  try
    set end of splitTextList to items i thru (i + 100) of fileText
  on error
    try
      set end of splitTextList to items i thru (length of fileText) of fileText
    on error e
      error e
    end try
  end try
end repeat

splitTextList
------------------------------------------------------------

The question is what are you really trying to do in what application(s).

It may be possible to streamline things if the apps you're using are scriptable.

-Chris

Oh, yeah.

Unix code like this:

cd /Wherever/the/file/is
split -l 100 example example-

Is run from an Execute a Shell Script action.

-Chris

Thank you for that code. I will see if I can work out how to run it.

By scriptable, do you mean if the app has its own macro language?

Hey Jon,

You end up with an AppleScript list object in the form of:

{100 lines, 100 lines, 100lines, …, Any lines left over}

This is an object not text, so it will require further processing to get each block of lines out of the array.

You should tell me a bit more about what you're doing, so I can help.

No. I mean whether or not the app is AppleScriptable.

Even if they're not AppleScriptable it may be possible to use AppleScript and System Events to work with them.

Exactly what you're doing affects how best to extract and emplace each chunk of text.

I can think of several methods, but they depend upon what you're doing.

Probably the most efficient method would require you to install an AppleScript Extension (OSAX).

-Chris

Well, my script is to do this:

  1. Import csv into an app. Spit out filtered data as MyData.txt. - now working!

  2. Loop through MyData.txt, 100 rows at a time until last row encountered.

  3. With each of those 100 rows, paste it into a field on a web app (the extension I purchased). Click Analyse. Wait until all the data has been retrieved. Export data to file, appending each time. e.g. MyData.txt has 1000 rows. So exported file will have 1000 rows. I have to break it up into chunks of 100 else the extension slows to a crawl.

My sticking points involve the splitting up of MyData.txt, cycling through pasting into the field on the extension and then knowing when the extension has finished retrieving all the data, by scraping for a signal. Hope that made sense!

Hey Jon,

Now we’re talkin’…

Understanding your workflow makes it much easier to figure out how to help.

So. Do I understand correctly that you are iterating through your whole analysis process start to finish with each 100 rows?

You only have 1 field to paste this into? (This might make it possible to place the text with JavaScript rather than pasting it.)

Either way (JavaScript or pasting) the way I’d handle the text is with the Satimage.osax AppleScript Extension’s regular expressions.

I’d change my split script to use the Satimage.osax and make it faster.

I’d write the output to a file.

Then with each iteration I’d read 100 lines off the top and then delete them.

If there are always 1000 lines then it’s very simple.

The Satimage.osax installer installs only one file here:

/Library/ScriptingAdditions/Satimage.osax

http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html

I’ve used it since 2003.

That easily takes care of the 100 lines bit.

So, the remaining challenge is how to determine if the processing is complete.

-Chris

To explain a bit more. MyData.txt may have say 1000 rows, or 15000, or whatever.

I could paste the 15000 rows in one go into my extension, but everything grinds to a halt and it becomes unusable. So that is why I am breaking it into chunks of 100.

It goes like this. Load up the extension page. Click a button on the bottom of the screen that pops up a box where I can paste in data. Paste in 100 lines. Click Analyse. Wait until the process finishes. Click another button that will export that data to a csv file. Repeat the above until all 150000 rows are done.

The speed of the script is not so important. Just simplicity in programming it! I will run the program and the data gathering from the Chrome extension will mean the whole process will take like 2 hours for just 3000 lines of data.

That is correct.

From the KM Wiki Keyboard Maestro Variables

Variable Arrays

Variables can contain an array of comma separated numbers, like the image size (123,456) or window frame (100,120,600,550). In a calculation field, you can refer to these using a normal (1-based) index notation, like Variable[2]. So you can use ClipboardImageSize[1] and ClipboardImageSize[2].

It is important to note that variable arrays can contain only numeric values. When you use the %Calculate% function to reference a variable array element, it will convert the element to a number. So, in effect, you can only use variable arrays to store/reference numbers.

To be more clear - Keyboard Maestro has no real support for arrays, but variables can contain a comma-separated list of numbers which basically exists for things like window frames, screen locations, mouse positions, etc. In calculations, you can access them as Variable[2] or Variable.x (for window rectangles or mouse locations or the like, which really just means Variable[2] anyway).

That said, you can have an array of strings by separating them with any character not used in the string (for example a linefeed or a comma or a bullet (•) or even a sequence of characters like “,KMSEP,”). You can use the For Each action to iterate over the strings, and you can use a regular expression to get the Nth string. See: How to access a text array variable.

Thanks for clearing that up. Now let me see, does KM have arrays, or not? :wink:
Sorry, for poking a bit of fun at the contradiction.

IMO, the best way to keep things clear is to definitely state that KM supports arrays only for a list of comma-delimited numbers.

If the user wants to treat a string as if it were a list, then each text segment (substring) in that string must be separated by a unique character not used in the substrings.

IMO, the easiest and clearest way to do that is by putting each substring on a separate line, and the using the For Each Action on "the lines in" either a KM variable of the clipboard.

For example:

If your string already has a separator other than new line (\n), then I would just do a simple replacement before using the For Each action:

So if the string is using "|" as the separator, replace with with new line "\n"

You could use RegEx like in the link Peter provided, but, IMO, this is simpler and easier to write and understand.

But then are are many ways to skin a cat in KM. Each is free to choose what works best for him/herself. :wink:

I’ve decided to use:

cd /Wherever/the/file/is
split -l 100 example example-

Its the simplest for me to understand.

However, in trying to do so, I have struggled to understand the path of OSX. In Windows, its C:\whatever. But how do I do the path on OSX? Is it ~/whatever?

The tilde is expanded to the user’s home path.

Probably worth opening up Terminal.app and experimenting a bit with the basics

ComplexPoint, your suggestion was a good one. I ended up watching a YouTube video on Terminal and now understand better the file structure of the Mac and also how to use Terminal.

I’ve been playing with Terminal and its pretty nifty! I did the split thing manually and it worked fine. But it splits the files into the same directory as the file being split. Is it easy to have it split the files into a subdirectory? e.g. Tmp

If it can do that, then I can cycle through the list of files without the source file getting in the way. :smile:

Is it easy to have it split the files into a subdirectory ?

This macro will show you a pdf version of the manual entry for split

(or any other Terminal command that you want to look at)

Generally, Bash commands send their output to the active directory by default, but also let you give a path to a different folder.

StackOverflow is also a good resource:

ComplexPoint, some great links and advice there, thank you.

I notice in the Stackexchange example they gave a path /lots/of/little/files/here

Forgive my ignorance, but should it not be ~/lots/of/little/files/here

Or is it a case that / will force it to the root while ~/ forces it to the home?

Hey Jon,

It's not a matter of forcing; it's a matter of addressing.

The guy on Stackoverflow was just indicating some path goes here.

Note that any path with spaces in it must be quoted or escaped.

'/Users/me/test_directory/Some Folder with Spaces/'

OR

/Users/me/test_directory/'Some Folder with Spaces'/

NEVER:

'~/test_directory/Some Folder with Spaces/'

Putting a quote in front of the tilde prevents it from expanding properly

This works:

~/'test_directory/Some Folder with Spaces/'

-Chris

1 Like

Thanks Chris. I have now got a script that rotates through the split files in a folder, shows the filename and list of contents on a popup so I can check its working ok. Now I am working on getting it to paste into the form correctly. I’ve used the Pause Until a specific button appears, then I tab 3 times to get to the field, but its proving inconsistent, hence a new post about how to click in a field. Slowly getting there!

And yes, “addressing” was the correct term. I couldn’t think of the right expression there.