Help Needed With a Macro That Splits a Text File

Not sure if I should be posting in a new topic or not, but my problem is similar to the OP in this post: Parsing a Single Text File into Multiple Files - #12 by Aachoo2

I was hoping to modify this solution to fit my own needs. Although I am a KBM newbie I was able to follow many of the steps in this macro. I get lost in the regex, though.

Like the OP, I have a text file that needs to be split into into multiple files
My splitting delimiter is @@@ but I could change it to anything
I'd love it if the macro could do the following:

  1. Split the big file using the delimiter (might be nice to be able to customize this so more people could use the macro)
  2. If the first line of the split files is blank/whitespace, delete it
  3. Name the individual files using the first non-blank line
  4. Delete that line in the new file (so that each new file starts with the second non-blank line). In my example file to split below these are the lines with tags (i.e. #cat #pet)
  5. Save the processed individual files (I'm not picky about this part. They could be saved in the same folder or a new folder like in the solution provided to the OP)

Here's what a representative file to split might look like:

cat litter
#cat #pet
This is a fantastic litter that doesn't clump or smell
It's available from fine pet products retailers everywhere
@@@
dog litter
#dog #pet
This product doesn't exist. 
@@@
bird litter
#bird #pet
This is basically a bunch of old newspapers used to line a birdcage

Thanks in advance for any help.

best,

Randy

P.S. I originally tried to do this in the new Mac Shortcuts app. Silly me.

The splitting of text and stripping of whitespace made me think of this problem as most easily done in a scripting language like Perl, Python, JavaScript, etc., so that's what I did. Others here may come up with a solution with only Keyboard Maestro native actions.

Split Files Into Notes.kmmacros (3.6 KB)

The macro loops through all the notes files you've selected in the Finder, and performs the steps you outlined in the Execute Shell Script step. It

  1. Splits the file contents on "@@@"
  2. For each section
    a. Strips any leading and trailing whitespace.
    b. Splits the section into lines.
    c. Sets the first line to the name of a new file.
    d. Joins the rest of the lines together, stripping any leading and trailing whitespace, to make the contents of the new file.
    e. Writes the new file to the same folder as the file it came from.

If you have sections that are named the same, the later section will overwrite the earlier one. I was too lazy to put in code to deal with that.

2 Likes

Thank you so much, good doctor!

Your solution worked perfectly -- and has inspired me to look further into python so I can understand that part of the macro.

I'm very grateful.

--Randy

1 Like

FWIW, a JS variant:

Split Files Into Notes (JS variant).kmmacros (7.7 KB)

Expand disclosure triangle to view JS Source
(() => {
    "use strict";

    const main = () => {
        const
            kme = Application("Keyboard Maestro Engine"),
            kmValue = k => kme.getvariable(k),
            fpFolder = kmValue("CurrentFolder");

        return kmValue("Filecontents").split("@@@")
            .flatMap(note => {
                const xs = lines(note.trim());

                return 0 < xs.length ? (
                    either(
                        msg => [msg]
                    )(
                        fp => [`-> ${tildeForm(fp)}`]
                    )(
                        writeFileLR(
                            combine(fpFolder)(
                                `${xs[0].trim()}.txt`
                            )
                        )(
                            `${unlines(xs.slice(1)).trim()}\n`
                        )
                    )
                ) : [];
            })
            .join("\n");
    };

    // --------------------- GENERIC ---------------------
    // https://github.com/RobTrew/prelude-jxa

    // Left :: a -> Either a b
    const Left = x => ({
        type: "Either",
        Left: x
    });


    // Right :: b -> Either a b
    const Right = x => ({
        type: "Either",
        Right: x
    });


    // combine (</>) :: FilePath -> FilePath -> FilePath
    const combine = fp =>
        // Two paths combined with a path separator.
        // Just the second path if that starts with
        // a path separator.
        fp1 => Boolean(fp) && Boolean(fp1) ? (
            "/" === fp1.slice(0, 1) ? (
                fp1
            ) : "/" === fp.slice(-1) ? (
                fp + fp1
            ) : `${fp}/${fp1}`
        ) : fp + fp1;


    // either :: (a -> c) -> (b -> c) -> Either a b -> c
    const either = fl =>
        // Application of the function fl to the
        // contents of any Left value in e, or
        // the application of fr to its Right value.
        fr => e => "Left" in e ? (
            fl(e.Left)
        ) : fr(e.Right);


    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single
        // string delimited by newline and or CR.
        0 < s.length ? (
            s.split(/[\r\n]+/u)
        ) : [];


    // tildeForm :: FilePath -> FilePath
    const tildeForm = fp =>
        // A filepath abbreviated with a tilde
        // representing the $HOME path.
        ObjC.unwrap(
            $(fp).stringByAbbreviatingWithTildeInPath
        );


    // unlines :: [String] -> String
    const unlines = xs =>
        // A single string formed by the intercalation
        // of a list of strings with the newline character.
        xs.join("\n");


    // writeFileLR :: FilePath ->
    // String -> Either String IO FilePath
    const writeFileLR = fp =>
        s => {
            const
                e = $(),
                efp = $(fp)
                .stringByStandardizingPath;

            return $.NSString.alloc.initWithUTF8String(s)
                .writeToFileAtomicallyEncodingError(
                    efp, false,
                    $.NSUTF8StringEncoding, e
                ) ? (
                    Right(ObjC.unwrap(efp))
                ) : Left(ObjC.unwrap(
                    e.localizedDescription
                ));
        };

    return main();
})();

( a reminder of how much more Python has built-in, amongst other things )

3 Likes

Thank you, Complex! The python script was actually throwing some errors on a few of my text files, but this JS version seems very solid.

I greatly appreciate all the help!

I took the challenge. Here is a functional macro with native KM actions that handles the given text in the OP.

Main actions involved:

  • Uses custom delimiter to split the source string.
  • a while action is used to loop through the split text.
  • Trims the white spaces in each split text (the filter action).
  • Uses RegEx to get the first line and the rest of the lines.
  • write to file.

RegEx - Splits a Text File.kmmacros (5.4 KB)

Macro Image

PS: I create a temp folder in my Downloads folder. You may change it to other folders.

I’d be interested in seeing the errors and—if there are no privacy concerns—the files that led to them. Even if you can’t use it, I like learning how to make my scripts more bulletproof.

1 Like

Hey Folks,

Me too, although I used a different method.  :sunglasses:

(I do like how @martin did the job though.)

I added actions the user can enable, so the macro will operate on the selected file in the Finder.

To use this:

  • Disable or delete the red DUMMY DATA SOURCE action.
  • Enable the green For-Each action.

-Chris


Split a Text File In Segments According to a Delimiter String v1.00.kmmacros (14 KB)

Macro-Image

1 Like

Thanks very much for all the responses.

I'm using the native KM actions to help myself learn KM better.

This is a fantastic community

I'm not exactly sure how to get the full text of the error.

image

The macro worked perfectly on my initial test file, but a slightly longer one (exported from Filemaker) caused the hiccup. It created 4 text files before jumping the rails. I wonder if it has something to do with carriage returns in the original file. When I open up the file in BBEdit I see carriage returns as upside-down question marks

Data file:

fauntest.txt.zip (2.1 KB)

Thanks!

something to do with carriage returns in the original file

Perhaps it just needs the more general str.splitlines(),
in place of the narrower str.split('\n') ?

2 Likes

Thanks for the response. @ComplexPoint is right. The file is encoded in the old Mac way with carriage returns (x0D) instead of linefeeds (x0A). I thought those days were over.

Also, the file is encoded as MacRoman instead of UTF-8, which may not be a problem on your computer, but it's throwing an error on mine. This error is coming from the Read File action in the step before the Python code. This seems like it's a KM 10 issue. Are you running KM 9? If so, I'll ask @peternlewis if there's a workaround for the encoding problem.

1 Like

Yes, I am running KM 10.0. I can change the encoding to UTF-8. I'll play with it some more.
Much appreciated

1 Like

That worked, Complex! No error. Thank you

Here's the updated macro:

Split Files Into Notes PYTHON.kmmacros (3.9 KB)

1 Like

This is a perfect example of the kind of unexpected conditions programmers face every day that cannot be solved without having their hands on real-world data examples.

-Chris