How to remove lines between pairs of text markers?

I have a very long movie script text file that includes annotations for each scene. The annotations are multi-line and are enclosed by ==== at the beginning and ++++ at the end.

I would appreciate it if you could tell me how to delete all the lines between ==== and ++++.

For example, it looks like this:

  1. A certain place
    Content of the scene.
    ====
    Annotation 1
    Annotation 2
    ++++

  2. Another place
    Another content of the scene.
    ====
    Annotation 3
    Annotation 4
    ++++

I would like these to be changed as follows:

  1. A certain place
    Content of the scene.

  2. Another place
    Another content of the scene.

Thank you.

I think this will work. Just put your text into the specified variable, below.

I must give you one small word of warning. If your final annotation also happens to be the last line of the file, without a newline after it, then this macro could fail to remove the last annotation. But I doubt that any normal script would have no newline after the last annotation. So I don't think you will ever see that happen.

What does (?s) mean?

It means that a dot will match a newline. For details:

But you are not using . anywhere in the regex.

Note that that regex will fail if + is used anywhere in an annotation.

What you want is probably:

====(?s:.*?)\+\+\+\+\R?

That will match:

  • Four =
  • ?s: means . matches any character
  • *? means match any number of the previous item, but the minimum it can match
  • Then four +
  • Then an optional line ending

If ==== is allowed in the text anywhere not at the start of the line, then you'd need to add restrictions for that as well.

But [^x] means "any character but x" and that's functionally equivalent to a dot with one exception. So I assumed that the (?s) would be required.

An alternative instrument is:

  • a Keyboard Maestro For Each action applied to each line, one by one, with
  • the value of a local_InFence variable moving 0 ⇄ 1, that is: falsetrue

When (and only when) local_InFence is false, we append a line to the accumulating output.

  1. When we see ====, local_InFence becomes true, and
  2. when we see ++++, local_InFence becomes false

For example:

Lines between fences filtered out.kmmacros (8.3 KB)


Or as a single script action:

Lines between fences filtered out (by JS .reduce).kmmacros (4,0 Ko)

Expand disclosure triangle to view JS source
return kmvar.local_Source.split("\n")
.reduce(
    // Updated accumulator
    ([inFence, outputLines], lineText) => {
        const
            [fenceClosing, fenceOpening] = [
                "====", "++++"
            ]
            .map(x => lineText.includes(x)),
            
            dropped = (inFence || fenceClosing);

        return [
            dropped && !fenceOpening,
            dropped
                ? outputLines
                : outputLines.concat(lineText)
        ];
    },

    // Initial state of accumulator
    [false, []]
)[1]
.flat()
.join("\n");

Thank you very much! It really helped me.

1 Like

Thank you! At this moment, Airy's soulution works, if there's any things go wrong, I'll try this.
Thanks again!

You are welcome. Peter made a decent point about a minor error in my method, but considering that my method is one short action, it's pretty easy to understand.

Some days I prefer solutions that are all-KM actions only, but other days I'm favourable to solving problems using Execute Shell Script if the solution is quite simple there.

1 Like

Nope. [^x] means every character except x. The s flag makes no difference.

Without the flag, . means any single character except any line terminating characters (\u000a, \u000b, \u000c, \u000d, \u0085, \u2028, \u2029).

With the flag, . means any single character, including the line terminating characters (\u000a, \u000b, \u000c, \u000d, \u0085, \u2028, \u2029) and also including the pair \u000d \u000a (so it could actually match two characters, which is not something I knew!).

This means you can frequently avoid the (?s) flag by using [^x] in place of . where you know x will never be present.

To be perfectly honest, I think I originally started my macro using a regex that contained a dot, and the truth is I just didn't bother to remove it, after switching to [^x], and wasn't sure if it was necessary to remove (?s). So I left it in. I should have conducted the tests that you did, but I was lazy.

2 Likes

Only a much simpler regular expression:

[+=]{4}

is needed in the natural habitat of regular expressions (splits rather than the slightly dysfunctional search and replace association to which they were yoked by grep in the 60s)

I don't think that Keyboard Maestro provides a very native (non-script) route to splitting on multi-character strings or regexes (perhaps a For Each collection could be defined in those terms ?), but reaching, for the moment, for a script action, we can:

  1. Split on [+=]{4}, and
  2. take the even-indexed fruits.
return kmvar.local_Source

.split(/[+=]{4}/u)

.filter((_, i) => i % 2 === 0)

.join("")

Lines between fences filtered by Splits.kmmacros (2.3 KB)

Here's my take on the task at hand:

Text without annotations and fences.kmmacros (18 KB)

1 Like