"Invert" and reorder list of numbers (for deleting pages in pdf)

Hi everyone!

Long-term objective: I would like to automatically remove user-specified pages from a large number of pdf files. To be more clear: let us say I have 10 different pdf files. For each pdf file, the user will individually input (into a table, see below) those page numbers which he would like to keep while the other pages will be deleted from the document.

I already found a JavaScript function which returns the total number of existing pages in a given pdf file. I also already found a JavaScript function which can delete a single page from a given pdf file at a time. Of course, this latter function can be used in a loop to remove multiple pages one after another.

Short-term objective: However, this requires two things (which I currently do not manage on my own):

  1. "Invert" the user-specified list of pages to keep into a pages to delete-list. For clarification: Let us say the user wants to keep pages 2, 3, and 5 out of a 6-page document. Then the pages to delete-list should contain 1, 4, and 6.
  2. Reorder the pages to delete-list in order to start with the deletion of the highest page number first. As the deletion is an iterative process, starting with the removal of page 1 would change all page numbers for the next page-removal iterations (which would make it very complex to keep track of the indices of those pages which should be removed). For clarity: this would turn our pages to delete-list from "1, 4, 6" into "6, 4, 1".

So in summary, the user should provide a table (Excel ...) containing the path to each pdf file in one column together with the list of page number he would like to keep (simply in the form of a comma-separated string containing those numbers such as "2, 3, 5") in another column. The function should then return the array "6, 4, 1" for a 6-page pdf document.

Please note: I am still free on the input format as long as the user can provide the information via a (Excel ...) table. So the list of pages to keep provided by the user could also be formatted differently.

Can anyone provide help here please? It is greatly appreciated!

I probably can't help, but I don't think you need to use JavaScript, because the macOS Preview app has the ability to delete individual pages from a PDF file. I'm not sure how AppleScript-friendly Preview is, but if it is, this shouldn't be difficult. (Even if it isn't, I don't see any real problem using Preview.)

You haven't stated why you are insisting on using Excel for storing data. That could be the hardest part of your problem. You probably should give a reason for that, especially since not everyone has Excel so some of us are blocked from helping you.

Hi Airy,

Thank you for your quick reply. I am happy with any solution as long as it works but as I already have a solution using JavaScript and Apple's PDFKit (happy to share it at a later stage), I currently still plan to use that solution.

I do not insist on Excel. In fact, I actually use LibreOffice at the moment. The information just needs to be retrieved from a spreadsheet application.

However, the extraction part is not what I need help with. I simply make KM move the selected cell to the cell with the required information and copy it to a clipboard. The clipboard is then written to a variable. Not very elegant but it works for now.

What I need help with is the "inversion and reordering" part. I have prepared a demo macro so that providing help is made easier:

Remove Selected Pages from PDF File.kmmacros (6.2 KB)

If you want to use that solution, I recommend that you upload that solution so that people can examine it and help you with the remainder of the solution.

I recommend that you explain why this is a requirement. If you choose not to explain, you may not get the best solution.

Dragging up memories of Maths from <cough> 50+ years ago...

You want the complement of the set of pages to keep. The most basic way to get the complement is to delete all the members of the set from the "universal set" -- the complement is whatever's left.

In this case, the universal set is "all the pages", i.e. {1,2,3,...,n}, where n is the number of pages in the document. And we can do what you want in KM with some simple text munging:

Get Pages to Delete.kmmacros (6.8 KB)

Image

There will be faster ways to do this for large lists, but this is very understandable!

Here is a single line that can do the job (taking the inverse of "3 5 8" from the numbers "1 to 10"). Of course, it will need a little finessing to work for this user, but it's at least a proof of concept that a single line version is possible.

comm -23 <(seq 1 10) <(echo "3\n5\n8")

2 Likes

I’m interested in this part - how JavaScript can remove page from pdf document? What program are you using?

Of course... You keep hitting me round the head with comm, and I keep immediately forgetting... :man_facepalming:

But be careful when echoing strings containing newlines -- different versions handle them differently. Your one-liner works in zsh but fails for me in macOS bash -- for that you need the -e flag:

comm -23 <( seq 1 10 ) <(echo -e "3\n5\n8")

I believe the most portable option is to use printf instead:

comm -23 <( seq 1 10 ) <(printf "3\n5\n8")

I know that this is KM forum and @Nige_S one liner solution is very smart (:+1: ), but in the case when you have and want to use python, the objects of class set are very good for such tasks:
This is also resistant to pages outside max page specified
Python has nice packages to work with PDFs and to read Excel files

#!/usr/bin/env python3
# Call like ./pdf_pages.py last_page_number usr_page_1 user_page2 user_page3 ...
# F. ex. pdf_pages.py 127 1 7 13 128

import sys

if len(sys.argv) < 2 : sys.exit(1)

# 1st arg -> total number of pages
# 2nd, 3rd, ... - set of page numbers

usr_pages = sys.argv[2] #"1 5 7"
last_page = sys.argv[1]

# set of user pages (integers), assuming that are divaded by blanks
# if other separator - replace argument of split
s1 = set((int(v) for v in usr_pages.split(" ")))

# set of all pages (integers)
s2 = set(range(1, int(last_page) + 1))

# result in form fo list
result = list(s2 - s1)
# Sorting to be sure valid sequence (optimization for deleting pages)
result.sort()
# result can be processed next in different ways
# f. ex. grouping, reversing, etc.
result.reverse()

# return as string - numbers are separated by commas (or whatever we want, change string before join() )
print(",".join((str(v) for v in result)))

Props to @Airy for that, not me! Mine's the multi-action mess of a macro above it :wink:

Oppps sorry @Airy, honors to you. I had a few interesting and instructive chats with @Nige_S lastly so I blindly assigned solution to him :slight_smile:

I owe Nige so much for his expertise, I don't mind if he takes credit from me. It's the least I can do to help repay him.

You can use Skim:
https://sourceforge.net/p/skim-app/wiki/AppleScript/

Hi everyone,

Thanks to all of you for your support!!! I really value the high engagement in this forum.

Here is a working example of my macro. It is commented in a bit more detail to allow others to profit. It is probably not very elegant. Please feel free to provide any kind of ideas to streamline it!

For my first attempt, I went with the approach from Nige_S to invert a list of numbers. The one-liner from @Airy seems much more elegant, but I would like to invest some more time to understand how it works before implementing it.

In respect to the reordering of a list of numbers, I kind of circumvented this by addressing the content of the list from back to front (see last repeat loop in the macro).

If you want to test it:
1] Create a pdf document with sufficient amount of pages and save it to your desktop. I created a 12-page document by exporting a Word document with the page number (1 to 12) written on each page. This helps with debugging in case anything goes wrong.

2A] Create a spreadsheet document looking like this (and insert your MacOS user name):


... or ...
2B] Deactivate the first group in the macro, activate the second group in the macro and adjust the variables in the second group. This skips the process of importing information from the spreadsheet.

3] Run the macro. Be aware: the original pdf file needs to be overwritten. Create a copy first.

Some other comments:

I need to use a spreadsheet application because the paths are already collected in an existing spreadsheet and I do not want to repeat that work.

Please check out the JavaScript segments embedded in the macro. They can extract the number of pages inside a pdf (first segment) as well as delete a specified page from a pdf (second segment). No program is needed. I just used the PDFKit already available in MacOS. Please note that the JavaScript code was AI-generated as I do not have almost any experience in JavaScript.

@Airy Could you please show me how to use your single-line version in a macro? I would not even know how to start. Thanks!

To all: Can someone explain to me how to hand over a local KM variable to a JavaScript and how to return a local KM variable from it? I know this is possible with AppleScript but not how to do this with JavaScript.

Remove Selected Pages from PDF File V2.kmmacros (37.8 KB)

The context is important :slight_smile:

You can do all this actions inside JavaScript.
I don’t have here my MacBook, but I’ve tested below solution on iPad (using app WorkingCopy) and example below works. You must just fill input vars (usr_pages and page_cnt) and do all other things in loop (not exiting from JavaScript) with pages inside result.

let usr_pages= [3, 5, 7]
let page_cnt = 8

let doc_pages = Array(page_cnt-1).fill().map((v,i)=>i + 1)

let result = doc_pages.filter((element) => ! usr_pages.includes(element))

result = result.reverse()

//console.log(result)
return result

For accessing local variables see:
https://wiki.keyboardmaestro.com/action/Execute_a_JavaScript_For_Automation#Local_Instance_Variables

Additionally it is funny, that if I go to the index of PDFKit in Apple doc, I don't see the instance methods of PDFDocument like remove, but if I put the keywords in search engine, it magically appears :rofl:

One of the features in my single line version is not supported by KM's Execute Shell Script features. So for example, you get an error if you do this:

image

This is why I wrote that it needs to be "finessed" to make it work for you. There will be a variety of ways to finesse that. One way to finesse it is to store the numbers into a file and then execute it like this: (which works)

image

The next trick is to get the numbers into the script. The first number you may want to get into the script is the number that replaces my "10", and the second numbers are the inversion values. In my solution, I assumed that you would be able to extract them from somewhere (e.g., Excel) and insert them yourself using KM variables.

If you tell me what your variables names are that store your values, I can fix my script to include variables instead of using hard-coded constants like I did above. But if you want someone to show you how to extract the numbers from an Excel page, you may need someone else to assist you with that, because I don't have Excel.

If you want to try my finessed sample code, here are the strings you need to paste into an Execute Shell Script action:

seq 1 10 > /tmp/seq.txt
echo "3 5 8" | tr ' ' '\n' >/tmp/selection.txt
comm -23 /tmp/seq.txt /tmp/selection.txt

If you give me variable names that contain your values, I can update this code.

Here is my proposal implemented only in JXA - I've removed Excel part, set PDF path and list of pages in KM Variables.
After setup, everything is done in one JXA code section (including reverse list calculation).

Filter PDF pages .kmmacros (4.3 KB)

1 Like

FWIW, note that with the default Modern Syntax selected in the small chevron menu to the left of the code text field,

you don't actually need:

  1. lines 2 and 3 (currentApplication, standAdditions) or
  2. or lines 3 and 4 (kmInst, kmeApp)

and as long as that chevron menu either:

  • has Include All Variables checked, or (better)
  • has a check by the names of the local variables which you use in your Execute action,

then you don't need the .getvariable method, or an instance identifier, and can simply obtain the values bound to those KM local variable names directly, by just writing:

  • kmvar.Local_PdfFullPath
  • kmvar.Local_UsrPageList

So, for example:

Filter PDF pages – variant using KM default "Modern Syntax" option.kmmacros (8.1 KB)


Expand disclosure triangle to view JS source
const main = () => {
    ObjC.import("PDFKit");

    const pdf_full_path = filePath(kmvar.Local_PdfFullPath);

    return either(
        alert("Save pruned version of PDF file")
    )(
        result => result
    )(
        fmapLR(
            pdfDoc => {
                const
                    pageCount = parseInt(pdfDoc.pageCount),
                    pagesToKeep = new Set(
                        kmvar.Local_UsrPageList
                            .split(",").flatMap(x => {
                                const n = parseInt(x);

                                return isNaN(n)
                                    ? []
                                    : [n];
                            })
                    );

                const
                    fpPruned = [
                        ...first(k => `${k}_pruned`)(
                            splitExtension(pdf_full_path)
                        )
                    ].join(""),

                    keptIndices = enumFromTo(1)(pageCount)
                        .reduceRight(
                            (a, i) => pagesToKeep.has(i)
                                ? [i].concat(a)
                                : (pdfDoc.removePageAtIndex(i - 1), a),
                            []
                        )
                        .join(",");

                return (
                    pdfDoc.writeToFile(fpPruned),
                    `Retained pages: ${keptIndices}\n\nWritten out as:\n\t${fpPruned}`
                );
            }
        )(
            doesFileExist(pdf_full_path)
                ? Right(
                    $.PDFDocument.alloc.initWithURL(
                        $.NSURL.fileURLWithPath($(pdf_full_path))
                    )
                )
                : Left(`File not found: "${pdf_full_path}"`)
        )
    );
};

// ----------------------- JXA -----------------------

// alert :: String => String -> IO String
const alert = title =>
    // Display of a given title and message.
    s => {
        const sa = Object.assign(
            Application("System Events"), {
            includeStandardAdditions: true
        });

        return (
            sa.activate(),
            sa.displayDialog(s, {
                withTitle: title,
                buttons: ["OK"],
                defaultButton: "OK"
            }),
            s
        );
    };

// --------------------- GENERIC ---------------------

// Left :: a -> Either a b
const Left = x => ({
    type: "Either",
    Left: x
});


// Right :: b -> Either a b
const Right = x => ({
    type: "Either",
    Right: x
});


// Tuple (,) :: a -> b -> (a, b)
const Tuple = a =>
    // A pair of values, possibly of
    // different types.
    b => ({
        type: "Tuple",
        "0": a,
        "1": b,
        length: 2,
        *[Symbol.iterator]() {
            for (const k in this) {
                if (!isNaN(k)) {
                    yield this[k];
                }
            }
        }
    });


// either :: (a -> c) -> (b -> c) -> Either a b -> c
const either = fl =>
    // Application of the function fl to the
    // contents of any Left value in e, or
    // the application of fr to its Right value.
    fr => e => "Left" in e
        ? fl(e.Left)
        : fr(e.Right);


// enumFromTo :: Int -> Int -> [Int]
const enumFromTo = m =>
    // Enumeration of the integers from m to n.
    n => Array.from(
        { length: 1 + n - m },
        (_, i) => m + i
    );


// doesFileExist :: FilePath -> IO Bool
const doesFileExist = fp => {
    const ref = Ref();

    return $.NSFileManager
        .defaultManager
        .fileExistsAtPathIsDirectory(
            $(fp).stringByStandardizingPath,
            ref
        ) && !ref[0];
};


// filePath :: String -> FilePath
const filePath = s =>
    // The given file path with any tilde expanded
    // to the full user directory path.
    ObjC.unwrap(
        $(s).stringByStandardizingPath
    );


// first :: (a -> b) -> ((a, c) -> (b, c))
const first = f =>
    // A simple function lifted to one which applies
    // to a tuple, transforming only its first item.
    ([x, y]) => Tuple(f(x))(y);



// fmapLR (<$>) :: (b -> c) -> Either a b -> Either a c
const fmapLR = f =>
    // Either f mapped into the contents of any Right
    // value in e, or e unchanged if it is a Left value.
    e => "Left" in e
        ? e
        : Right(f(e.Right));


// splitExtension :: FilePath -> (String, String)
const splitExtension = fp => {
    // The file path split before any extension,
    // or tupled with the empty string, if
    // no extension is seen.
    const
        lastIndex = [...fp].findLastIndex(
            c => "./".includes(c)
        );

    return (-1 !== lastIndex) && ("." === fp[lastIndex])
        ? Tuple(fp.slice(0, lastIndex))(
            fp.slice(lastIndex)
        )
        : Tuple(fp)("");
};

// MAIN ---
return main();
1 Like

Ha !
I couldn't find this "local menu" for javascript (trust me I'v tried) - I suggest to make sample screenshot in wiki for such dumb users like me :wink:. The wiki description is

You can turn Modern Syntax on or off in the popup menu next to the script.

Additionally in first probe I also tried this syntax kmvar.something - and it didn't work (I don't know why), so I've back to classical "solution".
I also wanted to do it with sets (like in python), but I've decided that will be too hard for others (many people don't know constructions like sets) to do that this way :slight_smile: , so the arrays.

Anyway your solution is more elegant :+1:.

The benefit for me from this thread - first time using ObjC bridge in JXA and maybe I will look more in JXA constructs.

BTW - do you now why sometimes "compiling" in KM (Return key) don't change the visibility of code (don't show the syntax hilight)?