PDF Table of Contents as Tab-Indented Outline

Aims to return a tab-indented plain-text version of the Table of Contents
(if the selected PDF contains one).

(see, for example, Preview.app > View > Table of Contents)

If we use it with Eloquent_JavaScript.pdf, for example, it returns a plain text TOC, tab-indented, with page numbers, in the pattern:

TOC for ~/Desktop/Eloquent_JavaScript.pdf:

Introduction    p1
    On programming    p2
    Why language matters    p3
    What is JavaScript?    p5
    Code, and what to do with it    p7
    Overview of this book    p8
    Typographic conventions    p8
Values, Types, and Operators    p10
    Values    p10
    Numbers    p11
    Strings    p13
    Unary operators    p15
    Boolean values    p15
    Empty values    p17
    Automatic type conversion    p18
    Summary    p20
Program Structure    p21
    Expressions and statements    p21
    Bindings    p22
    Binding names    p24
    The environment    p24
    Functions    p24
    The console.log function    p25
    Return values    p25
    Control flow    p26
    Conditional execution    p26
    while and do loops    p28
    Indenting Code    p30
    for loops    p31
    Breaking Out of a Loop    p32
    Updating bindings succinctly    p32
    Dispatching on a value with switch    p33
    Capitalization    p34
    Comments    p34
    Summary    p35
    Exercises    p35
Functions    p38
    Defining a function    p38
    Bindings and scopes    p39
    Nested scope    p40
    Functions as values    p41
    Declaration notation    p42
    Arrow functions    p42
    The call stack    p43
    Optional Arguments    p44
    Closure    p45
    Recursion    p47
    Growing functions    p50
    Functions and side effects    p52
    Summary    p53
    Exercises    p53
Data Structures: Objects and Arrays    p55
    The weresquirrel    p55
    Datasets    p56
    Properties    p57
    Methods    p57
    Objects    p58
    Mutability    p61
    The lycanthrope's log    p62
    Computing correlation    p64
    Array loops    p65
    The final analysis    p66
    Further arrayology    p68
    Strings and their properties    p69
    Rest parameters    p71
    The Math object    p72
    Destructuring    p73
    Optional property access    p74
    JSON    p75
    Summary    p76
    Exercises    p76
Higher-Order Functions    p79
    Abstraction    p80
    Abstracting repetition    p80
    Higher-order functions    p82
    Script dataset    p83
    Filtering arrays    p84
    Transforming with map    p85
    Summarizing with reduce    p85
    Composability    p86
    Strings and character codes    p88
    Recognizing text    p90
    Summary    p91
    Exercises    p91
The Secret Life of Objects    p93
    Abstract Data Types    p93
    Methods    p94
    Prototypes    p95
    Classes    p97
    Private Properties    p99
    Overriding derived properties    p100
    Maps    p101
    Polymorphism    p102
    Getters, setters, and statics    p103
    Symbols    p105
    The iterator interface    p106
    Inheritance    p108
    The instanceof operator    p109
    Summary    p110
    Exercises    p111
Project: A Robot    p112
    Meadowfield    p112
    The task    p114
    Persistent data    p116
    Simulation    p116
    The mail truck's route    p118
    Pathfinding    p119
    Exercises    p121
Bugs and Errors    p123
    Language    p123
    Strict mode    p124
    Types    p125
    Testing    p126
    Debugging    p127
    Error propagation    p128
    Exceptions    p130
    Cleaning up after exceptions    p131
    Selective catching    p133
    Assertions    p135
    Summary    p136
    Exercises    p136
Regular Expressions    p138
    Creating a regular expression    p138
    Testing for matches    p139
    Sets of characters    p139
    International characters    p140
    Repeating parts of a pattern    p142
    Grouping subexpressions    p143
    Matches and groups    p143
    The Date class    p144
    Boundaries and look-ahead    p145
    Choice patterns    p146
    The mechanics of matching    p147
    Backtracking    p147
    The replace method    p149
    Greed    p150
    Dynamically creating RegExp objects    p152
    The search method    p152
    The lastIndex property    p153
    Parsing an INI file    p154
    Code units and characters    p157
    Summary    p157
    Exercises    p159
Modules    p161
    Modular programs    p161
    ES modules    p162
    Packages    p164
    CommonJS modules    p165
    Building and bundling    p168
    Module design    p169
    Summary    p171
    Exercises    p171
Asynchronous Programming    p173
    Asynchronicity    p173
    Callbacks    p175
    Promises    p176
    Failure    p178
    Carla    p180
    Breaking In    p181
    Async functions    p182
    Generators    p184
    A Corvid Art Project    p185
    The event loop    p188
    Asynchronous bugs    p189
    Summary    p191
    Exercises    p191
Project: A Programming Language    p193
    Parsing    p193
    The evaluator    p197
    Special forms    p199
    The environment    p200
    Functions    p202
    Compilation    p203
    Cheating    p203
    Exercises    p204
JavaScript and the Browser    p206
    Networks and the Internet    p206
    The Web    p208
    HTML    p208
    HTML and JavaScript    p211
    In the sandbox    p212
    Compatibility and the browser wars    p212
The Document Object Model    p214
    Document structure    p214
    Trees    p215
    The standard    p216
    Moving through the tree    p217
    Finding elements    p218
    Changing the document    p219
    Creating nodes    p220
    Attributes    p222
    Layout    p222
    Styling    p224
    Cascading styles    p226
    Query selectors    p227
    Positioning and animating    p228
    Summary    p230
    Exercises    p230
Handling Events    p233
    Event handlers    p233
    Events and DOM nodes    p234
    Event objects    p235
    Propagation    p235
    Default actions    p237
    Key events    p237
    Pointer events    p239
    Scroll events    p243
    Focus events    p244
    Load event    p245
    Events and the event loop    p245
    Timers    p246
    Debouncing    p247
    Summary    p248
    Exercises    p249
Project: A Platform Game    p251
    The game    p251
    The technology    p252
    Levels    p252
    Reading a level    p253
    Actors    p255
    Drawing    p258
    Motion and collision    p263
    Actor updates    p266
    Tracking keys    p268
    Running the game    p269
    Exercises    p271
Drawing on Canvas    p273
    SVG    p273
    The canvas element    p274
    Lines and surfaces    p275
    Paths    p276
    Curves    p277
    Drawing a pie chart    p280
    Text    p281
    Images    p282
    Transformation    p283
    Storing and clearing transformations    p286
    Back to the game    p287
    Choosing a graphics interface    p292
    Summary    p293
    Exercises    p294
HTTP and Forms    p296
    The protocol    p296
    Browsers and HTTP    p298
    Fetch    p299
    HTTP sandboxing    p301
    Appreciating HTTP    p301
    Security and HTTPS    p302
    Form fields    p303
    Focus    p304
    Disabled fields    p305
    The form as a whole    p306
    Text fields    p307
    Checkboxes and radio buttons    p309
    Select fields    p310
    File fields    p311
    Storing data client-side    p312
    Summary    p315
    Exercises    p315
Project: A Pixel Art Editor    p318
    Components    p318
    The state    p320
    DOM building    p321
    The canvas    p322
    The application    p325
    Drawing tools    p327
    Saving and loading    p330
    Undo history    p333
    Let's draw    p334
    Why is this so hard?    p335
    Exercises    p336
Node.js    p338
    Background    p338
    The node command    p339
    Modules    p340
    Installing with NPM    p341
    The filesystem module    p343
    The HTTP module    p344
    Streams    p346
    A file server    p347
    Summary    p352
    Exercises    p353
Project: Skill-Sharing Website    p355
    Design    p355
    Long polling    p356
    HTTP interface    p357
    The server    p359
    The client    p366
    Exercises    p372
Exercise Hints    p374
    Program Structure    p374
    Functions    p375
    Data Structures: Objects and Arrays    p376
    Higher-Order Functions    p378
    The Secret Life of Objects    p379
    Project: A Robot    p380
    Bugs and Errors    p381
    Regular Expressions    p381
    Modules    p382
    Asynchronous Programming    p383
    Project: A Programming Language    p385
    The Document Object Model    p386
    Handling Events    p387
    Project: A Platform Game    p388
    Drawing on Canvas    p389
    HTTP and Forms    p391
    Project: A Pixel Art Editor    p392
    Node.js    p394
    Project: Skill-Sharing Website    p395

PDF Table of Contents as Tab-Indented Outline.kmmacros (12 KB)


Expand disclosure triangle to view JS source
(() => {
    "use strict";

    // Tabbed outline version of TOC (if any) in given PDF file.

    // Rob Trew @2024
    // Ver 0.1

    const kmvar = { "local_PDF_Path": "~/Desktop/mastering-emacs-v5.pdf" };

    ObjC.import("PDFKit");

    // main :: IO ()
    const main = () =>
        either(
            alert("TOC, if any, in PDF file.")
        )(
            tabbedOutline => tabbedOutline
        )(
            bindLR(
                pdfDocumentFromFilePathLR(kmvar.local_PDF_Path)
            )(
                pdfDocumentTOCAsTabbedOutlineLR
            )
        );


    // ----------------------- PDF -----------------------


    // pdfDocumentFromFilePathLR :: 
    // FilePath -> Either String (FilePath, PDFDocument)
    const pdfDocumentFromFilePathLR = pdfPath => {
        const fp = filePath(pdfPath);

        return bindLR(
            doesFileExist(fp)
                ? Right(
                    $.PDFDocument.alloc.initWithURL(
                        $.NSURL.fileURLWithPath($(fp))
                    )
                )
                : Left(`File not found: "${fp}"`)
        )(
            maybeDoc => maybeDoc.isNil()
                ? Left(`Not readable as PDF: ${fp}`)
                : Right(
                    Tuple(fp)(maybeDoc)
                )
        );
    };


    // pdfDocumentTOCAsTabbedOutlineLR :: 
    // (FilePath, PDFDocument) -> String
    const pdfDocumentTOCAsTabbedOutlineLR = ([fp, pdfDoc]) => {
        const
            outlineRoot = pdfDoc.outlineRoot,
            uw = ObjC.unwrap;

        return fmapLR(
            indentedForestOutline("\t")(
                x => `${uw(x.label)}\tp${tocPageNumber(x)}`
            )
        )(
            Boolean(outlineRoot.isNil())
                ? Left(`No TOC outline found in '${fp}'.`)
                : Right(
                    nest(outlineRoot)
                )
        );
    };


    // tocPageNumber :: PDFOutline -> String
    const tocPageNumber = x => {
        const
            d = x.destination,
            s = d.isNil()
                ? ""
                : ObjC.unwrap(d.page.label);

        return isNaN(s)
            ? ""
            : s;
    };

    // ------- ROOT AND NEST FUNCTIONS SPECIALISED -------

    // root :: Tree a -> a
    const root = tree =>
        // Specialised for PDFOutline.
        tree;


    // nest :: Tree a -> [a]
    const nest = tree => {
        // Specialised for PDFOutline.
        const
            uw = ObjC.unwrap,
            cn = uw(tree.numberOfChildren),
            length = isNaN(cn)
                ? 0
                : parseInt(cn, 10);

        return 0 < length
            ? Array.from(
                { length },
                (_, i) => tree.childAtIndex(i)
            )
            : [];
    };


    // ----------------------- JXA -----------------------

    // alert :: String => String -> IO String
    const alert = title =>
        s => {
            const sa = Object.assign(
                Application("System Events"), {
                includeStandardAdditions: true
            });

            return (
                sa.activate(),
                sa.displayDialog(s, {
                    withTitle: title,
                    buttons: ["OK"],
                    defaultButton: "OK"
                }),
                s
            );
        };


    // --------------------- GENERIC ---------------------

    // Left :: a -> Either a b
    const Left = x => ({
        type: "Either",
        Left: x
    });


    // Right :: b -> Either a b
    const Right = x => ({
        type: "Either",
        Right: x
    });


    // Tuple (,) :: a -> b -> (a, b)
    const Tuple = a =>
        // A pair of values, possibly of
        // different types.
        b => ({
            type: "Tuple",
            "0": a,
            "1": b,
            length: 2,
            *[Symbol.iterator]() {
                for (const k in this) {
                    if (!isNaN(k)) {
                        yield this[k];
                    }
                }
            }
        });


    // bindLR (>>=) :: Either a ->
    // (a -> Either b) -> Either b
    const bindLR = lr =>
        // Bind operator for the Either option type.
        // If lr has a Left value then lr unchanged,
        // otherwise the function mf applied to the
        // Right value in lr.
        mf => "Left" in lr
            ? lr
            : mf(lr.Right);


    // doesFileExist :: FilePath -> IO Bool
    const doesFileExist = fp => {
        const ref = Ref();

        return $.NSFileManager
            .defaultManager
            .fileExistsAtPathIsDirectory(
                $(fp).stringByStandardizingPath,
                ref
            ) && !ref[0];
    };


    // either :: (a -> c) -> (b -> c) -> Either a b -> c
    const either = fl =>
        // Application of the function fl to the
        // contents of any Left value in e, or
        // the application of fr to its Right value.
        fr => e => "Left" in e
            ? fl(e.Left)
            : fr(e.Right);


    // filePath :: String -> FilePath
    const filePath = s =>
        // The given file path with any tilde expanded
        // to the full user directory path.
        ObjC.unwrap(
            $(s).stringByStandardizingPath
        );

    // fmapLR (<$>) :: (b -> c) -> Either a b -> Either a c
    const fmapLR = f =>
        // Either f mapped into the contents of any Right
        // value in e, or e unchanged if it is a Left value.
        e => "Left" in e
            ? e
            : Right(f(e.Right));


    // foldTree :: (a -> [b] -> b) -> Tree a -> b
    const foldTree = f => {
        // The catamorphism on trees. A summary
        // value obtained by a depth-first fold.
        const go = tree => f(
            root(tree)
        )(
            nest(tree).map(go)
        );

        return go;
    };


    // indentedForestOutline :: String -> (a -> String) ->
    // Forest a -> String
    const indentedForestOutline = indent =>
        f => xs => xs.flatMap(
            foldTree(x => vs => [
                f(x),
                ...vs.flat().map(
                    v => `${indent}${v}`
                )
            ])
        )
            .join("\n");

    // MAIN ---
    return main();
})();
2 Likes