Hi
I am back with the same macro designed to chuck research books. In DEVONthink 4, I want to go straight to relevant chapters instead of searching through long PDF files. Thanks to @Nige_S Gemini and I are making progress. I will post: A. Gemini’s description of the problem. B. the macro, C. the Swift script. If I should be posting differently, I would like to know.
Ellen Madono
Project Overview: Automated PDF Chapter Extraction
The Goal
To automate the processing of a digital library by iterating through nested folders, identifying a source PDF and a corresponding Table of Contents (TOC) text file, and passing their paths to a Swift script. The script then "chunks" the PDF into separate files, one for each chapter, based on the page numbers provided in the text file.
The Folder Structure
The hierarchy is organized into three levels of nesting:
-
Project Root: A master folder containing multiple sub-directories.
-
Sub-Directory: Each folder represents a specific volume or book.
-
Target Files: Inside each sub-directory, there is exactly one PDF file and one TXT file.
The Immediate Problem
The macro fails during the transition from the sub-directory to the target files. The Keyboard Maestro Engine Logreports: Cant find folder %Variable%Local_BookFolder%
Because the variable token is not being evaluated as a path, the "For Each" scouts cannot enter the folder. Consequently, the variables for the PDF and TOC paths remain empty, and the Swift script terminates with an error.
A. Gemini’s description of the problem
1. Token Evaluation in "For Each" Actions
Why would a "For Each" action (Collection: The items in directory) fail to process a standard variable token like %Variable%Local_BookFolder%? Even though the variable is set in the immediate parent loop, the log suggests Keyboard Maestro is treating the percent signs and variable name as a literal string rather than a resolved path.
2. Directory Field Modes (Text vs. Variable)
In the "In directory" field of a loop, is there a specific setting or toggle (such as the **"T" (Text) vs "V" (Variable)**mode) that prevents tokens from being processed? If the box is accidentally in Variable mode, will it fail to read the %symbols?
3. Reliability of Environment Variable Handoff
The Swift script relies on ProcessInfo.processInfo.environment to retrieve the paths. If the variables are empty due to the loop failure, the script exits. Are there best practices for ensuring that file paths found in nested loops are "locked in" and visible to external scripts, specifically on macOS Tahoe or M1/M2 systems?
4. Structural Redundancy
Is there a more efficient way to "scout" for two specific file types (PDF and TXT) sitting in the same sub-directory without repeatedly referencing the parent path, which may be contributing to the "Cant find folder" error?
Note: This macro is part of a larger project to digitize educational research materials. I am looking for the specific reason why standard variable syntax is being rejected by the "In directory" path logic in this nested configuration.
B. the macro
02 Chunker_w- swift Ellen.kmmacros (16.7 KB)
C. Swift script
#!/usr/bin/env swift
import Foundation
import PDFKit
func filterPdf(input: String, output: String, start: Int, end: Int) {
guard let inputDoc = PDFDocument(url: URL(fileURLWithPath: input)) else { return }
let outputDoc = PDFDocument()
if let attrs = inputDoc.documentAttributes { outputDoc.documentAttributes = attrs }
let startIndex = max(0, start - 1)
let endIndex = min(inputDoc.pageCount - 1, end - 1)
if startIndex <= endIndex {
for i in startIndex...endIndex {
if let page = inputDoc.page(at: i) {
outputDoc.insert(page, at: outputDoc.pageCount)
}
}
}
outputDoc.write(to: URL(fileURLWithPath: output))
}
let env = ProcessInfo.processInfo.environment
// Matching your exact setup from image_c4d239.png
let bookPath = env["KAVARIABLE_Local_BookFolder"] ?? env["KAVARIABLE_BookFolder"] ?? ""
let pdfPath = env["KAVARIABLE_PDFPath"] ?? env["KAVARIABLE_Local_PDFPath"] ?? ""
let tocPath = env["KAVARIABLE_TOCPath"] ?? env["KAVARIABLE_Local_TOCPath"] ?? ""
let fileManager = FileManager.default
// Safety check: If these are empty, the loops above failed.
guard !pdfPath.isEmpty, !tocPath.isEmpty else {
fputs("ERROR: KM variables are empty. The loops did not find the files.\n", stderr)
fputs("DEBUG: BookPath is '\(bookPath)'\n", stderr)
exit(1)
}
let bookURL = URL(fileURLWithPath: bookPath)
let outputFolder = bookURL.appendingPathComponent("CHUNKS_FINISHED").path
if !fileManager.fileExists(atPath: outputFolder) {
try? fileManager.createDirectory(atPath: outputFolder, withIntermediateDirectories: true, attributes: nil)
}
do {
let tocContent = try String(contentsOfFile: tocPath, encoding: .utf8)
let lines = tocContent.components(separatedBy: .newlines)
let pattern = "\\[\\[(.+)\\]\\] +\\(\\((\\d+)\\)\\)"
let regex = try NSRegularExpression(pattern: pattern)
var entries: [(title: String, startPage: Int)] = []
for line in lines where !line.isEmpty {
let nsRange = NSRange(line.startIndex..<line.endIndex, in: line)
if let match = regex.firstMatch(in: line, options: [], range: nsRange) {
if let titleRange = Range(match.range(at: 1), in: line),
let pageRange = Range(match.range(at: 2), in: line) {
entries.append((String(line[titleRange]), Int(line[pageRange]) ?? 0))
}
}
}
for i in 0..<entries.count {
let entry = entries[i]
let nextStart = (i + 1 < entries.count) ? entries[i+1].startPage : 99999
let endPage = nextStart - 1
let invalidChars = CharacterSet(charactersIn: "/\\?%*|\"<>:")
let cleanTitle = entry.title.components(separatedBy: invalidChars).joined(separator: "-")
let outputPath = (outputFolder as NSString).appendingPathComponent("\(cleanTitle).pdf")
filterPdf(input: pdfPath, output: outputPath, start: entry.startPage, end: endPage)
}
print("Success! Check the CHUNKS_FINISHED folder.")
} catch {
fputs("ERROR: Script could not process the files.\n", stderr)
exit(1)
}
