Removing duplicate pages from pdf files

I regularly receive pdf files that contain up to 20 % duplicate pages. Deleting these pages manually is a tiring task and there seem to be no other solutions proposed outside this forum than exporting all pages into image files, compare these files, remove duplicates and join the files as a pdf file again (I don’t want to do this).

Is it feasible to do the following with the assistance of Keyboard Maestro?

  • A pdf file is opened by hand in preview. A single page is displayed, adjusted to the screen height. The sidebar is also open, displaying thumbnails. The first page is selected. The next page is opened by pressing arrow down.
  • Keyboard Maestro magic begin
  • Keyboard Maestro saves the part of the screen displaying the page (1).
  • Arrow down is pressed.
  • Keyboard Maestro saves the part of the screen displaying the page (2).
  • Keyboard Maestro compares (1) and (2). If the images are identical, backspace is pressed and page (2) is deleted. If the images are not identical, nothing happens.
  • Arrow down is pressed.
  • Keyboard Maestro magic end

Thanks for input on this.

Chris

Which app are you using to open the pdfs in?

I have Preview and PDF Expert (3.6).

Are you able to share some of the pdfs?

Because I think the best solution is with a shell script.

Like this one: pdftk - Eliminate duplicate pages from pdf - Unix & Linux Stack Exchange

The gist seems to be to split the pdf in single pages, compare each pair and then delete the once which are duplicates.

1 Like

Are you able to share some of the pdfs?

Unfortunately I can’t, I get them from a third-party.

I will try the script solution.