Is it possible to merge many .html files (downloaded) into one?

Just purchased KM and trying to wrap my head if it's possible? Couldn't find anything similar in a search. Thanks

Sure, it's possible. I'm not sure it makes sense, but that's up to you.
Do you know RegEx? You could write a RegEx that extracts the HTML header and body, and adds each to the respective parts of the merged file.

You'll need these KM Actions:

and maybe some others, but those are the major Actions.

For a person new to KM, I'd rate your task at moderate difficulty, but it is a good task to learn on. Just give yourself plenty of time and patience. :wink:

I, and many others here, could do this for you, but you'll learn a lot more if you do it yourself. If you really get stuck, post back and we'll help.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@bounce, welcome to Keyboard Maestro (KM) and its Forum.
KM is one of the best Mac automation tools available, its Forum is one of the best and friendliest forums on the Internet. Whenever you reach a tough stumbling block trying to use KM, please feel free to post your question/problem here for help.

You will also find this helpful:
Tip: How Do I Get The Best Answer in the Shortest Time?

Getting Started with Keyboard Maestro

For more help, see Getting Started with Keyboard Maestro and the Forum .

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 Like

Thanks, that’s helpful. I’ve learned a few regex recipes in the past, but wouldn’t be able to whip them up on the spot. Will take a look at the links and have a go at it this weekend.

It might also help to know what you are trying to accomplish by merging these files.

A standard HTML file would start/end with <html> and </html> and have a <head> and </head> and then a <body> and </body>. If you just merged all of them together, I have no idea what a browser would display if you opened that file in it.

That's probably a better way to say it.

1 Like

Taking your request very literally, as a test example,and NOT using KM, if you were to open textedit, then copy and paste three html source files one after the other and then save it, you would have three html files in one source file.

If that is what you wanted to do with KM , then

Open souce HTML file1
Select all
Copy to System Clipboard
Write to target text file
Repeat with HTML 2 and 3.... only APPEND to target text file

The selection of the source HTML files would be part of a for loop.
See other forum file answers. e.g. renaming a group of files in a folder for more details.

1 Like

Yes definitely, you all probably have a much more elegant solution. Basically, I’ve downloaded a few sites (with sitesucker) before they went offline. That has left me with a bunch of index.html files for each respective site. My ultimate goal is to get each site’s posts combined into a big pdf (for each site).

I tried the calibre book app to convert the html to pdf and that takes ages to convert just a few posts. Also, there is this iOS app “pdf search” which has basic machine learning search, and it can’t parse the calibre converted files on iOS for some reason.

Since they are already html, I thought combining them to one html for each site, and then quickly printing to pdf would be a good option. Any solution with the end result of multiple html to one pdf would be great.

Ahhh. Ok. Yeah, I don't think Keyboard Maestro is the tool that I would recommend. (There are hundreds of things that Keyboard Maestro can do, but some things are still done with another tool.)

You might want to post that question (the latter clarification) to https://apple.stackexchange.com which might help you run into folks who have done this before and have recommendations.

Two possible tools come to mind:

  1. PDFpen (or maybe the Pro version) has a tool to take a website and turn it into PDFs.

  2. Pandoc is a command-line tool that converts files from one format to another and might be able to convert each HTML page into a PDF, and then you could combine the PDFs.

3 Likes

If the sites are still online, Web2PDF will do the job.
Once you have individual PDF files you can drag pages from one to another.