I got a couple of HTML-ZIP archives (a folder tree in a ZIP file). It seems that while I can directly convert a ZIP-archive if there is only one HTML file in it, it does not work with a more complex zipped HTML directory that contains a bunch of HTML files, subfolders with further HTML files in them, css files and such. When I click "view" a window pops up showing the contents of the ZIP archive.

The browser of course can display it all. Is there any chance I could convert this complex structure into a MOBI file for the Kindle?

With Calibre or some other tool? Manual does not work, the book contains poems, a couple of hundred of them, each in a separate file. I would not mind if the index (indeces actually) all all lost and I end up with a flat file.

I got a couple of HTML-ZIP archives (a folder tree in a ZIP file). It seems that while I can directly convert a ZIP-archive if there is only one HTML file in it, it does not work with a more complex zipped HTML directory that contains a bunch of HTML files, subfolders with further HTML files in them, css files and such.

I store "complex zipped HTML" files in Calibre all the time, and have no trouble viewing or converting them. Perhaps you have zipped the directory it's in, not the index.html file and related files in that directory. Calibre needs to see a format it can read inside the zip.

The best way to store is to drag only the index.html file into Calibre. It will follow all the paths and grab all needed files. If it works in your browser by clicking on index.html, it will work in Calibre the same way.

Quote:

When I click "view" a window pops up showing the contents of the ZIP archive.

It sounds like you have Calibre set to pass zip files to the OS, which just opens them with your zip program. If you tell Calibre to handle them, it will open and view the book inside with its internal reader.

I have two books that are essentially zipped up HTML files. The structure is that there's one file in the root, everything else is in subfolders. They convert OK (at least, the part up to ePub generation, where it chokes horribly, I suspect that's because the HTML is so full of crap that I lost all hope of correcting it, together with the fact that each of the two books includes some 20.000 files). Did you try just hitting convert and see what happens?

Thanks for trying to help me, guys. It's just that I am totally new to this and somewhat confused.

In order to eliminate the ZIP issue, I unpacked the whole structure in a folder. When I add now the index.htm file to calibre (is this meant by "dragging"?), the book has no metadata (filename?) a size of zero MB and it is not really there (can't be viewed, again, upon clicking "view", the folder listing pops up).

Under "settings/behavior" I activated the internal viewer for ZIP files now.

If I add the ZIP file to calibre, all is well as far as the database is concerned, but I can not view the book I can see the landing page only, without the embedded image files and the links for navigation are not active.

A conversion to MOBI produces an empty book, except for the cover image.

Can you please advise what I am doing wrong?

Thank you, Mixx

Last edited by Mixx; 09-28-2010 at 04:27 AM.
Reason: Discovered the switch for the internal viewer for ZIP files

I have never been able to view html(zip) files from within calibre but I have never had the zip internal viewer enabled before in preferences. Amazingly it seems to work.
Live and learn as they say.

For the other issues did you try adding books from directory, one book per directory?
Or opening main file with word and saving as rtf?
Both of these have worked for me for problem files but yours may be more problematic

I have never been able to view html(zip) files from within calibre but I have never had the zip internal viewer enabled before in preferences. Amazingly it seems to work.
Live and learn as they say.

For the other issues did you try adding books from directory, one book per directory?
Or opening main file with word and saving as rtf?
Both of these have worked for me for problem files but yours may be more problematic

Sooo... I just set $preferred_programming_language = "Python" and hacked away. This is the result. Sadly, I couldn't properly test ist, because even with just about 280 out of the original ~20.000 files, Calibre chokes and dies horribly while creating the ePub output. But, at least it should create a HTML file as specified in the link Kovid provided.
The script expects the four files "header", "footer", "prefix" and "postfix" to be in the same directory as the HTML files. (Thus, to use, you need to extract your ZIP to a directory and put the files I attached inside the same directory.) The four files I mentioned contain the beginning and the end of the HTML file together with the text that's prepended and appended to the individual TOC entries. The script expects two parameters, the first one being the filename of the output file, the second one is the filename of the index file, which will go on top in the TOC. If you don't want to use an index file, just misspell the file name, the script shouldn't care.

Please be aware that this is somewhat kludged together (For you Python savvy folks out there, please don't hit me!) and may or may not work. There's no graceful error handling, scratch that: there's no error handling, so horrible, horrible things may or may not happen.