GuteBook is a preprocessor for Project Gutenberg ("PG") and PG Australia HTML files (or alternatively the best .txt file available) so as to quickly and easily prepare one or many ebook versions for current ebook readers.

This project was created by Nick Rapallo (nrapallo) and was adapted from the gutlrf.pl code written by FangornUK, 10th Nov 2006 (and as recently modified May 2009).

GuteBook (Windows GUI & Perl script) directly retrieves and converts PG or PG Australia HTML files specified by it's EText-No. or URL. PG Australia ebooks require the URL link to the HTML to also be specified in place of the Input File since there is no direct relationship between the PG Australia EText-No. and its URL. Once the HTML is available, the program fixes/filters many HTML items so as to properly create simultaneously many current ebook formats, including, .epub/.lrf/.mobi/.lit/.imp/.rb versions.

Now anyone can become a seasoned ebook creator with this easy to use program. So if your results ARE that good, consider contributing them to our EBook Upload forum (in the various ebook formats). Even mobileread.com's "elite" ebook creators may find it useful...

For the Windows GUI, download the GuteBook-0.5-Installer.zip file, unzip it and execute the enclosed .exe. This will install the Windows GUI program and all other support files.

Added a "stripped-down" version with no Windows GUI and with no Windows Installer for those that don't want/like Windows and/or bloat. Just download and unzip the GuteBook-noGUI-noInstaller.zip file and run the programs/batch files in the 'bin' directory.

Source code (gutebook.pl) and files are now also available at the MobileRead.com Dev Hub.

As always, Enjoy!

EDIT 27-Mar-2011: When using newer versions of calibre, the GuteBook conversion program's rebuild DOS batch file requires you to edit it & prefix any line with "ebook-convert.exe" with "start /w ". Otherwise, the first time "ebook-convert.exe" is invoked is the LAST time it's run...

I've successfully used this revised code within that DOS batch file, namely:

No, that's optional, but it is easier than remembering and manually typing the required switches. The Perl script came first and is perfectly useable on it's own. The 'samples' directory in the GuteBook Install directory shows some 'command line' examples which can used with the Perl script i.e.

where 'do-ge' is a bat/shell script utilizing gutebook.pl with the sample conversion of PG EText-No. 28700 with 2px (L/R) margins, smaller text and retain both the .zip downloaded and PG original .htm and produce a REB1200 .imp and Sony PRS .lrf with a pagebreak on the first <h1> heading used within .htm. (Sorry for the run-on sentence...)

2. Do I have to use/select an ebook output format?

No, but then GuteBook will only prepocess the .htm and will not create any ebooks nor setup the batch file to be used to re-generate the ebooks after re-editing the modified .htm. However, you can use the resulting .opf with, say, Mobipocket Creator and manually generate a .mobi and then feed that to calibre or any other mobi2... program (like Mobi2IMP). The choice is yours!

3. How do I download a Project Gutenberg Australia book?

Quick HOW-TO example:

on the main GUI screen click the blue text at the bottom right '^PGA List' and your browser will open with the PG Australia GUTINDEX_AUS.htm.
(when using the Perl script, refer to the GUTINDEX_AUS.htm file in the doc directory in your Install directory).

copy those two item into the GUI main screen boxes for ETEXT-No. and Input File respectively. Don't leave out the 'A' suffix on the EText-No. as that identifies the file as a PG Australia ebook for processing within GuteBook.
(when using the Perl script add: --PGnum 0364A "http://gutenberg.net.au/ebooks04/0400561h.html" ).

fill out the GUI main and options screens. See this post for screenshots.

click convert

enjoy, but you may need to re-edit this file as any font size reduction has no effect since the <p>'s were fixed at 14pt.

Etext-No.'s below 10,000 are sometimes problematic as many of the earlier etext no.'s don't follow the current/normal filenaming pattern of http://www.gutenberg.org/files/EXTEXTNO/EXTEXTNO-h.zip.

Let's say you've entered the number 7471 as the book listed on PG (it's a collection of short stories by P. G. Wodehouse).

But GuteBook fails to find, download and convert the file. It's nothing you've done wrong, it's just that this ebook doesn't follow the normal filename pattern and needs to be overridden by placing the following link in the Input File box: http://www.gutenberg.org/dirs/etext05/2left10.zip . Just so that you know, I got that link from the Gutenberg ebook page for Etext-No. 7471 and copying the link to the .zip text (or html) version.

Also, since that ebook is just available as text, you will need GutenMark (GUItenMark) installed and selected on the first page (see GUI screenshot). This ebook needs to be converted internally to html using GutenMark so that GuteBook can produce an ebook version.

Try it again, as before, but just override the Input File, in this case.

I tried your script extensively. It works really good and the eBooks resulting are more than adequate, even great compared to Manybook's efforts. As always, your GUI is well thought out and does work admirably.

A shame that there is no auto creation of covers though, but I am not aware of any such freeware or opensource software. Apart from that and some coding idiocies coming from the Gutenberg coders (making Tocs with page numbers as links, for example) the generated eBooks are instantly readable and the different formats compare favorably.

As you're aware yourself, there are still some problems with illustration sizes in ePub. For a first version, this is superb work!

K to you!

As an aside for Iphone/Ipod users: Use this to convert your Gutenberg books to ePub for Stanza. They will be perfectly formatted and you can autogenerate a cover. Best solution so far!

As you're aware yourself, there are still some problems with illustration sizes in ePub. For a first version, this is superb work!

Thanks! You are right that there could be better support for cover image extraction and/or auto cover page generation. I do plan on incorporating these features in new releases (they're already reserved on the GUI options screen). For most cases, an option similar to calibre's "Remove first image from source file" should suffice, but I also might want to detect <img> tags with a src image with "cover" in it's name or alt text with same.

The Perl script could also allow one to use a generic cover page that is placed in, say, the install directory, for example. While this cover image would be external to the source .htm for .mobi ebooks, the other ebook formats would probable use a cover.htm page in addition before the source .htm. This way it' would be more compatible for all formats.

Quote:

As an aside for Iphone/Ipod users: Use this to convert your Gutenberg books to ePub for Stanza. They will be perfectly formatted and you can autogenerate a cover. Best solution so far!

If I impose a (maximum) fixed width and height of 66% for 600x800 screens, that would mean that any large image would be reduced to 400x555 which would be acceptable to most ebook readers. I could use the "width=66%" or just use "width=400" and adjust the height after examining the actual image to determine it's image dimensions and aspect ratio. I'll experiment some more here...

Also missing, but a worthy addition is to autogenerate a Table of Contents ("TOC") and place it at the end. However, most PG HTML versions already have a "Contents" section, so I'll wait and see if there is demand for such a TOC feature.

Obviously, working with HTML as a starting point makes it easier to get all the "bells and whistles" we are used to seeing in hand-crafted ebooks created by those, like yourself, that do a marvellous job! In future, once the experimental nature of PG .mobi and .epub offerings become more standard, I can switch to using those as input instead of the HTML versions.

Working with .txt may require much more "polishing" by hand. Currently, GutenMark transforms any .txt only ebooks into acceptable .htm ebooks. I may incorporate this ability withing the Perl script using gut.pl (or newgut.pl discussed here)!

While GuteBook cannot be expected to properly detect and handle ALL PG quirks and idiosyncrasies, it makes a valiant attempt. I can improve the Perl script to "accomodate" any easily fixed quirk once it is made known which EText-No. PG ebook displays it. If you experience any formatting glitches, you can post your findings/fixes here and discuss/support their inclusion into future versions of GuteBook.

The biggest problem I've found with PG texts is that some of them have lost information contained in the original source book, e.g.,

Italics--many PG texts represent italics using UPPER CASE, which confounds real upper case with italics. For example, the following source will all be represented the same in the PG text (as HELLO WORLD):

hello worldHello World
HELLO WORLD

No "smart quotes". Opening and closing quotes are represented by the same character, sometimes a double quote, and sometimes a single quote. When single quotes are used, this is confused with apostrophes, which are usually represented by single quotes.

Confusion between hyphens and em-dashes. Hyphens should be represented by "-", and em-dashes by "--" in the PG text, which is easy to convert, but this is not always the case.

Indented text.

In my own private utilities I fix the above: convert to using opening/closing double quotes, correct apostrophe characters, correct use of italics etc. I do this automatically for many cases, and for the more difficult ones, I prompt the user for their resolution (what the correct character should be).

I wonder what your thoughts are on this and how it applies to Gutebook?

I wonder what your thoughts are on this and how it applies to Gutebook?

As I start with the HTML version, in most cases, these issues have already been solved by the PG DP consortium.

When dealing with .txt input, I've initially chosen to use GutenMark as my goal was to steer away from "reproducing" it's functionality, but rather concentrate on making the resulting .htm work/display better in dedicated ebook readers.

So while I applaud your efforts, it's not my focus here. I did look at many PG .txt to .htm routines (gut.pl, newgut.pl, gtxt2html.pl and even gutenbrowser) and I know it is a tremendous undertaking so I just reserved the right to address this issue in future releases.

If this is addressed elsewhere, my apologies, but is there anything like this for Mac users? I've been noticing a kinda profound lack of Mac apps for ebook creation/conversion.

I'm playing with Calibre, but curious if there's others out there.

The tool looks great Nick!

AFAIK, the GUI, Installer and included .exe's are Windows specific (my setup), but the gutebook.pl Perl script (the work-horse routine) was written to work under Perl 5.8.8 (buiild 820).

I know that calibre and eBook Publisher work in Mac OS X, however the gutebook.pl script relies on the Windows SBPubX COM interface for .imp creation (my ebook reader's format), so it may also require the Windows eBook Publisher to be installed under wine. I've done this previously for my Mobi2IMP software for my Linux netbook using the same NSIS Installer.

I'm sorry but I don't have access to any Mac to test this stuff on and/or debug it. Perhaps, someone else can help make this work under Mac OS X. I think it's doable!