The CIA maintains a reference manual called the World Factbook. They used to release a new edition each year; recently they decided to only maintain the online edition. I found it to be an excellent source of information, and wanted to make an off line copy.

The result is what I call the World Fact eBook. It is currently only available in Mobipocket. I decided to focus on Mobipocket because the format has certain specialized html tags. This ebook has a search index for article title, keyword, country name, and flag. It can also be used as a dictionary by most versions of Mobipocket Reader. This means that if you are reading news on, for example the Kindle, you can look up a country name to learn more information.

The current version, 0.7, can be downloaded here. Epub, IMP, and Sony LRF will be available soon.

P.S. This was my first large project. The source material consisted of over 500 html files, and close to 800 pictures. Most of the content on the web pages had to be removed. I wrote a fair amount of code to automate the cleanup. I am looking for a new project where I can repeat the process. If you would like some other website converted into an ebook, please let me know. (Please consider the copyright situation before you ask.)

First of all, thanks for the conversion. I had a (relatively quick) look at it, this is really a nicely done book.

I do have a few technical questions though. I have two books with (I assume) similar source material, they are reference books consisting of many html pages with images and some active content (lookup etc.). I know that I'll have to remove the active content, but beyond that I'm really pretty clueless as to what tools I should use for the conversion. I'd like to get a toc like yours, where you first select the character and then get a list of topics starting with that character, but I don't know how to do that (apart from manually writing the html page, but that would be a major pain in the ass).

So, my questions are:
- What did you use to parse the html files? I'm assuming some scripting language?
- What program did you use to build the Mobi-file from the multiple html files?

First of all, thanks for the conversion. I had a (relatively quick) look at it, this is really a nicely done book.

I do have a few technical questions though. I have two books with (I assume) similar source material, they are reference books consisting of many html pages with images and some active content (lookup etc.). I know that I'll have to remove the active content, but beyond that I'm really pretty clueless as to what tools I should use for the conversion. I'd like to get a toc like yours, where you first select the character and then get a list of topics starting with that character, but I don't know how to do that (apart from manually writing the html page, but that would be a major pain in the ass).

So, my questions are:
- What did you use to parse the html files? I'm assuming some scripting language?
- What program did you use to build the Mobi-file from the multiple html files?

Thanks in advance for your answers.

For the base conversion I wrote scripts for jflex, which then created Java code. The scripts were basically a list of regular expressions and some Java code to execute when the regular expression is found (in the source html file). If you know Java and regular expressions, you can use jflex ( or C and flex, for that matter).

For the finishing touches I used Textpad. It can use regular expressions for the search functions, as well as work on several hundred open files at once.

The TOCs didn't quite have to be done by hand. One of the appendices already had one. After changing it to a form I prefer, I copied it to the other files. The anchor tags did have to be put in by hand, though.

I then used Mobipocket Creator to make the ebook. The user interface leaves something to be desired, but given that it saves you the effort of manually creating the OPF file, it's not bad.

The TOCs didn't quite have to be done by hand. One of the appendices already had one. After changing it to a form I prefer, I copied it to the other files. The anchor tags did have to be put in by hand, though.

I then used Mobipocket Creator to make the ebook. The user interface leaves something to be desired, but given that it saves you the effort of manually creating the OPF file, it's not bad.

I have used Mobipocket Creator to add hyperlinks using it's Table of Contents section. The external TOC file it creates can then be used (merged in) and edited to strip all but the headings to remain in the TOC. It's a very powerful resource to add to ones toolset. The exact syntax I used is in the screenshot attached below.

The side-effect is that all the <a name>'s inserted can be then be referenced from within the ebook. That may have to be done by hand (I semi-automated this) but half the task was done, the insertion of the <a name> (or <a id>) and assigment of unique id labels.

This was the technique I used to add all those new hyperlinks to the Webster's Dictionary 1913 v2.0. (A version 2.1 with minor improvements will be uploaded soon ).

First off, that's a great idea!
Perhaps I'll be working on an LRF version of this book myself,for fun, and share it.

I just wanted to remind you of the line in the copyright which states:

Code:

"...The official seal of the CIA, however,
may NOT be copied without permission as required by the CIA Act of 1949 (50 U.S.C. section 403m).
Misuse of the official seal of the CIA could result in civil and criminal penalties...."

The rest of the book is in public domain.

So be sure you don't put the seal in your book!
Otherwise thanks for the effort! Looks like an interesting book to assemble!

If you promise to put real effort in to making the ebook look nice, I'll give you my working files.

No,really I just do this for the fun of HTML encoding. I'm learning it;
but tell me, you took the printed version,just create one big HTML file,and add the flag + flag info to every country?

Unfortunately in LRF I can't keep the original formatting.. I was thinking in lines of creating one chapter per country, subchapter being the flag, flaginfo, map, and following all the other information.

What approach did you use? (I'd be interested to see how you did it).
I was basically merging all info in one big file,cleaning it up a bit with notepad++ advanced search and replace, and then infusing all flag files manually (basically some copy paste work).

I used the print version,because it's cleaner than the web version to work with.
(oh,also remove the tables,that's a bit of a pain,I'm still looking into that. it's easy to remove them with Search&Replace, but I don't want to delete any valuable info, neither end up with broken HTML code).

Then at the end I still need to add the appendixes and the rankorder directory (2001rank.html to 2211rank.html)... I'm still figuring out how to do that; analyzing the content thereof...

No,really I just do this for the fun of HTML encoding. I'm learning it;
but tell me, you took the printed version,just create one big HTML file,and add the flag + flag info to every country?

Unfortunately in LRF I can't keep the original formatting.. I was thinking in lines of creating one chapter per country, subchapter being the flag, flaginfo, map, and following all the other information.

What approach did you use? (I'd be interested to see how you did it).
I was basically merging all info in one big file,cleaning it up a bit,and then infusing all flag files.
I used the print version,because it's cleaner than the web version to work with.

I found it better to keep the files separate. There are about 9 single files, and 2 groups of files (260 country pages and 250 flag pages). The single files have to be edited one at a time, but each group of files can be edited at once.

I started with the web version, but kept none of the original formatting. Instead, I replaced it with some very basic html tags.

The formatting of each group is internally consistent. When you figure out what looks best on the Sony Reader, you can change it all at once. If you instead decided to copy everything in to one file, you will need to edit it in a linear fashion.

It's going to take you at least 20 hours of work to get the source material to where I have it. Editing it one line at a time is really boring.