Computing, archiving, digital media, and a bit of historical speculation

Friday, 13 November 2009

making epubs from pdf's

As I've said earlier, I've found that epub files are definitely easier to work with on e-readers than pdf's.

My initial thought was, given I've a reasonable amount of stuff in pdf format to convert it. But how?

PDF files are essentially modified postscript with some embedded metadata but epub is a zip file based format with a manifest, formatting css and the document source material in xhtml - conceptually not unlike an open office document file in structure.

My initial thought experiment, based in part a very useful howto on hand creation of epub files was to write a print driver (ok, a ppd) to print the pdf to xhtml based on public domain pdf to thext and pdf to html code, apply a default style and create a manifest based on the embedded metadata.

However Stanza also allows the saving of pdf files in epub documents. Given that they have the technology, and I suspect that their epub conversion is perhaps a little more sophisticated given both that their native format is epub and they are now an amazon subsiduary.

No comments:

About Me

Been an IT professional, a field ecologist and tried my hand at research in psychology. Now retired, I'm a blogger, twitterer, traveller, pontificator and classical and early medieval history geek - I'm also known to enjoy a decent pinot noir, and late night conversations about central Asia, the Russian Revolution and just about anything else.
Some claim I know too much about some things, some that I know too little.