PDFRead is a tool for converting non-DRMed PDF and DJVU documents for reading on eBook devices. It does this by creating an image out of each page, enhancing the image and then collating the images in a device-specific format.

PDFRead was created by Ashish Kulkarni and announced, over a year ago, in the thread 'PDFRead 1.7 released'.

I (Nick Rapallo) have been hacking PDFRead v1.7 since fall 2007 (prior to my join date here) and became a developer along with Ashish Kulkarni.

REQUIRED: You must have the (free) eBook Publisher software previously installed to facilitate the conversions to .imp and .oeb. You can install the eBook Publisher software by going here. Then choose to download and install the current version ( Win_eBookPub_2.2.5.exe ).

EDIT: Note you MUST enter your own Title, Author and Category in the GUI screen for the Conversion to begin, otherwise it won't start. (If you don't really need them, just enter 1, 2, 3 or T, A, C.)

I have implemented some enhancements and fixed minor bugs in PDFRead 1.8.2

Changes in this release:

Changelog [2008-04-16] 1.8.2 (by NR)

• added an 'imgdir' In Format where you can select any image in a directory and have all images (files) in that directory loaded. This is similar to an 'imglist' but creates its own list of filenames without needing a (previously created) text file.
• for .prc output, removed current limitation on image sizes (480 max. width) and now use a modified 'html2mobi.exe' program. This should no longer cause large white margins. Cybook Gen 3 users are cautioned that images larger than 480x640 may crash your ereader. Please limit the Size H: and V:!An alternative solution exists with 'mobi2mobi --gen3imagefix' offered by tompe's MobiPerl.
• remembers last 'Output' directory upon startup, but you will need to edit destination filename or may overwrite previous output. To reset it, just type 'default'.

Previous changes:

[2008-03-30] 1.8.1 (by NR)

• can now install PDFRead to a different drive than your C: drive; just keep the same subdirectory structure for the GUI options file to be loaded properly.
• now uses, as a default, the input filename as the output filename (without file extension).
• added .prc output format support using opf2mobi.exe from Mobiperl by tompe on mobileread.com.
• added .cbz/.cbr input support for Comic books using unrar.exe and creating a (sorted) list of image filenames.
• tweaked and added Profiles for PRS500, PRS505, PRC-Mobi (Kindle and Cybook Gen 3), iLiad. On REB1200 only provides a 2 pixel left and right margin to avoid bleeding into the edge of the screen.
The default Profiles are:

• added command-line option '-r' to indicate rotation; '--colorspace' to specify gray or color output; '--color' to override number of colors used; '--overlap_h' and '--overlap_v' to override default overlap between pages.
• added 'colorspace' type to specify output color: gray (max. 16 shades) or rgb (max. 256 colors from 16M)
• added 'color' as an option to use images with fewer colors and thereby reducing output file size proportionately.
• fixed imglist option to allow for relative files to the directory where the text list resides; no longer need full pathnames. (DaleDe's suggestion)
• fixed problem with (broken) list generation introduced by eBook Publisher.
• placing an empty file called 'debug' in the PDFRead home directory will allow the temp directory to not be deleted at completion.

I will continue to maintain PDFRead, hopefully only minor bug fixes and/or enhancements will be needed.

TO DO:
- add GUI option to select between MinFilter 3 (orig) or MinFilter 5 (new) dilate.
- add Mini tutorial to get the best use out of converting into supported ebook formats for the various eBook reader devices.

EDIT: 8 Mar 2009 - FOR KINDLE/iLIAD/CYBOOK USERS (fixed that .jpg quality compromise imposed on .prc files!)After executing the PDFRead Installer, from NRhtml2mobi.zip unzip the modified NRhtml2mobi.exe into the bin directory and overwrite the existing file. It's a hack that may render the .prc unreadable on Palm PDA's or even the Cybook Gen 3. In those cases, use mobi2mobi with the --gen3imagefix switch as indicated above.

Note: A Kindle 2 specific resolution (480x622) has been found to best work with no blank pages in between.

EDIT: 7 Jan 2011 - A (original).pdf** to (enhanced/cropped).pdf method has been devised, but not yet included within the PDFRead GUI program. It's available in this post. Contains a modified pdfread.exe executable that now limits the expansion of small cropped pages to a more reasonable level (finally!).

**Actually, you can use any Input Format (.PDF / .DJVU / .TIFF / .CBZ / .CBR ) in lieu of just .PDF!

The 'Profile' box contains the default settings for each "device". If you select a profile from the drop-down box, then the various default options are loaded in. Afterwards, 'Processing' options can be selected to override these defaults. The 'Out Format' box selects the resulting ebook format to be generated. It also overrides the default settings.

The 'reb1200C' Profile retains 256 colors (but uses 16M colors prior to doing the conversion) while the 'reb1200' (note no 'C' suffix) produces a 16 grayshade image. A profile name with the a '-p' suffix denotes a Portrait one as without it defaults to landscape.

FAQ: When I view a 1150 .IMP why does it open up as a 1200 .IMP?

What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp "thinks" its a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher software. That usually cures the problem!

GEB Librarian can get confused if BOTH the 1150 .imp and 1200 .imp are in the same bookshelf when being uploaded. I think it has to do with the same 'Unique Edition ID' being used for both versions made from the same input. The solution here is to NOT put BOTH in the same bookshelf; use one for your 1150 ebooks and another for the 1200 ebooks.

FAQ: I am converting to .imp output format, but I get an error from the 'generate_imp' module. Why does it fail?

REQUIRED: You must have the (free) eBook Publisher software previously installed to facilitate the conversions to .imp and .oeb.

You can install the eBook Publisher software by going here. Then choose to download and install the current version ( Win_eBookPub_2.2.5.exe ).

FAQ: Why does PDFRead create stray pages or small leftover bits?

When rotating your input, if you want the output to be split over just two page flips, then select 'landscape-half' as your Layout Mode. For everything on one page only, select 'landscape-full'. The default 'landscape' mode will split over the maximum number of page flips so that the full width is displayed. There is also 'landscape-2col' which will display quadrants of the page over four page flips.

FAQ: What is an 'imglist' in the 'In Format' drop-down box?

To use the 'imglist' input format just create a text file with a list of filenames in any image directory, then you can get a mini color photo album. The easiest way to get this list of filenames is to open the command prompt in your image directory and issue the dos command:

Code:

dir /b /on >list.txt

Then open list.txt from PDFRead. Happy Converting!

FAQ: What is an 'imgdir' in the 'In Format' drop-down box?

To use the 'imgdir' input format just select any filename in an image directory to get a mini color photo album without the need for a text file as above for 'imglist'. The only drawback is that there is no control over the order that the images are compiled this way. This method always sorts the filenames alphabetically. Image formats supported are .jpg, .png, .gif, .bmp and .tif.

FAQ: The resulting text is too blurry! Can anything be done?

• You can choose a landscape mode to improve the clarity/resolution.

• You can reduce the amount of dilation (thickening of the text) by either turning it off with the 'no dilation' option or by increasing the DPI from 300 to say 600.

• If your results are too blurry, try increasing slowly the 'Error Level' as well as trying the above. With the GUI, you can do multiple tests (especially on a limited number of pages) until you get it just right. Then do it again with no pages restriction.

FAQ: Can PDFRead convert any .pdf?

If the .pdf is not built properly (you may have to input the number of pages in the outout screen), or is encrypted or has security preventing printing/extraction, then it will not work with PDFRead. The .pdf must be free of any DRM prior to being given to PDFRead.

FAQ: Can PDFRead convert (original).pdf** to (enhanced/cropped).pdf

To facilitate easy conversion of the intermediate images created by PDFRead into .pdf, I have written a simple batch (sam2pdfread.bat) file that can be placed in the temporary directory (created when an empty file 'debug' is placed in the PDFRead install directory).

To accomplish this (original).pdf to (enhanced/cropped).pdf, do the following:

1. Unzip this file and copy the all files into the PDFRead/bin in the default install location (or copy the *.exe programs to a directory in your windows path)

2. Start PDFRead using the .pdf 'In Format" and .html 'Out Format'.

3. When PDFRead is finished, that .html 'Out Format will cause the temporary directory with the enhanced/cropped images to be visible.

5. The resulting .pdf will have the name of the InputFilename (actually the resulting .html in that temp dir) and should be moved to a more permanent location. The temp dir can then be deleted.

Note: Doubles the memory storage required as each image is converted to a .pdf while retaining the original image.

The 'sam2pdfread.bat' can be edited to tweak the parameters passed to sam2p and/or pdftk. Just experiment what works best for you!

EDIT: 7 Jan 2011 Fixed 'sam2pdfreadt.bat' to correct some sam2p housekeeping issues as well as recompiled pdfread.exe to pad image filenames with leading zeros so that combining *.pdf works properly. This now includes a newer version of pdfread.exe and library.zip so you may want to make backups of the copies in your 'bin' folder before copying these files over.

EDIT: 7 Jan 2011 Also, while I was recompiling pdfread.exe I fixed a LONG time irk for me, the expansion of small crops into huge images. This time around, any small cropped image will not increase in size more than 10% of the original image. A good tradeoff! All modified files are included in the 'sam2pdfread.rar' available here.

**Actually, you can use any Input Format (.PDF / .DJVU / .TIFF / .CBZ / .CBR ) in lieu of just .PDF!

TIP: Just hover your mouse over any of the options and a nice tooltip help will pop up if you keep it there steady. Great way to learn the program hands-on!

I tried to convert some files for my Ebookwise1150. I tried several different "output format" parameters: first IMP1, then IMP2. Both generated GEB1200 format IMPs, according to GEB eBook Librarian, and both were unreadable on my 1150. I tried this both from the GUI and from the commandline, same results.

I had not tried any earlier version of the program before this.

I then generated OEB format, and unpacked it in ETI's eBook Publisher, then generated Grayscale VGA-half .. these files worked fine on my 1150. So I have an "out".

The return of landscape is blessedly welcome....

[EDIT] Never mind, I figured out what I was doing wrong. I need to specify "gray" as the color as well as "IMP2" as the output format. All is well.

Last edited by rpresser; 03-29-2008 at 09:14 PM.
Reason: figured out my problem

Same for me. The program correctly makes ETI-2 sized PNGs when I choose choose IMP2 in the GUI, but it always puts them into a ETI-1 IMP at the last step. I get around it by choosing HTML and then building the IMP in Gemstar Publisher. It's done this for some versions.

I tried to convert some files for my Ebookwise1150. I tried several different "output format" parameters: first IMP1, then IMP2. Both generated GEB1200 format IMPs, according to GEB eBook Librarian, and both were unreadable on my 1150. I tried this both from the GUI and from the commandline, same results.

I had not tried any earlier version of the program before this.

I then generated OEB format, and unpacked it in ETI's eBook Publisher, then generated Grayscale VGA-half .. these files worked fine on my 1150. So I have an "out".

The return of landscape is blessedly welcome....

[EDIT] Never mind, I figured out what I was doing wrong. I need to specify "gray" as the color as well as "IMP2" as the output format. All is well.

The Profile box contains the default settings for each "device". If you select the 'ebw1150' profile, then the Colorspace should default to (16 shades) gray. The 'Out Format' box also impacts the default setting, especially if it overrides the default settings. The ebw1150 (or ETI-2) uses the 'imp2' format. Once these are straightened out, you should be able to generate the .imp to properly view on your EBW1150.

What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp opens up as a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher software. That usually cures the problem!

Another problem could be that GEB Librarian can get confused if BOTH the 1150 .imp and 1200 .imp are in the same bookshelf when being uploaded. I think it has to do with the same 'Unique Edition ID' being used for both versions made from the same input. The solution here is to NOT put BOTH in the same bookshelf; use one for your 1150 ebooks and another for the 1200 ebooks.

As for landscape mode, if you want the output to be split over just two page turns, then select 'landscape-half' as your Layout Mode. For everything on one page only, select 'landscape-full'. BTW, the 'landscape' default will split over the maximum number of page turns so that the full width is displayed.

Same for me. The program correctly makes ETI-2 sized PNGs when I choose choose IMP2 in the GUI, but it always puts them into a ETI-1 IMP at the last step. I get around it by choosing HTML and then building the IMP in Gemstar Publisher. It's done this for some versions.

Ditto!

What can go wrong is a 'bug' that has reared its head before, namely the 1150 .imp opens up as a 1200 .imp. The solution, in the past, has been to re-install the eBook Publisher software. That usually cures the problem!

TO DO:
- add PDFRead source to MobileRead Dev Hub
- add Mini-Tutorial to get the best use out of converting PDF/DJVU/TIFF/Imglist/CBZ/CBR into ebook format IMP/RB/LRF/PRC/OEB/HTML for devices like EBW1150/REB1200/PRS-500/PRS-505/KINDLE/CYBOOK GEN 3/ILIAD (run-on sentence?)

I have implemented many enhancements and fixed minor bugs in PDFRead 1.8.1 (see post#1 above in this thread)

Changes in this release:

Changelog [2008-03-30] 1.8.1 (by NR)

• can now install PDFRead to a different drive than your C: drive; just keep the same subdirectory structure for the GUI options file to be loaded properly.
• now uses, as a default, the input filename as the output filename (without file extension).
• added .prc output format support using opf2mobi.exe from Mobiperl by tompe on mobileread.com.
• added .cbz/.cbr input support for Comic books using unrar.exe and creating a (sorted) list of image filenames.
• tweaked and added Profiles for PRS500, PRS505, PRC-Mobi (Kindle and Cybook Gen 3), iLiad. On REB1200 only provides a 2 pixel left and right margin to avoid bleeding into the edge of the screen.
The default Profiles are:

I have used this to convert several large PDF books (400-800 Pages) that I downloaded from Google Books. I have used your program in both Vista and XP to convert to LRF format for my Sony prs505 and it works very well in both. I have had a couple books that the program hung up on, usually after 30 or so pages for some reason ?? Thank You for a wonderful program.

I have used this to convert several large PDF books (400-800 Pages) that I downloaded from Google Books. I have used your program in both Vista and XP to convert to LRF format for my Sony prs505 and it works very well in both. I have had a couple books that the program hung up on, usually after 30 or so pages for some reason ?? Thank You for a wonderful program.

Jim

Glad you found it useful!

I'm always looking to tweak the internal Profiles; just to make the program that much better!

I want to produce the largest possible image proportional to the devices' screen without any resizing/zooming effects.

May I ask if the resulting LRF adequately "fills" the screen of the Sony PRS-505 or is there a bit too much (white) margin i.e. top/bottom or left/right?

I'm always looking to tweak the internal Profiles; just to make the program that much better!

I want to produce the largest possible image proportional to the devices' screen without any resizing/zooming effects.

May I ask if the resulting LRF adequately "fills" the screen of the Sony PRS-505 or is there a bit too much (white) margin i.e. top/bottom or left/right?

The books I have been converting seem to vary since they are all scanned books and the 'person' doing the scanning doesn't take great care in making them look nice. But I would generally say that there is more white space at the top and to the right, sometimes losing a line at the bottom. I should add that I have been converting to prs505 portrait format.

Jim

Last edited by themoores1us; 04-08-2008 at 04:44 PM.
Reason: more info

The books I have been converting seem to vary since they are all scanned books and the 'person' doing the scanning doesn't take great care in making them look nice. But I would generally say that there is more white space at the top and to the right, sometimes losing a line at the bottom. I should add that I have been converting to prs505 portrait format.

Jim

I would treat a big white top margin with full extents, left and right as normal, but to have some leftover white on the right may indicate there are scanning artifacts on the page interferring with the internal cropping function that removes as much of the white space as possible (and retain the same aspect ratio). You could try entering Size H: and V: as different from 583 and 753 to help squeeze the white out! In the end, if the scanned page in portrait mode is not in the same aspect ratio as the 505 screen, then it may be NORMAL to have a small amount of white on top and to the right, so nothing to fret about!

You may greatly benefit from the 'unpaper' option. If you click the word 'unpaper' on the GUI option screen, your browser will be directed to a website detailing the benefits of using 'unpaper'. It basically tries to counteract the effects of "bad" scanning errors (dark areas on the side, skewed pages, etc).

IT IS VERY TIME CONSUMING THOUGH! Several minutes per page!

To activate it, just place parameters (like '-v' for now) in the input box beside the word 'unpaper' on the GUI option screen. You should only test it on a specific range of pages numbers i.e. 1 to 5 (which may take half an hour!)

Then refine the unpaper input paraemters so as to only perform the operations you need. I've only done a few tests and know it can automatically deskew (straighten) pages, but have not done this for an entire book, let alone a 300 page pdf! (Let's see, 5 min per page x 300 page pdf = 1500 min / 60 min per hour => 25 hours or a day!)

Last edited by nrapallo; 04-08-2008 at 06:24 PM.
Reason: clarified top and right margins may not be a problem

I have downloaded this programme. It looks just what I needed. I have Sony prs505, but am in the UK so can't login to buy books from official site. I purchased a book from a uk site in adobe pdf format. I tried to convert using this programm, and for a while there I thought I had done it....but the file size was only 1kb. When the conversion was running it said it couldn't determine amount of pages. When I typed amount in it seemed happy, but as each page was processed it said it was blank. What have I done wrong?

I have downloaded this programme. It looks just what I needed. I have Sony prs505, but am in the UK so can't login to buy books from official site. I purchased a book from a uk site in adobe pdf format. I tried to convert using this programm, and for a while there I thought I had done it....but the file size was only 1kb. When the conversion was running it said it couldn't determine amount of pages. When I typed amount in it seemed happy, but as each page was processed it said it was blank. What have I done wrong?

Just to make sure PDFRead is working properly, could you try converting any sample (non-ebook) pdf you have? Does it work for other ebooks you have?

Also, does the filename of your purchased .pdf contain "troublesome" characters, like foreign character or strange quotes?

Now with that out of the way, if the number of pages is not determinable, then there could be issues with that pdf.

Do you know if that .pdf was created with Acrobat version 8 (when loaded check 'PDF Version' in the 'Document Properties' under the File menu of the Abode viewer)? If so, there are issues with this recent version!

Please note that if you are comfortable at the command line/dos prompt, then in the bin directory of the PDFRead installed directory, there is a program called pdftk.exe . Just type the command below to get the same info as above:

Code:

pdftk "c:\directory\your purchased.pdf" dump_data

In the end, if the pdf is not built properly or is encrypted or has security preventing printing/extraction, then it will not work with PDFRead.