Hey, firstly let me thank you for this amazing program. It makes reading pdfs on a 6 inch e-ink device realistic. Most of my books come straight off of Amazon, but I have about eight pdfs aside from that. I'm trying to optimize my small collection for my Kindle, but I'm running into some problems with k2pdfopt. Basically, I use Briss to crop my pdfs, Acrobat to remove any hidden text, k2pdfopt to format (output as pdf), and finally calibre to convert. So far, I've only managed to convert only one document successfully. Two or three seem to be producing the following two issues while using k2pdfopt: uneven text sizes at random points (already tried -col 1 to prevent this) and large images or tables splitting up. Here is an example of what I'm talking about:

Any clues of what can be done to prevent this? If I could get k2pdfopt to deal with this issue, I would be able to successfully convert the rest of my pdfs

... I'm running into some problems with k2pdfopt. ... Any clues of what can be done to prevent this?

Can you post or send me the source PDF file (or enough of it to reproduce your issues) and the command you are using to convert it? It looks pretty clean, so I'm hoping some fine tuning of the conversion options will help. You can send me a private message if you don't want to post the PDF file in this thread.

Anything new on the "no vertical space before a title" issue? If it's still considered an issue of course. If needed, i can PM you with some samples which exhibit the same behavior - or is it a structural thing? (if wrapping is off, big blocks are simply glued one after another, without looking into them?). TY

Thank you for taking the time to help me. In general, I've been using -n -m 0 -col 1 for my conversions. I've extracted roughly ten pages from each document that seem to produce the error. The first one is a pretty well known text by Tae Kim found on his website. Take note of how the list on the first page is uneven after conversion and the tables are split throughout. The table issue is much more prevalent on the other two documents. On the C++ document, I end up getting pages of distorted images. Hope that helps.

Anything new on the "no vertical space before a title" issue? If it's still considered an issue of course. If needed, i can PM you with some samples which exhibit the same behavior - or is it a structural thing? (if wrapping is off, big blocks are simply glued one after another, without looking into them?). TY

I know what the issue is, so you don't need to send more examples. It's exactly what you suggested--big blocks getting appended to each other without looking inside of them for line spacings. I've got the suggestion on my "to do" list. I'm not sure exactly when I'll get out the next release at this point. I've put a lot of time into k2pdfopt in the last few months and I'm taking a bit of a breather for now, but I still want to keep it progressing.

Thank you for taking the time to help me. In general, I've been using -n -m 0 -col 1 for my conversions....

Try using -mode fw (fit width):

k2pdfopt -mode fw japanese.pdf

You can see what the -mode option does on my usage page, or take a look at the native output help page. If you don't want the output rotated, add -ls-. The -vb option (which is applied by -mode fw), when supplied with a negative value, keeps the scaling more uniform. [Yet another option is to set -odpi to a lower value so that k2pdfopt doesn't over-magnify certain regions.] For documents like you're reading--single column with lots of special spacings, figures, and indentation, the -mode fw option is best if it produces output large enough to read on your e-reader. If you need higher magnification than -mode fw, then you should try the default conversion (no arguments), which will allow text wrapping, but you'll probably get some occasional strange formatting, particularly on the Chem book which has stuff in the margins--that's something k2pdfopt doesn't handle well.

For the C++ doc, set the white threshold to a high value so that anything that isn't perfectly white is considered "not white":

-wt 250

That will do much better at keeping k2pdfopt from splitting up the figures, which have a lot of light grey backgrounds that k2pdfopt thinks are white space by default.

Here is a better description of how -vb works--a description which I'll put into the next release:

-vb <thresh> Set gap-size vertical-break threshold between regions that cause them to be treated as separate regions. E.g. -vb 2 will break the document into separate regions anywhere there is a vertical gap that exceeds 2 times the median gap between lines of text. These separate regions may then be scaled and aligned independently. Special values: Use -vb -1 to preserve all horizontal alignment and scaling across entire regions (vertical spacing may still be adjusted). Use -vb -2 to exactly preserve each region (both horizontal alignment and vertical spacing--this is the value used by -mode fw, for example). The default is -vb 1.75.

I know what the issue is, so you don't need to send more examples. It's exactly what you suggested--big blocks getting appended to each other without looking inside of them for line spacings. I've got the suggestion on my "to do" list. I'm not sure exactly when I'll get out the next release at this point. I've put a lot of time into k2pdfopt in the last few months and I'm taking a bit of a breather for now, but I still want to keep it progressing.

Thank you for answering, i will patiently wait for the next release, whenever that is

I just tried you software with two pdf and the risults are:
1. when trying to open them with Acrobat Reader it says "Error when opening the document. The file is broken and cannot be read"
2. when I put them in my Kindle Kindle 3 it cannot open them and says "This pdf cannot be opened due to embedded features not yet supported by Kindle"
What am I doing wrong? Thank you for supporting...

I just tried you software with two pdf and the risults are:
1. when trying to open them with Acrobat Reader it says "Error when opening the document. The file is broken and cannot be read"...

Adobe should be able to read the converted file. Would you be able to post the source PDF file and converted file (you can use the paperclip to attach files to posts)? Can Adobe read the source PDF file okay? What options are you using to convert? On what system? What version of Adobe?