pdf to word converter (not online)

This is a discussion on pdf to word converter (not online) within the General Discussions forums, part of the Community Boards category; I am interesting into a pdf to word converter. Well actually my dad needs that He wants to have it ...

Do you have "Word"? (This is not as weird as it seems; a lot of applications can read "Word" documents.)

If so, use "Calibre" to go from PDF to RTF with "Word" to go from RTF to your chosen "Word" format.

If not, download "Libre Office" and a PDF import module.

That's the only processes I'd recommend, but even then, upwards of over half the documents you try and convert will lose formatting to the point of illegibility. (I'm saying half based on my experience.)

I'd love to know why he thinks he needs some a tool!?

If this is about viewing and he doesn't like his PDF viewer, get a different one; there are a lot of open source and free PDF viewers.

If this is about editing one or two, tell him you'll do it and just use an online tool.

If this is about editing something that will be repeated, the multistage loss of formatting will be so costly with many PDF files that it would be faster to recreate the layout from scratch. (You can still use a tool like "Calibre" to rip the text and images.)

By the by, conversions between two formats with layout markup almost always suck.

upwards of over half the documents you try and convert will lose formatting to the point of illegibility.

The quality of most text processors is unbelievably poor. Before I went down the dark path of signal processing, I was a document guru. The reason so many PDF files fail to convert properly is due to a combination of poor generation of PDF and stupid coding of the conversion tools. Within a PDF document, the contents of each page are described using a PostScript-like (NOT PostScript) language, among whose many commands are commands for positioning and rendering of glyph sequences. For reasons that I was never able to discern over the period of a decade, some PDF generators will place text on the page in an almost completely random order. As in, imagine the words of this paragraph being laid down character by character, but not left-to-right, more like insano-mode.

Other things the poor, downtrodden PDF converter may run into, are a mixture of fonts and images together to form the text -- seemingly at random. Guess what, that means you need OCR -- artificial intelligence, basically -- to convert the document. Or, the use of multiple fonts, each of which contain different subsets of characters, even when only a single font was used. Or the use of custom encodings for absolutely no reason other than it made some schmuck programmer's life a miserable fraction easier.

On the conversion side, many converters won't even take the simplest of steps to try to undo this mess, such as organizing and sorting glyphs in topological order to recover the line and paragraph structures. It's a freakin' mess, and it's why I maintain that PDF is a print format, not a document format. None of this stuff matters for print, but for document processing it is a nightmare.

Note that none of these problems are intrinsically the fault of PDF. PDF has some weird crap in it, like JavaScript, but the fundamentals are solid, and in fact, PDF makes a not-too-bad object persistence format. That's right, I just said that PDF is an okay choice for an object-relational database.