Why Are Ebooks Riddled With Typos?

That's the question over on The Verge: why is an ebook ever riddled with typos? It's an interesting point and one that must strike people as they stumble and stutter their way through something downloaded from Apple's iBooks or Amazon. The answer comes in two parts, each piece affecting a different part of the market. Both have the same root cause, which is that people just aren't paying enough attention to the production process.

The first part is in the self-produced and published realm. This is where a goodly portion of ebooks come from, after all. And the reason for so many typos here is that almost no self-publishers are passing their work under the nose of an editor. Perhaps this distinction is easier if I use the English English words, editor and subeditor (or "sub"). An editor decides on what is going to be published, how stories are going to be tackled. A sub does the work that is lacking here. Correcting typos, making sure sentences are whole, perhaps rewriting the occasional line to make it flow.

What almost everyone who hasn't worked for a newspaper doesn't understand is that all of the copy in anything has been passed under such a nose. The subs rescue many of us from the its/it's problems, dangling participles (not that I even know what that is) and so on. One of the things that everyone writing online has had to learn (or relearn!) is all of these rules. For we don't have subs online.

We've all also had to learn (or relearn) that subbing your own writing is near impossible. The eye just skips over some of the things that a writer is prone to. Sure, spellcheckers help but they're not perfect. And they won't help with grammar or the its/it's thing, or "arc" and "are".

So this is the first reason that many ebooks are filled with mistakes and typos. Many are being written by those who don't know about the vital function of the sub and wouldn't afford one even if they did. And given that it is incredibly difficult to sub your own work this might well be a problem that doesn't have a solution.

Then there's the second point:

Though I’ve only had the Kindle for three weeks, I’ve noticed that the book I’ve been reading, Foucault’s Pendulum, has many typos. This isn’t an out-of-copyright, cheaply made book from a fly-by-night press. This is marketed and published as a 2007 edition of the 1988 book by Mariner, an imprint of Houghton Mifflin. Its list price is $15.95, and it costs $8.77 on Amazon. Many of the typos — the letter "c" in place of what should be an "e" — appear to be the casualties of a hasty OCRing of some actual text of the work. OCR (Optical Character Recognition) is a process of scanning a book and using software which recognizes the scanned words as words, rather than merely as images, converting the images into text files. Anyone who has ever used OCR software knows that the process is far from perfect and always demands a serious attention to detail in the copy editing phase, once scanning is done, because the software doesn’t "read" the text perfectly. This seems to be at least partially what is happening in my Kindle edition of Foucault’s Pendulum, and it’s unacceptable.

Unacceptable or not, that's what someone has done. Simply OCR'd the printed text and not subbed it through again. This is a problem that has a solution: other than just insisting that all OCR'd text gets subbed. Time will take care of it in fact. The translation dates back to 1988 or so, maybe a year earlier. It's most unlikely that they have the original software files for that translation. Or if they did that it's in a format they can still read. Thus the OCR of course. But for new titles that come out in ebook form this problem shouldn't occur. For all of the editing and subbing of the text will happen in software. And the ebook and the printed version will come from the same master copy of that subbed and perfect version. So new ebooks we would expect to be as free of (which isn't entirely but nearly so) typos as printed books.