Thanks to Mark Coker, head of Smashwords.We welcome other reader contributions. If you have something you'd like to submit, feel free to send it to [email protected].One of my many joys of running Smashwords is working directly with authors every day who share my passion about the promise of e-books. Their feedback, dreams and frustrations are what guide our development.
The biggest challenge these authors face getting their book into e-book form is that they're held hostage by their previous conceptions regarding how a book should be formatted. Traditional print formatting is very forgiving. If you use space marks or tabs instead of indents, for example, as long as the words are arranged where you want them on screen or in your PDF, the book prints reasonably well and all your bad formatting habits are forgiven.
E-books aren't so forgiving, because for the most part, formatting is the enemy of good e-book formatting. If my statement sounds circular and nonsensical, allow me to elaborate.

In the e-book realm, authors must abandon the notion of the “page.” Pages have no meaning in e-book form, because pages become amorphous shape shifting creatures depending on the e-book reader; the reader’s choice of font size, font style or line spacing; or in the case of the iPhone, whether they’re holding it vertically or sideways.

When the notion of page disappears, it creates other problems for traditionally formatted books. The page numbers in your table of contents or index become meaningless. Your artificial page breaks, made via the common bad habit of multiple paragraph returns, create blank pages. Your forced page breaks disappear.

The secret to good e-book formatting is to keep it simple: A paragraph return at the end of a paragraph, a proper indent at the beginning of the paragraph, a couple paragraph returns between each chapter, things like that.

For long form narrative books, which is what most people read, readers buy books for the words, not the formatting. Don’t let your formatting get in the way of the words.

Mark, you are getting it wrong. The notion of page is not going anywhere. If you want to be able to quote a book, you need an universally agreeable page number. For TOC/picture index/tables index, you need it too. It may take one or more screens of iPhone or the device of your choice to display a page, but the page is here to stay. Take a look at Sony’s implementation of ePub to get a clue what I’m talking about – just pressing “Next” doesn’t necessarily getting you to the next page. To the next screen, positively.

I’m all for better formatted ebooks! Too many ebooks I’ve seen are just one line after the next with no breaks and you barely notice a new chapter has begun. I’ve seen tables spread all over the place with no effort made to make the content understandable. I’ve seen links that go on forever because of poor html. I’ve seen them with footnotes and you don’t even know it til the end because there was no effort made in the text to announce their existence. Often someone overrides the reader’s options for justification, color, font by hard coding them into the book. LOTS of books don’t have tables of contents, or don’t have them linked to the software, which is extremely important to me on my Cybook.

This is not only free ebooks. I have paid for several books with terrible formatting. I bought a book about ancient Italy and the map was about one inch square. A Cybook will display an image nearly as large as the screen. I can’t read a one inch square map. Yes, I do look at the maps provided in the front covers. It helps me understand the story, just like it’s meant to.

To me, good formatting adds a lot to the reading experience. I’d like to see more people taking it seriously.

Yes, paragraphs should be marked. Whether it’s by some weird escape code before/after each paragraph, or by surrounding each one with (<p>) and (</p>) is up to the format used.

> a proper indent at the beginning of the paragraph,

I have no idea what this even means. How things are indented are format-specific, but even so the actual indent should only be a default that can be overridden. Under no circumstances should spaces or tab-characters be used for indenting text.

> a couple paragraph returns between each chapter

No, no, no. Paragraphs are paragraphs and chapters are chapters. You must not “simulate” chapters by using empty paragraphs. In fact, empty paragraphs should be forbidden altogether, since there is no such thing really.

Formatting in ebooks should be made just like in modern, style-based word processors. I.e., the author/publisher/whoever should mark which parts of the text is what. Then the reader software can display TOCs, footnotes, hyperlinks, chapter-breaks etc. as it sees fit. The author can provide wishes (or “hints”) of what defaults should be used all else being equal (or even different defaults for different screen sizes), but it should always be possible to override these.

Amazon’s Kindle compiler has one nice feature–after doing the initial conversion pass on your source file, it allows you to download and tweak the XHTML it cranks out. I’d like to see Smashwords (I’d pay for a software package that did this) take a basic XHTML document and produce the same output in all the formats. To simplify things, there could be an option to predefine a handful of tags, for example, title and chapter heading (preceded by a page break).

It is very unfortunate that Smashwords advocate so many bad practices:
– formatting isn’t the enemy of good formatting at all: you just need to rely on more semantic elements and less style elements
– the concept of the page is important and a real support for paged-media is the next step for e-books if we want to create rich layouts that can adapt themselves to any screen (support for footnotes for example, which are quite different from endnotes, or multi-column layouts for newspapers)
– people should avoid at all cost the idea of separating chapters using empty paragraphs: instead they should properly indicate that there is a chapter, which will also be useful to create a table of contents

Services that rely on direct conversions (Amazon DTP, Smashwords) from DOC or RTF are crossing the line between simple and simplistic. The semantic markup is the most important aspect of a source format, both authors and publishers need to understand this as soon as possible.

Amazon’s Kindle compiler has one nice feature–after doing the initial conversion pass on your source file, it allows you to download and tweak the XHTML it cranks out.

Eugene, I wasn’t able to get this feature to work and, quite frankly, I found the whole Kindle process to be more of a PITA than any of the other formats I did—and I provided the XHTML document. It didn’t honor many of the tags the help file said it would and I had to reformat a good bit of it to reflect the tags it would honor.

It is possible I wuz doin’ it rong, but I couldn’t find a faster/easier way to do it and get even close to the look I wanted.

The XHTML I got back from the Amazon DTP was pretty much the same as what I was using, so that wasn’t an important step. I ran it though Mobipocket Creator until it looked the way I wanted it to, and didn’t notice any differences between what displayed in Mobipocket Reader and the Kindle preview mode (other than screen width). I haven’t looked at it on an actual Kindle, though. BTW, both Mobipocket Creator and ReaderWorks recognize the CSS attribute “page-break-before” and Mobipocket also has the proprietary tag “mbp:pagebreak.”

Some excellent discussion here, thanks. I should probably clarify a few points.

1. I’m advocating we move in the direction of simpler formatting, so that more books can be satisfactorily read on more devices and platforms than is currently the case. Project Gutenberg has done well here, IMHO. Their books may not be perfectly formatted, but they are eminently readable anywhere.

2. My post is as much an indictment against bad Word processing habits as it is against unnecessarily complex formatting.

3. My comments do not and cannot apply to all forms of writing. There are many types of books that are unreadable without rigid formatting. Those types of books will be slower to reach mass market adoption in ebook form.

@ Anonymous Coward: Pages cannot persist in the ebook realm unless the world agrees that a page can only consist of a fixed horizontal and vertical dimension and a certain number of words. I think we agree location of information is important, but rather than location being defined as a page it should be defined as where that information can be found.

@ Christine: yes, we all want good formatting that can add to the readability and enjoyment of the book. Too often, however, technologists develop elaborately complex solutions that fall flat and prevent readability. PDFs, for example, offer a horrible and inflexible reading experience unless the formatting is critical to the readability or printing of the content. Often, the content found in PDFs would be more readable if displayed as simple text or HTML.

@ Marcus.. re: indents: I agree 100%, but keep in mind I work with self-published authors. The number one problem we see is authors using spaces or tabs for indents. re: how formatting should be done with style based (or, as Hadrien proposes, semantic) formatting: Yes, it would be wonderful if all books were created that way, and if all that intelligent formatting could translate into all the different reading formats and reading situations. I’m just not optimistic we’ll get there any time soon.

@ Hadrian: Keep in mind, our focus is to take a single file and translate it reasonably well into multiple DRM-free ebook formats. We don’t strive for perfection, nor do we aim for mediocrity. The challenge we all face, especially as we see more and more works introduced from citizen authors, is that it’s difficult to divorce “the way people create” from “the way people *should* create.” The tips we offer our authors help them create a good looking multi-format ebook with minimal effort.

> There are many types of books that are unreadable
> without rigid formatting.

Just out of curiosity, what might these be?

> The number one problem we see is authors using
> spaces or tabs for indents.

I authors themselves don’t know what’s best for them then the publishers could help them by requiring documents to be in e.g. LaTeX. It’s very easy to (learn how to) do basic semantic markup with LaTeX, and usually faster than with wordprocessors.

> formatting should be done with style based (or, as
> Hadrien proposes, semantic) formatting

Actually I meant the semantic aspects of modern style-based wordprocessors, not the styles as such. Hadrien and I are completely on the same page (pun intended).

I certainly hope it is, but apparently not fast enough. Unfortunately there are too many stupid people stuck on illogical notions of how something “has to be”.

> If you want to be able to quote a book, you need an
> universally agreeable page number.

No, you don’t. Why on earth would you think one needs a page number for that? Have you even thought about it, or is it just some odd gut feeling you have?

It’s obvious that page numbers are utterly illogical to use for pageless formats. Logical alternatives are letter numbers, word numbers and paragraph numbers. One could also combine any of them with (hierarchical) chapter numbers. (And just add a suitable SI prefix whenever such a number gets too big. E.g., if L = letter, then 1000 L = 1 kL and 1000 kL = 1 ML etc.)

> For TOC/picture index/tables index, you need it too.

No, you most certainly don’t. TOCs and other indices are of course direct links in any sensible digital format. Again, have you ever even thought about any of this? I know you have used the web since you managed to write that comment, but you seem to be utterly clueless about digital text.

@Mark:
> Pages cannot persist in the ebook realm unless the world agrees that a page can only consist of a fixed horizontal and vertical dimension and a certain number of words.

The world seems to agree on that part, Mark – you’re trying to change that. As I have mentioned, the whole academic research spins around being able to quote things in order to verify someone’s research. Again, I’m not inventing anything here – take a look at Sony Reader. It *can* reflow PDFs. When reflowed, it *may* take up to five *screens* of Sony device to show that particular *page*, but a page in PDF is always that page with universally quoteable page number.

> Unfortunately there are too many stupid people stuck on illogical notions of how something “has to be”.

The whole academic community that works with notion of research is used to quoting other people’s work using page numbers. If you take page numbers away, you’re all on your own convincing all of these “stupid people” that they need to adapt a different way of quoting each other’s works in order to make their research independently verifiable.

And yes, calling people “stupid” when they express views different to yours really tells more about you that anything you say.

Pages may continue on even when a document is born digital. According to a variety of sources, the accepted average word count for a printed page is 250 words. So, using 250-word ‘pages’ as markers, like mileage signs between cities, provides a handy way to judge distances between where you are and where you are going.

For example, if a novel is published online and is 10 chapters and 100,000 words, it would be easier to think of it as 400 pages, even though it may never see print.

An important characteristic of digital content is its ability to deliver to multiple platforms simultaneously—to print, Web and mobile channels. Invariably, the same content will look different when viewed on various output devices, and it should. Each device has its own display characteristics, and the design of the presentation should be optimized for that device. [… more …]

For born-digital books, the standard bibliographic information in a footnote could be followed by: “search on ‘search string'” instead of the page number. Especially with resources like Google Books, that kind of footnote would be a lot more useful.

> If you think people are going to be more likely to
> accept your arguments if you call them stupid,
> you’re stupid.

I agree, and I don’t think that. (I suspect my “rudeness” is an expression of frustration caused by my helplessness against an overwhelming stupidity in the world. I don’t really dislike even grossly stupid people as such, but I do hate stupidity.)

I didn’t call any particular person stupid. If you yourself think you are one of the “stupid people stuck on illogical notions of how something “has to be”” then you are calling yourself stupid.

However, if you truly get hung up on ad hominem arguments like “You are rude, therefore you are wrong.” then you are indeed one of those people.

> The whole academic community that works with notion
> of research is used to quoting other people’s work
> using page numbers.

Oh, c’mon! Different universities, journals, proceedings, etc. use different formats for references and bibliographies. Heck, even different faculties within a university often use different formats. And the formats vary after what the targets of the references are. E.g., now when papers refer to webpages (which are inherently pageless) they seldom include any page numbers, and when they do it’s usually more out of ignorance than anything else. (A webpage included as an appendix in a paged format will obviously have pages that can be referred to, but then you’re referring to the paged appendix and not the webpage directly and thus that doesn’t count in this context.)

Very few universities, journals, conferences, etc. have already decided how to format references to ebooks, which are inherently pageless. Thus when they do make that decision they wouldn’t be changing anything if they decide to use e.g. paragraph numbers or letter numbers or just settle for chapter numbers for now. If some of them do decide to go with page numbers for some pageless format, such as webpages or ebooks, then that would indeed be a very stupid decision.

Stupidity is not far off when cluelessness reigns, and unfortunately a large portion of the old academia suffers from total cluelessness regarding digital media. I know professors who can’t read their own email, but have assistants to print out messages on paper and afterwards type in handwritten/spoken responses. I know professors who think the web is a 1-way channel like broadcast TV/radio. These people have unfortunately got stuck in an earlier epoch (often partially without them even realizing it).

As these people just barely lost the vote to keep an artificial “scroll number” back when pages were a new thing we can always hope they might have learned something from that. (Clarification: This paragraph is mostly a joke.)

> calling people “stupid” when they express views
> different to yours

You are either ignorant or lying. I have never, ever in my life (as far back as I can remember, which excludes my first 5 or so years) called someone stupid for expressing a view that is different to mine!

> using 250-word ‘pages’ as markers […] provides a
> handy way to judge distances between where you are
> and where you are going.

‘Paragraphs’, ‘words’ or ‘letters’ (with SI-prefix as needed, e.g. “kilowords”) are just as good, except for the fact that some people are not used to them yet. (Some are, though. E.g., some people are often given the task of writing an “N word essay/article” or even an “N letter essay/article”.)

After thinking about it for a good 5 seconds I’d say that I prefer paragraphs for references (since then there’s a (remote) chance it’d work even with translated versions), and letters for length (since then the actual size is not so language-specific (although it’d still be specific to the type of grapheme used, so maybe there’s something better that would work similarly even with asian, logograph-based languages)).

Getting used to something like this is a non-issue. If you read/write much then you’d get used to it in no time flat, if you sometimes read/write you only need to know how to approximately convert to whatever metric you’re more familiar with, and if you hardly ever read/write then it doesn’t matter whether you’re used to any particular metric or not.

If someone does decide to name and use some arbitrary number of words then I would highly recommend against using “page” or any other related word that already has a specific meaning. Having pages of different “pages” is just asking for trouble, and will cause needless confusion.

> For born-digital books, the standard bibliographic
> information in a footnote could be followed by:
> “search on ’search string’” instead of the page
> number.

A mobile reading device doesn’t have to have a small screen; e-ink screens will soon (~5 yrs) become foldable/rollable so that, when expanded, they’ll provide a reading surface as big as a letter-sized paper.

A page may not be the most logical division, but it certainly is a very psychological one. (Those who disagree perhaps don’t read much except blogs).

> e-ink screens will soon (~5 yrs) become […] as
> big as a letter-sized paper.
> A page may not be the most logical division, but it
> certainly is a very psychological one.

You are completely missing the point. It doesn’t matter what size some particular screen is. The relevant fact is that the same content will be shown in different text sizes, perhaps with different fonts or line heights and probably also on screens of different sizes. So, some cross-device “page size” metric is no longer the size of your actual page, but some arbitrary number of words, perhaps relative to the default font size the author/publisher specified and perhaps with different heading weights or chapter-related page-breaks, or somesuch arbitrarily chosen parameters that are more or less unrelated to the screen of some device.

So, if you’re talking about this arbitrarily selected combination of parameters for constructing an artificial page metric then it’s in no way related to what is shown on the display of some particular device at some particular time. OTOH, if you’re talking about a page actually shown on the screen of some device at some particular time then the term is meaningless for other devices or even the same device with other settings.

These facts are simple to understand, and as far as I can see there is no other logical conclusion than that using “page” as a metric is illogical and misleading (and thus probably counterproductive).

Actually, anyone who has seen legal documents, and many government documents, which may go through many revisions, will know that “page” numbers are less important than section and paragraph numbers.

Each section can contain an arbitrary amount of information, from a few lines, to a zillion piles of (possibly mind-numbing) data.

Pages may mean less and less, and in my opinion were never a terribly good way to indicate where in a book we were talking about (since each size and edition of a print book can end up with the same information ending up with a different page number).

But chapters? Sections? Bookmarks?

Those are rad.

When I read an ebook, I could care less what “page” I’m on. But I do jump around between chapters, and section markers, and bookmarks.

And I use hyperlinks, etc.

We don’t need to come up with a “standard” page size in order to communicate effectively about an electronic book. That’s just silly.

We just need to tag (and perhaps anchor) content throughout the book, with human and/or machine readable tags. A naming convention might be nice, but even that wouldn’t be strictly necessary.

Recent Comments

David Rothman { Yes, Anne, proprietary DRM frustrates me too. But at least with the standard ePub format, we're a good part of the way there. As for... } – Aug 02, 3:23 PM

Anne { @David- I think freedictionary.com needs to add something along the lines of "as long as the specs don't allow for the addition of a variety... } – Aug 02, 1:56 PM

David Rothman { I'll go with this definition of "standard" from thefreedictionary.com: "A set of specifications that are adopted within an industry to allow compatibility between products." Thanks,... } – Aug 01, 11:25 PM