Helicon Books

EPUB3 and digital books experts

EPUB3 vs. HTML5

By: Ori Idan, Helicon books CEO

Recently I have been reading more and more people saying that there is no need for EPUB3 since we have HTML5. HTML5 have the option to be read offline so it seems EPUB is yet another standard and thus there is no need for yet another standard.

In this short article I am going to explain why EPUB3 is not yet another standard, why it is important for digital books, and why it is not compete but complete HTML5.

Short background about HTML5 and EPUB3

Before I can explain why there is no contradiction between EPUB3 and HTML5 let's review briefly what is HTML5 and EPUB3

HTML5 is developed by a group named WHATWG (http://www.whatwg.org/) that is now part of the W3C (http://www.w3c.org). This is the fifth revision of HTML which I think is one of the most widely used standards in the Internet.

This is actually not yet a standard, it is now in the stage of "call to review" it is scheduled to become a formal W3C recommendation in 2014. This is a long standardization process started around 2004.

I had the chance to sit in few of their face to face meetings while I was the head of the Israeli W3C office, one thing I remember from these discussions are many arguments and discussions on each and every point of the standard, probably because there are many participants with different opinions and the W3C tries to get to a consensus before releasing a standard. But they finnaly did it and we have a specification now.

This revision of HTML is a giant step forward as it added and standardized many things among them are:

Canvas element for drawing 2d objects.

Video and Audio

New tags such as article, aside, nav etc. that enables easier parsing of data by software (semantic web)

Offline web applications

MIME type and protocol handler registration

In addition to the HTML5, the W3C released new version of the CSS standard (CSS3) and MathML for rendering mathematical equations.

When referring to HTML5 people usually means HTML, CSS, SVG, JavaScript and MathML.

HTML5 seemed to gain popularity among many websites and applications. This standard I think has the quickest adoption rate among W3C standards. This standard is also the most widely used W3C standard.

However HTML5 is still HTML and has some drawbacks when it comes to complicated documents that are not web based such as books.

HTML is a textual representation of a web page. Elements such as images, video, audio scripts etc. reside in additional files referenced from the main file.

So even a simple HTML document with few images will contain not just one file but several files.

This architecture is common and easy to use when working on the web. However when we want to store the document in any other means, we need a way to pack all components together.

The IDPF EPUB standard is essentially HTML documents packaged together with a package file that defines the actual files within the publication and defining the order by which the documents should be read (chapters of a book).

The EPUB standard also adds a table of contents file to allow for easy navigation between sections of the publication.

So in essence EPUB3 is not competing with HTML5, I would say that EPUB3 is a complementing standard. It uses HTML5 and CSS3 for the actual contents.

During the development of EPUB 3 the IDPF made a key decision, to tightly align with HTML5, SVG, CSS 3, and related modern Web Standards. Rather than defining “frozen” profiles of these standards, the general approach of EPUB 3 is to normatively reference the relevant standards in their entirety. This means that if it’s legal HTML5, it’s legal EPUB 3, period. And as HTML5 evolves, EPUB 3 is committed to evolving with it. This decision has made EPUB 3 much more fundamentally a portable document packaging of Web content, rather than a distinct format.

Do we need EPUB3 when everything is on the web?

The other argument I hear many times is that in today's world there is no need for standard like EPUB as everything resides on the web. I think there are several reasons why although we can put everything on the web, there is still a need for EPUB3.

In my opinion, in addition to what Bill McCoy wrote there are three main reasons why EPUB is still needed:

There are cases when web connection is not available or too expensive (travelling for example)

Band width limitation, imagine a class of more then 20 students trying to access the same website, the server it self will probably be able to serve the pages but the actual connection will be too slow.

Sustainability - When information resides on a web server, it is up to the web server owner to make is accessible to readers and when he/she takes the server down, the information will not be accessible any more.

When you download a book to your reader, you own the book and can do with it whatever you like. This is something that you don't have even in digital books bought from Amazon for the kindle, in kindle ecosystem, it's up to Amazon to let you read the book (and there have been cases when for no apparent reason people could not read a book).

What about offline HTML5?

HTML5 has an option for offline reading of a website, the standard defines a method by which a browser can download all resources needed for a website, so in essence this is the same as EPUB?

No it is not. EPUB file is one zip file containing a package file and all other resources needed by the book. The package also defines order of reading and a table of content.

HTML5 does not define order of reading or table of contents. You have to define your own method for table of contents and order of reading.

EPUB makes it easier to distribute a book since it is one file and not many files that need to be downloaded. Also since EPUB file is a compressed ZIP file, it is considerably smaller then all files together.

An EPUB reading system, knows how to decompress the file and read it. Browser are not build to use zip files, they are build to use one file at a time and call other files per need.

Summary

In this article I've shown how HTML5 is related to EPUB3 and why EPUB3 is not competing with HTML5.

One might say that EPUB3 is essentially a packaged web site for offline reading. This packaged website also contains all the book metadata so it is easier to distribute the book.In the future, the IDPF will continue to define standards that will complement the EPUB3 such as Fixed Layout books, dictionaries, indexes etc.