Tuesday, February 8, 2011

Inside the new Kindle 'page numbers' feature

The new prerelease software for the Kindle 3 (v 3.1) has a feature called 'Real Page Numbers':

Real Page Numbers -- Our customers have told us they want real page numbers that match the page numbers in print books so they can easily reference and cite passages, and read alongside others in a book club or class. We've already added real page numbers to tens of thousands of Kindle books, including the top 100 bestselling books in the Kindle Store that have matching print editions and thousands more of the most popular books. Page numbers will also be available on our free "Buy Once, Read Everywhere" Kindle apps in the coming months. If a Kindle book includes page numbers, press the Menu key in an open Kindle book to display page numbers.[For a more complete description on Amazon.com, click here.]

Page numbering corresponds to a specific print edition, as identified by the print edition's ISBN number.

I was curious about the implementation, so I downloaded "The Girl Who Kicked the Hornet's Nest" from my Amazon Archive and had a look.

I noticed that there is now a sidecar file with ".apnx" file extension. Hmm, could this have something to do with page numbers? As in 'Amazon Page Number indeX'?

Indeed, viewing the file in a hex viewer confirms this suspicion:

At the top, you can see a string table/dictionary at the top (this one is for 'The Girl Who Kicked the Hornet's Nest'):

This is followed by an array of 16 byte values which appear to represent a sequence of numbers arranged in ascending order. I'm guessing that each of these defines an offset to the position that corresponds to the start of a given physical page number. The number of 16 byte values seems to be very close to the number of page numbers in the book (there are a few additional rows of bytes that precede the presumed 'page map' as such, and may have some special significance).

In the book I looked at, material before page "1" does not display a page number (such as i, ii, iii, iv etc.). (Wonder if that's a limitation of Amazon's page mapping scheme, or just what they did for this particular book?) I'd also note that the last page number (in this case '563') was applied to content that almost certainly spreads over more than one physical page, and indeed, is assigned to material not in the physical book. In this case, the ebook edition puts the copyright page at the end, as well as a cover image, these should not have been labeled as being page '563'. Okay, so it is not perfect, at least in this case.

Presumably this scheme also works with Topaz format books, a requirement Amazon would need to take on, and it's something they can do after material is submitted to them for publishing.

It's not clear how self-published books can get page numbers, since 'locations' don't exist until you bake the .azw file. Hopefully Amazon will clarify this for its KDP ('Kindle Direct Publishing') users.

I noticed that there are also two other file extensions associated with Kindle Store books now (not just those with page numbers):
.ea - this is an xml file that contains the data for the 'Customers who bought this book also bought' and 'More by this author' lists that now show up after the last page of the book, including ASIN so it can jump to the title's Kindle Store page.
.phl - is an XML file that identifies a position offset of popular highlights in the book, and the frequency number for each. That's probably been there for awhile, since the popular highlights feature was introduced for K2/DX.

I was curious as to when these files show up or are updated, so I turned off wireless, connected to my computer and deleted all three. Page numbers went away, the .ea lists went away (leaving only the 'tweet' and 'rate' links), and the popular highlights went away - as expected. Then I did 'Sync and Check for Items'. Still nothing. Finally, I removed the book and added it again. Everything's back!

So to take advantage of the new 'real pages' feature, it appears you must remove the item and download it again.