Rowberry, Simon (2016), ”Commonplacing the Public Domain: reading the classics socially on the Kindle”. Language and Literature. 25.3. 211–225.

This was published as part of a great special issue of Language and Literature on ‘Reading in the Age of the Internet’ edited by Daniel Allington and Stephen Pihlaja.

Abstract: Amazon leads the market in ebooks with the Kindle brand, which encompasses a range of dedicated e-reader devices and a large ebook store. Kindle users are able to share the experience of reading ebooks purchased from Amazon by selecting passages of text for upload to the Kindle Popular Highlights website. In this article, I propose that the Kindle Popular Highlights database contains evidence that readers are re-appropriating commonplacing – the act of selecting important passages from a text and recording them in a separate location for later re-use – while reading public domain titles on the Kindle. An analysis of keyness in a corpus of 34,044 shared highlights from public domain titles suggests that readers focus on words relating to philosophy and values to draw an understanding of contemporary society from these classic works. This form of highlighting takes precedence over understanding and sharing key narrative moments. An examination of the top ten most popular authors in the corpus, and case studies of Jane Austen’s Pride and Prejudice and William Shakespeare’s Hamlet, demonstrate variation in highlighting practice as readers are choosing to shorten famous commonplaces in order to change their context for an audience that extends beyond the original reader. Through this analysis, I propose that Kindle users’ highlighting patterns are shaped by the behaviour of other readers and reflect a shared understanding of an audience beyond the initial highlighter.

I was invited to give a talk for the Centre for the History of the Book at the University of Edinburgh. I took the opportunity to talk through some of the methodological challenges facing researchers of ebooks.

I’ve been seriously working on research for my history of the Kindle for a couple of years now and I’m still figuring out how to capture the impact of the Kindle on the scale of both the publishing/technology industry and the individual reader.

This tension is clearest when looking at the available data on reading and the shared highlights. There are a large number of individuals making personal choices behind the 500,000 shared highlights of a single edition of Wuthering Heights. If we scale this to over 4 million ebooks and 40 million Kindle users, it becomes extremely difficult to focus on both the local and global trends (and doubly so when access to the data is obsfucated and entirely unavailable): What counts as an appropriate sample? To what degree can individual highlights link to the mass of activity? How much data can I even get hold of?

While I ponder these questions, there’s still the problem of method. In order to figure this out, here’s a pilot study of the Harry Potter series as a complete unit that is manageable yet has received a fair amount of attention.

On the global level, shared highlights might not be able to tell us much about readership because an unknown number of readers choose not to highlight or share their efforts. The benefit of using Harry Potter, however, comes from the fact it is possible to gauge popularity across the series.

In recent versions of the Kindle software, a helpful pop-up box appears “About This Book” when opening a title for the first time. Luckily, this pop-up contains the total number of shared highlights and how many unique sections of the title have been highlighted. (These may not necessarily be up-to-date, but all the data here comes from 20 October 2015)

The data from the Harry Potter series reveals some interesting patterns. Figure 1 shows the total volume of shared highlights for each title, while figure 2 looks at the number of unique highlights per title. The most striking part of figure 1 is that the visible highlights (the top 10 most shared highlights) barely represent 10% of all shared highlights for any individual title.

Figure 1. Total highlights for each Harry Potter title and the visible top 10 highlights (click for full size)

Figure 2. Unique highlights for each Harry Potter title (click for full size)

While the two graphs appear to show that the popularity of the series drops at the end and plummets after the first novel only to be pick up towards the middle, there is a far simpler explanation: the longer books receive more highlights as there is more text to highlight.

The only notable exception is Harry Potter and the Philosopher’s Stone, where more readers are focusing on particular passages. The large increase in total highlights without a similar increase in unique highlights likely indicates that more people are reading the first book than the rest of the series, or at the very least, they lose enthusiasm after the first book.

The second macroscopic view we can get from the Popular Highlights is the location of the shared highlights. Jordan Ellenberg has coined the Piketty Index as a way of using popular highlight locations to see how far through a book a reader got before quitting. From the evidence I’m gathering, it looks like the top 10 shared highlights are more likely to appear at the beginning of a book than the end, but what about the Harry Potter series?

Figure 3. Top 10 Shared Highlights for each Harry Potter title (click image for full size)

As a series, readers are more likely to highlight passages at the end of the book than the beginning. Not only does this suggest that readers are likely to finish the books, but through looking at the content of the highlights from the end of the book, it is clear that some of the most popular parts of the titles are Dumbledore’s speeches to Harry and the denouement of the narrative. Given the make-up of Rowling’s series and the slow start of most of the books, this inversion makes sense.

And that’s about as much as you can deduce from looking at the global level as far as I can tell. Once I’ve dug into the more traditional annotations and highlights of individual readers, I’ll compare the results with the broad patterns identified here.

One of the problems with studying digital texts is coming up with a bibliographic description that captures enough information for others to identify (and often replicate the conditions) of the object. Unsurprisingly, ebooks have thrown up some interesting challenges for budding digital bibliographers.

Alan Galey has explored this issue across formats in The Enkindling Reciter. From this analysis, it is clear that the format of the ebook is important to record. For example, when talking about Walter Isaacson’s biography of Steve Jobs, the bibliographic record should indicate that the text was the ‘[Kindle edition]’ or ‘[EPUB]’. This is becoming standard practice in several venues, but is this sufficient to identify an edition?

Unfortunately, ebooks are likely to automatically update. Luckily, Amazon have several ways of identifying versions of a text:

the Amazon Standardized Identifier Number (ASIN), the 10-character string which identifies each record in Amazon’s catalogue, which can vary between separate editions of the same ebook. For Walter Isaacson’s biography of Steve Jobs, the Little, Brown Book edition is B005J3IEZQ, while the Simon & Schuster is B004W2UBYW. This is not the case of a same book reskinned for different markets, as the Simon & Schuster file is eight times larger than the Little, Brown edition, which I will discuss here.

The APNX file (used to generate page numbers) contains a ‘fileRevisionId’ (1378512022867) and ‘acr’, an identifier for a palm database (often a lengthy string, such as ‘CR!EBPXHWBERS4VV2GK50GFF58D17NS’). These values, while not infallible, can be used to match similar files.

Even this information is not sufficient for an accurate bibliographic description, since as I have argued elsewhere, the ebook must be considered as platform of at least four different layers: hardware, software, format and content. Without mapping all of these elements, it is impossible to accurately describe an ebook.

Just five words from Isaacson’s biography (“KOBUN CHINO. A Sōtō Zen…”) are sufficient to demonstrate why we need to pay closer attention to more than just the format of an ebook.

In the paperback edition of the text, the text is formatted with small caps and macrons on both the ‘o’s in Sōtō:

Walter Isaacson (2013) Steve Jobs. New York: Simon & Schuster, xiii.

The second generation Kindle renders this in a slightly different manner:

Kindle 2

This in turn is slightly different from the Kindle for Android, iPad, Mac & Cloud Reader edition:

Android 4.4.2 (Sony Xperia D2005 | Kindle for Android 4.13.0.203)

iOS 8.4 (iPad MD522B/A | Kindle for iPad 4.10)

Mac OS X 10.10.4 (Kindle for Mac 1.11.2 [40670])

Kindle Cloud Reader (Chrome 44.0.2403.125 | Mac OS X 10.10.4)

Variation in font and reading preferences aside, there are clear differences between versions that are of interest for the descriptive bibliographer. There are two major differences I want to highlight:

Sōtō doesn’t look right in any of the Kindle edition.

Kobun Chino’s name appears in small caps in the original print version, but not all Kindle platforms replicate this.

The first is a clear limitation of the Kindle platform and its design. Rather than using the rich and varied palette of a Unicode standard such as UTF-8 (allowing users to include a wide range of alphabets, and more importantly, emoji!), Amazon chose the much more restrictive Latin-1 encoding, which includes a range of diacritics and punctuation common to Latinate alphabets but not a lot else.

Unfortunately, this did not include the ‘o’ with macron, which just so happens to appear twice in a single word. Luckily, rather than simply removing the macrons, the producers have used a work round by including an image of the character. Unfortuantely, the image does not properly scale with the text and it only works with black text on a white background.

This has a couple of consequences for the ebook itself too, since it makes it impossible to search for ‘Sōtō’, as the text is either rendered into two single character words, or worse, turned into ‘St’. Not only does this make the word difficult to search for, but it also effects the quality of the Kindle’s text-to-speech facilities.

Sōtō rendered as “saint”

While the first bibliographic glitch was readily visible, the second would be difficult to spot without comparing different versions of the same edition. Formatting standards such as HTML, which ebooks use as their basic logic, are not hard laws, but recommendations for how to display text which can vary between different interpreters. Small caps is one of those features which is not universally supported by different instances of the Kindle application.

This may appear to be a minor aesthetic variation, but once again, it has an effect on the functionality of the ebook. Due to the variation in parsing the ‘small caps’ formatting tag, different versions of the Kindle software do not agree on whether the start of the ‘small caps’ formatting represents the start of a new word.

For example, Kobun Chino’s second name is rendered as ‘C hino’ on the iPad version, but remains ‘Chino’ on the Kindle for Mac version. This is a problem for readers who try to look up the name through the dictionary, Wikipedia or X-Ray, as the surname may be rendered as two separate words. Again, the text-to-speech functions of the Kindle stumble on this split word too, rendering some of the accessibility functions difficult to navigate.

CHINO VS. C HINO

It is clear that identifying the brand and associated file format alone will not suffice, and even the file format may not be enough due to variation among platforms. Hardware and software configurations make a real difference in the version and behavior of the file. Since Amazon’s file formats (AWZ, PRC, KF8 and so forth) are not openly documented, so it is insufficient to look at the source code, noting the software and OS may be a necessary step in ensuring the replicability and accurate documentation of Kindle ebooks. Even this may not be enough to stave off the constantly updating Kindle infrastructure, but at least it’s a start towards documenting a specific moment in time.

It has now been 20 years since Amazon sold its first book: the titillating-sounding Fluid Concepts and Creative Analogies, by Douglas Hofstadter. Since then publishers have often expressed concern over Amazon. Recent public spates with Hachette and Penguin Random House have heightened the public’s awareness of this fraught relationship.

It has been presented as a David and Goliath battle. This is despite the underdogs’ status as the largest publishing houses in the world. As Amazon has become the primary destination for books online, it has been able to lower book prices through their influence over the book trade. Many have argued that this has reduced the book to “a thing of minimal value”.

Despite this pervasive narrative of the evil overlord milking its underlings for all their worth, Amazon has actually offered some positive changes in the publishing industry over the last 20 years. Most notably, the website has increased the visibility of books as a form of entertainment in a competitive media environment. This is an achievement that should not be diminished in our increasingly digital world.

Democratising data

In Amazon’s early years, Jeff Bezos, the company’s CEO, was keen to avoid stocking books. Instead, he wanted to work as a go-between for customers and wholesalers. Instead of building costly warehouses, Amazon would instead buy books as customers ordered them. This would pass the savings on to the customers. (It wasn’t long, however, until Amazon started building large warehouses to ensure faster delivery times.)

This promise of a large selection of books required a large database of available books for customers to search. Prior to Amazon’s launch, this data was available to those who needed it from Bowker’s Books in Print, an expensive data source run by the people who controlled the International Standardised Book Number (ISBN) standard in the USA.

ISBN was the principle way in which people discovered books, and Bowker controlled this by documenting the availability of published and forthcoming titles. This made them one of the most powerful companies in the publishing industry and also created a division between traditional and self-published books.

Bowker allowed third parties to re-use their information, so Amazon linked this data to their website. Users could now see any book Bowker reported as available. This led to Amazon’s boasts that they had the largest bookstore in the world, despite their lack of inventory in their early years. But many other book retailers had exactly the same potential inventory through access to the same suppliers and Bowker’s Books in Print.

Amazon’s decision to open up the data in Bowker’s Books in Print to customers democratised the ability to discover of books that had previously been locked in to the sales system of physical book stores. And as Amazon’s reputation improved, they soon collected more data than Bowker.

For the first time, users could access data about what publishers had recently released and basic information about forthcoming titles. Even if customers did not buy books from Amazon, they could still access the information. This change benefited publishers as readers who can quickly find information about new books are more likely to buy new books.

World domination?

As Amazon expanded beyond books, ISBN was no longer the most useful form for recalling information about items they sold. So the company came up with a new version: Amazon Standardized Identifier Numbers (ASINs), Amazon’s equivalent of ISBNs. This allowed customers to shop for books, toys and electronics in one place.

The ASIN is central to any Amazon catalogue record and with Amazon’s expansion into selling eBooks and second hand books, it connects various editions of books. ASINs are the glue that connect eBooks on the Kindle to shared highlights, associated reviews, and second hand print copies on sale. Publishers, and their supporters, can use ASINs as a way of directing customers to relevant titles in new ways.

Will Cookson’s Bookindy is an example of this. The mobile app allows readers to find out if a particular book is available for sale cheaper than Amazon in an independent bookstore nearby. So Amazon’s advantage of being the largest source of book-related information is transformed into a way to build the local economy.

ASINs are primarily useful for finding and purchasing books from within the Amazon bookstore, but this is changing. For example, many self-published eBooks don’t have ISBNs, so Amazon’s data structure can be used to discover current trends in the publishing industry. Amazon’s data allows publishers to track the popularity of books in all forms and shape their future catalogues based on their findings.

While ISBNs will remain the standard for print books, ASIN and Amazon’s large amount of data clearly benefits publishers through increasing their visibility. Amazon have forever altered bookselling and the publishing industry, but this does not mean that its large database cannot be an invaluable resource for publishers who wish to direct customers to new books outside of Amazon.

Abstract: The Kindle’s launch in 2007 is considered pivotal in the transition of the eBook from marginal interest to mainstream phenomenon. This narrative marginalizes the pre-history of the eBook stemming from Bob Brown’s manifesto, The Readies, in 1929 through to Sony’s big push for public eBook acceptance with the Sony Librie in 2006. Traditional accounts of the eBook recall early failures to monetize the eBook through expensive hardware experiments from 1999 to 2006, but this ignores a wider range of precedents apparent from a media archaeological excavation of the eBook before the Kindle.

The current project traces the development of the eBook from the Kindle to its precursors outside of the dedicated hardware that typically characterizes the eBook’s incunabular period. It is clear that dedicated devices did not catch on prior to the Kindle, but this does not mean that a samizdat eBook culture did not exist. eBook reading prior to the launch of the Kindle was facilitated by applications for the portable devices such as PalmPilots and Game Boys. This media archaeological approach reveals the birth of the modern standards for eBook formats and how users were frustrated with the lack of available eBooks and often went to great lengths to create their own eBooks. This reaches its apex in the development of an eBook application for the Game Boy, where readers built a programme to read a range of titles from Robinson Crusoe to Lolita on the games console.

It is possible to see the foundations of the modern eBook from such activity, as the necessity for reflowable text when reading on a Portable Digital Assistant (PDA) led to the formation of the Open eBook Publication Structure (a precursor to the EPUB format) in 1999, and several portable devices such as the Game Boy Advance, PalmPilot and SoftBook had facilities for modems, allowing readers to receive books without using a computer, often seen as one of the core selling points of the original Kindle. Amazon regenerated the eBook marketplace by amalgamating these elements into a single package while leveraging their competitive advantage of their total dominance over online bookselling to transform the commercial eBook marketplace. Through reconstructing this 87 forgotten, and often-unauthorized history, it is possible to find a richer pre-history of the eBook than the generally established historical narrative of public hardware failures.

Abstract: Since the mid-2000s, the ebook has stabilized into an ontologically distinct form, separate from PDFs and other representations of the book on the screen. The current article delineates the ebook from other emerging digital genres with recourse to the methodologies of platform studies and book history. The ebook is modelled as three concentric circles representing its technological, textual and service infrastructure innovations. This analysis reveals two distinct properties of the ebook: a simulation of the services of the book trade and an emphasis on user textual manipulation. The proposed model is tested with reference to comparative studies of several ebooks published since 2007 and defended against common claims of ebookness about other digital textual genres.

Abstract: It is difficult to talk about the digitalization of the book trade without mentioning Amazon, but the constituency and scale of the retailer have not undergone large-scale critical scrutiny. Amazon’s infrastructure, including the integration of ISBNs into Amazon Standard Identification Numbers (ASINs), has shaped the book trade over last two decades, and in places, has replaced traditional sources of information such as Bowker’s Books in Print and Nielsen BookScan. Amazon thus presents a large cache of data for publishing studies, although Amazon is notoriously secretive.

The current project maps Amazon UK’s online bookselling infrastructure and offers an initial foray into how this data can be analysed to present a survey of the contemporary publishing landscape. While Amazon’s websites are a living resource that are difficult to map, there is an impetus to archive and analyse data immediately, as Amazon is not an archival resource, aptly demonstrated by their purge of pre-Kindle ebook data in 2007 and their recent closure of the public popular highlights function. To this end, the current project will provide an overview of Amazon’s digital infrastructure, followed by two practical applications: (1) tracking the used book marketplace with a focus on Vladimir Nabokov; and (2) analysing Amazon’s use a cataloguing tool for books not on sale through Amazon or third-party seller. Through these case studies, the paper aims to open conversations of how to use Amazon as a research tool as well as a research object.

Abstract: Mass digitization of text has resulted in the development of textual generators that are much more capable of writing through reading pre-existing chunks of text. While they do not understand the semantics of the text, many of these machines are capable of creating reasonably intelligible discourse through their reading and reassembly of pre-existing texts. Through targeting specific corpora (including Moby Dick and live data from a remote buoy; instructions from WikiHow; and a database of time zones), text generators and Twitterbots are creating engaging literary works. In this paper, I will theorise and historicise the development of reading automata within the wider context of the recent textual return in digital media facilitated by the development of ebooks and Twitter.

June 1st, 2015 § Comments Off on PUBLICATION: Indexes as Hypertext § permalink

Abstract: Digital media presents several challenges to the index, but this ignores the fact that the index has played an important role in the development of the computer. Hypertext, or links between chunks of text, is a vital concept in computation, and one which can be traced back to the index. The author explores the link between indexes and hypertext through three case studies of novels with indexes: Vladimir Nabokov’s Pale fire, Mark Z. Danielewski’sHouse of leaves and Steven Hall’s The raw shark texts. This analysis reveals how indexes can be used as a subversive part of experimental fiction that authors employ to encourage the reader to move beyond superficial forms of reading.