Building a Digital Library

How close are we to fully digitised libraries? Director at Caper, Rachel Coldicutt and her partner scanned their entire book collection to find out.

This article was originally posted on The Literary Platform a site showcasing projects experimenting with literature and technology.

My partner, Matt, and I own quite a lot of books. As two English graduates who have worked in publishing, we’ve both spent the last two decades stockpiling knowledge by way of paperbacks. There are stacks of books we’ve worked on, books we’ve read, books we want to read and books that might be useful one day, and in the years we’ve lived together these stacks have grown to occupy most of the first-floor of our flat.

I started my career as a lexicographer and reference editor, and I’ve spent most of the last 15 years using massive archives – like Encyclopaedia Britannica or the V&A Collections – to do interesting digital things. Having spent most of my working life organising information, my home life is, quite naturally, a morass of disorganised confusion. For a long time, that was fine, because I knew where everything was. I never needed to make a list and had near instant recall over a wide range of subjects.

But then, I started to get older.

I started to find I couldn’t remember exactly which books I had read, let alone what was in them, and that the random queries I was used to sending out to the recesses of my mind were taking longer and longer to come back. I was also becoming frustrated that the physical objects I owned weren’t, somehow, digitally available and persistent: that I couldn’t look at a book that was sitting on a shelf at home when I was online or on my phone in the same way that I could listen to a piece of music I owned, wherever I was in the world. Also, more practically, I was growing fed up of never being able to find anything.

I’ve recently started looking into scanning my books so that I can read them wherever I am. There’s a brilliant community over at DIY Book Scanner working together to find the best ways of turning physical books into digital files. An open-source community, they’ve made hundreds of physical machines that execute the process of scanning and cleaning the pages in slightly different ways, with some models being able to handle up to 1200 pages an hour.

But first of all you have to make the scanner. And then you have to scan every page of every book you own. So although this was a completely free model, it seemed like a fairly ambitious place to start.

In order to ease myself in, I started to make a searchable index of my cookery books. The aim was to have an at-a-glance guide to which books used which ingredients and techniques, without having to open each one in turn and search through the index.

But most indexes are a mess. Even within the same publishing house, there is unlikely to be a standard taxonomy for recipe indexes. I started with half-a-dozen Italian books, and was quickly up against a war of authenticity which meant that – if I was going to get anywhere – I needed to compile my own taxonomy, including standard variants. Ribollita can become ‘Nona’s Favourite Soup’, a courgette can be a zucchini, pasta can be organised by shape or not organised at all, and there was no agreement on whether beans referred to greens ones or to pulses. I managed to do about six during my Christmas holiday, and then I had to stop. It was taking far too long.

I started to look into simpler ways of cataloguing our books and wanted to see what I could do with metadata, in lieu of having access to actual content. As an already keen user of Library Thing, I wanted to get my Library Thing profile to mirror the actual collection of books on my shelves – hoping this might at some point make it easier for me to replicate my physical collection in a digital format.

I started by very unscientifically asking people I knew on Twitter what they used, and those who used anything all appeared to use Delicious Library. If you want to live a really quantified life, Delicious can help you catalogue your board games, your electronic devices and even your clothes and jewellery, which might be handy if you’re moving to another country or making a comprehensive insurance inventory. I decided, for the time being, to stick just to the books.

The Delicious book catalogue connects each record to an existing authority file – most commonly the one held by Amazon. It costs $35 for a multi-use license, which you download as local application to a number of devices, allowing you to identify each book by scanning the barcode. If you quite enjoy the self-service checkout at Sainsbury’s, you can quite quickly add accurate edition details about the majority of your books.

As well as keeping a local version of the catalogue, which can be organised into as multiple internal categories (or ‘shelves’), you can also export it as XML, XSLT or CSV, or into an MLA-standard Bibliography for citations. If you wanted, you could also use it as a way of finding the value of your books, keeping track of what you’ve lent to whom, and buying and selling similar on Amazon.

I was mostly interested in exporting it to LibraryThing and having an accessible list of all the books I owned that I could look at remotely. I wanted to be able to stand in a bookshop and quickly refer to a list that showed me which Persephone novels or PG Wodehouse novels I owned without worrying about buying a duplicate. Meanwhile, we were also going to categorise every book we owned onto physical shelves, so that we could immediately reach for a thing and know where it was – something I hadn’t been able to do at home for nearly a decade.

What did we find?

We found about 30 books that don’t belong to us, about 10 duplicates (if you were wondering how many times can one person ready “Americana” by Don DeLillo and not remember it’s the same book, the answer is “three”), and a further 150 or so that we don’t want or need, which we then sold to a second-hand bookshop for the princely sum of £25.

That left 1,062 others, which are now – in the real world – distributed across five bookcases and separated into sixteen categories. Fourteen of these are organised by alphabetical order of author, with the other two (music and craft) organised by sub-category, as that made things easier to find.

We were able to barcode scan and organise all of the books with two people working fairly consistently over a single day. The next day, I went through Delicious and organised the books we had scanned into ‘shelves’, roughly equating to the categories we shelved the real books into.

We didn’t have a very hard-and-fast system for classifying physical books. We started the day having quite a few ridiculous conversations that I wouldn’t want many people to over hear – “Should we subdivide the feminist books by ideology?”, “What about the Greeks?”, “Does sheet music go with books about music?” Because physical books have to be put away, we needed to decide. The aim of the day was to put everything on a shelf, in a reasonable order, and that’s pretty much what happened.

It was a different story when I catalogued the books in Delicious. Categorising things on a screen feels much more of a commitment than organising them in the real world. Multiple tags increase the opportunity for ambiguity, but while an unclassified digital record is still searchable, an unclassified physical book is just lost. So only 1,048 of the 1,062 physical books have made it into digital categories – and while I’ve used the physical categories almost daily ever since, I’ve not had good reason to look at the digital ones again.

The digital categories have allowed me to take an overview of the kind of books we have. In reality, it was neither surprising nor very illuminating. 39% are straightforward Fiction; Crime and Thrillers represent an additional 6.9% and Poetry and Drama another 14%. This leaves 41% non-fiction books scattered over 13 categories. Of these the largest are Theory and Essays, Cookery, Music, and History and Culture.

I also have possibly thousands of recipes that are written on bits of paper, torn out of magazines or photocopied from books. I’ve been trying to sort these out for years. This hasn’t helped – although from blogging about the process, I now know from a friend that I could RDF scan these using Evernote, which would probably start to help with my original recipe index project. It’s now two months since we did this, and I’ve discovered the following:

It’s quite easy to keep books in order on shelves, and quite satisfying to add new ones to the existing system.

It’s much more difficult to remember to scan new books. We need a library-style system of labels or red dots on the spines to indicate which books we have and haven’t scanned and which we have or haven’t virtually shelved. While Matt was very patient about scanning all of the books the first time, I’m not sure our relationship will survive the professionalisation of our bookshelves to quite that extent.

I haven’t included ebooks anywhere.

The most useful output is the LibraryThing profile. I set up a new profile for our joint library. While I haven’t tagged this or put it into categories, it’s completely searchable and accessible to me wherever I have Internet access. Not only will this make buying Matt’s Christmas presents easier, it’s also the best antidote I’ve found to my constantly receding memory. Although I’ve only checked it about half a dozen times, each time has been incredibly useful – it’s either provided the answer to a nagging question, or stopped me from buying something I already own.

I have about 300 books in my personal LibraryThing profile. Even though this is less than a third of our total joint library, I’m still (as an individual) the LibraryThing member who has the most books in common with us as a couple.

But mostly, I’m still hoping the longer-term benefit will be that I’ll one day be able to replicate all of these books digitally. In the meantime, I have an almost-perfect digital record of all the books that we had in our house on one Saturday in August 2012.

ORG Events

Contact us

Email us

Write for us

ORGzine welcomes contributions. If you are interested in writing a comment on a digital rights issue,
please get in touch

About ORG

The Open Rights Group campaign for digital rights, and defend democracy, transparency and new creative possibilities.
ORGzine is the Open Rights Group digital magazine. The zine is a space for news, opinion, features, and debate over the social,
political and legal issues associated with digital rights.