Exercises in democracy: building a digital public library

The Digital Public Library of America—coming to an Internet connection near you?

Most neighborhoods in America have a public library. Now the biggest neighborhood in America, the Internet, wants a library of its own. Last week, Ars attended a conference held by the Digital Public Library of America, a nascent group of intellectuals hoping to put all of America's library holdings online. The DPLA is still in its infancy—there's no official staff, nor is there a finished website where you can access all the books they imagine will be accessible. But if the small handful of volunteers and directors have their way, you'll see all that by April 2013 at the latest.

Last week's conference set out to answer a lot of questions. How much content should be centralized, and how much should come from local libraries? How will the Digital Public Library be run? Can an endowment-funded public institution succeed where Google Books has largely failed (a 4,000-word meditation on this topic is offered by Nicholas Carr in MIT's April Technology Review)?

Enthusiasm for the project permeated the former Christian Science church where the meeting was held (now the church is the headquarters of Brewster Kahle’s Internet Archive). But despite the audience's applause and wide-eyed wonder, there’s still a long way to go.

As it stands, the DPLA has a couple million dollars in funding from charitable trusts like the Alfred P. Sloan Foundation and the Arcadia Fund. The organization is applying for 501(c)3 status this year, and its not hard to imagine it running as an NPR-like entity, with some government funding, some private giving, and a lot of fundraisers. But outside of those details, very little about the Digital Public Library has been decided. "We’re still grappling with the fundamental question of what exactly is the DPLA," John Palfrey, chair of the organization’s steering committee, admitted. The organization must be a bank of documents, and a vast sea of metadata; an advocate for the people, and a partner with publishing houses; a way to make location irrelevant to library access without giving neighborhoods a reason to cut local library funding. And that will be hard to do.

Real content, real concerns

When people hear "Digital Public Library," many assume a setup like Google Books: a single, searchable hub of books that you can read online, for free. But the DPLA will have to manage expectations on that front. Not only are in-copyright works a huge barrier to entry, but a Digital Public Library will be inextricably tied to local libraries, many of which have their own online collections, often overlapping with other collections.

An online library of America will have to strike a balance between giving centralized marching orders, and acting as an of decentralized cooperation. "On the one hand would [the DPLA only offer] metadata? No, that’s not going to be satisfying. Or are we trying to build a colossal database? No that’d be too hard," Palfrey noted to the audience last Friday. "Access to content is crucial to what the DPLA is, and much of the usage will be people coming through local libraries that are using its API. We need something that does change things but doesn’t ignore what the Internet is and how it works."

Wikimedia was referenced again and again throughout the conference as a potential model for the library. Could the Digital Public Library act as a decentralized national bookshelf, letting institutions and individuals alike contribute to the database? With the right kind of legal checks, it would certainly make amassing a library easier, and an anything-goes model for the library would bypass arguments over the value of any particular work. Palfrey even suggested to the audience that the DPLA fund "Scan-ebagoes"—Winnebagoes equipped with scanning devices that tour the country and put local area content online.

But the Wikimedia model, where anyone can write or edit entries in the online encyclopedia, could present problems for an organization looking to retain the same credibility as a local library. Several local librarians attended the conference, and voiced concerns over how to incorporate works of local significance and texts published straight to an e-book format, into the national library.

One member of the audience, who is also a volunteer for the DPLA, suggested in an afternoon presentation that the Library’s API incorporate an "up-vote, down-vote" system for works submitted by individuals. You could write a cookbook of Mexican food, he suggested, and if you don’t know anything about Mexican food, your book would be down-voted, and in a search it wouldn’t show up at the top of the list. A librarian sitting in front of him cautioned that appraising works before they end up in the Digital Public Library is crucial to maintaining its authority—an up-vote, down-vote system could never be enough of a sanity check. "Well if that’s true then Reddit wouldn’t work," the volunteer shot back. Of course, the trouble is that Reddit doesn’t work—not like a library, at least, where the voices of women and minorities tend to get shut out in favor of whatever lulz-zeitgeist hit the Internet that morning.

And America is huge: how do you appraise works that may be considered offensive or worthless in some areas (anything from C-list author creationist diatribe, to sex-instructional books with illustrations, to the Anarchist’s cookbook)? The easy answer is that all information should be accessible to anyone who wants it, but some curating might be necessary to make sure every library in America gets on board. Although he stipulated that his answer was speculative, Palfrey told Ars that individuals would not be contributing to the Digital Public Library, at least at the beginning. "Libraries have done this for a long time, [appraisal] is not a new problem," he said.

Similarly, the Scan-ebago idea is brimming with populist appeal, but Google Books is proof that it’s not always as easy as scanning and uploading documents that people want to see online. As a presentation titled "Government, Democracy, and the DPLA," pointed out, even government testimony, while not copyrightable by law, can contain text or images that are copyrightable, like an image of Mickey Mouse, for example. Scanning books is easy, but making sure you have all your legal bases covered before you upload text to the Internet is quite another.

The Internet Archive headquarters hosted the Digital Public Library of America's west coast conference.

And how about local book stores and big publishers? They make the content, and some of them will almost certainly try to stonewall this endeavor. But (unsurprisingly) no anti-digital-public-library publishers showed up at the conference that day. Publisher Tim O’Reilly of O’Reilly Media played the print industry’s white knight at the DPLA’s conference, explaining to the audience how his company adapted to the prevalence of on-demand information. "We’ve insisted from the beginning that our books be DRM free," He insisted to applause.

Brewster Kahle, another champion of digital (and physical) libraries and the founder of the hosting Internet Archive, suggested that the DPLA buy, say, five electronic copies of an e-book, and digitally lend them out, just like one rents a movie off Amazon or iTunes, which expires in 24 hours or a few days. When an audience member questioned Kahle on what it would take for publishers to nix DRM (or Digital Rights Management restrictions, which confine certain formats to specific e-book readers) for that rent-a-book idea to be more widely viable, Kahle replied facetiously, "Wanting to have a business at the end of the day?"

Kahle and O’Reilly are members of a growing number of publishing industry-types that believe that fixing books to a single e-reader platform is an unsustainable business practice that will naturally become extinct. Their enthusiasm is infectious, but the reality of DRM will certainly be a problem for the Digital Public Library in the short term, if not down the line as well. Wishing DRM away, or convincing charitable investors that’s it’s not going to be a problem, could be an Achilles heel for the Digital Public Library.

Organizing Metadata (where the DPLA can excel today)

While content is a thorny issue, what the DPLA can leverage to establish itself as a force that won’t be ignored by content providers, is the massive amount of metadata it’s collected about books, including data for over 12 million books from Harvard’s libraries. These aren’t actual books, but details about books you can find in libraries across the country. Sure, it’s not exactly a romantic liberation of information, but this data is a roadmap to everything that’s available out there, and where users can find it.

Building an API with all of this metadata is also the first step to the ideal because a digital library is useless if search doesn’t work. "It’s critical to think through search: how to leverage the distributed nature of the internet, and keep [content] in open formats that are linkable," O’Reilly said. With an open API, the organization’s extensive database could be distributed to all libraries to build their own digital public library on top of it.

There are other benefits to organizing all the metadata too. Involvement has long been an issue for local libraries, and members of the Digital Public Library’s volunteer development team suggested that the API could be used to build social applications on top of the DPLA platform, or map the database and include links to other relevant online databases of culture, like Europeana. "The DPLA could sponsor some research in managing all the metadata," David Weinberger, a member of the DPLA’s dev team, suggested. But in the meantime, the group is relying on volunteer time from developers at occasional DPLA-sponsored hackathons.

By April 2013, Weinberger said, the DPLA aims to have a working API with a custom ingestion engine to put metadata from library holdings online, a substantial aggregation of cultural collection metadata and DPLA digitizations, and community developed apps and integration. All mostly from the help of volunteers and open source enthusiasts.

The problem the DPLA has now, explained Weinberger, is figuring out how to build an API that makes use of all the metatdata without giving weight to information that will incorrectly classify a lot of the books. Similarly, he described the DPLA’s "deep, deep problem" of "duping" which happens when two caches of data describe the same book differently, leading to duplicates. Weinberger described the "clunky ingestion engine" as "wildly imperfect." If the project is going to get off the ground, it’ll need a lot of volunteer help, or a lot of money, and the DPLA is counting on the former, and hoping for the latter.

It has to happen, and fast

"Public education is the most radical idea in the world," Kristina Woolsey, director of San Francisco’s Exploratorium, said at last week’s conference. "Another radical idea as big as democracy is the idea of public libraries."

Despite the challenges facing the Digital Public Library of America, it’s a concept that needs to come to fruition sooner than later. Not simply because a Digital Library would be a professional accomplishment for many well-meaning intellectuals, but because citizens deserve a way to access, even just for the duration of a rental, the same ideas that people who live near better-funded libraries can access, without having to engage in piracy.

One of the earliest speakers at the conference, Dwight McInvaill, a local librarian for North Carolina’s Georgetown County Library, spoke of how important it is to digitize works for the good of the public. His own library’s digital collection gets over 2 million hits a month. "Small libraries serve 64.7 million people," he said, many of those in poverty. "We must engage forcefully in the bright American Digital Renaissance," McInvaill proclaimed. Either that, or be left in the physical book dark ages.

22 Reader Comments

This is a good idea. But considering how the entire US Government is hostile to any idea of Fair Use, it doesn't matter how good an idea it is. Whatever money these guys get, will go to paying lawyers or to awards for damages after publishers win the inevitable lawsuits. If even one Johnny Citizen gets to read a book that he hadn't paid for, I'll be shocked.

This is a good idea. But considering how the entire US Government is hostile to any idea of Fair Use, it doesn't matter how good an idea it is. Whatever money these guys get, will go to paying lawyers or to awards for damages after publishers win the inevitable lawsuits. If even one Johnny Citizen gets to read a book that he hadn't paid for, I'll be shocked.

Even the broadest understanding of fair use wouldn't include putting all books into a giant library that anyone could access.

This is a good idea. But considering how the entire US Government is hostile to any idea of Fair Use, it doesn't matter how good an idea it is. Whatever money these guys get, will go to paying lawyers or to awards for damages after publishers win the inevitable lawsuits. If even one Johnny Citizen gets to read a book that he hadn't paid for, I'll be shocked.

Even the broadest understanding of fair use wouldn't include putting all books into a giant library that anyone could access.

So besides the means of access (physical vs electronic) and the scale what would be the legal difference between a community library and an online repository of all works?

With a physical library I suppose the argument could me made that only a certain number of copies of each book are available, and while it's head-smackingly ridiculous that we allow access to information and technological progress to be held back by artificially created scarcity simply to kowtow to the invested interests of a dying business model, I suppose some sort of DRM could be worked out that allowed only a certain number of copies of each work to be 'checked out' at a time (there are already systems like this available through many public library e-lending systems).

Restricted works (for whatever reason) would be made available only under conditions specified in the restriction. Classified material available only to those with the proper clearance, copyrighted material available only to those the copyright owner allows, etc.

Government documents that are freely available would require a limited license that prohibits reuse of copyrighted material included in them. The government obviously granted itself a limited license to include the material in the document. That license should allow all readers of the document to see the entire document.

In addition to the library's own collection they should have a searchable index to the collections of other libraries and a searchable index of all known publications would be nice to have also.

Gaining access to electronic editions that are not in the DPLA, would require contacting the library which does have it in their collection and following their rules for accessing the material. A model for this already exists with Interlibrary Loan for physical media & the Link Lists found on many websites that are set up as reference works to point visitors to other sites hosting additional material they may be interested in.

So besides the means of access (physical vs electronic) and the scale what would be the legal difference between a community library and an online repository of all works?

You might as well say "So besides physics and current technology what's stopping us from achieving FTL space travel?"

Means of access, physical vs digital, and the resulting differences in scale are massively important factors here.

NulloModo wrote:

With a physical library I suppose the argument could me made that only a certain number of copies of each book are available...

You "suppose"? As if that's some outlandish argument some zany person might possibly bring up?

lol

NulloModo wrote:

and while it's head-smackingly ridiculous that we allow access to information and technological progress to be held back by artificially created scarcity simply to kowtow to the invested interests of a dying business model...

Right, the "invested interests", i.e. the people who actually invested in the "information" that you inexplicably believe yourself entitled to. Those assholes.

:|

Whine all you want about "failed business models", it won't change the fact that free access to the latest New York Times best sellers is neither a god-given right nor a fundamental law of nature. It's not even logical.

NulloModo wrote:

I suppose some sort of DRM could be worked out that allowed only a certain number of copies of each work to be 'checked out' at a time (there are already systems like this available through many public library e-lending systems).

The current (physical) library model is mutually beneficial to the public, the publishers, and the authors, in large part because tangible books have to be continually replaced for wear and tear. The inconvenience of physically checking out and physically returning physical items is yet another factor. Take away these myriad factors, and the relationship goes from mutualism to parasitism.

As you say, some of these factors could be transposed into the digital realm, in addition to the imposition of DRM so that "renting" didn't become as big a euphemism as "escorts". Unfortunately, even if a system for digital libraries could be set-up to be as mutually beneficial as physical libraries, the tech fundamentalist bitching would likely never cease, as your own post so conveniently demonstrates...

“One member of the audience, who is also a volunteer for the DPLA, suggested in an afternoon presentation that the Library’s API incorporate an "up-vote, down-vote" system for works submitted by individuals.”

I was that volunteer/audience member. Regarding “authority” and the public libraries: I hope it is clear to readers that providing “facts”, while important, is only one the many things public libraries provide to communities as centers for self-initiated, lifelong learning. I’m of the opinion that public libraries should be more than centers for information consumption, but also hubs for dialogue, active learning, information production, and co-creation. Community created content like the cookbook of Mexican food I mentioned doesn’t have to be *in* the DPLA, a third party could write an application that sits *on* the DPLA and integrates this kind of content into search results. The result would be a DPLA untainted by the misleading, terrible Mexican cookbook that I might (and probably would) write.

it's head-smackingly ridiculous that we allow access to information and technological progress to be held back by artificially created scarcity simply to kowtow to the invested interests of a dying business model

^ This.

"Lending" or "borrowing" an e-book is meaningless from an information perspective; it's as if I offered to lend out a blog post, the text of the Gettysburg Address, the bytes that encode a digital photograph, or the number 17.

Then again, I do remember the original Lisa file system, which actually deleted the original file on your hard drive if you moved it (rather than, say, duplicating it and moving the duplicate) to a floppy. The intention was to be more "intuitive" by making digital documents behave identically to their paper counterparts, and to behave consistently regardless of whether you were moving a file to another directory or another volume, much like the Unix mv command. Needless to say this resulted in horrified users wondering why their documents had vanished (answer: you probably moved them to a floppy disk) and was subsequently abandoned for the now-familiar yet rather inconsistent model that results in a file being duplicated when you drag it to external storage. This does make it harder to archive files to secondary storage in order to free up disk space, but it optimizes the common case and reduces the chance of data loss.

Overall, the "copy my file when I'm sending it somewhere else" model seems to have won out; certainly the scp command is much more popular than, say, an smv command which would delete your original file as soon as you finished copying it over the network.

I'm not clear on how is this different from the services provided by OCLC? Was anyone from that organization present at this meeting?

Quote:

While content is a thorny issue, what the DPLA can leverage to establish itself as a force that won’t be ignored by content providers, is the massive amount of metadata it’s collected about books, including data for over 12 million books from Harvard’s libraries. These aren’t actual books, but details about books you can find in libraries across the country. Sure, it’s not exactly a romantic liberation of information, but this data is a roadmap to everything that’s available out there, and where users can find it.

I'm sure the FBI will be very interested in seeing who's been checking out copies of the Anarchist's Cookbook. So the DPLA will need to figure out how to accommodate patron privacy, or violate one of the central tenets of the ALA.

Quote:

Building an API with all of this metadata is also the first step to the ideal because a digital library is useless if search doesn’t work. "It’s critical to think through search: how to leverage the distributed nature of the internet, and keep [content] in open formats that are linkable," O’Reilly said. With an open API, the organization’s extensive database could be distributed to all libraries to build their own digital public library on top of it.

and while it's head-smackingly ridiculous that we allow access to information and technological progress to be held back by artificially created scarcity simply to kowtow to the invested interests of a dying business model...

Right, the "invested interests", i.e. the people who actually invested in the "information" that you inexplicably believe yourself entitled to. Those assholes.

:|

Whine all you want about "failed business models", it won't change the fact that free access to the latest New York Times best sellers is neither a god-given right nor a fundamental law of nature. It's not even logical.

The idea that a fee should be paid in order to reproduce a creative work is a relatively modern idea in the grand scheme of things. I'm not for the abolishment of copyright, but it's not a fundamental law of nature either, and it should evolve as our technological capabilities do.

Quote:

NulloModo wrote:

I suppose some sort of DRM could be worked out that allowed only a certain number of copies of each work to be 'checked out' at a time (there are already systems like this available through many public library e-lending systems).

The current (physical) library model is mutually beneficial to the public, the publishers, and the authors, in large part because tangible books have to be continually replaced for wear and tear. The inconvenience of physically checking out and physically returning physical items is yet another factor. Take away these myriad factors, and the relationship goes from mutualism to parasitism.

As you say, some of these factors could be transposed into the digital realm, in addition to the imposition of DRM so that "renting" didn't become as big a euphemism as "escorts". Unfortunately, even if a system for digital libraries could be set-up to be as mutually beneficial as physical libraries, the tech fundamentalist bitching would likely never cease, as your own post so conveniently demonstrates...

I don't see how it's possible.

The current system is far more beneficial to publishers and authors than to the public.

How does having to rebuy books that are worn out (and thus putting a strain on library resources) and forcing the inconvenience of having to physically pick up and return the books help the general public?

I'm not nearly as concerned with protecting publishers and authors as I am with making information more readily accessible to the public in general. It's time for the pendulum to start swinging back to benefit the common man instead of big business.

With a physical library I suppose the argument could me made that only a certain number of copies of each book are available, and while it's head-smackingly ridiculous that we allow access to information and technological progress to be held back by artificially created scarcity simply to kowtow to the invested interests of a dying business model, I suppose some sort of DRM could be worked out that allowed only a certain number of copies of each work to be 'checked out' at a time (there are already systems like this available through many public library e-lending systems).

A perfect example of what Brewster Kahle meant with his "Wanting to have a business at the end of the day?" comment.

"The idea that a fee should be paid in order to reproduce a creative work is a relatively modern idea in the grand scheme of things. I'm not for the abolishment of copyright, but it's not a fundamental law of nature either, and it should evolve as our technological capabilities do. "

It's only in modern times that such reproductions can be free, or close to free. Even 100 years ago, there was no easy means of copying a book. You either hand-copied it, or you paid a printer to copy it. Neither were particularly cheap. So yes, in ye-olden-days you could copy a book without being sued or paying the original author/publisher, but it was far free free, and the technology was available only to the wealthy.

Copyright has been evolving, and is a direct response to the technological improvements in copying technology.

"m bear" is absolutely correct - OCLC, also known as WorldCat, already provides exactly this. You can use WorldCat to "Find In A Library" any book, CD, or other item you wish - it is essentially a library search engine - and once you have identified the book you want, you can easily click an Inter-Library Loan request which will cause the book to transport itself from whichever library owns it to your local library.

Essentially, your local library becomes your portal to every book in every library in the world!

Whine all you want about "failed business models", it won't change the fact that free access to the latest New York Times best sellers is neither a god-given right nor a fundamental law of nature. It's not even logical.

I'd phrase this a bit differently:

Quote:

Whine all you want about "pirates stealing your intellectual property", it won't change the fact that a legal monopoly on controlling access to the latest New York Times best sellers is neither a god-given right nor a fundamental law of nature. It's not even logical.

Copyright (or lack thereof) is a practical matter, not a "god-given right" nor a "fundamental law of nature". The public (via their elected government) chose to voluntarily (by passing a copyright law) grant temporary exclusive control (with certain fair use limitations) to authors as an incentive for those authors to create works. The public was not morally obligated to pass such a law, and the purpose was ultimately to benefit the public, not the authors and/or publishers. Financial benefits to authors and publishers are a means to an end (creation of more works), while the public's temporary restriction on copying is a harmful (but at the time necessary) side effect.

Within the context of the technology available at the time when the first US copyright law was passed, the public wasn't really giving up much, since most of the public didn't have the means to produce copies anyway. For the public, the trade-off of temporary restrictions for more incentive to authors was a pretty damn good deal, so the law was in the public interest.

Now, many people have observed some changes:1) Content industry lobbyists have constantly pushed for longer terms, to the extent that the public domain is now a wasteland of mostly irrelevant old works (with a few classics that withstood the test of time), rather than a useful collection of relatively recent works that have had enough protection to earn their authors a living.

2) The restrictions imposed by copyright now interfere with the useful benefits of modern technology. Digital technology can allow a level of distribution and sharing at nearly zero cost or effort that was impossible with physical printing, but all those possibilities are illegal when applied to works that aren't in the public domain. The result is that we've invented wonderful technology for free distribution of content, but in most cases, we're not legally allowed to use it to its full potential.

These changes have led some (including myself) to conclude that the copyright laws are no longer in the public's interest, at least the way they're currently written. I think there could be some use still for a short term copyright on commercial use and distribution (as advocated by the Pirate Party), but copyright as it stands now is against the public interest in my opinion. The law needs to put the public interest first.

A compromise could work initially. Some content can be rotating and free (sponsored by the site), but the rest just gets a pointer to it elsewhere. The free content could include all public domain works, the top 500 literary classics, key religious texts, and lots of history, how to, and other useful material. Rotate in some more populat books, and it could be a good first stop in the quest to find a book.

The idea that a fee should be paid in order to reproduce a creative work is a relatively modern idea in the grand scheme of things. I'm not for the abolishment of copyright, but it's not a fundamental law of nature either, and it should evolve as our technological capabilities do.

of course, the ability to mass reproduce creative works cheaply is a relatively modern development in the grand scheme of things too.

I'd like to echo the sentiments of m bear and commenter 5. I kept expecting to see mention of OCLC and/or WorldCat in the article. They've already got the connections to public and academic libraries and metadata covered. If some sort of partnership, or at least consultation, has not been considered, I would consider that a huge oversight.

Lots of hype - you could play bingo with some of these presentations - with no real direction after a year, and doesn't do much that WorldCat doesn't cover. On top of that, there are Ragnarok-level battles on the horizon with publishers, and any compromise will end poorly for the patrons (24 hr lending, 5 copies total?!).