Exercises in democracy: building a digital public library

The Digital Public Library of America—coming to an Internet connection near you?

Most neighborhoods in America have a public library. Now the biggest neighborhood in America, the Internet, wants a library of its own. Last week, Ars attended a conference held by the Digital Public Library of America, a nascent group of intellectuals hoping to put all of America's library holdings online. The DPLA is still in its infancy—there's no official staff, nor is there a finished website where you can access all the books they imagine will be accessible. But if the small handful of volunteers and directors have their way, you'll see all that by April 2013 at the latest.

Last week's conference set out to answer a lot of questions. How much content should be centralized, and how much should come from local libraries? How will the Digital Public Library be run? Can an endowment-funded public institution succeed where Google Books has largely failed (a 4,000-word meditation on this topic is offered by Nicholas Carr in MIT's April Technology Review)?

Enthusiasm for the project permeated the former Christian Science church where the meeting was held (now the church is the headquarters of Brewster Kahle’s Internet Archive). But despite the audience's applause and wide-eyed wonder, there’s still a long way to go.

As it stands, the DPLA has a couple million dollars in funding from charitable trusts like the Alfred P. Sloan Foundation and the Arcadia Fund. The organization is applying for 501(c)3 status this year, and its not hard to imagine it running as an NPR-like entity, with some government funding, some private giving, and a lot of fundraisers. But outside of those details, very little about the Digital Public Library has been decided. "We’re still grappling with the fundamental question of what exactly is the DPLA," John Palfrey, chair of the organization’s steering committee, admitted. The organization must be a bank of documents, and a vast sea of metadata; an advocate for the people, and a partner with publishing houses; a way to make location irrelevant to library access without giving neighborhoods a reason to cut local library funding. And that will be hard to do.

Real content, real concerns

When people hear "Digital Public Library," many assume a setup like Google Books: a single, searchable hub of books that you can read online, for free. But the DPLA will have to manage expectations on that front. Not only are in-copyright works a huge barrier to entry, but a Digital Public Library will be inextricably tied to local libraries, many of which have their own online collections, often overlapping with other collections.

An online library of America will have to strike a balance between giving centralized marching orders, and acting as an of decentralized cooperation. "On the one hand would [the DPLA only offer] metadata? No, that’s not going to be satisfying. Or are we trying to build a colossal database? No that’d be too hard," Palfrey noted to the audience last Friday. "Access to content is crucial to what the DPLA is, and much of the usage will be people coming through local libraries that are using its API. We need something that does change things but doesn’t ignore what the Internet is and how it works."

Wikimedia was referenced again and again throughout the conference as a potential model for the library. Could the Digital Public Library act as a decentralized national bookshelf, letting institutions and individuals alike contribute to the database? With the right kind of legal checks, it would certainly make amassing a library easier, and an anything-goes model for the library would bypass arguments over the value of any particular work. Palfrey even suggested to the audience that the DPLA fund "Scan-ebagoes"—Winnebagoes equipped with scanning devices that tour the country and put local area content online.

But the Wikimedia model, where anyone can write or edit entries in the online encyclopedia, could present problems for an organization looking to retain the same credibility as a local library. Several local librarians attended the conference, and voiced concerns over how to incorporate works of local significance and texts published straight to an e-book format, into the national library.

One member of the audience, who is also a volunteer for the DPLA, suggested in an afternoon presentation that the Library’s API incorporate an "up-vote, down-vote" system for works submitted by individuals. You could write a cookbook of Mexican food, he suggested, and if you don’t know anything about Mexican food, your book would be down-voted, and in a search it wouldn’t show up at the top of the list. A librarian sitting in front of him cautioned that appraising works before they end up in the Digital Public Library is crucial to maintaining its authority—an up-vote, down-vote system could never be enough of a sanity check. "Well if that’s true then Reddit wouldn’t work," the volunteer shot back. Of course, the trouble is that Reddit doesn’t work—not like a library, at least, where the voices of women and minorities tend to get shut out in favor of whatever lulz-zeitgeist hit the Internet that morning.

And America is huge: how do you appraise works that may be considered offensive or worthless in some areas (anything from C-list author creationist diatribe, to sex-instructional books with illustrations, to the Anarchist’s cookbook)? The easy answer is that all information should be accessible to anyone who wants it, but some curating might be necessary to make sure every library in America gets on board. Although he stipulated that his answer was speculative, Palfrey told Ars that individuals would not be contributing to the Digital Public Library, at least at the beginning. "Libraries have done this for a long time, [appraisal] is not a new problem," he said.

Similarly, the Scan-ebago idea is brimming with populist appeal, but Google Books is proof that it’s not always as easy as scanning and uploading documents that people want to see online. As a presentation titled "Government, Democracy, and the DPLA," pointed out, even government testimony, while not copyrightable by law, can contain text or images that are copyrightable, like an image of Mickey Mouse, for example. Scanning books is easy, but making sure you have all your legal bases covered before you upload text to the Internet is quite another.

The Internet Archive headquarters hosted the Digital Public Library of America's west coast conference.

And how about local book stores and big publishers? They make the content, and some of them will almost certainly try to stonewall this endeavor. But (unsurprisingly) no anti-digital-public-library publishers showed up at the conference that day. Publisher Tim O’Reilly of O’Reilly Media played the print industry’s white knight at the DPLA’s conference, explaining to the audience how his company adapted to the prevalence of on-demand information. "We’ve insisted from the beginning that our books be DRM free," He insisted to applause.

Brewster Kahle, another champion of digital (and physical) libraries and the founder of the hosting Internet Archive, suggested that the DPLA buy, say, five electronic copies of an e-book, and digitally lend them out, just like one rents a movie off Amazon or iTunes, which expires in 24 hours or a few days. When an audience member questioned Kahle on what it would take for publishers to nix DRM (or Digital Rights Management restrictions, which confine certain formats to specific e-book readers) for that rent-a-book idea to be more widely viable, Kahle replied facetiously, "Wanting to have a business at the end of the day?"

Kahle and O’Reilly are members of a growing number of publishing industry-types that believe that fixing books to a single e-reader platform is an unsustainable business practice that will naturally become extinct. Their enthusiasm is infectious, but the reality of DRM will certainly be a problem for the Digital Public Library in the short term, if not down the line as well. Wishing DRM away, or convincing charitable investors that’s it’s not going to be a problem, could be an Achilles heel for the Digital Public Library.

Organizing Metadata (where the DPLA can excel today)

While content is a thorny issue, what the DPLA can leverage to establish itself as a force that won’t be ignored by content providers, is the massive amount of metadata it’s collected about books, including data for over 12 million books from Harvard’s libraries. These aren’t actual books, but details about books you can find in libraries across the country. Sure, it’s not exactly a romantic liberation of information, but this data is a roadmap to everything that’s available out there, and where users can find it.

Building an API with all of this metadata is also the first step to the ideal because a digital library is useless if search doesn’t work. "It’s critical to think through search: how to leverage the distributed nature of the internet, and keep [content] in open formats that are linkable," O’Reilly said. With an open API, the organization’s extensive database could be distributed to all libraries to build their own digital public library on top of it.

There are other benefits to organizing all the metadata too. Involvement has long been an issue for local libraries, and members of the Digital Public Library’s volunteer development team suggested that the API could be used to build social applications on top of the DPLA platform, or map the database and include links to other relevant online databases of culture, like Europeana. "The DPLA could sponsor some research in managing all the metadata," David Weinberger, a member of the DPLA’s dev team, suggested. But in the meantime, the group is relying on volunteer time from developers at occasional DPLA-sponsored hackathons.

By April 2013, Weinberger said, the DPLA aims to have a working API with a custom ingestion engine to put metadata from library holdings online, a substantial aggregation of cultural collection metadata and DPLA digitizations, and community developed apps and integration. All mostly from the help of volunteers and open source enthusiasts.

The problem the DPLA has now, explained Weinberger, is figuring out how to build an API that makes use of all the metatdata without giving weight to information that will incorrectly classify a lot of the books. Similarly, he described the DPLA’s "deep, deep problem" of "duping" which happens when two caches of data describe the same book differently, leading to duplicates. Weinberger described the "clunky ingestion engine" as "wildly imperfect." If the project is going to get off the ground, it’ll need a lot of volunteer help, or a lot of money, and the DPLA is counting on the former, and hoping for the latter.

It has to happen, and fast

"Public education is the most radical idea in the world," Kristina Woolsey, director of San Francisco’s Exploratorium, said at last week’s conference. "Another radical idea as big as democracy is the idea of public libraries."

Despite the challenges facing the Digital Public Library of America, it’s a concept that needs to come to fruition sooner than later. Not simply because a Digital Library would be a professional accomplishment for many well-meaning intellectuals, but because citizens deserve a way to access, even just for the duration of a rental, the same ideas that people who live near better-funded libraries can access, without having to engage in piracy.

One of the earliest speakers at the conference, Dwight McInvaill, a local librarian for North Carolina’s Georgetown County Library, spoke of how important it is to digitize works for the good of the public. His own library’s digital collection gets over 2 million hits a month. "Small libraries serve 64.7 million people," he said, many of those in poverty. "We must engage forcefully in the bright American Digital Renaissance," McInvaill proclaimed. Either that, or be left in the physical book dark ages.