SPIEGEL ONLINE

02/19/2010 05:35 PM

Competition for Google

A German Library for the 21st Century

The German Digital Library wants to make millions of books, films, images and audio recordings accessible online. More than 30,000 libraries, museums and archives are expected to contribute their digitized cultural artifacts. The idea, in part, is to compete with Google Books. But will it work?

On a good day this reader gets through as many as 1,216 pages per hour. Hissing quietly, devouring book after book. Now and then it says, "Pffft."

This is a state-of-the-art robot at work. It automatically scans every book placed open in front of it. A slender wedge drops down to the fold, sucks in a page from left and right and lifts the goods. It's photographed and with a gentle puff of air -- pffft -- the robot flips the page.

So it goes, day after day, at the Munich Digitization Center of the Bavarian State Library. Some 45,000 works have been scanned -- from the "Nibelungenlied" on parchment to an original score from the hand of Gustav Mahler.

Admittedly, treasures from the early years of book culture are generally scanned by hand. The robot surrenders when faced with fragile tomes, which can weigh up to half a hundredweight with leather bindings or wooden covers.

Eventually a new Internet portal will benefit from the riches of these Munich databases. The German Digital Library (Deutsche Digitale Bibliothek, or DDB) will become an online center for millions of books, magazines, photographs and films. Libraries, museums and archives all over the country are expected to contribute digitized cultural artifacts.

A Chamber of Wonders

But it will take time. The first trial version may go online in 2011 -- "and that will only be for a restricted group of users," says Ute Schwens, a director of the German National Library in Frankfurt, which is coordinating the DDB.

Germany's Culture Minister Bernd Neumann (CDU) calls the long-term vision a "project of the century." The initiators promise a virtual chamber of wonders, as suitable for lay people as it will be for researchers hunting specific sources and scientific documents. Type in "Beethoven" and you will find not only books about the composer, but -- eventually -- handwritten sheet music, music samples and perhaps a movie version of "Fidelio."

Germany's federal cabinet gave the green light in early December. The goal is to integrate the DDB with
Europeana, the European portal launched in 2008 with similar ambitions.

This European industriousness has been spurred, primarily, by Google, which has digitized more than 10 million books around the world already. There were warnings about a private corporation gaining a cultural monopoly. Europeana and the DDB promise to respect the copyrights that Google has so far only reluctantly observed. Jean-Noël Jeanneney, former president of the French National Library, spoke of an "anti-capitalist model to counter Google's power play."

Digitization is also a measure against the vulnerability of the book as a medium. A 2004 fire at the Anna Amalia Library in Weimar destroyed 50,000 volumes, some of them irreplaceable. Digital backup copies could limit such losses in future.

Even as the experts in Germany embark on the preliminary work, it's apparent that the venture faces fearsome challenges. The technical goals alone seem ambitious. The Fraunhofer Institute in Sankt Augustin, near Bonn, is responsible for the DDB's computer technology. It's developing programs to recognize people in films, convert speech recordings to searchable text, and automatically index documents.

Most audacious is the proposed sweep of the new portal. More than 30,000 museums, archives and scientific collections across Germany are supposed to hook up. The DDB's creators will be pleased, for now, with a hundred participants, but prestigious institutions such as the Hamburger Kunsthalle or the Städel Museum in Frankfurt are not even on the list yet.

Overambitious, and Underfunded

Rolf Griebel, Director General of the Bavarian State Library, who regards the project as "good and overdue," nevertheless warns against over-ambitious plans. "I have real doubts about whether the DDB can be filled with content properly and within a reasonable timeframe," he says.

Griebel estimates that scanning a book from the 16th or 17th century costs between 70 and 140, depending on the amount of work. Contemporary titles are cheaper, but the quantities involved are enormous. The German Library Association is proposing to digitize around 5.5 million volumes in the first 10 years. That would cost at least 165 million. But where is the money supposed to come from?

Germans are gazing enviously at France, where President Nicolas Sarkozy recently promised to raise 750 million to pay for the digitization of France's national culture.

The German project, by contrast, may have obvious cultural gaps for years to come. Will the user be content to chance upon occasional prize discoveries? Wouldn't it be preferable to do less, but do it all right? "To start with it would certainly makes sense to limit this to selected areas and themes," says Griebel.

But the DDB's planners won't settle for that. Every cultural activity, every science, every type of document is fair game -- preferably from every German museum and library.

And the search technology will be more sophisticated than just looking up terms, as offered by Google. The DDB collections (under the current plan) will be indexed according to a range of criteria -- place, time, subject area. Such an index can only work if the objects are described in detail.

In this effort the DDB has the benefit of some basic technology from the German government-funded Theseus program. Researchers at Theseus have been working since 2007 on methods of indexing images, films, audio recordings and books. If the computer has a rudimentary understanding of what's going on, it can fill out several fields automatically -- indispensable for the vast quantities of documents the DDB will have to contend with.

The researchers are reporting initial progress in recognition of elements in films and photographs. "Faces are still hard, but it's going well with trees, cars and buildings," says Thomas Niessen, head of Theseus. The computer is also having some success with converting spoken word into searchable text -- it even attempts to distill the relevant people, places and events.

Complete With German Rail Tickets

Meanwhile, a debate is underway about the bigger picture. How exactly should the DDB serve both lay people and researchers? And what should the ideal portal look like?

Reinhard Altenhöner of the German National Library thinks users might be able to post their own contributions. "If a city archive provides material about the history of a street," he says, "the residents could enrich that with their own stories and photos."

Museums could place links in search results to relevant current exhibitions. A small demonstration on the screen illustrates how that might work. "And here," says Altenhöner, clicking another link, "here you can even buy the Deutsche Bahn ticket on the spot."

Such subtle extras can't be found on Google. The search-engine firm prefers projects that can be explained in one sentence. In the case of digitization, the goal is simple: every book in the world in a user-friendly presentation. Beyond that, the best indexing technology is of little benefit.

The consequences of ignoring this axiom are illustrated by the Europeana site. Following a long stagnation, the collection is due to grow to 10 million cultural artifacts by mid-2010. "That makes us a global leader," says Berlin information scientist Stefan Gradmann, a member of Europeana's executive committee.

Some exhibits are accessible now on the site, on a trial basis, using intelligent search. Type, say, "Paris" and Europeana also returns Montmartre and the Tuileries; sources appear relating to Paris, the prince from Greek mythology. The search engine is likewise familiar with his fateful deed, the "theft of Helen." It finds documents, in other words, that do not contain the search term. But browsing in Europeana is just not very pleasurable. The results are displayed in thumbnail images the size of postage stamps. And if you click through for a closer look, you're taken to the corresponding institute. Soon you're wandering helplessly around a dozen different museum and library Web sites -- and you end up lost somewhere between the "Vlaamse Kunstcollectie" and the "Wielkopolska Biblioteka Cyfrowa."

Would it not be preferable to incorporate all the exhibits within the familiar scope of Europeana? "We would have preferred that," says Gradmann. "But then the museums would not have participated." They insist on presenting their own treasures.

Digital Libraries, Babylonian-Style

If the DDB yields to the vanity of the participating institutes, the result will be a Babylonian structure with 30,000 annexes. Would anyone have the patience to browse an index consisting of 30,000 idiosyncratic Web sites?

The promise to strictly observe copyright also poses problems. The only works the DDB may scan freely are those by authors who have been dead for at least 70 years. For newer documents whose authors cannot be contacted, a settlement will be worked out with the relevant copyright collectives.

DDB coordinator Schwens wants to incorporate contemporary material, too. "It would be a shame," she says, "if current scientific knowledge could not be found via the DDB." Negotiations with publishers are already in progress. Ideally, says Schwens, there would be a "one-stop-shop," where the user can electronically buy or borrow the work that interests them.

Online shop Libreka, operated by the German Booksellers' Association, would be available for book sales. However, Libreka has a dubious reputation: Many of its electronic books have convoluted copy protection. The publishers want it that way. And many of their bestselling books tend not to appear, for fear of digital piracy.

The German digitization project is threatened from two sides: There isn't enough money for the scanning of older works, while access to new works -- which may exist in digital form already -- is liable to be blocked by anxious publishers.

So would it be better to let Google take over the whole thing? By mid-2010 the American firm wants to start trading in electronic books. A half-million titles have been designated for the "Google Editions" project. Sixty-three percent of each sale will to the publisher, with Google keeping the rest.

The Bavarian State Library's experience with Google has been good so far. Since 2007 Google has digitized out-of-copyright books for Munich's cultural custodians -- around a million volumes so far. In tough negotiations, the library secured the right to have its own copy of each book, to present however it wishes. Open access to these treasures is thus guaranteed.

And the scanning is going like clockwork. Each week around 5,000 volumes leave the halls of the state library. A truck delivers them to a top-secret location in Bavaria, where Google's scanners work away unswervingly. At this rate, it will all be done and dusted in just four years.