Digitizing the past and present at the Library of Congress

The world's largest library uses the latest technology to study and scan ancient books, maps and other historical artifacts. Rob Beschizza takes a look into an incredible archive made possible by technology.

The Library of Congress has nearly 150 million items in its collection, including at least 21 million books, 5 million maps, 12.5 million photos and 100,000 posters. The largest library in the world, it pioneers both preservation of the oldest artifacts and digitization of the most recent--so that all of it remains available to future generations.

I recently took a tour of two LoC departments that exemplify this mission: the Preservation Research and Testing Division in Washington, D.C., and the National Audio-Visual Conservation Center in Culpeper, Va.

The Library not only contains collections of technology as well as media, but it's maintained in good condition to ensure that long-obsolete formats can be examined.

The library's preservation specialists use the latest technology to study and scan ancient books, maps and other historical artifacts. One process, called scanning electron microscopy, allows them to create elemental maps of manuscripts, identifying the chemical nature of inks and pigments, or the paper itself. Imperceptible changes made by artists appear plain as day when viewed using x-rays.

X-rays, however, aren't easy to work around. One new technique, hyperspectral imaging, offers similarly revelatory results in the darkroom: ultra-high resolution scans of documents, imaged under sharply restricted wavelengths of light, show details denied to the naked eye. Viewed at sharp angles, old documents even reveal data about the woodblocks used to impress them onto the page.

It's not all about moldy maps and tomes, either: thanks to the poor quality of consumer media, techniques are already being developed to recover information from damaged examples. Researchers already understand, for example, why using sticky labels increases the likelihood of failure in CDs and DVDs. (LightScribe etching has no apparent negative effects). So when the work of today's unheralded geniuses end up as priceless, rotting museum pieces, the preservers will be ready.

Paper items rot, but the preservation challenge they present is nothing compared to 1970s-era magnetic tape

An ancient book presents the typical problem for archivists: how to better understand something that may be destroyed simply by the act of examining it? Researchers have adopted policies which forbid sacrificing part of an item in the hope of learning more about it.

"We can't afford any damage to anything," said Eric Hansen, chief of the Preservation Research and Testing Division. "Never take a sample; be completely nondestructive. ... We know there will be advances in technology and that current techniques will become outmoded."

The LoC's Jennifer Wade demonstrates a high-tech scanner

"We can map the elements, the chemical components," said Jennifer Wade, scanning a centuries-old but well-preserved copy of Platina's The Lives of the Popes. "We can simulate changes in heat, cold, and humidity. [But] all we do is provide information about treatment. Others make the restoration decisions."

Fenella France, a research chemist with the Preservation Research and Testing Division, uses a 39 megapixel camera to take high-resolution images of documents ranging from renaissance-era maps to American state papers.

"We don't filter at the camera, we illuminate with small wavelengths," Fenella said. "We're creating a reference set of samples. We can't take samples of the documents themselves--it's just not going to happen"

This technique creates a set of images like a 'stack of cards,' all identically framed but revealing a different spectral face of the subject.

On the plan for the city of Washington designed in 1791 by Pierre L'Enfant, a hidden street plan emerges under IR light. A design for a circle emerges on 16th and K.

"It's incredible, it's humbling. It might be 6 p.m. and I'll be exhausted but I think, 'I can't complain--I'm working with the Gettysburg Address!'"

The Gettysburg Address exists on her computer as 8 different documents, each representing a different waveband in the visible spectrum. But only some show the mysterious fingerprint residue that may be Lincoln's own.

"In the next 5-10 years, I wouldn't be surprised if they could pull residual genetic information from the documents. [This is why] one of our foci is making sure that we don't interfere with future research."

One machine used to examine the book is an x-ray fluorescence spectrometer. "The clasp's corroding, degrading, so we're trying to figure out exactly what the corrosion material is," said Wade. "What is it caused by? What could stop it? Interpretation is important."

Among the finds: tracings of an earlier document on a Marco Polo map that dates to 1480. Lost text, revealing the cartographer, on 1516's Carta Marina. James Madison's debate papers, it turns out, contain hidden revisions.

"If it's fragile, even researchers have trouble with it," France said."I want to make it acessible."

Hansen stands by a collection of badly-damaged audio recordings that may yet be recoverable using new technology: "You can learn about a culture from how it builds and stores things."

A visitor stands before the Waldseemüller world map.

Fenella France stands beside the unique, 400-liter environmental chamber used to publicly house the map. Hurricane-proof glass and a high-tech aluminum enclosure ensure that it is kept at the perfect temperature and humidity; tests had to be performed to ensure the weight would not pose a structural problem for the Library.

"We pretty much know that the Vinland Map contains titanium dioxide in a form that didn't exist until modern times." - Eric Hansen

Printed by Martin Waldseemüller in 1507, the Universalis Cosmographia was the first world map to use the name "America" to identify the new world. The only copy of it is at the Library of Congress.

Far fom the bustle and majesty of Capitol Hill, a former nuclear bunker has become home to an unprecedented effort to catalog the nation's creative works. And while the media is more recent than that dealt with in D.C's basement labs, plenty of technical challenges remain.

The National Audio Visual Conservation Center, near Culpeper, Virginia, once contained billions in cash, squirrelled away to kickstart the economy in the event of an atomic apocalypse. Beautifully renovated, it now has 175,000 square feet of offices and laboratories, 135,000 square feet of collections storage, and 55,000 square feet dedicated to storing dangerous nitrate film in optimal conditions. There are more than a million films, television shows, DVDs and games already in its collection.

And it grows, day in, day out. Delivered to loading docks, thousands of items make their way through processing areas until finding a permanent home in the vaults.

Gregory Lukow, chief of the motion picture, broadcasting and recorded sound division at the campus, said that it was staffed by about 100 techs, engineers and other workers. Many items are digitized to ensure their preservation, and to allow researchers to view them remotely in D.C reading rooms. They also host public screenings of classic movies at the in-house cinema.

As the copyright office did not register celluloid prints until 1912, early movie makers created prints of the entire reel on opaque photographic paper. "It's an iconic image in America cinema, that cowboy shooting his gun at the camera, at the audience, at the end of the Great Train Robbery," said Gregory Lukow. "The quality of prints recovered from the paper is shockingly good."

Most of the collections arrive via the copyright registration process. Though works receive copyright protection at the moment of creation, registration provides more legal options in court disputes, ensuring what Lukow called "a tidal wave of material" for the campus to process. But a lot of the material is old -- and not all of it is in good nick.

"The late 1970s is one of the worst times for video longevity," Lukow said. "Magnetic tape is our largest preservation problem."

Gregory Lukow of the Library of Congress shows off the intake bins at their audiovisual campus in Culpeper, VA., packed with the cultural output of a nation. Millions of items are added every year to LoC collections. Highly sensitive items, such as digital prints of movies playing in theaters, often arrive under assumed titles to reduce the likelihood of interception.

The distinctive round-rect casing of RCA Selectavision disks was briefly commonplace in the U.S. Now, the analog video format is a rarity.

There is an entire room at the campus dedicated to rewinding things. Almost every room, however, has cutting facilities of one kind or another.

Into the Film Storage Vaults: maintained at 39° at 30 percent relative humidity, nitrate film is divided into 124 individually fireproofed chambers, each able to hold about 1,000 cans. Each is designed so that even if a particular reel goes up in flames, it can only damage those in the same insulated cubbyhole. Total capacity: 145,056 cans. Films removed from the vaults must first go through an acclimation chamber before being exposed to normal temperatures and humidity.

The Tony Schwartz collection has an astounding number of field recordings of commercials and other publicly-broadcast media. Passed to the Library after Schwartz's death in 2008, the archive currently fills several large walls. "It's immense," said the Library's Matt Barton. "Thousands of reels of tape, film, video. And I don't know how much correspondence." Schwartz is famous to many as the creator of the Daisy Cutter campaign ad.

Gregory Lukow describes RCA Selectavision, a video format so homely it is denied even the ironic contemporary cachet enjoyed by LaserDisc and 8-track.

Matt Barton of the Library of Congress's National Audio Visual Conservation Center.

Not everything that the Library of Congress uses to examine its collections is high-tech.

Gregory Lukow explains the workings of one of the Library's basic tools: a flatbed film viewer designed to let staff play fragile films without the use of projectors and potentially damaging bulbs.

IRENE--image, reconstruct, erase noise, etcetera--is a system that creates a high-resolution digital map of a record's surface without touching it. Recordings on warped and damaged vinyl can be recovered and restored, then played back by a computer program that emulates the movements of a stylus passing over the modeled grooves. Some records, however, are too badly damaged even for IRENE.

Banks of reel-to-reel tape machines stand in one of the conservation center's digitization rooms. Nearby, a robot-operated VCR works through dozens of tapes automatically.

Scott Rife, senior system administrator, explains the library's digital storage system in this video clip: a tape library with 37,500 slots, each able to store 1TB of data. "That's 37 petabytes. As far as we know, this is the largest digital preservation operation in the world." Even so, they remain committed to preserving film as film: "We wouldn't preserve 35mm as digital right now."

James Snyder, senior systems administor, explains the challenges involved in capturing hundreds of channels of archivable broadcast material. When completed, the Packard Campus's "Live Capture" room will grab 120 video streams from satellite and FM television, 90 DirectTV channels and 20 DISH Network channels. 72 Mac Minis will capture the output of 42 internet radio stations, 10 FM radio stations, and much of what's played on the XM/Sirius satellite radio service. Each machine is able to capture two sources at once: if an individual capture station fails, another picks up the load. Playlists, as cultural snapshots, are themselves important artifacts

A small museum is set aside at the campus for the most beautiful film and broadcasting equipment in its stores. But it's not just for show: old media often needs old equipment to play it. The LoC has little interest in DRM, due to the inherent likelihood that decryption methods will fail or fade away as time passes. "We don't wan't to have to hack anything," Lukow said.

Welcome to the Critical Listening Room. James Smetanick describes the work of an audio engineer tasked with preserving sound recordings. The environment is perfect: non-parallel walls and deeply-pocked paneling kill standing waves and reflections. A custom-made Simon Yorke turntable is good enough for government work: maple knobs not required. "I can't complain about coming in each day," Smetanick said.

Michael Hinton, a staffer at the Library of Congress' NAVCS, works in a spartan room housing an enormous film-processing machine.

The Packard campus contains a huge variety of old and obsolete machines used to view, cut or otherwise manipulate media. It's not just for show, either: obscure formats will become unreadable if the vintage tech used to play them isn't maintained.

Oh my stars. This is both delightful and mind numbing. As the son of a mother who has saved the pennies we swallowed when we were kids and has snow in her freezer from the Great Chicago Snowstorm of 1967 I can truly appreciate the compulsion to save.

Geo #28 — No, that system, the TRIS was modified to copy paper prints to 35mm motion picture stock before the Kinetta Scanner was built. When the Kinetta arrived, it was decommissioned and the remaining bits were put in storage. Who knows if it survives to this day?

To Anon, who found it interesting that “the LoC decides to leave some content in analog form,” at least in the Preservation Directorate, we believe that digitization isn’t preservation, it just provides access to a wider audience. We still, as often as we can, want to preserve the original, not only for its inherent value but also because we learn so much about how it was made, what it was made from, who made it, who added to it, who touched it or used it, etc. from the original materials… things we could never learn from a digital image of the object.

Also, you ask about the cases: more info on the case (especially for the Waldsemueller Map) can be found here:

Twenty five years ago most broadcast stories were shot on 20 min. 3/4 inch U-Matic cassette tapes. Edited to thirty min. U-Matic tapes, the stories were “archived” usually on 60 min. U-Matic tapes. I suggested to our engineer around 1985 we might want to buy a new U-Matic deck, leave it in the box, place in our climate controlled room and save it for future conversion to a new format. He laughed. By the time DigiBeta came along, there was nearly 20 years of U-Matic raw stock, edited stories and complete programs. Only two U-Matic decks worked. With staff reduced, eventually nearly 8000 boxes were pitched in a Dumpster.
– Retired SC broadcaster

Well done! Great to see someone getting their hands dirty instead of linking to an existing story (not to say that isn’t appreciated too). Makes me want to get off my ass and digitize my grandfather’s old wire recordings.

Optical illusion: go to the nitrile film vaults video, pause it in the second half and then scroll the page up and down.

On a different note, it’s interesting that the LoC decides to leave some content in analog form. I always thought archival was simply restore, scan, store. But as can be seen in the article, a lot of effort goes into preservation of the originals.

Live-capturing video and audio will probably take up a lot of space compared to just noting down the name of every program and song coming in. With just 10 FM radio stations I think some things might be missed.

Of course MOST brittle books don’t receive that sort of super treatment. There are just WAY too many of them. Most brittle books have the spine guilotined, then they’re scanned, page by page. And of course there was an early attempt at using DEZ for mass deacidifacation which was a HUGE* failure. The ginormous quantity of material that LC has which is slowly disintigrating means that only the gems of the collection can possibly hope to get this sort of loving care, it’s difficult to just scan this amount of material.

Jennifer, LoC’s view on digitization is refreshing in an age of “everything digital=better.” There is no substitute for the real thing. Thanks for sharing :}. And what would future generations learn if only the copy existed? There is so much to learn from a primary source.

I work at the Capitol Theatre, in Rome, NY, and I know that we continually benefit from the terrific early film preservation work of the LoC for our summer film festival, Capitolfest. We operate as a revival house for silent and sound film, and some of the films we show haven’t been seen since their original run, often 80 or more years ago; it’s so wonderful to know that you are watching something truly special and unique on the screen. “Thank you” can’t even begin to cover how grateful we are to James Cozart and all the folks there who help make what we do possible!

LoC does a terrific job preserving our history and culture. Our hats are off to all of you. Kudos to Rob on the essay, too :} It’s first-rate and fascinating.

This is incredible. I knew the Library of Congress archived a lot, but I never knew it was so hightech. Hats off to these guys who work tirelessly to ensure future generations can view these documents, recordings and other historically significant artifacts

Anon #24 made a very good reference about distinguishing media archival formats and public viewing is a very good one. But I still stand by my statement LoC sold us out. Silverlight is very Windows centric, leaving out pre-Intel Mac users and for the most part all Linux users. (Moonlight is always at least one version behind, with questionable validity and legality issues always tied to it.) Yes, they are in the minority, but that’s like saying it was OK for SAT tests to be biased toward middle- upper-class whites. Also, each successive version of Silverlight adds more features solely in the Windows platform. It’s an exclusive, proprietary format that should not be endorsed by a public institution. There are other alternatives that are much more open with much better long term viability. For those of you with short memories, Microsoft was not always the top dog with its government sanctioned monopoly status, and there is no indication things will stay that way. Vendor lock-in is a very troubling situation and the sooner Microsoft’s market share numbers sink down to the 70% mark or so the better off we will all be in the long run. Technologies we are experiencing now are ALL tied to things developed 20 years ago and big corporations do everything within their power to stifle competition, especially small business because they’re an easier target. When our market place returns to diversity, real competition, and advanced R & D, then we’ll be seeing small and large business innovation again.
Sorry for the rant but what LoC did was just another example of short term solutions for long term issues.

It is wonderful of Library of Congress to share its national treasures and the tools used to preserve them. I am glad technology is more than just making more content but also used to reviving older generation contents as well. SLV conservation team also does its best to preserve OZ materials too. A toast to human history.

Amazing, amazing essay Rob. I’m especially curious about the system that holds that map. Can you direct me further?

I’ve entertained the idea myself about making complete vacuum enclosures out of industrial sapphire blocks for some of my rare books, but it’s partly a fantasy. How exotic an enclosure could be made, and how it would work, to preserve something inside infinitely.

Do the Library of Congress have a book or something published about all their preservation tech? I would love to read about this kind of stuff, especially ancient map and book preservation.

A couple of years ago the Library of Congress sold out the American public and made a contract with Microsoft — video archiving will be through the time untested, proprietary Silverlight. Preserving our heritage and history takes a back seat to lobbyist power.

The LoC is using MXF wrapped Motion JPEG2000 as the codec for its video PRESERVATION files.

There’s a big difference between Preservation quality files and ones used for ACCESS.

As the story he links to says, Microsoft Silverlight is being used on the library’s website for public access.

“Microsoft has worked out an agreement with the Library of Congress to deploy Microsoft Silverlight on the library’s new Web site. In return, Microsoft will provide an initial grant worth a total of $3 million in technology, services as well as funding.

The grant is to be used to enhance the online accessibility and interactivity of about 800 of the Library of Congressâ€™ prominent holdings. In addition, the deal also entails bringing in kiosks running on Microsoft Vista that highlight featured documents at the library.”

The Silverlight files will be much smaller and more suited to web access and on-site kiosks compared to the much larger MXF/MJPEG2000 files that retain the quality of the content, but need high-horsepower computers to capture, edit and play.

I hope folks have a clearer understanding of the differences between high quality, lossless archival Preservation files and ones suitable for Access on the web.

I think you have a misunderstanding of this deal – digital library projects typically create a digital master in a non- or losslessly- compressed, non-proprietary format, and derives an access copy (the Silverlight version) from that for public use. The MS-LoC deal is for this later stage access; when/if Silverlight becomes obsolete, LoC has a digital master from which to derive a new access version.

When I taught XML I used to joke about archives storing older documents – and not only having to store the machinery that could read them – but also some old guy who knew how to keep that machine running. I told this until my class happened to have a couple of archivists in the class who grumpingly told me they didn’t do that – they only store the ascii version.

I came to appreciate the process of restoration when I filmed in the labs of the Canadian Conservation Institute who restored the copy book of Daniel Daverne (1816) to pristine condition. This copy book – Daniel’s Journal – is the subject of my documentary “Daniel’s Journal – History Rewritten” See http://danieldaverne.com for ongoing production notes.

In the 90s I worked for Image Premastering making still image Laserdiscs (I still have a pile of ‘em). We had a big optical bench with large collimating lenses a couple of feet in diameter for transferring film to 1″ video and apparently the only other one is at the LoC. That thing still around?

Halyna Bryn, who had a reverence for the work and more patience than anyone I’ve met, worked a bit on that system for you. She now makes Ukranian Easter eggs.