New digs for old treasures

Take a look at a slide show of the new Library of Congress Center for Film and Sound Storage in Virginia by clicking here.

Reading the grooves

Energy lab uses imagery and software to reconstruct battered and, in some cases, broken old vinyl and wax recordings

Those who remember vinyl records might recall how fragile they could be. Once a record is broken ' or damaged through oxidation, excessive scratches, warping or other factors ' it generally is impossible to play again. And for the archivists at the Library of Congress, a damaged record has often meant a recording lost to history.

Carl Haber, a scientist at the Energy Department's Lawrence Berkeley National Laboratory, has developed a way to capture the sounds on these damaged records ' without playing the record at all. His technique involves taking high-resolution images of the grooves on the record, and then reconstructing the sound from scratch. Haber spoke at a recent presentation at the Library of Congress' James Madison Building in Washington.

Haber calls his process IRENE (which stands for Image, Reconstruct, Erase Noise, etc.). The library is currently testing IRENE units to recapture sound from old vinyl records and wax cylinders. Haber himself is developing a new model that can image records in three dimensions, rather than just the two the current iteration of IRENE executes.

IRENE grew out of Haber's physics work at Lawrence Berkley, where he investigates methods of optical measurement, pattern recognition and image processing.

According to the Web site How Stuff Works, sounds were recorded by etching onto vinyl the vibrations picked up by a microphone's diaphragm. As a record player's needle moves over the etches laid onto the record or cylinder in a single circular groove, it reproduces the vibrations, hence reproducing the sound. 'If you unwound [a groove on a single disk] it would be over 100 meters long, but the sound is encoded in structures which are a millionth of a meter in scale,' Haber said. 'So there is an incredible disparity between the size of these things and the detail of information on the surfaces.'

IRENE works by taking a high-resolution picture of the disk ' or rather, it takes many photos, each of a very tiny slice of the disk. When the entire disk has been photographed, the software stitches together all the images and reads the grooves to reconstruct the sound, saving it as a digital file. With digital noise-reduction software, scratches and other noises caused by the damage on the record can be removed.

Generally, each micron of disk surface requires about a pixel's worth of space on the image. When completed, the image of the entire disk can take anywhere from 4 to 8 gigabytes, although the resulting digital audio files come to about 300 M or so. Digitizing a record takes only a little longer than it would take to play the record, Haber said.

IRENE's Web site (http://irene.lbl.gov/ ) offers various samples of the sound quality that can be achieved through optical reconstruction. One recording, for instance, is Leadbelly's 'Goodnight Irene' (after which the project was named).

The version recorded directly from a record suffers from many crackles and pops that become more pronounced as the vocalists sing more softly. The optically reconstructed version has none of the crackles or pops but is marred by a bad hiss. However, another version demonstrates this recording after digital noise reduction has been added, which removes most of the hiss.

Finally, the site offers a clip of the song taken from a CD that used a remastered version of the original tape. In that version, the individual instruments and voices can be heard with the greatest clarity. Haber stressed that the optical approach is most useful in cases where playback of the original disks would be impossible or very costly.

Making Tracks: IRENE extracts the sounds from old disks by taking high-resolution images of the grooves.

Since the 1990s, just about every division of the Library of Congress has been running out of space, and the Motion Picture, Broadcasting and Recorded Sound (MPBRS) Division has been perhaps the hardest hit.

For more than 100 years, the library has collected moving pictures and audio recordings, amassing a trove of more than 4 million movies, videos, recorded broadcasts and sound recordings 'on every imaginable format going back to 1890,' said LOC's Gregory Lukow.

The division had 6 million items stored in seven locations in four states and the District of Columbia. When someone requested an item at the division's reading room in Washington, it might have to be shipped from as far away as Dayton, Ohio.

Since joining the library in 2001, Lukow, now chief of the division, has had the unique opportunity to help design a state-of-the-art center for the storage and preservation of this collection, enabling the transfer of millions of items to trillions of digital bytes to ensure their availability to future generations.

The division is moving this summer into the National Audio Visual Conservation Center (NAVCC), in Culpeper, Va., a gift from the Packard Humanities Institute.

The vaults of an old Federal Reserve Bank facility have been remodeled to provide 140,000 square feet of storage space for irreplaceable materials, and 300,000 square feet of new construction has added conservation labs with automated equipment to digitize old recordings, petabytes of storage and a high-speed link to servers feeding content to the library's reading rooms on Capitol Hill.

Because of the volume of data involved, LOC's effort weighs in as one of the largest government agency information technology projects.

Built for the future

The storage and transport systems have to be scalable over time as well as in size, said Alan Bechara, president of Government Micro Resources, a division of PC Mall Gov. These complex systems are not something that can be upgraded over the usual three- to five-year IT life cycle.

'You have to be looking at a 10-year life cycle for something like this,' he said.The center and its resources will give a new lease on life to the division's program of converting archival holdings from analog to digital formats, Lukow said.

Holdings on magnetic media include audio- and videotape in every format, from open reels of various widths to cassettes of every kind. There also are recordings on disks and cylinders in addition to movie film, including 140,000 reels of nitrate film, used for most theatrical movies before 1951. Some of the nitrate film has deteriorated and become flammable, presenting unique challenges for preservation, and some has remained in remarkably good shape for more than 100 years, Lukow said.

Shelf life

'Just because it's old doesn't mean its bad,' he said. The condition depends on how it was originally developed and how it has been stored since. 'The inner clock is entirely different' in each reel. But magnetic formats have a shorter shelf life and 'are degrading every second.'

Not everything is being converted at once. 'It becomes appropriate when it becomes clear that the original recording is deteriorating to the degree that it is nearing the end of its useful life,' Lukow said. 'For magnetic media, that means almost everything.' For film, paper prints and records, the rate of deterioration varies from item to item, he added.

Saving these sights and sounds recorded on tape is a high priority for the division, but it is not easy. Finding a format that will have a longer, more stable life than the original material is a major challenge.

'We made the transition to all-digital preservation for sound recording about three years ago,' Lukow said.

The division uses a WAV file format to preserve sound rather than a physical medium such as compact disk because the files can be stored without physical degradation and retrieval of digital files would not be as hampered by obsolete playback equipment. But until recently, there was no similarly acceptable digital format for preserving video.

'Five or six years ago, we pressed the pause button on our videotape preservation program,' because it was a losing game to keep going from one short-term tape media to another, Lukow said.

Division officials recently settled on the Motion JPEG2000 format for video preservation and have helped develop a robotic system that will automatically transfer videotapes to digital files aaround the clock. But they had to wait to put the system into operation until the division had an up-to-date facility in which to work. That's where the Packard Humanities Institute came in.

Adaptive reuse

The Packard Humanities Institute found a decommissioned facility of the Federal Reserve Bank of Richmond near Culpeper that looked like it might lend itself nicely to the division's needs. It originally consisted of a three-story building with vaults to hold stores of cash that could be used to restart the economy east of the Mississippi River in case of an attack on the country, in addition to a records center and a secure retreat for the bank.

Congress approved acquisition of the 45-acre property through a PHI grant in 1997. The institute provided $155 million for design and construction of a new facility and $80 million start-up operational funding. Remodeling of the original bunker and vault building and construction of a new physical plant and state-of-the-art conservation building began in 2003. The MPBRS division has been moving in, a bit at a time, during the past year.

'All the major construction is done,' Lukow said. 'We're sort of homesteading now,' with equipment and collections being moved in as facilities are readied.

The library is expected to take formal ownership of what is called the Packard Campus of the NAVCC this summer. Lukow said it will take about a year for the center to ramp up to full production in its preservation efforts.

The former Federal Reserve vaults on which the conservation center is built were intended to hold $3 billion in emergency cash, but its new contents will be worth even more than that, Lukow said.

'It's priceless,' he said of the collection of movies, videos and recordings. 'How could you put a price on America's creative heritage?'

Preservation issues

The conservation building contains a large wet lab for film-to-film preservation, and selective reformatting by scanning some film to digital files will be done.

The wholesale video reformatting effort will rely on the System for the Automated Migration of Media Archives developed by Media Matters, a consulting firm specializing in older audiovisual materials. The company focuses on the archival challenges of magnetic media and developed SAMMA in collaboration with the Library of Congress and other international organizations.

SAMMA is a robotic digital preservation system that reformats the analog content as digital files. It supports a variety of digital formats in addition to MJPEG2000 and captures metadata about the condition of the original material in a Material eXchange Format wrapper. The MJPEG2000 format was chosen for the archival video files because of its compression.

'All the other formats were lossy,' Lukow said, referring to a method of compression in which some information, such as redundant information in images, is eliminated. 'For archival preservation we didn't want that.'

The first SAMMA machine was tested at the division's original digs in the library's Madison Building on Capitol Hill, and it has been moved to the Packard Campus. A second machine is on order, and the division expects to eventually have four of them.

Much of the MPBRS audiovisual collection is being digitized, but because of copyright restrictions, don't expect the material to be available on the Internet anytime soon, Lukow said.

'It's all on library premises,' he said. 'This is not an online program. We'd love to put content online,' but although copyright laws allow the library to archive and make content available on site, they prohibit distribution on the Internet without permission.

Remote viewing

Once something has been digitized, high-quality archival files will stay with the NAVCC, and a derivative file in MPEG2 format will be created for viewing. This broadcast-quality file will be stored in servers near the division's reading room in the Madison Building, where files can be accessed by researchers. Level 3 Communications will move those files from Culpeper to Washington.

'What we've got is the latest technology in fiber optics that can support up to 80 wavelengths of 10 gigabit capacity each,' said Jerry Hogge, senior vice president of government markets at Level 3, which provides the link between Culpeper and Washington. 'We provide the connectivity for the center to our backbone and to another interconnection point with them closer to Washington.'

To store all this data, the division selected a multitiered system from GMRI that includes Sun Fire x64 high-speed and high-capacity servers, and storage running the Sun Solaris 10 operating system. The Sun solution was chosen in part because of the company's long-term commitment to storage and its road map for advancing its technology, Bechara said.

This multipetabyte storage system could be the largest ever built for an archival-library facility. It definitely is among the top five projects in terms of size and complexity, Bechara said.

'From our perspective, it's the largest archival system we have worked on,' Bechara said.

Undisclosed location

The system is capable of taking in data at the rate of 2 gigabits/sec and, in full operation, is expected to store about 8 terabytes of new material annually. The system is built for quick disaster recovery, and data will be backed up on tape with a mirror backup at an undisclosed second location. Every 18 months, backup files will be transferred to new, upgraded tapes.

'The library was very methodical in working with us to assess their needs,' which is not common in a customer, Bechara said.

Disk storage can provide more capacity and faster access than tape, but the library chose the tape system for backup because of power, space, cooling and financial considerations, Bechara said. 'You can buy a lot of tapes for the cost of a hard disk.'

The system has been tested and now is mostly up and running, but it was not simple to create.

'It was an engineering feat,' Bechara said. 'You're looking at a lot of raw computing power to run your storage system. If you are technical, this is the kind of job you'd want to work on without getting paid.'

The system contains multiple subsystems and components that have to be benchmarked and tested to assure that they will work together.

'As much as we may think this is plug-and-play, it never is,' Bechara said.