Storing National Treasures

The Library of Congress is undertaking an ambitious effort to build a storage infrastructure for the world's largest collection of films, television and radio broadcasts and recorded sound.

SHARE

SHARE

Last month, the Library of Congress issued RFP LOC-NAVCC-06-01 Computer Server and Storage Infrastructure, asking server and storage vendors to bid on one of the world's largest and most ambitious audio-visual storage projects to date.

The task for the storage vendors: provide a storage area network (SAN)  including tape drives, robots and related hardware; host bus connections; Fibre Channel or other storage switches; RAID or MAID storage; and tape media  that will allow the library to safely, securely and easily store digitized copies of the Library of Congress's massive collection of films, television and radio broadcasts and other audio-visual materials.

It is a daunting task, one that only a handful of vendors have the necessary experience and products to address. Bids are due no later than March 15 with the contract (or contracts) to be awarded May 30. Enterprise Storage Forum will follow along as this landmark storage project unfolds.

https://o1.qnsr.com/log/p.gif?;n=203;c=204655439;s=10655;x=7936;f=201806121855330;u=j;z=TIMESTAMP;a=20400368;e=iA Brief History of the NAVCC

In 1997, Congress and President Clinton authorized the Library of Congress to acquire an approximately 140,00-square-foot former Federal Reserve Bank building near Culpeper, Va., to serve as a storage and preservation facility for the library's audio-visual collections. Soon thereafter, with a $10 million initial grant from the David and Lucile Packard Foundation, the National Audio-Visual Conservation Center (NAVCC) was born.

In the intervening years, the NAVCC has expanded to include three new structures, including an approximately 175,000-square-foot, state-of-the-art Conservation Building, which will house the new data/storage center, reformatting laboratories/digital preservation systems and their respective staffs. When completed, hopefully by January 2007, the NAVCC, at 420,000 square feet, will be both "the first centralized facility in America especially planned and designed for the cataloging, storage and preservation of the nation's heritage collections of moving image and recorded sound materials," as well as the largest facility of its kind.

For Gregory Lukow, chief of the Library's Motion Picture, Broadcasting and Recorded Sound Division (known as MBRS), it has been a long time coming. With such a large collection of records, audiotape, videotape and film, stored mostly in analog form in some seven facilities in four states and the District of Columbia, with more material being added every day, the MBRS division has been fighting an ongoing storage and workflow battle. It has taken years of careful planning and working with a variety of experts in film and audio production, broadcasting and server and storage technologies, as well as the generous support of outside donors, in this case the Packard Humanities Institute, which took over funding from the Packard Foundation, to make the NAVCC a reality.

To Boldly Go Where No San Has Gone Before

Fast forward to March 2004, when members of the MBRS division and the Library's Information Technology Services (ITS) group held a two-day meeting with technical consultants from different industries on what the general requirements for the NAVCC computer server and storage infrastructure should be.

"We had no pre-existing models to go by or use as a benchmark," explains Mike Handy of the Library's Information Technology Services (ITS) division. "So we started from scratch and developed our concept."

In the fall of 2004, the Library issued a Request for Comment (RFC), which they sent to the leading OEMs of large-scale server and storage components, with comments due by January 2005.

"We asked them to comment on the RFC and to propose a solution  what solution they would put together given their understanding of what we were looking for and their current product lines," recalls Handy. "We wanted to validate the architectural model, to put it to some scrutiny by the vendors, to see if we were missing something or neglecting some critical issue."

The upshot: The Library needed to re-state the problem and clarify the vendor requirements. So over the next few months, the ITS staff met with individual OEMs, asking them questions about their comments and how the library could improve its server and storage model. A few months later, the library team began drafting the Request for Proposal and developing benchmark tests.

The 61-page RFP finally went out last month. Responses are due no later than March 15. On May 30, the contract (or contracts, since the server and storage components do not necessarily have to be provided by the same vendor) will be awarded.

Delivery and installation  to the Library's Madison Building on Capitol Hill, not the NAVCC, which will still be under construction  is to take place in July, with an acceptance test performed in August. Once the NAVCC is ready, the ITS team will disassemble the SAN and then painstakingly reassemble it in Culpeper, just as it did with the Library's disaster recovery system, housed some 70 miles away from Capitol Hill.

A High Fibre Diet

Because of the project's scope, size and national importance, the SAN, as the primary conduit for transporting the digitized content within the NAVCC Preservation Archive architecture, must be able to handle huge amounts of data safely and securely with minimum downtime. The storage system must be both leading edge and use best of breed technologies, such as 4Gb Fibre Channel and potentially 10Gb InfiniBand, although the RFP states that costs must be kept low.

The Library also wants bidders to provide information on standards compliance as well as a technology roadmap, including information on upgrades, scalability and migration to new technologies.

Once selected, the winning SAN bidder will have to successfully demonstrate the following (per the RFP) during the acceptance test:

Streaming I/O of at least 750MB/sec is required for sustained write performance to RAID

Streaming I/O of at least 750 MB/sec is required for sustained read performance to RAID

Streaming I/O of at least 650 MB/sec is required for sustained write performance to RAID and file system

Streaming I/O of at least 650 MB/sec is required for sustained read performance to RAID and file system

Full duplex performance must exceed 1000 MB/sec for sustained full duplex with 50% read and 50% write to RAID and file system

RAID LUN rebuild time of no greater than five hours is required with any configuration provided with no I/O in the background using the same configuration as proposed to meet the streaming I/O requirements

Power of 2 LUN size (e.g. RAID 5 6+1, 7+1 is not acceptable)

A combination of storage types (RAID disk and tape) is assumed, but offerors are free to suggest different solutions

HBA/HCA failure on Preservation Archive storage must not degrade performance, therefore an N+1 failover is required, and N+2 is preferred for both tape and disk

Data must be able to stream from the RAID system to the tape system at a full data rate without degradation in performance at all times. That means that even during a write reconstruction the tape drives must be able to stream at their data rate. This is to prevent tape wind quality issues

Total amount of storage required in the staging area of the Preservation Archive is 100TB Disk-to-tape performance must exceed 150MB/sec 7x24x365 for both read and write and must not degrade the full duplex RAID performance requirement of 1000 MB/sec. This tape performance is the combined performance for both the primary and disaster recovery archives

Server must support 2GB/sec total throughput

The system must be able to support enough FC buffer credits to operate all of the tape drives at full rate at a distance of 200 KM

Loss of data due to the failure of power, servers, HBA/HCAs, switches, RAID controllers or tapes is considered unacceptable.

Interestingly, retrieval is not a key component of the SAN requirements. While the RFP notes that "processes will be designed to retrieve specific digital masters from the archive ... these will not be optimized for traditional retrieval goals such as ease of searching, retrieval speed, etc." This is in large part due to access and copyright restrictions for much of the content. The focus is first and foremost on long-term storage and preserving our nation's visual heritage for future generations.

Advertiser Disclosure: Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.