NARA lacks processing power

The National Archives doesn't have the computing power it will need to copy and store the millions of electronic files headed its way, Archives officials said.

Deputy archivist Lewis Bellardo brought up the problem confronting archivists during a speech to members of the Association for Federal Information Resources Managers in Washington.

When President Clinton's term expires, Bellardo told the group, the National Archives and Records Administration will receive more than 20 million files from the executive office.

"With the current tools we have at our disposal, if we started processing that documentation when Clinton leaves office," he said, "it would take us 10 years using our full capacity--doing nothing with the rest of the federal government--to preserve it once.

"Then, after 10 years, we have to preserve it again."

To handle the flood of individual documents--some as small as single-page e-mail files--NARA is considering using a Defense Advanced Research Projects Agency supercomputer to add the necessary kick to its Archival Preservation System (APS), Bellardo said.

The supercomputer is only one computer option Archives is considering.

Most modern computer systems handle fewer than 2 million files annually, he said, "which is not in the ballpark that we need to deal with."

APS converts electronic files into more secure formats. To do this, archivists must first deal with whether the media containing the documents--diskettes, tapes, compact disks--are in good condition, Thibodeau said.

Some have been subject to years of environmental stress and mishandling, he said.

"You don't know [a file's] life history, so the first step is to get it onto a medium that you can trust," Thibodeau said. "That's the core function of the APS."

The result is an archive of documents in a standard digital text format.

"It may be going from a square tape to a round disk, but ... they're all completely standardized," Thibodeau said.

Tapes were considered the only reliable storage mechanism when APS was designed in 1990. But that all changed when the courts ruled for the first time that electronic documents are records and must be handled as such.

"We were suddenly saddled with responsibility for 6,000 volumes of stuff, mainly backup tapes coming off various [executive office] systems," Thibodeau said. "At that point we had zero in-house capability."

Even before APS was delivered, NARA had it redesigned to process the new electronic records.

"We went from handling several hundred files to several thousand, which is not too bad," he said. "But that's not going to get us to several million," he said.

Expanding the current system won't work, Thibodeau said. APS operates on PCs linked by Ethernet to an IBM RS/6000 server running an Oracle database.

"If we got a hundred of these PCs on the floor, it wouldn't handle the workload," Thibodeau said. "Before we finished copying the files, we'd have to start migrating them" to the archives.

In addition to talks with DARPA, NARA is looking at the possibility of combining individual files into one file.

"If I can put 1,000 messages into one file, my information bottleneck is reduced a thousandfold," Thibodeau said.

The large number of individual files wreaks havoc on the system.

"We can copy 200M in 15 minutes, but we can't copy 100M in thousands of files in 50 hours," he said.

NARA officials hope archiving experts can come up with options.

"We have no predilection about what platform it may eventually run on," he said. They want to first engineer an ideal system, then decide the platform to run it on, he said.