Despite government efforts, the Web never forgets

David ColkerLos Angeles Times Staff Writer

Within days of the Sept. 11 attacks, the federal Agency for Toxic Substances and Disease Registry rushed to pull a suddenly sensitive report from its Web site titled "Industrial Chemicals and Terrorism." The agency eliminated all traces of the document and its description of sources for home-brew nerve gases and improvised explosives.

But on the World Wide Web, almost nothing truly dies.

Indeed, the thorny report currently lives on at several locations, including the site for the Oklahoma City National Memorial Institute for the Prevention of Terrorism, a UC Santa Cruz graduate student's Web site and the databanks of the Internet Archive, a nonprofit venture that has electronically stored an estimated 10 billion Web pages in an effort to preserve the Web's history. The Toxic Substances and Disease Registry is one of several agencies -- public and private -- facing this problem. Contrary to concerns about too much censorship in the wake of the Sept. 11 terrorist attacks, the reality is that some agencies are having a hard time censoring anything that was once published on the Internet.

"The Internet is not like a faucet you can turn off and on. It's like a leaky faucet that keeps dripping long after it's turned off," said Gary Bass, executive director of OMB Watch, an organization that strives to cut back on government secrecy.

Still scattered across the electronic ether are a host of "erased" documents, including maps of nuclear reactors, pictures of secret spy satellite facilities and a description of a NASA space propulsion project.

In many cases, agencies had no idea that their erased documents are still available for anyone with a Web browser and Internet link. Detailed maps from the Energy Department's International Nuclear Safety Center, for example, are still retrievable through the Internet Archive.

"I have never heard of the archive," said Jeff Binder, director of the center. "Maybe our guys in cybersecurity have."

In the electronic battle against terrorism, the Web has become as porous a landscape as the real battlefronts surrounding Kunduz or Kandahar in Afghanistan.

That's largely due to a kind of Xerox effect on the Web, where pages and even entire digital sites can be easily copied with a few mouse clicks.

Copies of supposedly eradicated reports and documents can be found using common search engines and the Internet Archive's whimsically named Wayback Machine. The "Industrial Chemicals and Terrorism" report can be found in a matter of minutes, even by novices.

Until the Sept. 11 attacks, the porousness of the Web was actually a feature celebrated both in and out of government as a way of providing instant global distribution of information.

Anti-secrecy group now pulling pages

For Steven Aftergood, director of the project on government secrecy for the Federation of American Scientists, the Internet has been a primary tool in the organization's efforts to battle what it considers misuses of government secrecy. It collects and disseminates information on nuclear weapons, the "Star Wars" antiballistic missile initiative and other projects.

Indeed, the Washington-based group was created after World War II by scientists from the supersecretive Manhattan Project worried that the government was concealing the dangers of building a nuclear arsenal.

But since Sept. 11, Aftergood has found himself in the awkward position of following the government's lead in protecting sensitive information. So far, he has removed about 200 pages from the federation's site, mostly concerning intelligence and nuclear weapons facilities.

Or so he thought.

One of the sections he removed from the site showed pictures of an unidentified government agency that analyzes spy satellite data. Included in the cache was a closeup of a sign pinpointing the location of the building and the opening in the security fence used to admit cars.

What Aftergood did not know was that in the months that the photographs were on his computer servers, a succession of small programs, known as "bots," had scoured the site, indexed its contents and copied them to the Internet Archive.

It's a normal process that routinely happens behind the scenes, invisible to most Web users. Dozens of bots from search engines continually monitor the Web in this manner to keep their search indexes up to date.

Much to Aftergood's chagrin, the bots ensured that the photographs would have a permanent home in cyberspace regardless of his efforts to erase them.

"Once something is posted on the Web, it's a safe bet that it is archived somewhere," he said after being shown that the photos were still available on the Internet. "Once it's out there, it's all but impossible to get back."

There are ways to track down and eradicate some pieces of stray information. One method is using what is known as a "robot exclusion." Web site owners can insert a bit of code that causes electronic search bots to pass over a site.

The Internet Archive has gone one step further, allowing site owners to retroactively remove their pages from the archive if they engage the robot exclusion.

For example, trying to access the Nuclear Regulatory Commission site on the archive results in a "Blocked Site Error" notice "per the request of the site owner."

Brewster Kahle, the founder of the Internet Archive, defended the right of site owners to erase all traces of what was once available to the public, although he worries that heavy use of the exclusion will erode the historical value of his project.

"We are hoping people see the value of their sites being in the library," said Kahle, a San Francisco entrepreneur who is the archive's chief financial backer.

But trying to completely erase electronic information is not as simple as instituting a few lines of code. Once information has been placed on the Internet, it can be easily copied to other sites or put into a variety of media, ranging from paper to CD-ROM to private hard drives.

The Investigative Reporters and Editors organization culled data from a Federal Aviation Administration Web site detailing all actions taken against airports found guilty of security breaches from 1962 through August this year. Although the FAA removed the information from its site after Sept. 11, the IRE continues to make it available for sale on CD-ROMs.

The huge FAA database details 380,520 agency investigations, showing the date, airport, nature of the problem and any action taken. FAA officials said the information could be used by someone to look for weaknesses in an airport's security.

"We took it off because of a determination that it was what's called SSI, security sensitive information, and that it should not have been on our Web site in the first place," FAA spokeswoman Laura Brown said.

Keeping information available defended

But the IRE, which has made the database available in the easy-to-use CD-ROM form since 1997, has refused to put the genie back in the bottle, saying this is precisely the sort of information that should be available to the public.

"Given the track record of the FAA in keeping our airports secure, we think journalists should continue our role as watchdogs," said Brant Houston, executive director of the organization.

Because the FAA has stopped making its updates public, Houston said he is not sure whether the IRE will be able to continue adding to its CD-ROM database.

The Web offers so many avenues of escape that even sites specifically targeted for erasure have managed to persist.

One of the most hunted sites is Azzam.com, the Internet home of the pro-Taliban Azzam Publications. For a time, those who typed its standard Web address, www.azzam.com, were greeted with a blank page with the message, "This site has been suspended until further notice."

It soon reappeared in an obscure corner of the Web controlled by the island nation of St. Helena off the west coast of Africa. Its authors advised readers to quickly make their own copies of the site, now located at http://www.azzam.sh.

"We have written these few words in expectation of our site being closed yet again," the site said. "Therefore, we advise all the Muslims to save a copy of this page and ponder about what it says lest our site is closed and we are not able to say it again. We also advise the Muslims to copy, translate into other languages if necessary and distribute this message all over the Internet."

They need not worry. Nearly two years' worth of the site is stored in the Internet Archive.