Where websites go to die

The National Library of Australia is a world leader in tracing the evolution of the internet. But, writes Lauren Martin, with the average life of a website now only 44 days, time and money are short.

"You are on the web page the media do not want you to know about," trumpets the screen. It bears the banner "Pauline Hanson's One Nation", but in a flowery script that looks amateur compared with the slick corporate logo adopted later when she began posing with the flag draped around her. The paranoia endures in the party, but this picture is from 1998 - when the redhead was still on the outside, running the show.

It is one of 10 versions of the party's website captured for posterity through an innovative, world-leading and quite unfunded project at the National Library of Australia. Through PANDORA - the acronym for Preserving and Accessing Networked Documentary Resources of Australia - you can trace Hanson's evolution as a political force.

You can also follow the downfall of Ansett, clicking through those $99 fares, back when the banner read "The spirit of competition", to this: "Welcome to the Ansett Australia Group Administration website. This site was previously the information website and booking facility."

You can see what the INTERFET website looked like when Australia began its peacekeeping mission in East Timor - though the site no longer exists on the World Wide Web.

You can experience exactly how the Sydney 2000 Olympic Games site operated during the event, though it, too, has otherwise vanished. Likewise the Centenary of Federation, or the republic vote. You can see what the DFAT Club Buggery Fan Club site was about, when it was functioning - and it was not Roy and HG.

If you prove your bona fides, you can even view a select number of turn-of-the-millennium sex sites - though only while on a supervised computer in the Canberra library.

You can, in short, trace the evolution in Australia of the dominant cultural phenomenon of our age: the internet.

PANDORA is an archive of Australia's part in a technological leap which, in a decade, has revolutionised publishing. And it is run by a white-haired librarian, Margaret Phillips, with a staff she can count on one fine-boned hand.

The National Library of Australia is a 20th century classical revival building, surrounded by columns on the shores of Lake Burley Griffin. More than 500 people work there, but the work of Phillips's five - helped by the technology experts who engineer their efforts and by associates they have trained in a growing number of partner institutions - are ahead of almost anyone in the service of internet archiving.

The Canberra library's PANDORA, begun in 1996, is a model for the massive project by the United States Library of Congress (called MINERVA), and still in its infancy. It may yet be a model for the British Library, where staff tried, a few years back, archiving about 100 sites and then decided it was too hard. It is now making a second attempt.

What's more, the National Library of Australia has spearheaded and continued its digital archiving - not just of websites, but pictures, manuscripts, maps and music - with no new money to do so. In contrast, the US Congress in 2000 designated $US100 million (to be doubled by leveraging it with non-federal funds) for the same job.

The questions facing Phillips and her team were daunting: how does a library go about collecting websites? What to keep? How to capture it? How to negotiate permission? How to make sure researchers can use it in the future? How to pay for it?

These were thrown into Phillips's in-box. With her silver chignon and classic suit, she seems an unlikely techno-geek. She spent much of her career at the Queensland Conservatorium of Music library. She came to the NLA to work in the music section and admits to being more scared than excited when, in the mid-'90s, she was asked to head up the digital project. "Experts in the US were saying it couldn't be done. We had to say, 'We will do it, or at least we'll have a go,"' she says.

The NLA's director-general, Jan Fullerton, had heard enough about the obstacles. She finally said, according to Phillips, "Look, just go away and work out what it is we would want to do if there were no inhibitors."

Now the NLA's expertise in internet archiving is sought worldwide. It's "dynamic" archive - which captures audio (election jingles, for instance), video (watch the Bangarra Dance company), flash and other multimedia features - is unique in the world.

Its selective approach - with clear criteria, cataloguing, search facilities, and staff who exercise quality control - is one of two directions taken by national libraries. Canada's archive takes the same approach but is static without the bells and whistles on each site. The other, mainly used by the Nordic countries, led by Sweden, is a whole-of-domain approach; it tries to harvest all net material from the country at a given time. This way is more comprehensive but less organised.

Phillips does not rule out a whole-of-domain harvest at some time, to show the full context of the Australian web. But she believes that selecting and cataloguing sites for the national bibliography will allow future researchers to easily track what they need from one place: PANDORA.

What to select has always been the key question. The US Library of Congress report puts it this way: "Who collects what for whom in what format is especially fraught with ambiguity and some frank anxiety. Unlike the contentious issue of copyright, for example, there are no laws that can be enacted or revoked to give parameters for good behaviour. Yet because of the ephemeral nature of the data, if we make mistakes about collection development now, we are unlikely to get a second chance to collect in the future. A digital file cannot sit neglected on a bookshelf for 200 years before someone discovers its value. By then it will be corrupted or trapped in an obsolete software encoding."

As of January last year, the web comprised more than 550 billion public pages and linked documents. While it is not even a decade old, the web is enormous and grows by 7 million pages a day. Forty-four per cent of the sites available in 1998 were no longer in existence a year later. The average life of a website is now only 44 days.

Phillips is concerned about what PANDORA cannot yet collect: webcams ("because it's an important commentary on how people are using the internet and their attitudes to privacy"); "blogs" (there are now a few on PANDORA), chat rooms, databases, even games. But for now her staff must concentrate on the limited areas of significance chosen in 1996: government publications, academic journals, conference papers, e-journals, and topical sites.

Many have disappeared from their publishers' sites, but most are still on the web, probably because PANDORA is selected for quality and potential for future research. It will be, on balance, a history not so much of the internet, but of the best examples of it.

Still, PANDORA's account of Australia's online creative record is dazzlingly diverse. There are all the government reports and publications, of course, and numerous academic e-journals. But also protest sites (against the war in Iraq, against a new McDonald's in Coburg West), religious sites (BuddhaNet and Bible Believers), guide dogs and a literary site called grrrowl, collections around the Bali bombings and the Canberra bushfires. Silverchair is there, and so is Slim Dusty, whose site was first archived in 2001, including his tour dates, and poignantly archived again after his death in September.

A parliamentary committee is inquiring into the role of libraries in the online world, with a report due by mid-October. It has heard that libraries are strategically placed to deliver national benefits - and equity - in the digital era. But it needs money.

Perhaps it will give a greater indication of whether libraries are seen now to be as crucial to our democracy as they were considered 100 years ago. Or will a yawning gap emerge between our cultural institutions and powerhouses such as the US?

There, the Library of Congress acknowledges the "generous" funding to its digitsation program. This, it says, "embodies the belief of the Founders that self-government depends vitally on free and open access to knowledge and the unhampered pursuit of truth by an informed and involved citizenry".