Libraries to store all UK web content

Millions of tweets, Facebook status updates and even a blog about a bus shelter in Shetland are to be preserved for the nation.

The British Library and four other "legal deposit libraries'" have the right to collect and store everything that is published online in the UK.

It is estimated around a billion pages a year will be available for research.

It follows 10 years of planning and will also offer visitors access to material currently behind paywalls.

The other institutions involved are the National Libraries of Scotland and Wales, the Bodleian Libraries in Oxford, the University Library, Cambridge and the Library of Trinity College, Dublin.

The archive will cover 4.8 million websites and will include magazines, books and academic journals as well as alternative sources of literature, news and comment such as Mumsnet, the Beano online, Stephen Hawking's website, and the unofficial armed forces' bulletin board, ARRSE.

Ben Sanderson from the British Library said while people may think information on the web lasts forever, huge amounts of research material has already disappeared.

He added the public had already "lost a lot of the material that was posted by the public during the 7/7 bombings".

MP's blog sites have also been lost following a death or an election defeat.

Many Facebook comments are public and people don't realise they're publishing to the worldJim Killock, Open Rights Group

Top 100 websites

Mr Sanderson explained that with much of public life having migrated to the online world, material that is now published physically gives only a part of the story and debate within modern Britain.

He said: "It will be impossible to tell for instance the story of the 2015 general election without accessing what appears on the web".

The new databases will cover all areas of interest, for example the website Style Scout - a fashion blog documenting London Street Fashion - will give historians a snapshot of what people were wearing in 2013.

As part of the launch of the process, the British Library has commissioned a survey of the top 100 websites that ought to be preserved for historians and researchers.

Among the sites recommended to keep material from are eBay, Facebook, Twitter, Tripadvisor and Rightmove.

Some other lesser known ones include the Anarchist Federation, the Dracula Society and The Dreamcast Junkyard - a blog dedicated to the community of gamers who continue to play Dreamcast games online, despite the fact they were officially discontinued in 2002.

The British Library is also asking for advice from the public as to which websites should be preserved to give an accurate picture to future generations.

Jim Killock, executive director of the Open Rights Group, told the BBC News website: "The idea of the British Library preserving published content from UK websites is a great one.

"My concern is that a lot of Facebook comments are public and people don't realise they're publishing to the world. That's Facebook's fault, not the British Library's - their user settings need to be changed in line with people's expectations.

"Twitter, on the other hand, is avowedly public - it's very clear you're publishing to the world."