during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.

The Internet Archive's budget is in the region of $15M/yr, about half of which goes to Web archiving. The budgets of all the other public Web archives might add another $20M/yr. The total worldwide spend on archiving Web content is probably less than $30M/yr, for content that [probably] cost hundreds of billions to create.

My rule of thumb has been that collection takes about half the lifetime cost of digital preservation, preservation about a third, and access about a sixth. So the world may spend only about $15M/yr collecting the Web.

If we are to continue to preserve even as much of society's memory as we currently do we face two very difficult choices; either find a lot more money, or radically reduce the cost per site of preservation.

It will be hard to find a lot more money in a world where libraries and archive budgets are decreasing. For example, the graph shows that the British Library's income has declined by 45% in real terms over the last decade.

the archival failure is caused by changes CNN made to their CDN; these changes are reflected in the JavaScript used to render the homepage.

The detailed explanation takes about 4400 words and 15 images. The changes CNN made appear intended to improve the efficiency of their publishing platform. From CNN's point of view the benefits of improved efficiency vastly outweigh the costs of being unarchivable (which in any case CNN doesn't see). Alas, the W3C's mandating of DRM for the Web means that the ingest cost for much of the Web's content will become infinite. It simply won't be legal to ingest it.

Almost all the Web content that encodes society's memory is supported by one or both of two business models: subscription, or advertising. Currently, neither model works well. Web DRM will be perceived as the answer to both. Subscription content, not just video but newspapers and academic journals, will be DRM-ed to force readers to subscribe. Advertisers will insist that the sites they support DRM their content to prevent readers running ad-blockers. DRM-ed content cannot be archived.

So for Internation Digital Preservation Day I will end with a call to action. Please:

Use the Wayback Machine's Save Page Now facility to preserve pages you think are important.

Login

Action required

Due to GDPR legislation that will come into effect on 25th May 2018, we're now asking all new registrants to consent to their personal details being used in order to complete their registration, in line with our Privacy Policy.

If you registered with us prior to March 2018, you will need to edit your own profile (by clicking on your name at the top of the screen once logged in) in order to provide this consent. Without this, we may need to remove your account before GDPR comes into effect.