When Good Sites Go Bad – And how to save them: Archival challenges and solutions for post-1995 websites

Before about 1995, preserving websites was a simple matter; you just copied the pages and saved them. The site could then be easily reproduced or moved forward even as the software and hardware that supported it evolved. Collections of pages were never obsolete (although some links may have broken).

As scripted, CGI enabled, and database driven websites became dominant, page-scraping showed itself for the Rube Goldberg solution that it is. Not only are these page-scraped collections a pale representation of the original, but they rarely represent the context, the interactivity or the totality of advanced sites.

To make matters worse, advances in software and hardware often completely break these websites. Each server upgrade is a time of endangerment and loss.

What’s an archivist to do? Continue to scrape pages with Archive-IT or Heritrix? Break sites into highly described atoms that we can preserve but in the process lose context and interactivity?