Personal tools

Views

Find Lost Web Pages

From Wired How-To Wiki

There's nothing more frustrating than searching for a page, finding what looks like a promising result, and then clicking though only to discover that the page is gone. Unfortunately it happens all the time. Servers get jammed, pages are removed, some servers move and some servers are simply no longer maintained. But what happens you want to find a page that's vanished?

Contents

Dealing With the Slashdot Effect

Some sites, particularly smaller independent publishers and bloggers, can't handle the traffic influx from having a link show up on Slashdot or Digg. The sites simply stop responding as their servers become overwhelmed. However, you might still be able to see a cached version of the content using Coral Cache.

Coral Cache

The Coral Cache logo

Coral Cache is a free service that uses distributed computing to lessen the so-called "Slashdot effect." Coral Cache was developed to provide a distributed mirror of the original page that can handle the high traffic volume.

You don't need any special software -- just append .nyud.net to the end of a regular URL and you'll hit the page through Coral Cache rather than directly connecting.

It won't be quite as fast as you may be used to (compare wired.com directly with the Coral Cache version), but it could help you get to content that's currently being choked due to an exceedingly large number of direct connections. It gives you sense of the rich data that's available for web sites that have been around for years.

Finding Content That's Been Removed

If a web page has been deleted or removed by its publisher, you can often still find it using one of the web's longer-term caching services.

Google Cache

As search engines crawl the web, they cache fresh versions of pages as they go. To access a page in Google's cache, just search for the original page. If it's still in Google's cache, you'll see a little link leading to the page as it looked the last time Google indexed it.

In some cases, this will lead you straight to the content you want. However, sometimes the method doesn't work. The page owner may have replaced the original page with new content, and if Google's indexing spiders have been back to the page since that change, you won't see the old content.

In such cases, you may be out of luck, but there is one other method you can try.

The Wayback Machine

The Internet Archive is a nonprofit organization founded with the goal of building an Internet library that could offer permanent access to web pages for researchers, historians and scholars.

The Internet Archive's ambitious goal of indexing every page of content that ever been on the public web is not a reality, but the system certainly tries really hard. It just might have the page you seek.

The Wayback Machine is the Internet Archive's search engine that takes a URL and then looks for pages published at that URL over time. Using the Wayback Machine, you can often find pages that have been removed or deleted from the live web years ago.

In some cases, the pages may appear a bit mangled and won't necessarily have all the original formatting -- images, stylesheets and scripts may not be referenced properly anymore -- but you can at least get at the actual text content.

Tip: There's a hidden bonus to the Wayback Machine. From its earliest days, it's not only been caching HTML, CSS and images, but all directly linked content on the server. That means when you go looking for the site of a manufacturer that's been out of business for years, their driver download links may still work.

As of March 2008, the Internet Archive boasts 85 billion web pages. It also recently started archiving other content like movies, audio files and live music, though its indexes for multimedia content are not as extensive as the web page offerings.

Prevent Pages From Disappearing In the First Place

Many of today's popular web-based bookmark services offer page caching as a feature. Ma.gnolia, for instance, takes a snapshot of a page when you bookmark it and caches the contents. This is helpful for ensuring that your favorite bookmarked pages don't disappear on you. If they do, just head to ma.gnolia and click through to the cached version.

Toolbox

Navigation

Welcome to the Wired How-To Wiki, a collaborative site dedicated to the burgeoning DIY culture. Here you'll find all kinds of projects, hacks, tricks and tips on how to make each day better than the last. Anyone can contribute new items or edit an existing item.