What is Proxy Mode?

Proxy Mode is an "offline browsing" mode, allowing you to better evaluate which page elements were captured and which are still being pulled from live sites. Using proxy mode is an excellent way to check the quality and completeness of your archived content. Proxy mode allows you to exclusively browse the archived documents within a single Archive-It collection.

When you view a seed or document in proxy mode, only the most recent capture will be visible. Because it is not possible to view earlier captures in proxy mode, you'll want to do integrate this technique into your on-going quality assurance workflow.

Why use Proxy Mode?

When you browse your collection without a proxy, embedded files in your archived websites can inadvertently redirect to their counterparts on the live web. This means that when you are looking at an archived web page, you could actually be seeing the live version of an embedded document rather than the archived version. When these redirects happen, it is difficult to discern whether your documents were archived successfully. Proxy mode prevents these redirects from happening. For QA purposes, it is important to browse your documents using proxy mode to be certain that your harvests are complete.

How to do it

Use Archive-It's Proxy Mode Add-On

The easiest way to set up and use Proxy Mode in order to browse your web archives "offline" is with our Firefox browser add-on, which enables you to quickly toggle between viewing pages in Wayback and Proxy Modes at the push of a button.

Set up Proxy Mode manually in Firefox or Chrome

To use proxy mode without the Firefox add-on, you need to make a manual adjustment to your web browser's settings. Once you have adjusted your browser, you will only be able to view material from your archived collection, and not any content from the live web. To again view sites on the live web, you will need to adjust your browser back to its original settings.

In the dialog, choose the option "Automatic Proxy Configuration URL:", and paste the following URL into the space provided: http://wayback.archive-it.org/proxy.pac

Click "OK" and you are ready to go!

Google Chrome

From the top of your browser, go to Chrome > Preferences....

Click Advanced at the bottom of the page

Under System, click Open Proxy Settings:

In System Preferences, click Automatic Proxy Configuration, and enter http://wayback.archive-it.org/proxy.pac

Follow directions for applying that rule (on a Mac click OK then Apply)

Browsing in Proxy Mode

Once you have followed the proxy mode set up instructions above, you will be able to click through archived links as you normally would in the Wayback Machine's regular viewing mode.

The easiest way to browse your archives in proxy mode is to work from a list of your seeds and/or specific URLS that are important for you to check for accuracy. Once you have a proxy enabled, you can just paste in the seed or exact URL you are looking for into the browser's address bar (you don't need to include wayback-archive-it.org, the collection ID, or the capture date). For example, if the archival URL is: http://wayback.archive-it.org/193/20080508191419/http://www.sdhistory.org/, you would remove everything starting with wayback.archive-it.org through the date code, making the URL: http://www.sdhistory.org/. At this point, if proxy mode has been set up correctly, you will be prompted for a username and password:

When presented with this screen, for the "User Name," enter the collection ID number that corresponds to the URL you are attempting to view (note that in the case of test crawls, the collection ID number must be followed by "-test" in the following manner: ####-test). You can leave the "Password" field empty.

If successful, you will know you are looking at the archived version, because your proxy settings have been changed, and you will see the archived website disclaimer at the top of the screen. You may need to reload your page.

Browsing among multiple collections

Please note that if you are viewing a webpage from one collection, then enter the URL for an archived page from a different collection, you will likely encounter a "Not in Archive" screen. To view content in a different collection, click the "Switch Collections" link in the yellow Wayback banner area. This will again prompt you to enter the collection ID number as the username to proceed.

URLs that cannot be viewed in Proxy Mode

It is not currently possible to browse any URLs that begin with https in Proxy Mode. In many cases, you may simply remove the s to view the http version of the URL, however, some sites and browsers force the use of https in certain situations or for certain component documents and thus will not replay in Proxy Mode, most notably facebook.com.

Further information

Zombies in the Archive, a blog by from the Web Science and Digital Libraries Research Group at Old Dominion University, further explains how the live web finds its way into web archives.