Integrate external WARC/ARC files into Archive-It collections

If you manage W/ARC files that were created with other capture tools, from inherited or legacy web archiving systems, and/or donations, you may upload them to your Archive-It collections for storage and preservation. The contents of these files will replay in Wayback mode and be indexed for metadata-based and full-text search just like any WARC files created with Archive-It's capture technologies. Like crawls, uploads can be made by Archive-It partners with Administrator or User privileges and will count towards your account's data budget.

To enable this additional feature in your account, begin by contacting Archive-It's web archivists. They will determine and provide support if this tool is the best, most efficient way to add your data to Archive-It.

How to add W/ARC files to your Archive-It collections

Upload and store

Once enabled in your account, the W/ARC Uploader tool is accessible through a sub-navigational tab in each of your collections:

The interface under this tab includes a file selection tool and a table listing the collection’s in-process and completed upload jobs.

To begin, use the "Browse…" function to find your file(s) in local or networked storage, then click the "Upload" button to initiate the process of adding these files to your collection:

Upload speeds will vary depending upon your local bandwidth and the size of your file(s), so please do not manually refresh, close, or navigate away from this tab in the web application until it automatically refreshes to display your uploaded file(s) in the table below:

This table represents the following information about your uploaded files:

WARC Filename: The name of each uploaded file as it appears in Archive-It storage and the Archive-It Wayback index, following the naming convention: ARCHIVEIT-[Collection number]-EXTERNAL-[UPLOAD TIMESTAMP]-[ORIGINAL FILENAME].

File size: The volume of the uploaded file in storage.

MD Hash: An md5 checksum value generated to uniquely “fingerprint” the contents of each uploaded file, which may be compared to the original file’s checksum in order to verify integrity.

Status: Updated according to each file’s stage in the process of being permanently added to Archive-It. Immediately upon upload this status will be Processing. After completing file format validation and depositing in storage, the status will change to Stored.

Date: The day on which each file was uploaded.

Replay

Like WARCs created with Archive-It, these uploaded files will require up to 24 hours after storing to appear for browsing in Wayback mode.

To enable access to these external captures, add your preferred URLs from the relevant W/ARC file(s) to your collection as seeds, if they are not already. Once indexed for Wayback, and without the need to crawl these seed URLs anew, you can access the results from the Wayback calendar links in the collection’s Seed tab or, if public, through interface on archive-it.org:

In Wayback mode, each uploaded document is indexed with its original capture date, such as this upload created in November 2015 and uploaded in August 2017:

Quality assurance

The W/ARC Uploader tool can deposit externally-created files into Archive-It storage and verify their format and integrity. However, it does not guarantee that the indexed contents of these files will replay in Archive-It’s Wayback mode precisely as they do in other separate and/or legacy replay mechanisms. Please feel free to report significant differences in the appearances of your archived documents, but understand that without control of the original capture mechanisms the Archive-It staff’s ability to provide support will be limited.