ETH Zurich Web Archive

The ETH Zurich University Archives periodically collects ETH Zurich's most important websites (main site, portals for ETH members and students, special interest websites).

Which websites are archived?

The ETH Zurich Web Archive (external link) permanently stores selected parts of ETH Zurich's web presence and makes these websites available to the public. Every webpage belonging to a specific website is captured in a snapshot. These snapshots reflect the condition of the website at the date of archiving.

The Web Archive currently contains several collections of websites.

Snapshots showing the state of the most important ETH websites of institutes, chairs and departments prior to ETH Zurich's web relaunch in 2016.

Snapshots of several older ETH websites.

Snapshots of virtual exhibitions of ETH Library since 1998.

The Web Archive offers a well-considered selection of websites that illustrate the development of ETH Zurich's web presence. To ensure the completeness of a website, all its webpages are archived at the same time. Technical and visual quality control ensures the high quality of the snapshots. The Web Archive guarantees the stability of references and quotations so that users can cite the archived websites as scientific sources.

The historic websites are retrievable in various portals, such as Hochschularchiv Online (Archival Information System of the ETH Zurich University Archives), the Knowledge Portal of the ETH Library and Archives Portal Europe (external link). The search index consists of the title of a website, the date of archiving and other metadata, but not of the websites' textual content.

For the web archiving process, the ETH Zurich Web Archive uses the remote harvesting method. We use Heritrix as web crawler. The crawler collects all content linked to a start URL. This web archiving method creates snapshots of all webpages in a website. The crawler generates files in WARC format and log files documenting the crawler's settings.

A prerequisite for archiving in the Web Archive is that a website’s owner is part of ETH Zurich. Only publicly accessible websites, for which no login is required, are archived. Content published on a website, such as PDFs or presentation slides, is stored in the WARC files. Embedded content from external services (e.g. YouTube videos or Google Maps) is not archived. Instead, a placeholder "Resource not in archive" is displayed.

The websites are displayed in the viewer Open Wayback Machine. The display of the archived version can differ slightly from the original version. Web archiving is particularly difficult for dynamic content. The quality control process ensures that the central contents of a website are archived.

How to cite websites

Every snapshot, i.e. the versions of a website archived at different times, is assigned a Digital Object Identifier (DOI). Thus, users can cite these snapshots as sources in their scientific publications.

How can I register an ETH Zurich website for archiving?

Would you like to save your ETH website in the ETH Zurich Web Archive? To register, please email archiv@library.ethz.ch

Digital preservation

The WARC files and selected log files of each crawl are stored and managed in the ETH Data Archive. The ETH Data Archive adheres to the OAIS model (Open Archival Information System) and uses the internationally accepted standards METS and PREMIS.

ETH Zurich departments involved in web archiving

The ETH Zurich Web Archive is made possible thanks to the cooperation of various ETH departments. The ETH University Archives cooperates closely with Corporate Communications (external link) and decides which websites are archived. IT Services (external link) configures the crawler, harvests the websites and provides infrastructure for temporary storage. The University Archives performs quality control checks, re-crawls websites if necessary and catalogues all archived websites. The snapshots and metadata are archived in the ETH Data Archive.

Other web archives of ETH Zurich websites

The Internet Archive (external link) contains crawls from various ETH websites since 1997, but ETH Zurich has no control over when, how often and which parts of its web presence are archived in the Internet Archive. Frequently, webpages belonging to a specific website are archived at widely differing dates.