ETH Zurich Web Archive
The ETH Zurich University Archives periodically collects ETH Zurich's most important websites (main site, portals for ETH members and students, special interest websites).
Which websites are archived?
The ETH Zurich Web Archive (external link) permanently stores selected parts of ETH Zurich's web presence and makes these websites available to the public. Every webpage belonging to a specific website is captured in a snapshot. These snapshots reflect the condition of the website at the date of archiving.
The Web Archive currently contains several collections of websites.
- Snapshots showing the state of the most important ETH websites of institutes, chairs and departments prior to ETH Zurich's web relaunch in 2016.
- Snapshots of several older ETH websites.
- Snapshots of virtual exhibitions of ETH Library since 1998.
The Web Archive offers a well-considered selection of websites that illustrate the development of ETH Zurich's web presence. To ensure the completeness of a website, all its webpages are archived at the same time. Technical and visual quality control ensures the high quality of the snapshots. The Web Archive guarantees the stability of references and quotations so that users can cite the archived websites as scientific sources.
The historic websites are retrievable in various portals, such as Hochschularchiv Online (Archival Information System of the ETH Zurich University Archives), the Knowledge Portal of the ETH Library and Archives Portal Europe (external link). The search index consists of the title of a website, the date of archiving and other metadata, but not of the websites' textual content.
Open the start page of the Hochschularchiv Online (external link) database. Enter a keyword in the search field.
It is best to combine your query with the word "website". A list of all archived websites is also available (external link).
In the hit list, click on the entry you are interested in.
Click on "Link auf digitales Original" to access the ETH Data Archive.
If you agree with the copyright notice, click on "I accept the terms".
You see an overview of the snapshots archived over the last couple of years. Click on the date you are interested in.
The snapshot is displayed in the Open Wayback Machine.
The banner at the top of the page serves to avoid confusion with the current ETH Zurich website.
For the web archiving process, the ETH Zurich Web Archive uses the remote harvesting method. We use Heritrix as web crawler. The crawler collects all content linked to a start URL. This web archiving method creates snapshots of all webpages in a website. The crawler generates files in WARC format and log files documenting the crawler's settings.
A prerequisite for archiving in the Web Archive is that a website’s owner is part of ETH Zurich. Only publicly accessible websites, for which no login is required, are archived. Content published on a website, such as PDFs or presentation slides, is stored in the WARC files. Embedded content from external services (e.g. YouTube videos or Google Maps) is not archived. Instead, a placeholder "Resource not in archive" is displayed.
The websites are displayed in the viewer Open Wayback Machine. The display of the archived version can differ slightly from the original version. Web archiving is particularly difficult for dynamic content. The quality control process ensures that the central contents of a website are archived.
How to cite websites
Every snapshot, i.e. the versions of a website archived at different times, is assigned a Digital Object Identifier (DOI). Thus, users can cite these snapshots as sources in their scientific publications.
Please quote as follows:
ETH Zurich University Archives, [call number], [website title], [date of archiving], [DOI]
ETH Zurich University Archives, EZ-INF1.1/7, Website of: Aquatic Ecology, original URL http://www.ae.ethz.ch, 01/10/2017, DOI: 10.7893/ethz-hsa-web-7
How can I register an ETH Zurich website for archiving?
Would you like to save your ETH website in the ETH Zurich Web Archive? To register, please email firstname.lastname@example.org
The WARC files and selected log files of each crawl are stored and managed in the ETH Data Archive. The ETH Data Archive adheres to the OAIS model (Open Archival Information System) and uses the internationally accepted standards METS and PREMIS.
ETH Zurich departments involved in web archiving
The ETH Zurich Web Archive is made possible thanks to the cooperation of various ETH departments. The ETH University Archives cooperates closely with Corporate Communications (external link) and decides which websites are archived. IT Services (external link) configures the crawler, harvests the websites and provides infrastructure for temporary storage. The University Archives performs quality control checks, re-crawls websites if necessary and catalogues all archived websites. The snapshots and metadata are archived in the ETH Data Archive.
Other web archives of ETH Zurich websites
The Internet Archive (external link) contains crawls from various ETH websites since 1997, but ETH Zurich has no control over when, how often and which parts of its web presence are archived in the Internet Archive. Frequently, webpages belonging to a specific website are archived at widely differing dates.