Abstract

Theses and dissertations published at a university are important research resources. ETDs (Electronic Theses and Dissertations) are simply the theses and dissertations published in electronic form (e.g., in PDF). Many universities are implementing a requirement that theses and dissertations be submitted in electronic form, thus making it easier for other people to access these works. These ETDs typically are archived on a server at each local university. We have developed a mirroring system which will store additional copies of remote ETDs, and thus will preserve and enhance access to them. The local archive of ETDs will be updated regularly. If someday the university (Publisher) fails to provide access to one of its ETDs or an ETD copy is corrupted, the user will still have access to another copy of ETD. The above system will be used for NDLTD (Networked Digital Library of Theses and Dissertations).
NDLTD is an initiative to encourage the creation of ETDs by student authors, and to make ETDs easily accessible to students via World Wide Web, thus improving graduate education. There are currently over 150 members in NDLTD. Users can browse or search ETDs through the NDLTD website. The NDLTD website also provides a union catalog to search for ETDs.
The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability. OAI has developed a metadata harvesting protocol to support streaming of metadata from one repository to another, ultimately to a provider of user services such as browsing, searching, or annotation. An OAI harvester implements the OAI protocol for metadata harvesting.
We use an OAI harvester to harvest metadata about ETDs and then a simple web crawler is used to get the actual data and store it on a local machine. This ensures that we have a local copy of data even if the publisher of data is somehow unable to provide us with data. Our OAI harvester harvests metadata, which was not harvested since the last time it was run. Hence, updating the mirror site is easily accomplished. This is a very effective scheme, which can be used to mirror any collection of data, provided the collection has an associated OAI server.