Curatorial Rationale

Traditional curatorial processes (appraisal, acquisition, arrangement and description, housing and storage, reference and access, preservation) will not scale to the ever increasing volume of government records and publications in the information era. PeDALS seeks to reengineer the curatorial workflow by articulating business rules that repositories can successfully implemented in middleware. That process, based on the Open Archival Information System reference model, includes the following high level activities.

Negotiate a schema for a submission information package (SIP) with each office of origin (OOO). Because the SIP will be specific to each record series and recordkeeping system and because it will require some work on the part of the OOO to implement, the SIP will be kept as simple as possible. In principle, it will consist of little more than the record itself and key metadata in the recordkeeping system. The OOO will write an export routine for the regular transfer of records from automated recordkeeping systems. (The project will also demonstrate an alternative means to develop SIPs for records that have been transferred on an irregular or one time basis and harvests of web publications.) The SIP will be an XML package that encapsulates each record and metadata specific to the record. The project will test both simple and complex record sets.

Write rules to validate the records. The middleware must be programmed to generate a list of the records received with a hash value. That list is sent back to the OOO's system to ensure that all the records were received and that no records were altered during transmission. The OOO's system verifies receipt or retransmits records until receipt is verified.

Write rules to create a standard archival information package (AIP). The AIP is a generic XML schema that will include the original record as submitted, all the OOO's original metadata, additional metadata generated by the repository, and digital signatures to support integrity checking and demonstration of authenticity. Library and Archives staff have already identified essential schema elements, and project staff will look specifically at other research on XML schema done by other states. The staff will also look at the METS and ARC formats, and the National Information Exchange Model and the Global Justice XML Data Model.

Write rules for accessioning and describing the records. These rules will create an entry for each record in an Accessions Register database. The database will include administrative metadata, such as date of acquisition. Repository staff will map metadata supplied by the OOO to the AIP schema and will also write rules to supply additional metadata. For example, knowing the specific record series allows one to assign default metadata for the records, including provenance, series title, and controlled vocabulary subject headings appropriate to that series. Rules can also supply standard, record specific metadata, such as data of transfer and order, as well as parties to the record (represented as a name and role in the record, such as "Hancock, John | witness"), record date, and locale. Rules can use JHOVE or other resources to identify the record format, and they can capture other preservation information, such as file size and a hash value.

Write rules to ingest the AIP into the LOCKSS system, including rules to validate the ingest process.

Write rules to create a dissemination information package (DIP) from the AIP. The AIP is the official copy of the record, but the AIP schema may not be "user friendly." The DIP transforms the record to a format that can be easily viewed and will protect the integrity of the record. (For example, records created and submitted in Word might be converted to PDF to make it difficult to alter.) Similarly, project staff may create a variety of DIPs targeted towards different audiences. (For example, raw observational data may be provided in that form for those wanting to do analysis, but represented as graphically for those without the tools or knowledge to know how to manipulate the data.)

Write rules to publish DIPs to the web, including populating a database with appropriate metadata to support discovery. Given the sensitive and confidential nature of some of the information in the records, the system may need to restrict access to some records for a period of time to comply with state or federal law. Similarly, access to some records may be restricted to the OOO or the repository to protect individuals' privacy and to thwart identity theft; this limited access recreates practical obscurity.

Monitor the processes. Under the reengineered curatorial process, archivists and librarians will seldom be working with specific items. Rather, they will work with processes described above to manage those items. Because the professionals will not see each item, they must constantly check to make sure the processes are running correctly, tweaking the process if necessary.