Out with the Old

One of the major challenges we face at HSP with the Digital Center for Americana project is just how to deal with pesky legacy data. Getting information online to improve access is great and all, but it takes a lot of effort to select, customize, and design systems so they can function together, integrate data from older systems (legacy data) and then provide the easy online access we have all come to expect.

One such system we are trying to port over, hopefully familiar to everyone over the age of 25, is HSP's card catalog. Consisting of over one million cards, it is too big to tackle in its entirety for this project. Instead, we are charged with porting over 17,000 records relating to graphics items for the DCA and then another ~40,000 records as part of a separate project.

These card marking assistants are helping weed out duplicates for the retrospective conversion of HSP's graphics cards

This card didn't survive the selection process

There are many separate issues when it comes to converting these paper cards to electronic records; the first being data integrity. Some of the cards we are dealing with are over 100 years old and many of them have not been properly updated. As time goes on, certain items change location on shelves, or perhaps are moved to entirely different collections or institutions. It is not uncommon for the card pointing to the physical item to be forgotten when such a shift is made. Additionally, methods necessary to find information in a card catalog are handled differently in an electronic database. In most database systems you can simply keyword search to find a record based on a specific morsel of information. With a card catalog, however, to achieve the same task you need a separate subject, creator, title, geographic, and publisher cards; just to name a few. This is why our graphics card catalog, known as PC4, is bloated at over 95,000 cards for roughly 50,000 unique records. In order to ensure a speedy turnaround time by our conversion vendor, MARCIVE, a small army of volunteers and assistants carefully check each card in PC4 for duplicates and obvious inaccuracies, marking duplicate cards with a big X in highlighter. This process should take roughly 1200 hours worth of labor to complete.

There is much back and forth between us and our vendor for the card conversion. Its not as clear as one would think as to where information from these cards should fit into MARC fields

Once we have the duplicates removed, we have to send the cards off for conversion to MARC records. The MARC format has been around for the better part of 50 years in the library world, but it is not a standard utilized by most archives. We are using MARC because it is a system our vendor understands, and can serve as a sort of Rosetta Stone between the four systems (Archivists’ Toolkit, Collective Access, Voyager OPAC, and VuFind) that are being implemented or tweaked as part of the DCA. For systems that do not utilize MARC already, such as Collective Access or the card catalog itself, we have to develop field maps to make certain the data goes where it needs to.

A MARC record for one of the thousands of converted cards

All in all, it takes a lot of work to move data from one form of technology to another. When it’s all finished the greatly increased amount of manageability and access to HSP records, and by extension HSP’s collections, makes the effort worth it.

A record displayed by Collective Access from philaplace.org. The same software we will be using for HSP's DAMS