Tag Archives: digitized images

I’ve updated the MARC records for all our photographs in our Horizon catalog and we’ve exported fresh copies. With only a few exceptions, the records are ready to run through a script to dump most of the data into tabular for our CONTENTdmimport.

Remains of fire on East Washington Street, Petaluma, Calif., 1978 (photo courtesy of the Sonoma County Library)

The entire Cataloging crew worked diligently to load all the archival TIFF copies of our photos to Demeter, our archival server, and now I’ve regularized the file naming scheme (preserving the original name in the image metadata along the way). I’ve extracted the image metadata and will pair that up with the record metadata to see where we have discrepancies or missing image files (while we’ve scanned nearly all our photos, we keep discovering the odd one here or there that we’ve missed). Once that task is done, I can upload the full metadata sets into the CONTENTdm client, which will create JPEG2000 images from the TIFF masters. Given the number of images, that process will undoubtedly take some time.

Collections and Image Quality

I’ve divided up the collection into collections based on variety of factors, including type of material (maps, newspaper excerpts, general photographs), theme (wine industry, Sonoma County Fair), and origin (Sonoma Depot Museum, Healdsburg Museum). The latter groups are an interesting case because many years ago we borrowed photos from a number of institutions and photocopied them for our collection. I don’t know that decision was due to budget constraints or restrictions from the loaning institutions, but the quality of the photocopies are mediocre. We unfortunately digitized quite a few of these photocopies, but we are now talking about working with the loaning institutions to digitize the original prints. In the meantime, I’ve updated the metadata and I’ve wrestled with the decision of whether to display the digitized photocopies in CONTENTdm before we replace them or not. They wouldn’t show off our collection in the best light, but at the same time, they provide at least some level of access to the images.

We also have some 7,500 images that were scanned by volunteers at the start of SCL’s digitization efforts back in the late 1990s. Starting in 2005, we contracted out digitization with Backstage Library Works and have full sets of TIFFs and JPEGs on disc for all images, but all the early scans are low resolution JPEGS. I only recently found the masters for those images on one our workstations after thinking we had no masters available. I’m probably going to display those images as well, but we will gradually replace them with higher quality scans as they are requested, since that group consists of photos especially selected for historical interest.

Next Steps

I hope to have a sample of 100-200 records adapted from existing MARC records and another 200 new records up on our test server by the end of the first week of January, if not sooner. This will allow us to troubleshoot the workflow and should uncover any issues with metadata (I’m crossing my fingers there will none). Though the timeline has slipped considerably because I went ahead with conversion of existing records before starting with wholesale creation of new records, we’re very close to having a visible face to our project.

I’ve started to load TIFF and other archival files to Demeter, our new image server. With several hundred discs to load, this project will take a while, even if each disc loads fairly quickly. To some extent, I can load in the background while I work on other things, but changing discs interrupts any workflow.

I’m loading the images into a temporary folder since I need to check them over before putting them in the permanent file. I will probably work through the process in batches rather than try to deal with all 30,000 image files at once. The next fun step will be managing the duplicate file numbers/names that have crept into the collection. My preference would be to actually pull and renumber the physical photos, but that may not be practical, depending on the number of duplicate numbers. Once loading is complete and the duplicate numbers dealt with, I’ll rename the files according to a uniform naming scheme (the file names on disc have a variety of conventions).

Conversion of MARC photo records for CONTENTdm will wait until the photos are all loaded since I need the TIFFs in place to create the JPEG2000 images.