For about three months, I’ve been assembling a Zotero bibliography of articles published in The American Archivist (AA). I realize this may seem like a lot of work when there’s already a searchable index available on the publication website, but it seemed like a worthwhile use of my time for several reasons.

The metadata that powers the search function on the AA site is flawed. I believe it was generated via OCR by the vendor who set up the portal. I’ve found places where characters were misread (e.g., serf instead of self) or were dropped altogether.

Whatever mechanism was used to sort the tables of contents into sections is inconsistent. When I was reading and reviewing all of the annual addresses by Society of American Archivists (SAA) presidents, I realized that not all of the speeches are designated as presidential addresses. And a few things that are not, in fact, presidential addresses are labeled as such. For example, the greeting an incoming president gives during an annual meeting is occasionally (but not always) labeled as a presidential address.

I’m interested in getting a retrospective look at how certain topics have been covered in the literature of SAA, and it seems efficient for me to discover and tag these articles all in one place. This way, I can easily generate a bibliography of all articles relevant to a particular topic:

Spoiler alert — there hasn’t been much written in AA about correspondence!

I’ve enjoyed using Zotero. It has a fairly intuitive interface, and there are useful browser plug-ins along with a standalone version of the software. Prior to this project, my greatest exposure to Zotero was through overseeing the importing and updating of the bibliography for the SAA Records Management Section.

For this project, I imported all the AA articles labeled:

Articles

Perspective

Presidential Address

Research Articles

Depending on whether they piqued my research interests, I also included some items from the sections labeled Additional Matter and Shorter Features. “International Scene” began as a subsection of the Technical Notes and later became its own subsection in the table of contents; these articles were also chosen based on their applicability to my research interests. I included none of the book reviews, abstracts, council minutes, obituaries, or front matter.

The reason I had to be somewhat choosy is that this was a rather labor-intensive project. The RIS file that I exported from the AA site into Zotero did not list the issue dates the way they appear in the publication itself, so I changed all of those (so that my works cited lists will be accurate). For example, AA was originally published four times a year; the April 1938 issue exported with a date of April 1, 1938. After AA switched to semiannual publication, the dates got especially problematic. I went with Spring/Summer 2011 as the publication date, although these exported with a specific date (e.g., January 1, 2011).

The other labor-intensive thing I did was to track down as many published bibliographies as I could find. From 1943 to 1980, the National Archives generated an annual bibliography of publications related to archives and manuscripts and records. Usually, instead of being listed in the table of contents on the AA website, these bibliographies were buried within the Technical Notes PDF. Although I have no ability to crop the PDF, I did indicate in my Zotero entry the precise page numbers of the bibliography within the broader PDF. (Unfortunately, there are a handful of these annual bibliographies that I still couldn’t track down: 1958-59, 1963, 1965, 1967, 1968, 1970, 1974, 1979-82).

If you look at my Zotero bibliography (https://www.zotero.org/cbaileymsls/items/collectionKey/8DZEFJWX), you’ll find the tags I’ve used to identify topics of interest to me. In both the browser client and the desktop client, you can click on one of my tags to find the related articles, and you can sort these entries alphabetically by title or author or chronologically by date of publication (although Zotero has a hard time understanding that the Fall/Winter issue was published after the Spring/Summer issue). This can be useful for identifying when one particular author has published multiple articles or for analyzing the frequency of publication on a particular topic.

Two other improvements I tried to make were to author names. Although they didn’t export in the RIS file, I tried to add any suffixes like Jr. or III. And if an author was listed with a first initial and a middle name, the metadata only captured the initial, so I went back and added the middle name for clarity.

I hope in some way this bibliography might also be of use to other folks. Perhaps it can help you find something for which you’ve been searching or challenge you to help fill a void that exists in the archival literature.

I had the opportunity to hear Cliff Missen speak this week about his work with the eGranary Digital Library. In the developing world, Internet access can cost 100 times what it costs in the United States, and even then, the access times are usually quite slow. eGranary addresses this problem by providing access to a vast number of educational resources cached within a local area network. Missen refers to this as storing the “seeds of knowledge.”

For $2,000, subscribers to eGranary can purchase a 4TB hard drive that contains 30 million Internet resources, all of which can be full text searched. About 60,000 of these resources have also been cataloged and can be searched via their metadata. There are also dozens of portals that have been developed to harness the wealth of resources and facilitate the discovery of information. The number one topic searched is health information. For all documents not in the public domain, the WiderNet Project has secured appropriate copyright permissions.

The collection includes more than 60,000 books, hundreds of full-text journals, and dozens of software applications. It includes not only text documents but also video, audio, and images. Although the Internet resources provided cannot be updated, subscribers can upload local materials as well as create and edit unlimited Web sites within the eGranary system. (20GB of storage is allocated for this purpose.)

To this point, eGranary has been installed in more than 700 schools, hospitals, clinics, and universities in Africa, Asia, and the Caribbean along with a number of prisons within the United States. Cliff Missen and his colleagues at the WiderNet Project deserve credit for trying to bridge the information divide between the developing and developed world. At the end of the day, access to information is more of an economic issue than it is a technological one, and it behooves those of us with access to both wealth and information to develop sound methods of broadening accessibility.

A quick Google search confirms for me that I am not the only one who despises the way the iPhone numbers pictures and videos. I am in the habit of copying from my brother’s computer the pictures that he takes of my nephews, both to procure a source of images that can be used in scrapbooks and to serve as a form of off-site backup. In the years that I’ve been copying these pictures, the confusion caused by the iPhone’s file naming procedure has caused me great consternation. I do not own an iPhone myself, but the best I can tell from personal experience plus online research is that after pictures are downloaded to a computer, it will begin numbering new pictures at 001. The end result is that when I copy pictures from several folders – each one created with a date name that indicates when the pictures are downloaded – I wind up with multiple different images with the same file name. I even sometimes wind up with two copies of the same picture with different file names; apparently, if a picture is left on the iPhone after a download, that picture’s number is also reset. Needless to say, if I have a photo with the number 009 printed on the back, the process for determining when that photo was taken is not clear cut, because a search for that file name may produce three or more images with the same name.

In addition to generating personal frustration, these interactions with iPhone pictures provide me with a reason to reflect on why metadata matters. Earlier this year on the ACRL TechConnect Blog, Meghan Frazer posted “An Elevator Pitch for File Naming Conventions.” She provides her own recommendations about file naming conventions:

I began using a personal computer in the era of the MS-DOS FAT file system, so the names of the older files that I have migrated from computer to computer over the years are hampered by the 8.3 file name (a maximum of 8 characters for the base file name and 3 characters for the extension). In my attempt to create descriptive file names at the time, I tended to use the extension to indicate something about the content of the file; for example *.let was a letter that I wrote, while *.civ was a file that I created for my civics class. Needless to say, this has created a problem for me in that by using the extension in this way, I removed any indication of the software program that I used to create these files. A post on How-To Geek provides step-by-step instructions of four methods for batch renaming files in Windows. I know that the word processor I used was WordPerfect, so I could change the extensions easily from the command line. But that would lose the descriptive information contained in my original extension, so instead I’ve decided to try out one of the most highly recommended software solutions, Bulk Rename Utility, in order to have a more sophisticated method for handling my file name changes. (Stay tuned for the results!)

To this point, I have relied on personal examples of why metadata matters, but as Frazer points out, if repositories are making files available to the public, file naming matters for both for discovery and access. For example, the Moby Dick Big Read project provided audio versions of each chapter of this book, available for download. Moby Dick had been on my list of things to read for years, and I had a long car trip coming up, so I downloaded the chapters to an SD card and popped it into my car’s audio system. It had been very time-consuming to download these files because I had to download each chapter separately, so I hadn’t taken the time to notice how the files were named. But when I tried to play the book, I realized the file names given were going to make my listening very frustrating. The chapters tended to be named c1.mp3, so when these files were sorted into ascending name order, chapter 1 was followed by chapters 10 through 19. When I later renamed these files, I had to be cognizant of the 136 chapters and use leading zeroes so that they would be in the right order. (A more minor issue, but still one easily addressed by the producers from the outset, was that only two of the files were embedded with ID3 information, so the title and artist were not readable by my audio system.)

Obviously, repositories can leave the responsibility to the user for renaming files as necessary. I eventually got Moby Dick to play in the correct order, so other people can do it, too. But it strikes me that this is a relatively simple way that archives and digital libraries could both facilitate discovery and access while also modeling best practices for file naming procedures. In the long run, this can only improve the state of born-digital records that are accessioned by repositories. Personally, I don’t think it’s too much to add to the workflow. Here are some resources that can address this need:

University of Michigan, Digital Preservation Glossary – provides succinct definitions of key terms related to digital preservation and distinguishes among various types of metadata

If you want to delve into metadata more fully, you can peruse these sites:

Dublin Core: defines a set of fifteen “core” metadata elements; has the flexibility to address various needs (see the “Levels of interoperability” section to determine what is needed by your organization)

Metadata Encoding & Transmission Standards (METS): as this web site explains, “The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.”

Metadata Object Description Schema (MODS): MODS is an XML schema for a bibliographic element set; one of the most useful elements on this web site is the listing of conversions between MODS and other metadata schema

PREMIS: the Preservation Metadata: Implementation Strategies working group was convened by OCLC External Link and RLG External Link and developed the PREMIS data dictionary with the goal of creating an implementable set of “core” preservation metadata elements; Version 2.2 is available, and Version 3.0 is in the works