Recently, we asked the Princeton University Library Digital Studios to photograph the remaining Volumes 9-11, for addition to the PUDL and the Finding Aids.

We were lucky enough to visit the Digital Studios and see the digitization of the volumes in action. Digital Studio staff members use a number of digital cameras and lighting to achieve the best quality image.

Images are fed to a local computer and continually checked by staff as they shoot.

The entire process can take a few months to complete, from photograph to online availability.

We are happy to be able to share the process with you and look forward to announcing the final early volumes being available online soon.

Though the collection spans his lifetime, the John Foster Dulles Papers focus on Dulles’s service as the fifty-third Secretary of State under the Eisenhower administration. Dulles was formally appointed to the position on January 21, 1953. In December of that year, he made his first Christmas address to the American people, wishing them “peace on earth, good will to men.”

John Foster Dulles Papers (MC016), Box 321

Check the blog for future posts about the progress of the John Foster Dulles digitization project. For more information about the Digitizing the Origins of the Cold War project, see some of our previous posts.

Responsibilities: As the digital archivist at Mudd, I’m responsible for the development, implementation, and execution of processes that facilitate the effective acquisition, description, preservation, and access of born-digital archival collections acquired by the University Archives. The emphasis on ‘born-digital’ is to distinguish my work from that of digitization, which is a process that converts analog material into digital formats. Born-digital records are those that originated as microscopic inscriptions of 0’s and 1’s on a piece of magnetic media.

“Magnetic Force Microscopy (MFM) of a Magnetic Hard Disk,” taken from MIT

Preserving and providing access to those 0’s and 1’s, or bits, is too challenging a problem for any single person to solve, so many of my duties require me to collaborate with others in the University Archives and across campus. This often involves me meeting with our University Archivist, the Assistant University Archivist for Technical Services (to whom I report), and the University Records Manager. As exciting as it is to dive into the past by hacking away at old and new media—and trust me, doing this is really exciting—the most important element of my success is laying the infrastructure for our Digital Curation Program, which we initiated two months ago. Infrastructure is invisible to most of us but critical for all of us. More on that in future posts.

Lest I lead you to believe that I work exclusively in the digital realm, I also do things that archivists have always done: processing paper records and performing reference services in person, on the phone, and over email.

Ongoing projects: Because our Digital Curation Program is rather nascent, I spend a majority of my time drafting policy documents for the program as well as revising workflows for how we process born-digital records. Outside of that, I contribute to several Library-wide working groups and task forces. When I’m not doing one of those two things, you can probably find me working with a new digital preservation tool or strengthening my command of various operating systems.

Worked at Mudd since: I began at Mudd in November of 2013. Prior to Princeton, I served as University Library Associate at the Special Collections Library of the University of Michigan, a post I maintained for nearly two years while I completed my master’s degree in information science at the School of Information. Before Michigan, I had brief stints at the Maryland State Archives and Beinecke Rare Book & Manuscript Library.

Why I like my job/archives: Contrary to general perception, archivists are concerned equally with the future as they are with the past. Yes, we manage records that document past activities, but we do so only for future use by researchers. In this way, I see my job as a digital archivist as one that preserves the past in order to promise the future. That promise is harder to ensure when it comes to digital records, but it’s a challenge that I find to be terrifyingly exciting and incredibly meaningful. Also, I learn something new each and every day, which is one of the most fulfilling aspects of my work.

And though I put a lot of time and energy into curating bits, I joined the profession because I like people. I enjoy assisting them with their research questions and it gratifies me that I can contribute to the creation of new knowledge about the past. The roughest days I encounter are immediately turned around when a researchers says “I can’t thank you enough for your assistance” or “without you, I’m not sure I could have answered this question.” Those are my reminders that I chose the right profession.

Favorite item/collection: Recently I responded to a researcher who sought information about the first Japanese student to graduate from Princeton. I spent some time digging around our Historical Subject Files and our Alumni Undergraduate Records collection to learn that in 1876, Hikoichi Orita was the University’s first Japanese student to graduate.

In addition to his alumni files, we have a copy of his student diary, which I told myself I would read slowly over my career. It’s in English, in case you’re interested in viewing it, too. This is a classic example where a researcher informs the interests of the archivist, instead of vice versa.

Like this:

Late last year, the Mudd Manuscript Library was granted an award by the National Historical Publications and Records Commission (NHPRC) to digitize our most-used Public Policy collections, serve them online, and create a report for the larger archival community about cost-efficient digitization practices. Excerpts from our six-month progress report is below.

Work so far

Project planning

From the time we were awarded the grant to the present, we have produced an overall project plan and timeline, a vendor RFQ and plan of work, in-house quality control procedures for vendor-supplied images, a workplan for in-house scanning, and hardware-specific instructions for in-house scanning. All activities are either on schedule or ahead of schedule. Vendor-supplied digitization is currently eight months ahead of schedule.

Finding a vendor

After distributing an RFQ and collecting bids, we decided on The Crowley Company as our vendor, based on both price and our confidence that they would be able to manage the materials and the work carefully and efficiently.

Managing vendor-supplied digitization

Before materials can go out to the vendor, we first create a manifest of everything we want to send by transforming the EAD-encoded finding aid into an easily-read Excel worksheet. Since we want each folder of material to have a cover sheet that explains the collection name, box number, folder number, URL, and copyright policy, we used collection manifests to make target sheets with this information. A total of 6,943 target sheets were created, printed, and inserted into the beginnings of folders by student workers before materials were sent out to the vendor.

Once materials have been imaged by the vendor, students sample ten percent of the collection to check for completeness and readability. So far, everything has passed quality control with flying colors.

Each month, Crowley sends us a report of how many images have been created that month, how many images have been created cumulatively, and average scanning rate per hour. This information is below:

Boxes Scanned

Pages Scanned

2013 March

15

17119

2013 April

32

45761

2013 May

50

49499

2013 June

65

97896

Totals

162

210275

In-house imaging

Imaging of the John Foster Dulles papers started in June. So far, we have completed a pilot of scanning with the sheet-feed of the photocopier, and pilots of microfilm scanning and scanning with a Zeutschel face-up scanner are underway.

Project goals and deliverables

Twelve series or subseries from six collections digitized

To date, five series or subseries have been completely digitized, and three others are in the process of being digitized.

Approximately 416,000 images created and posted online

As of July 1, 2013, 210,275 images have been scanned by the vendor. Of this total, 39,834 images have been posted online. Our vendor is several months ahead of schedule for this project, and in-house scanning is on track. Since beginning in-house scanning in June, 1,838 pages have been scanned by student workers. In the next months, we will calculate the per-page costs for scanning on a Zeutschel face-up scanner and with a microfilm scanner. From there, we plan to image fifty feet of materials with the sheet feeder of the photocopier, 10.3 feet with the Zeutschel face-up scanner, and 33.4 feet with the microfilm scanner.

Six EAD finding aids updated to include links for 17,508 components (folders)

Two finding aids (Council on Foreign Relations Records and Adlai Stevenson Papers) have been updated to include links to digitized content. Another (George F. Kennan Papers) is ready to be updated. This process is managed semi-automatically with a series of shell scripts. After quality control hard drives of images are sent to Princeton’s digital studios. Staff there verify and copy digital assets to permanent storage. After this, PDF and JPEG2000 files are derived from the master TIFFs, and the relationship between these objects is described in an automatically generated METS file. The digital archival object (<dao>) tag is added to the EAD-encoded finding aid for each component.

Digital imaging cost of less than 80 cents per page achieved

The plan of work with our vendor calls for scanning costs well below the 80 cents per page. Our first (and likely least expensive) of three in-house scanning pilots estimates the costs of scanning with the sheet feeder of a copier to be two cents per page. We will have numbers for microfilm scanning and scanning with a face-up scanner at the time of our next report.

Metrics for digital imaging of 20th century archival collections for

In-house microfilm conversion

Sheet feeding through a networked photocopier

Vendor supplied images

The information that we have collected thus far is below. Our vendor metrics are based on the quote and plan of work with The Crowley Company. Sheet feed metrics are collected by having a student worker fill out a minimal, time-stamped form at the beginning and end of each scan, and then analyzing that information. These numbers are preliminary. Sheet-fed scans have not yet been checked for quality control — re-scans may increase the total time per page and dollars per page for this method.

Vendor

Sheet Feed

Microfilm

Zeutschel

Total pages:

270,600*

1838

Total feet:

530.95

1.68

Total time:

2:25:14

Total time (decimal):

2.42

Time per page:

0:00:04

Pages per hour:

270.75

759.33

Hours per foot:

1:26:26

Feet per hour

0.69

Cost per page:

TBD

$0.02

*This number is an estimate, based on an assumed 1200 pages per box. Our reports from Crowley show anywhere from 1050-1750 pages in a box.

Note: in addition to these three methods, we plan to add a fourth – scanning with a face-up scanner (in our case, a Zeutschel scanner table).

Policies and documentation for large-scale digitization initiative created and shared with archival community

It’s February, and we’re now in the second month of our NHPRC-funded digitization project. In twenty-three more months, we’ll have completed scanning and uploading 400,000 pages of our most-viewed material to our finding aids, and anyone with an internet connection will be able to view it.

This is just the most recent effort to introduce digitization as a normal part of our practice at Mudd. As I said in my previous post, we know that it’s well and good that we have collections that document the history of US diplomacy, economics, journalism and civil rights in the twentieth and twenty-first centuries. But for the majority of potential users, who may never be able to come to Princeton, NJ, this is irrelevant. However interested they may be, they may never be able to afford to visit us. And there’s a whole other subset of potential users — let’s call them working people — who can’t come between the hours of 9:00 and 4:45, Monday through Friday. Are we really providing fair and equitable access under these conditions? Since we have the resources to digitize, it’s imperative that we develop the infrastructure and political will to do so.

We know that it’s time to get serious — and smart — about scanning.

The ball has been rolling in this direction for some time. We have three “streams” of making digital content available, and with our new finding aids site, we have an intuitive way of linking descriptions of our materials to the materials themselves.

Images of the collection in the context of the finding aid

Our first is patron-driven digitization.

This is our Zeutschel scanner. It does amazing work, is easy on our materials, and usually requires very little quality control.

Archives have been providing photoduplication services since the advent of the photocopier. At Mudd, we have dedicated staff who have been doing this work for decades. Recently, we’ve just slightly tweaked our processes to create scans instead of paper copies and to (in many cases) re-use the scans that we make so that they’re available to all patrons, not just the one requesting the scan.

A patron (maybe you!) finds something in our finding aids that he thinks he may be interested in, and asks for a copy.

If he’s in our reading room, he flags the pages of material he wants. If he’s remote, he identifies the folders or volumes to be scanned. The archivist tells him how much the scan will cost, and he pre-pays.

Now, the scanning. This either happens on our photocopier (the technician can press “scan” instead of “photocopy” to create a digital file instead of a paper one) or on our Zeutschel scanner. And while we feel happy and lucky to have the Zeutschel, we don’t strictly need it to fulfill our mission to digitize.

The scan is named in a way that associates it with the description of the material in the finding aid, and is then linked up and served online. We currently send the patron an email of this scan, but in the future we may just send them a link to the uploaded content.

Our second stream is targeted digitization based on users’ viewing patterns

We try to keep lots of good information about what our users find interesting. We use a service called google analytics to learn about what users are browsing online, and we keep statistics about which physical materials patrons see in the reading room.

From these sources, we create a list of most-viewed materials, and set up a system for our students to scan them in their downtime when they’re working at the front desk.

We do this because we want to make sure that we’re putting the effort into digitizing resources that patrons actually want to see — there are more than 35,000 linear feet of materials at the Mudd Library. We probably won’t ever be able to digitize absolutely everything, and it wouldn’t make sense to start from “A” and go to “Z”. So, we pay attention to trends and try to anticipate what researchers might find useful.

Our final stream — and the one for which we currently have to rely on external support — is large-scale vendor-supplied digitization.

Our current cold war project is a great example of this. We’ve put together a project plan, chosen materials, called for quotes and chosen a vendor. We recently shipped our first collection to be digitized, and I’ll be posting information to the blog as we move forward.

Another good example of an externally-supported digitization activity is the scanning of microfilm from our American Civil Liberties Union Records. Our earliest records were microfilmed decades ago and recently, Professor Sam Walker supported the digitization of some of this microfilm so that they could be made available online.

No single stream — externally-supported projects, left-to-right scanning, or patron-driven digitization — would be enough to support our goal of maximizing the content available online. We hope that the three, each pursued aggressively, will help us realize our mission of providing equitable access to our materials. And we think that focusing on this cold war project will help us reflect on and improve all of our digitization activities.

The historian John Lewis Gaddis, author of a 2012 Pulitzer Prize-winning biography of George Kennan, has stated that the Mudd Library holds “the most significant set of papers for the study of modern American history outside of federal hands.”

We want to change this to make it easier for everyone to access our materials. Thanks to the generosity of the National Historical Publications and Records Commission (NHPRC), a taxpayer-funded organization that supports efforts to promote documentary sources, over 400,000 pages of records from six of our most-used collections will be digitized and put online for anyone with an internet connection to access. We hope that our records will become newly accessible and indispensible to international researchers, high school and college students, and anyone else with an interest in the history of the Cold War. As Gaddis wrote in a letter of support for our grant, this kind of access “has the potential, quite literally, to globalize the possibility of doing archival research. That’s no guarantee that this will produce a greater number of great books than in the past. What it will ensure, however, is a quantum leap in the opportunities students and their teachers will have to bring the excitement of working with original documents into all classrooms.”

Collections include:

John Foster Dulles (1888-1959), the fifty-third Secretary of State of the United States for President Dwight D. Eisenhower, had a long and distinguished public career with significant impact upon the formulation of United States foreign policies. He was especially involved with efforts to establish world peace after World War I, the role of the United States in world governance, and Cold War relations between the United States and the Soviet Union. The Dulles papers document his entire public career and his influence on the formation of United States foreign policy, especially for the period when he was Secretary of State.

George F. Kennan (1904-2005) was a diplomat and a historian, noted especially for his influence on United States policy towards the Soviet Union during the Cold War and for his scholarly expertise in the areas of Russian history and foreign policy. Kennan’s papers document his career as a scholar at the Institute for Advanced Study and his time in the Foreign Service.

The Council on Foreign Relations is a nonprofit, nonpartisan research and national membership organization dedicated to improving understanding of international affairs by promoting a range of ideas and opinions on United States foreign policy. The Council has had a significant impact in the development of twentieth century United States foreign policy. The Records of the Council on Foreign Relations document the history of the organization from its founding in 1921 through the present.

The Allen W. Dulles Papers contains correspondence, speeches, writings, and photographs documenting the life of this lawyer, diplomat, businessman, and spy. One of the longest-serving directors of the Central Intelligence Agency (1953-1961), he also served in a key intelligence post in Bern, Switzerland during World War II, as well as on the Warren Commission.

The Adlai E. Stevenson Papers document the public life of Adlai Stevenson (1900-1965), governor of Illinois, Democratic presidential candidate, and United Nations ambassador. The collection contains correspondence, speeches, writings, campaign materials, subject files, United Nations materials, personal files, photographs, and audiovisual materials, illuminating Stevenson’s career in law, politics, and diplomacy, primarily from his first presidential campaign until his death in 1965.

James V. Forrestal (1892-1949) was a Wall Street businessman who played an important role in U.S. military operations during and immediately after World War II. From 1940 to 1949 Forrestal served as, in order, assistant to President Roosevelt, Under Secretary of the Navy, Secretary of the Navy, and the first Secretary of Defense.

The Princeton University Archives, working in conjunction with the Princeton University Library Digital Initiatives, has nearly completed a monumental project that will change the way researchers investigate University history. The student newspaper, The Daily Princetonian, has been digitized from its inception in 1876 through 2002. The site has been available in beta for almost two years, but all issues will be loaded as of June 30, 2012. At the suggestion of The Daily Princetonian alumni board who have been among the prime backers of this project, the site is named in honor of the newspaper’s long-serving production manager Larry Dupraz, and researchers are able to perform sophisticated keyword searches that can unlock the vast richness of the daily newspaper that documents so much of the University’s history. (For the years 2002- present, users may search online via the Daily Prince site.)

“I wrote my final paper for my Freshman Writing Seminar about how the presence of veterans on Princeton’s campus following World War II affected Princeton’s academic environment and social atmosphere,” said Jennifer Klingman ’13. “My research heavily relied on The Daily Princetonian archives, and I had to spend a lot of time and energy searching for relevant articles in Firestone’s microform versions of the newspaper. It was difficult to comb through the articles, and as a result my research was limited in scope. This spring, I wrote my history department junior paper on academic and social changes taking place at Princeton during the late 1940s and 1950s. The online Daily Princetonian archives proved to be invaluable. I was able to access the archives anywhere and at any time, and use the archives’ search function to find a number of extremely useful articles. My independent work has definitely benefited from the existence of the online archives.”

Freelance journalist W. Barksdale Maynard ’88 states “I am able to write about the social history of Princeton in an entirely new way and have restructured my research to take full advantage of this exciting new resource. For my Princeton Alumni Weeklyarticle on the early history of automobiles at Princeton, the Dupraz Digital Archives allowed me to identify every reference to cars as early as 1901, to pinpoint who owned them and what kinds. I would never have attempted this article without The Dupraz Digital Archives.”

Maynard’s PAW colleague, Gregg Lange ’70, regularly uses the site for his column, “Rally Round the Cannon,” which examines and appraises University history. “You can piece together the story of Princeton football or Woodrow Wilson in a dozen ways. But the unique accessibility of a daily publication allows more subtle topics to arise and recede, and for cross-generational tales to emerge. Be it Ella Fitzgerald singing at a Princeton dance at age 19, then receiving an honorary degree 54 years later; or student revolts against the clubs’ Bicker selection system in 1917 and 1940 presaging its loss of monopoly in 1968, the combination of detail and long view is indispensable in understanding the ethos of the institution over time, and essentially inaccessible without the DuPraz technology and precision. And existentially, if I never see another microfiche in my life I will die a happy man.”

Maynard added, “My regular column in PAW, “From Princeton’s Vault,” has benefited enormously. Recently I was able to identify the earliest references to Princetonians as “tigers,” which had been guesswork previously. It turns out we were wrong by a decade.

This has been an international project, with the newspapers sent from Princeton to Brechin Imaging in Canada, where TIFF images are generated using high end German cameras. The files are then sent via a hard drive to Cambodia, where Digital Divide Data analyzes the structure of each page and uses an optical character recognition (OCR) program to derive machine-readable text, which allows for keyword searching. The hard drive is then shipped to Austin, Texas, where the US office of New Zealand company DL Consulting loads the data into a content-management system called Veridian, which supports searching and browsing, online reading, article extraction and printing, and other features.

Within the library, many hands have worked for this project’s success. At Mudd Library, project archivists Dan Brennan and then Adriane Hanson have overseen the day-to-day work of the project, managing the shipment of the newspapers to Brechin, as well as supervising students with the quality control phase. University Archivist Dan Linke raised the funds from various University and alumni sources and coordinated the project.

Within the greater Library system, Cliff Wulfman, the Library’s Digital Initiatives Coordinator, took the lead in writing the Request for Proposals and then selecting and coordinating the work with DDD, as well as providing technical assistance, support and vision. The Library System Office’s Antonio Barrera designed the front end web page with Phil Menos providing server support, and Deputy University Librarian and Systems Librarian Marvin Bielawski allocated the funds to acquire the Veridian software.

The project employs the METS/ALTO markup standard, the same used by the Library of Congress’s Newspaper Digitization Project, which means that as software changes and improves, we will be able to sustain this resource for many years to come.