How Materials are Made Available in the Virtual Vietnam Archive

The Virtual Vietnam Archive project began in 2001 as an effort to digitize the complete holdings of the Vietnam Archive. The staff of the Center and Archive have extensive experience with archives, digitization, and digital projects. Using this experience, we have established a set of guidelines and standards that best fit our project and our goals. We are constantly reviewing our procedures, as well as industry standards, and adjust our practices as necessary.

Much of the digitization and metadata creation is done by students of Texas Tech University. These students come from a wide variety of backgrounds, but all have an interest in history, and many have a connection to the Vietnam War. Each student undergoes an extensive training program, and their work is monitored as part of the quality control process. In most cases, the same student will be responsible for a digitizing the all of the documents from a single collection. We also employee students who will focus exclusively on photographs or slides, or other special projects. Many of the students will continue working until they graduate from the university. Since the beginning of the Virtual Archive project, the Vietnam Center and Archive has employed over 100 students, a mixture of both undergrad and graduate.

The Virtual Vietnam Archive is an continually evolving project. New materials are added daily, and we are always striving to make our resources more available and easier to access for researchers. If you have any comments or suggestions about the project, please feel free to use our online contact form, or call at 806-742-9010.

Digitization Workflow

The Vietnam Center and Archive has an extensive collection of hardware available for digitization, and all work is conducted in-house by either students or full time faculty or staff. This section describes the digitization workflow for our most common types of materials. The metadata we collect for each item is described in the next section.

The Database

The Virtual Vietnam Archive is powered by the Cuadra Star database system. This powerful database program is very flexible and customizable, and has allowed us to customize the Virtual Archive to fit the needs of our collections and our researchers, and to provide a variety of ways to access the digital materials.
All items in a collection, copyrighted and non-copyrighted, are digitized, but only non-copyrighted items are available online to researchers. Personal information, such as addresses, phone numbers, and Social Security Numbers, are removed from all digital copies.

Document/Manuscript Digitization

The majority of the materials in the Virtual Vietnam Archive are printed documents. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers. Optical Character Recognition (OCR) is run for each document. This OCR text is added to the metadata record, and is also embedded into the PDF file to allow finding of words or phrases within the file itself.

Equipment

Dell desktop computers w/ Microsoft Windows OS

Adobe Acrobat 9

Fujitsu fi-4220c scanners with both flatbed and automatic document feeders for items up to 8.5"x17"

Epson Expression 10000XL for items up to 11"x17"

Specifications

Master Copy - 300 dpi PDF

Access Copy - compressed PDF

Still Images (photographs, slides, and negatives)

The Virtual Vietnam Archive contains over 100,000 still images. These materials are digitized, and metadata records are created, by our student staff. Digital files and database records are quality controlled by a full time staff member before they are made available to researchers.

Equipment

Dell desktop computers w/ Microsoft Windows OS

Adobe Photoshop Expressions

Epson Perfection V700 for photographs and negatives larger than 35mm

Epson Expression 10000XL for large images, up to 11x17

Nixon Supercool Scan 5000 for slides and 35mm or smaller negatives

Specifications

Master Copy - 300 dpi TIFF

Access Copy - 72 dpi JPG

Large format items (documents or images)

The Vietnam Center and Archive recently purchase a CopiBook ONYX large format scanner, capable of scanning 17"x24" at 400dpi. The scanner also features a self balancing book cradle and can output a variety of formats.

Audio and Moving Images

The holdings of the Vietnam Archive contain a wide variety of audio/visual formats. These materials are digitized by a full time staff member, who also creates the metadata record, including an abstract of the item.

Oral Histories

Oral History interviews are currently recorded on flash media, although early interviews were recorded on cassette or mini-disc. Interview are usually conducted by phone by Vietnam Archive faculty, who also create the metadata records. Student employees transcribe each interview, which is then reviewed by the interviewee. Another student conducts a final round of edits, and then the faculty member who conducted the interview reviews the transcript a last time before it is made available to the public.
Over 800 interviews are available online, most including both the audio and a full transcript in PDF format.

Equipment

Dell Desktop w/ Microsoft Windows OS

Marantz recorders

Adobe Audition

Specifications

Master Copy, Audio - WAV

Access Copy, Audio - MP3

Master Copy, Transcript - MS Word

Access Copy, Transcript - PDF

Artifacts

The Vietnam Archive houses a collection of over 4000 artifacts. Artifacts are digitized, and metadata records are create by, student staff members.

Equipment

Nikon D80 Digital Camera

Fujitsu fi-4220c flatbed scanner

Adobe Photoshop Elements

Specifications

300 DPI JPG

Microfilm

The Vietnam Archive has digitized over 5 million pages of microfilm, but due to the processing time involved, only a small portion have been made available online. Metadata record creation is performed by student staff members.

Maps, Posters, and Other Oversized Items

Over 1000 maps and other large items have been digitized and made available online. Only non-fragile items are digitized. Digitization and metadata record created is performed by full time staff members.

Equipment

HP DesignJet 815mfp with Windows OS

Adobe Photoshop

Adobe Acrobat

Specifications

Master Copies - 300 DPI Tiff or PDF

Access Copy - PDF and JPG

Servers and Storage

The Vietnam Center and Archive maintains a number of servers, storage devices, and backup systems. Three primary servers are in use - one for website and file access, and two running redundant copies of the database. Record creation is conducted on one server, and new records are transferred to the public access server nightly. Redundancy of the servers allows for seamless transition of users to alternate servers incase of unavailability or failure of a server, ensuring near continuous access to the Virtual Vietnam Archive. 60TB of near-line storage is also available through a Server Area Network (SAN). All materials are backed up on magnetic tape using Dell Robotic Backup Libraries. More about our backup system can be found in the Digital Preservation section below.

Equipment

Dell PowerEdge Servers

Dell EMC CX300 and Dell EMC CX4-120 Storage Arrays

PowerVault 136T and ML6020 tape libraries

Metadata

The Vietnam Center and Archive includes an extensive amount of information and each item in the database records for that item. To develop our metadata list, we started with the Dublin Core Metadata Element Set.
We then added our own metadata fields customized to the types of materials and the subject matter covered. The following list is the primary metadata fields we collection. Note that not all fields are used for every media type. Additional metadata about some items and material types may be collected and stored in databases that are not accessible to the public. Additionally, the files themselves for some items may include embedded metadata.

The Virtual Vietnam Archive index currently contains over 20 million searchable terms.

If item is not going to be available online to researchers, reason why

If personal information has been removed from digital copy

Fields specific to Maps:

Country

Series

Scale

Edition

Contour Interval

Geographic Features

Latitude/Longitude

Military Grid Zones

Fields specific to Finding Aids:

Linear Feet of Collection

Scope and Content

Biographical Note or Administrative History

Access Level

Collection Inventory

Accession Numbers

Digitized Materials?

EAD Version Available?

Fields specific to Oral Histories:

Interviewer

Date transcription completed

Transcription software

Military information about the donor or interviewee, such as military branch, rank, unit, awards, etc)

Digital File Hashes

Record creator

Record updator

History of record updates

Digital Preservation

The Vietnam Center and Archive has devoted extensive resources to developing a comprehensive digital preservation and disaster recovery plan, consisting of numerous elements.

Disaster Recovery

The first stage of disaster recovery is redundant backup servers. Copies of all access versions of digital files, as well as of the database itself, are maintained on two or more servers. In the event of a hardware failure, the redundant server will take over the research access load.

The next stage of protection is backups on magnetic tape. Nightly backups are run for all new and changed files, along with weekly backups of all data. One complete set of backups is stored offsite at all times.

Many digital files are also burned to gold-based CDs or DVDs and stored in our climate controlled stacks.

Digital Preservation

Although we are confident that the file formats we have chosen to use for master copies will remain available standards for many years to come, the Vietnam Center and Archive is committed to migrating to newer formats as necessary.

Future Goals

The Vietnam Center and Archive is currently exploring the possibility of utilizing cloud storage for both digital preservation and disaster recovery. Ideally, all digital files on our servers, including website files, access and master copies of digital materials, database files (including the installation programs for the database), and disk images of all servers would be stored in a data storage location outside of Texas, ensuring that if something catastrophic occurred to our physical location, the Virtual Archive could continue to exist on the internet.

Physical Materials

All physical materials remain open and accessible to researchers in our reading room. The primary focus of our digitization effort is to provide access to materials, not as a preservation method. The digital copies will, however, reduce the need for handling or accessing the physical originals, helping extend their life span.