Numeric Listings on Archived Microfilm Converted to ASCII Files

Volume 16, Number 2, June 2000

By Joseph King, George Fleming, and Richard Chu

During its early years NSSDC acquired many non-computer-readable (ncr) data sets (e.g., on
microfilm and microfiche) in addition to the many data sets acquired on digital magnetic tape.
Most of these ncr data sets are planetary or other images, spectra, line plots, etc., but a
significant minority are images of pages of numeric listings, for example, computer printouts.
Several of these data sets of numeric listings are of potential scientific value today.

Scanning and optical character recognition technologies have advanced in recent years to the
point that it is reasonable to assess and possibly implement the conversion of data sets from
numeric listings on microfilm frames to computer-accessible data files.

NSSDC has moved into this area with a previously microfilm-only IMP 8 1970s set of Los Alamos
National Laboratory (LANL) magnetotail data. This data set consisted of 54 reels of 16-mm film
containing both plots and listings of ion and electron densities, flow speeds and directions,
fluxes, average energies, pressure anisotropies and directions, all at about 30-sec resoluton.

The 54 reels and their duplicates were of variable quality (character sharpness, etc.), so three
high quality reels were selected and sent to a local vendor for the scanning of all frames and for
the optical-character-reading of the 2,589 frames of numeric listings. After about two weeks the
film and a CD were provided to NSSDC. The three reels covered the periods 12/13/73- 02/02/74,
02/27/79-04/10/79, and 03/22/80-04/20/80. Data for 45 days in these intervals were digitized.

Display of the new ASCII files revealed that most characters on the film were correctly read and
converted. However, a number of errors were readily recognized at NSSDC and were fixed. The
incidence of errors not readily recognized appears to be very small (< 0.1%) but difficult to
quantify. Fortunately for this data set, certain columns are redundant and can serve as checks;
for example, particle density, average energy/particle, and energy density must be consistent as
must particle density, flow speed, and flux.

The ASCII files returned by the vendor mapped to the input microfilm frames on a one-to-one basis.
Each typically had about 13 minutes of data plus blank lines and column headers reflecting the
original computer printout. NSSDC has created a user-effective set of one-day files without blank
lines or column headings. In the process NSSDC also moved all the statistical uncertainties in
parameters to the ends of records from their locations on the film frames in parentheses
immediately following each relevant parameter value. These new files are FTP-accessible from
NSSDC at ftp://nssdc.gsfc.nasa.gov/spacecraft_data/imp/.

This Los Alamos magnetotail electron and proton data set taken from the ~ 35 Re IMP 8 spacecraft
is a unique and valuable data set. It complements the IMP 8 Low Energy Proton and Electron
Differential Energy Analyzer (LEPEDEA) distribution function and moments data set whose
FTP-accessibility was announced in the last NSSDC News. There are 11 days in 1979 and 1980
when LEPEDEA data and presently digitized LANL data are concurrently available. (LEPEDEA data
have been subsequently made accessible through FTP Helper; interest in the LANL data could lead
to its receiving the same treatment.)

Users interested in having NSSDC attempt the conversion of the LANL microfilm data for other
time intervals or the conversion of any other NSSDC-held listings-on-microfilm data sets (see
the NSSDC Master Catalog) should contact NSSDC.