EAST-COAST
DATABASE

Database Formats

Our objective was to gather all of the available data on grain size for
the bottom sediments produced by the
Woods Hole Coastal and Marine Science Center of
the
U.S. Geological Survey into a scientifically-edited database
that will allow scientists, policy makers, and others to manipulate,
query, and display the original data themselves in order to address
their own specific applications. Requirements for the database formats
are that they be comprehensive and simple for both entering and
extracting data.

The basic structure of the database is a matrix where records are rows
representing individual samples and the columns contain information on
sample identification, navigation, classifications, analyzed parameters,
and comments. This is a "flat file" format, which means that it is not
"normalized". While this format is considered inefficient from the point
of view of database management, it is the simplest way of presenting the
basic data. This structure was chosen to avoid ambiguity, and to make
the process of locating fields, entering data, and validating it as
simple yet comprehensive as possible. Since we know neither the software
capabilities of the user nor the probable uses that may be made of the
data, we have made no attempt to split the files to reduce blank fields
or remove redundancies. The same data may be presented in more than one
form, for example phi class frequencies and cumulative frequencies. Even though
each form can be derived from the other, presenting both eliminates the
need for the user to program formulas to calculate one from the other.
Although this may violate the principal of having a single entry for any
given data item, it greatly simplifies the use of the file. If the user
wishes to make the data base more efficient through "normalization", we
feel that it is better that this be done by the user to fit both the
applications available to the user and the database structural logic
that is familiar to the user. The price paid for the "flat file"
approach is additional storage space, rather wide records, and the
possibility that corrections made here at the source may fail to be
carried through to all forms of the data affected. We have made every
effort to see that this last possibility did not happen.

The database presented here contains 58 fields (please refer to the
Data Dictionary). The specific fields and parameters have been chosen based on the data produced by the Sedimentation Laboratory at the Woods Hole Coastal and Marine Science Center, and the format of information typically found in the literature. Because the data have come from numerous projects, there are differing amounts and types of information. Most of the samples or sets of samples do not have data in all of the given fields; however, additional fields, qualifiers, and data can be added in virtually unlimited fashion to accommodate specific needs.

The database itself is provided in three formats: comma-delimited ASCII text (.csv), Microsoft Excel 2010 (.xls), and Esri shapefile (.shp). The comma-delimited file contains data as well as headings for the tables of data in uncompressed ASCII format. This comma-delimited file is supplied for users who do not have a Windows compatible computer, or for users who wish to import the data into applications that can accept ASCII character information. The formatted files will open in the appropriate software if the user has the applications installed and their web browser properly configured. The database can be accessed through the Data Catalog section of this report.