Exploratory Data Analysis

Mapping Data

Mapping Data: Spatial Analysis and GIS

An important step of a causal analysis is to define and map the spatial extent, or geographical area, of your study area. A map of the study area can help identify other sources of data, facilitate exploratory data analysis, and highlight samples in which spatial autocorrelation may be an issue. Being able to combine data from many different sources is both a strength and a weakness of using a geographical information system (GIS) to produce a map (Waller and Gotway 2004). Brewer (2006) provides some basic principles for mapping data in GIS. Some common questions to ask when mapping your study area include:

1. In what watershed does the study occur?

The Watershed Boundary Dataset (WBD) provided by U.S. Department of Agriculture’s National Resources Conservation Service on the Geospatial Data Gateway contains the hierarchy and areas for the six nested levels of hydrologic units (region, subregion, basin, subbasin, watershed, and subwatershed). The numbering scheme for the hydrologic units increases by two digits per level, beginning with a two digit hydrologic unit for region and ending with a twelve digit hydrologic unit for subwatershed. The WBD also describes different types of hydrological modification, such as stormwater ditches, levees, navigation canals, at the watershed and subwatershed scales, and such modifications may be candidate causes to consider in the analysis.

2. What rivers and streams flow through the study area?

The NHDPlus is a geospatial dataset providing the locations for streams and rivers, and incorporating elements from the National Hydrography Dataset (NHD), the National Elevation Dataset (NED), the National Land Cover Dataset (NLCD), and the WBD. The U.S. Geological Survey web site StreamStats provides stream-flow statistics and drainage-basin characteristics.

3. In what ecoregion does the study area occur?

An ecoregion is an area with environmental resources that are
similar such as vegetation, climate, soils, and geological
substrate. Regions with similar topography, climate, and geology are
expected to have water bodies that are similar in hydrology and water
chemistry. Knowing the ecoregion may allow you to compare the
measurements in your study area to measurements from other water
bodies in a relevant region or to select the data to be included in
exposure-response modeling. Descriptions of the ecoregions and
data on ecoregions can be downloaded at the National Atlas web site.

6. What software are available?

A variety of Geographical Information Systems (GIS) software can be used, and some of these include ArcMap
(ESRI - The GIS Software Leader ),
R (CRAN Task View: Analysis of Spatial Data ),
MapWindow (MapWindow Open Source GIS ),
and the Geographic Resources Analysis Support System (GRASS GIS ).
Analysts handling spatial data will need to have a working knowledge of GIS software so that they can perform basic GIS operations such as a
spatial query, layering of several different spatial datasets, and buffering. Waller and Gotway (2004) cover some of the fundamentals of
using GIS.

Figure 1. Little Scioto River Map

Norton et al. (2002) and Cormier et al. (2002) performed a causal analysis on the Little Sicoto River, near Marion, Ohio, and we have updated that map (Figure 1) using some of the GIS datasets described above. Besides data on location, these shapefiles also contain other information that may be helpful for causal analysis. For example, the NHD dataset contains the reach code, or reach address. The reach code consists of the 8-digit hydrologic unit number followed by a 6-digit arbitrarily assigned sequence of numbers. This reach code is referenced in data provided by other U.S. EPA programs, such as Impaired Waters and Fish Consumption Advisories (the Reach Address Database). One can also obtain the location and information about facilities and sites in relevant subwatersheds that are subject to environmental regulation from the U.S. EPA Geospatial Data Access Project. For the Little Scioto, the City of Marion’s wastewater treatment plant was identified as a relevant point source using this dataset. Finally, the locations of recent samples collected by the Ohio EPA were added to the map.