The
Essential Elements of a Geographic Information System: An Overview

As
we said in the first chapter, an information system is fundamentally an
end-to-end system, which deals with the flow of data and information from its
primary sources to the derived information and its ultimate uses. Geographic
information systems are designed to handle information regarding spatial
locations. In this chapter, we will introduce the essential functional
components of a GIS, and will discuss some key concepts in geography and
geographic data processing.

3.1GIS Functional Elements

There are five essential elements that a
GIS must contain (Figure 3.l; based on the discussion in Knapp, 1978): data
acquisition, preprocessing, data management, manipulation and analysis, and
product generation. For any given application of a geographic information
system, it is important to view these elements as a continuing process. We will
introduce each of the elements in this chapter, and will examine each in greater
detail later in this text. As a guiding principle, the analyst should develop
an end-to-end model of the task at hand. Even when the precise details of the
steps to be taken may depend on the results of intermediate calculations and
analyses, an explicit outline of the process, like a working hypothesis in a
scientific experiment, can be very valuable.

Data
acquisition is the process of identifying and gathering the data
required for your application. This typically involves a number of procedures.
One procedure might 'be to gather new data by preparing large-scale maps of
natural vegetation from field observations, or by contracting for aerial
photography. Other kinds of surveys may be required to determine, for example,
consumer satisfaction and preferences in different parts of a city to help
locate new business offices. Other procedures for data acquisition may include
locating and acquiring existing data, such as maps, aerial and ground
photography, surveys of many kinds, and documents, from archives and
repositories.

One must never underestimate the
costs (in time as well as money) of the data-acquisition phase. A GIS is of no
use to anyone until the relevant data have been identified and located.
Furthermore, the accuracy (of the decisions reached through spatial analysis is
limited by the accuracy and precision of the underlying datasets. We often know
too little about the underlying quality of many kinds of spatial data. At
times, however, we may be forced to use maps and other datasets whose
underlying quality is unknown. And without spending some effort ensuring that
various datasets are not only relevant but also reliable, we run the risk of
fooling ourselves.

Preprocessing involves
manipulating the data in several ways so that it may be entered it into the
GIS. Two of the principal tasks of preprocessing include data format conversion
and identifying the locations of objects in the original data in a systematic
way. Converting the format of the original data often involves extracting information
from maps, photographs, and printed records (such as demographic reports) and
then recording this information in a computer database. This process is a
time-consuming and costly efforts for many
organizations. This is particularly (and sometimes painfully) true when one
calculates the costs of converting large volumes of data based on papermaps and transparent
overlays, to an automated GIS based on computerizeddatasets. We will
discuss aspects of the process in section 6.l.

A second key task of the
preprocessing phase is to establish a consistentsystem for recording
and specifying the locations of objects in the datasets. When this task is
completed, it is possible to determine the characteristics of any specified
location in terms of the contents of any data layer in the system. During these
processes, it is very important to establish specific quality control criteria
for monitoring the operations during the preprocessing phase so that the
databases can be of maximum value to the user.

Data-management functions govern the
creation of, and accession, thedatabase itself. These
functions provide consistent methods for data entry, update, deletion, and
retrieval. Modern database management systems isolate the users from the
details of data storage, such as the particular data organization on a mass
storage medium. When the operations of data management are executed well, the
users usually do not notice. When they are done poorly, everyone notices: the
system is slow, cumbersome tease, and easy to disrupt. Under these latter
circumstances, the smallest human and machine errors create large problems for
both the users and the systemoperators.
Data-management concerns include issues of security. Procedures must be in
place to provide different users with different kinds of access to the system
and its database. For example, database update may be permitted only after a
control authority has verified that the change is both appropriate and correct.

Manipulation and analysis are
often the focus of attention for user ofthe system. Many users
believe, incorrectly, that this module is all this constitutes a geographic
information system. In this portion of the system are the analytic operators
that work with the database contents to derive new information. For example, we
might specify a region of interest and request that the average slope of the
area be calculated, based on the contours of elevation that have already been
stored in the GIS database. Since no single system can encompass the complete
range of analytic capabilities a user can imagine, we must have specific
facilities to be able to move data and information between systems. For
example, we may need to move data fromour GIS to an external system
where a particular numerical model is available, and then transport the derived
results back into the spatial database inside the GIS. This kind of modularity,
where other data processing and analysis systems can be linked to a GIS, is
very valuable in many circumstances, and permits the system to be easily
extended over time by pairing it with other analytic tools. When one speaks of geoprocessing, one is often focused on the
manipulation and analysis components of a GIS.

Product generation is the
phase where final outputs from the GIS are created. These output products might
include statistical reports (such as a table listing the average population
densities for each county in California, or a report indicating landowners who
are delinquent in their property taxes), maps (for example, a presentation of
the property boundaries of plots within a township that are owned by public
agencies, or a map of a subdivision indicating where construction workers must
be careful when digging due to the presence of underground pipes and cables),
and graphics of various kinds (such as a set of bar charts that compare the
acreage of different crop types in an area). Some of these products are soft
copy images: these are transient images on television-like computer
displays. Others, which are durable since they are printed on paper and film,
are called hard copy. Increasingly, output products include
computer-compatible materials: tapes and disks in standard formats for storage
in an archive or for transmission to another system. The capability of taking
the output of an analytic process, and placing it back into the geographic
database for future analysis, is extremely important.

These essential components of a
geographic information system are the same as those of any other information
system. Let us compare this sequence of functional elements to a more
conventional information system problem. Consider the steps that are taken in
an automated system to manage employee records for a business. Information
about the individuals must be gathered together, perhaps via a questionnaire
and interview when the individual is hired. This is clearly the data
acquisition phase. Then, because some of the information is inevitably
expressed by different people in different ways (for example, some people will
list their education as "through grade 12", while others will say
"through high school"), the data must be put into a consistent
vocabulary and format. Only after this preprocessing phase can the data be entered into the computer in a consistent form.
Validation of the data entered into the system is a fundamental part of the
preprocessing phase, to insure the accuracy of the resulting database.

Once the data have been converted into a
consistent form and put in the computer database, we have accomplished a large
fraction of the end-to-end task and often expended a large fraction of the
end-to-end costs. Data management functions permit as to update the
information when necessary (for example, when an employee completes an advanced
degree), and to retrieve only the relevant information when required (as in a
summary report of salaries for a particular division of the company). Various
kinds of analytical operations can be run--perhaps using employee
addresses to find out which employees live close to one another in an effort to
encourage car pooling. Finally, we need to be able to develop statistical
reports, graphics of many kinds, and other output products, such as
documentation for management reviews of salary levels. These steps exactly
parallel the five GIS components we will discuss in detail.

3.2Data in a GIS

It is important to understand the
different kinds of variables that can be stored in any information
system. Nominal variables are those which are described by name, with no
specific order. Categories of land use (such as parks, wilderness areas,
residential districts, and central business districts) and trees (such as Eucalyptus
calophylla, Pinuscoulteri, and Quercusagrifolia) are different kinds of nominal variables.
These are common in many kinds of thematic maps. Ordinal variables are
lists of discrete classes, but with an inherent order. Classes of streams
(first order, second order, and so forth; referring to the number of
tributaries which contribute to the stream) or levels of education (primary,
secondary, college, post-graduate) are ordinal variables since the discrete
classes have a natural sequence. Interval variables have a natural
sequence, but in addition, the distances between the values have
meaning. Temperature measured in degrees Celsius is an interval variable, since
the distance between 10C and 20C is the same as the distance between 20C and 30C. Finally, ratio variables have the same
characteristic as interval variables, but in addition, they have a natural
zero or starting point. Since degrees Celsius is a measurement with an
arbitrary zero point, the freezing point of pure water, it fails the latter
test. Degrees Kelvin, since it is based on an absolute standard, is ratio
variable. Per capita income, the fraction of the weight of a soil sample that
passes through a specified sieve, and rainfall per
month are common ratio variables.

In addition to these 4 kinds of
data, there are two different classes of data found in most geographic
information systems. Consider a simple object in space: a
water well. From the point of view of a GIS, the primitive but essential
piece of information to record about this water well is its location on the
Earth -- a data value pair such as longitude and latitude, thus storing the
simplest kind of spatial data. However, there may be a wide range of additional
information which is required for many applications. This might include the
depth of the well, the volume of water produced over a given period of time,
dates of pump tests, and temporal sequences of measurements of dissolved and
particulate matter in the water from the well. This second set of non-spatial
or attribute data, which is logically connected to the spatial data,
must not be forgotten. In many geographic information systems, there are tools
to both store and manipulate the non-spatial data along with the spatial data.
In some applications, as we will see, the volume of non-spatial data may
actually be larger than the volume of the spatial data, and the logical
connections between the spatial and non-spatial information may be very
important.

A recent issue of The
American Cartographer (January, 1988), the journal of the American Congress
on Surveying and Mapping, proposes a standard for digital cartographic data.
This standard is based on entities in the real world, and a mechanism to
represent these entities in terms of objects in a database. Within this proposal
is a set of definitions of spatial objects, which we now paraphrase to explain
more of the vocabulary of geographic information systems. This brief discussion
also expands on the comments in Chapter l about different kinds of spatial
objects. One may divide the different kinds of spatial objects into three
classes, based on spatial dimensions of the objects.

A
0-dimensional object is a point that specifies a geometric location. From a
mathematician's perspective, a point is a primitive location with no areal extent. Points are used in a number of ways in both
computer graphic and digital cartographic data, as well as in a geographic
information system. They are commonly used to indicate features themselves,
such as the exact center of the water well mentioned above, the end of a
street, or the corner of a lot in a subdivision. Points are also used as a
reserved position for a label (such as a place name) or a symbol (such as an
airport or benchmark) on a map, or to carry information for the surrounding region
(such as who owns the region, or the color to be used when the region is
displayed). Points are also used to define more complex spatial objects, such
as lines and areas.

The simplest 1-dimensional object
is a straight line between two points. More complex forms of lines include
connected sets of straight lines (determined by the sequence of points at which
the path changes direction), curves which are based on mathematical functions,
and lines whose direction is specified. Particular sets of mathematical
functions are used to define curves in some disciplines, as in the functional
definition of the curve of a street used by a civil engineer. One advantage of
a directed line segment is that we have a way to distinguish which end is the
beginning of the line, and which end is the end. This may be particularly
valuable in circumstances as diverse as the analysis of flow in pipes (perhaps
indicating source and destination for flow in a potable water supply system) or
models of population flow between countries. When the line segments carry
information about direction, we are also able to distinguish the regions on the
left and right sides of the line. As we shall see later, this can be very
useful in a number of applications.

Finally, 2-dimensional objects are
areas, which also come in many forms. In a particular application, we may refer
to a bounded area, or focus on just the boundary, or just the region within the
boundary. The description of the area itself is normally based on the geometry
of the bounding line segments. The area may be either homogeneous or divided
internally, as discussed in Chapter l. A distinction is often made between sets
of two-dimensional bounded regions, and true three-dimensional surfaces. In
some applications, an analysis based on a two-dimensional planimetric
representation of the Earth may be completely sufficient. We focus on these
kinds of applications in this introductory text.

The details of the connections
between spatial objects, such as the information about which areas bound a line
segment, is called topology. One of the
distinguishing features of some geographic information system databases is that
they have explicit mechanisms to store topology, as we shall see in Chapter 4.

Cowen (1987) discusses a geographic
information system from several different points of view. The databaseapproach
stresses the ability of the underlying data structures to contain complex
geographical data. The descriptions of spatial objects in the previous several
paragraphs take this view. In Chapter 4 we examine a number of common
alternatives to storing spatial data. The process-oriented approach
focuses on the sequence of system elements used by an analyst when running an
application -- the five components we discussed at the beginning of this
chapter follow this view. Chapters 5 through 9 in this text represent such an
approach. An application- orientedapproach defines a GIS
based on the kinds of information manipulated by the system and the utility of
the derived information produced by the system. Chapter 12 presents a number of
uses of these spatial data processing systems, and clearly emphasizes this
view. A natural resources inventory system is an easily understood example of
this approach. Finally, a toolbox approach emphasizes the software
components and algorithms that should be contained in a GIS. We develop a
number of details from this point of view of a GIS in Chapters 6 and 8. Each of
these different points of view of a geographic information system is useful; we
recommend that the reader consider the differences between them during the
following discussions.