Introduction to Multivariate Data Analysis

Introduction to Multivariate Data
Analysis

We have devoted much attention to the essential characteristics
of a one-variable, or univariate, data set. We have considered
concepts and skills used to describe the characteristics of
location, spread, and shape. We have learned how to generate
measures of central tendency, measures of variation, and
descriptions of the shape of a distribution. Perhaps most
importantly, our emphasis has been on exploring the data and
generating visual and numeric representations to help determine
the very nature of a data set.

We now focus our attention on two-variable relationships, or
bivariate data sets. Bivariate data sets have as their data
values ordered pairs of data. Here is a bivariate data set. It
shows the total points and total personal fouls for members of two
professional basketball teams.

Does there appear to be any relationship between a player's
points scored and his personal fouls? When presented with data
sets with two or more variables, or multivariate data, a
common question focuses on whether or not some relationship exists
among the data.

As we extend our data exploration to
multivariate data sets, we begin to explore relationships between
the data. Once again, we strive to describe three essential
characteristics of relationships: direction, strength, and
shape. Concepts and skills associated with these
characteristics can help us to effectively describe a multivariate
data set.

Exploring Relationships Through
Scatter Plots

To explore the data, we will create scatter plots of
bivariate data. A scatter plot provides a first look at how two
sets of data may relate to each other. We create a scatter plot
with a traditional two-dimensional coordinate system, or what you
may call an xy-plane. The data pairs that make up the data set are
plotted as ordered pairs on the coordinate axes. Here is a
scatter plot of the fouls-points data pairs from the table of
values above.

Note these components of the scatter
plot.

The horizontal and vertical axes are labeled according to
the data being used.

The scales on each axis are clearly marked.

It is not required that the scales for each axis be
identical.

At times, we may begin one or both axes at non-zero
values.

All dots or points that represent data pairs are exactly
the same size.

If two or more data pairs are equal, indicate that by
including the appropriate numeral on the scatter plot.

A scatter plot gives us a visual display of the
bivariate data set. It may reveal characteristics of shape,
direction, and strength not apparent from the raw data. We now
look at examples to help describe each of these
characteristics.

The scatter plots that follow show relationships between pairs
of data sets. By the direction of the relationship, we describe
how the data pairs increase or decrease with respect to each
other. Figure (a) shows the speed of a bicycle during the first 60
feet of travel. It shows that as distance increases, so does
speed. We use the word positive to describe the
direction of the relationship between distance and speed,
because as the values in one data set increase, so do their
associated values in the other data set.

Figure (b) shows the temperature of an object that has been
placed in a freezer. We see that as time increases, the
temperature decreases. We use the word negative to describe
the direction of the relationship between time and
temperature, because as the values in one data set increase,
the associated values in the other data set decrease.

Some bivariate sets of data appear to have neither a positive
nor a negative relationship. That is the situation revealed in
figure (c) above. It shows the elevation above sea level and the
annual rainfall for several cities throughout the world. Here, as
elevation increases, there seems to be no apparent direction to
the associated rainfall values. They show neither a corresponding
increase nor a corresponding decrease. Here, it seems there is
no conclusive direction to the
relationship.

For an overall look at the strength of the relationship of a
bivariate data set, we can apply to a scatter plot a tool called
the ellipse test. Examples are shown in the next plots. In
each case, an ellipse is used to fully capture the points shown
in the scatter plot. In figure (a), the ellipse is long and
narrow. Its major axis is much longer than its minor axis. In (b),
the major axis is longer than the minor axis, but not to the same
degree an in (a). In (c), however, the ellipse may be better
described as a circle, because the axes show little difference in
length.

The ratio of the axes lengths of an ellipse that
surrounds the data points of a scatter plot provides a rough
measure of the strength of the linear relationship in a
bivariate data set. The higher the ratio (that is, the greater the
difference in the two lengths), the stronger the relationship. As
the ratio approaches 1:1 (that is, as the lengths grow closer to
being equal to each other), the relationship grows weaker. When a
circle is required to capture the plotted points, the linear
relationship between the data pairs shows virtually no strength.
Direction of the relationship, too, is impossible to establish in
this case. Be aware, however, that these ratios can be distorted
by differences in the scales of the horizontal and vertical
axes.

Many of the relationships revealed in scatter plots appear to
be linear relationships, as illustrated below in plot (a),
and indeed can be justified to be so. Many
other relationships, however, do not seem to follow a straight
line when plotted. We have at our disposal many other mathematical
models to characterize the shape of relationships, shapes such as
quadratic (b), exponential (c), and periodic
(d), among many others.

We have provided examples to help describe
what we mean by the direction, strength, and
shape of a relationship. To help reinforce these concepts,
draw a separate scatter plot to represent each of the following
relationships. In doing so, try to identify real-world contexts
that fit these conditions

Draw a scatter plot to show a moderately strong positive
relationship with a constant rate of increase for
the data pairs.

Draw a scatter plot to show a weak positive
relationship where the data values represented on the
vertical axis increase much more quickly than the data
values represented on the horizontal axis.

Draw a scatter plot to show a perfect negative linear
relationship.

Draw a scatter plot to show a relationship for which
neither direction nor strength can be determined.