Introduction to Research Methods in Political Science:
The POWERMUTT* Project
(for use with SPSS)*Politically-Oriented Web-Enhanced Research Methods for Undergraduates  Topics & ToolsResources for introductory research methods courses in political science and related disciplines

If we can calculate a mean score for one group of cases (such as the world's countries) on one variable, we
can also 1) compare a group's mean scores on two or more different variables
and, conversely, 2) compare the mean scores of two or more groups (e.g., countries in different regions of the world) on the same
variable. The variables for which means are compared must, of course, be interval, though the groups that are compared can, and usually will, be nominal or ordinal.

In this topic, we will first examine a technique called the t-test,[1] a measure of whether the difference between two mean
scores is statistically significant. The test is also called
"Student's t" because its inventor, William Gosset (1876-1937), wrote under the
pseudonym "Student." One version of this test (called a
paired-samples t-test) uses one group of cases and compares the group's scores
on two different variables. Another version (the independent-samples t-test) compares the scores of two different groups on the same variable.
Except that it is more powerful because it uses data at a higher level of
measurement, the t-test is similar in
concept to the chi-square test of statistical significance discussed in an earlier topic. T-tests are often employed in experimental research, with paired-samples
t-tests commonly used to compare pre and post experiment scores, and
independent-samples t-tests used compare two groups of subjects, such as a
control group and an experimental group, on the same measure.

We will end the topic with a discussion of a simple form of a very powerful
method called analysis of variance. Using one way analysis of variance, we
will compare several groups in terms of the same variable. By
partitioning the distributions of scores into between group variance and within
group variance, we will be able to measure the strength of the differences
between groups using a proportional reduction in error measure called eta2 (η2), and also determine whether the differences are
statistically significant.

Paired-Samples t-Tests (comparing scores on two different variables for one group of cases)

How do people feel about the two major political parties? In the 2008
American National Election Study, respondents were asked to rate the Democratic
and Republican parties on a "feeling thermometer," on which 100
represented the warmest, or most favorable of feelings, and 0 the coldest, or
least favorable. When we run
a t-test to compare the means of the two variables (the
data are weighted using the “weight” variable), the result shows that the Democratic party comes out somewhat better. The
first table below shows a mean score for the Democratic Party of 56.87, compared to 48.15 for the Republican Party. Is this difference
sufficiently large that we can reject the null hypothesis that it is simply due
to random sampling error (that is, chance)? The figure in the last column
of the second table below helps us answer this question. The value of t
with 2046 degrees of freedom (one less than the number of cases) has a
significance level of .000, that is, it would occur by chance less than one time in a thousand. The difference is clearly statistically
significant. Note: a two-tailed test of significance is used for
"non-directional" hypotheses, in which we suspect that there will be
a difference in scores, but don't know in advance of examining our results
which score will be higher. Normally, hypotheses are
"directional," and we have reason to predict not just that there will
be a difference, but also which score will be the higher one. To
calculate the one-tailed probability, simply divide the two-tailed result by
two. For example, if a relationship were significant at the .04 level using a two-tailed test, it would be significant at the .02 level using a one-tailed test. (Of
course, if the difference is not in the direction we predicted, then our
hypothesis is not confirmed regardless of the level of significance.)

Independent-Samples t-Tests (comparing scores on one variable for two different groups of cases)

In 2000, Ralph Nader, running as the Green party candidate for president,
won only about 97,000 votes in Florida (less than two percent of the total),
but these votes almost certainly cost Democrat Al Gore Florida's 25 Electoral
College votes and, with them, the election. By 2004, when Nader again ran
for president, many Democrats had developed bitter feelings toward him.
The 2004 American National Election Study asked respondents their feelings
about a number of prominent politicians, including Nader. On the one hand,
we might expect that most Democrats would be closer to Nader philosophically
than would most Republicans, and so would have warmer feelings about him.
On the other hand, there were the memories of the 2000 election. An
independent-samples t-test will enable us to compare Nader's scores from
respondents of both major parties.

Again weighting by the "weight" variable, we can see, first of all, that respondents of neither party had particularly
warm feelings for Nader, with Republicans averaging 40.88 and Democrats 39.47
(see first table below). For the independent-samples t-test, there are
two versions of the computational formula, depending on whether we assume that
the variances of the two scores are equal. (The technical name for equal
variance is homoscedasticity.) Before deciding which version to
use, we need to determine whether there is a statistically significant
difference (p<.05) between the variances of the two variables. The
test for this uses an F ratio (a measure of statistical significance in the
same family of measures as t and chi-square). In this case, we can see
from the second table below that the F ratio is not statistically significant
(p=.326). We can therefore proceed to use the version of t that assumes
equal variances (though in this case, the results are almost identical
regardless of which version we use). Because we could in advance have made
a case either way as to whether Democrats or Republicans would have warmer
feelings about Nader, we will use a two-tailed test. We find that
the difference between Democrats and Republicans could have easily been due to
chance (p=.419). The relationship is not statistically significant.

One-way Analysis of Variance and Eta2 (η2) (comparing scores on one variable for several different groups of cases)

The independent-samples t-test is a special case of a more general method
that allows comparisons among more than two groups of cases. If we think
of group membership as an independent variable, and the interval or ratio
variable as a dependent variable, we might then ask whether the differences
between the groups are statistically significant, and how strong an indicator
group membership is of the value of the dependent variable. We can answer these questions with one-way
"analysis of variance" (ANOVA) and a related proportional reduction in error measure of association
called eta2 (η2).

We will illustrate these ideas by comparing the Gross Domestic Product per
capita of countries in different regions of the world. Boxplots displaying this relationship are
shown in the following figure:

Obviously, there are major wealth differences between regions. At the same time, there are important differences within some
regions. European and North American
countries, while the most affluent overall, vary considerably in their wealth. While most Asian countries
are poor, there are a few outliers in this region that are at least as affluent as most
in Europe and North America. But just
how good a predictor is region of wealth? Put another way, how much of the variance in wealth is between region variance,
and how much is within region variance?

For an interval or ratio variable, our best guess as to the score of an
individual case, if we knew nothing else about that case, would be the
mean. The variance gives
us a measure of the error we make in guessing the mean, since the greater the
variance, the less reliable a predictor the mean will be. For GDP per capita, we obtain the following
parameters for all countries taken together:

How much less will our error be in guessing the value of the dependent
variable (in this case, GDP per capita) if we know the value of the independent
variable (region)? We can calculate the
within-group variance in the same way that total variance is calculated, except
that, instead of subtracting each score from the overall mean, we subtract it
from the group mean (that is, the mean for the region in which the country is located). We can then
determine how much less variance there is about the group means than about the
overall mean.

The formula:

η2= (total variance — within-group
variance)/total variance

provides us with the familiar proportional reduction in error. Eta2 thus belongs to the same “PRE”
family of measures of association as Lambda, Gamma, Kendall's tau, and others.

Recall that variance is the sum of squared deviations from the mean divided
by N (the number of cases). The “Sum of
Squares” numbers in the ANOVA table refers to the sum of squared deviations
from the mean. They are, in other words, the same as the between groups, within groups,
and total variances, except that they have not been divided by N. Since N is the same for each, we can omit
this last step. Eta2 is then
calculated as follows:

η2 = (51,659,357,824 - 34,689,556,406) / 51,659,357,824 = .328

In other words, by knowing the region in which a country is located, we can reduce the error we make in guessing
its GDP/capita by 32.8 percent.

We can also perform a test for the statistical significance of this measure
using the F ratio (assuming that we wish to calculate statistical significance
for population data such as is in the "countries" file). In
this case, the differences between the regions would occur by chance less than
one time in a thousand. (See last column of the first table above). If
there are only two groups being compared, the F ratio is mathematically
equivalent to the t-test.

The following topic,
regression (or ordinary least squares), is another, even more powerful way to analyze
variance.

2. Using the same dataset, and again weighting cases by the
"weight" variable, do independent-samples t-tests to compare the
feelings of Democrats and Republicans toward Joe Biden and Sarah Palin. Using boxplots, display these same relationships graphically.

3. Using the same dataset, and again weighting cases by the
"weight" variable, do comparison of means tests, requesting ANOVA and eta2 to see how well party (use “partyid7”) and ideology explain respondents’ scores on the various
“feeling thermometers” included in the file. Using boxplots, display these same relationships graphically.

4. Open the senate.sav file and the senate codebook. Do
a comparison of means test between the voting records of Democrats and Republicans, requesting eta2. Repeat with senators’ gender and with
the region of the state they represent as your independent variables. Which independent variable does the best job
of explaining voting record? Using boxplots, display these same relationships graphically.

[1]For the
various formulas used to compute t-tests, see Ying (Joy) Zhang, "Confidence
Interval and the Student's T Test," http://projectile.sv.cmu.edu/research/public/talks/t-test.htm#types.
Note: what we have, using SPSS's terminology, called "paired-samples t-tests," Zhang calls "paired t-tests," and what we have called
"independent-sample t-tests," he calls "unpaired t-tests." He also describes "one sample" t-tests, a
subject not covered in POWERMUTT.