Substantively

The model and data are taken directly from Agresti, 1996: 207. The goal is to examine the effect
of the length of alligators on the type of food that they consume. At some point, I will try to find
a meaningful social science example. The length of alligators is measured in meters, and may be a
proxy for age, speed, and other unmeasured factors. The types of prey are classified as fish,
invertebrate, or other. A total of 59 alligators are observed, with 45 scores on measured length.
The goal is to predict the effect of length on the probability of each of the three types of prey,
under the constraint that these three response probabilities must sum to unity.

Statistically

There are a number of alternative approaches to this problem. However, since the dependent
variable is truely categorical, an extension of the binomial sampling model to multiple outcomes is
reasonable. The response probabilities may be linked to the linear combination of predictors by a
variety of functions, but the logistic is a good choice (a common alternative might be the probit).

One could estimate the effects of length on the log odds of each pair of outcomes, or the effects
of length on the odds of each outcome against the other two pooled. There are a couple problems
with this, however. First, since the equations of each approach are not truely independent (the
same data are used in more than one equation), the estimated standard errors and inferential
statistics may be too optimistic. Second, there is no guarantee that the estimated probabilities of
the three outcomes will sum to unity if the equations are estimated independently.

So, a better approach is to choose contrasts that will enable the estimation of the log odds of any
two of the three outcomes, and to derive the effects with regard to the third. It is also necessary
that the two equations for two outcomes be estimated simultaneously, to ensure consistency.
PROC CATMOD is a good tool for this kind of task.

In CATMOD, the standard approach (others are possible) is to define two equations for the
generalized logits of two outcomes with respect to the third; and, to derive parameter estimates
simultaneously by ML. From the parameters of these two equations, it possible to derive effects
of unit changes in independent variables on the probability of each of the three outcomes. These
effects are, by the nature of the logistic linking function, non-linear. However, they can be easily
understood by graphic methods or by calculation of elasticities.

Code

Comments

The length of the alligator in meters in input, followed by the qualitative variable indicating the
type of prey (Invertibrate, Fish, or Other). Qualitative codes are used for the levels of the
dependent variable in this case. It is often better to use numeric codes, and use a PROC
FORMAT to assign labels, as controlling the order of the categories of variables in catmod is
sometimes troublesome.

The "response logits" statement tells CATMOD to model generalized logits. CATMOD
calculates the log odds of each category of the dependent variable relative to the last category of
the dependent variable. Two alternative response functions are ALOGIT which calculates the log
odds of a category relative to the next highest (adjacent) category; and, CLOGIT which calculates
the log odds of a category relative to all lower categories. These latter two response functions
are normally used for the analysis of ordinal variables.

The "direct length" statement tells CATMOD that the variable length is to be treated as a
continuous variable (CATMOD, being a program for the analysis of categorical data, tends to
assume that all variables are CLASS, unless told otherwise).

The model statement simply defines the dependent and independent variable. A large number of
options are available to control the type of estimation (ML is the default for multinomial logits,
but not for everything CATMOD does). Here, we ask to see the predicted probability and
freqencies for each case. The purpose is to examine residuals, and to recover case probabilities.

Comments

CATMOD reports the number of populations (that is, unique scores on X) and the number of
observations. Next, the iteration history of the liklihood function and the parameter estimates are
reported. The final estimate of the -2log liklihood should be noted, as it is useful in assessing
badness of fit, and improvement in fit. It would be useful if CATMOD would report the liklihood
results for a two parameter model (that is, intercept only equations for the two generalized logits),
which could then be used to assess the improvement due to adding the independent variables to
the model. The four parameters cited in the iteration history are the intercept and slope of age on
the two generalized logits. A summary chi-square test is provided for the effect of the intercept
(with two degrees of freedom, as two intercepts are estimated) and -- more importantly -- the
independent variable, length. Here, we see that we can be reasonably confident that the one or
the other, or both of the log odds of consuming fish versus other or invertebrates versus other
prey are different from zero. That is, length does hava an effect. The liklihood ratio statistic for
the overall model has 86 degrees of freedom (45 scores or levels of X are observed for each of
two logits, yielding 90 pieces of information, four of which are consumed by the estimation of the
four free parameters of the model). One cannot readily reject the hypothesis that the model fits
the data (that is, the differences between the predicted frequencies and the actual frequencies in
the 45x2 table are not so large that they could not have reasonably occured by chance).

Comments

The four parameters are the intercepts and slope coefficients for the two equations predicting the
log odds of "fish" versus "other" and "invertebrate" versus "other" prey. As in most regressions,
the intercepts are of little interest. They do suggest that fish and invertebrate prey are more
common than other prey at low levels of alligator length. The effects of length on the log odds of
fish versus other prey are not significant at the 5% level, whereas the effects of length on the log
odds of invertebrate versus other prey are highly significant. Facing this result, one might elect to
collapse fish and other, and contrast that category with invertebrates. However, given the small
sample size, and given a substantive concern that calls for treating the three prey categories
separately, we would be more likely to simply note this result, and to proceed to trying to
understand the pattern of effects.

One could, of course, simply discuss the effects of one meter differences in alligator lengths on the
log-odds that an alligator is consuming fish versus other prey and on the log odds that it is
consuming invertebrates versus other prey. However, effects on log odds tend to not speak very
well to audiences. A much better strategy is suggested by Agresti, who shows how the
probability of each of the three outcomes can be calculated from the regression parameters for any
given value of X (here X is a single continuous variable, but the approach holds for any vector of
X values in models with multiple independent variables). The transformation looks like this:

probability that a case falls in category one on Y =

exp ( a1 + b1X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category two on Y =

exp (a2 + b2X) / [1 + exp (a1 + b1X) + exp (a2 + b2X)]

probability that a case falls in category three (the "reference or last category) of Y =

1 / [ 1 + (a1 + b1X) + (a2 + b2X)]

That is, to calculate the probability in the first category of the outcome, we exponentiate the
equation for the selected value of X for that outcome as a numerator; we take one plus the sum of
the exponentiated equations for each of the other logits in the denominator. Here, since the
dependent variable has only three categories, the denominator has two terms, one for each logit
estimated.

The same calculation is performed for the probability of each category of Y, except the last, or
reference category. To calculate the probabilities for the reference category of the dependent
variable, one is used in the denominator.

This transformation allows us to calculate the predicted probability of each of the three scores on
Y for any given score (or scores) on X. Then what? There are two approaches. For models with
very simple X vectors (one or two X variables) one can present the response surface in the form
of a line chart or graph. For more X, onc can hold all but one (or two) X constant (usually at
their mean values), and plot the partial response surfaces.

For those who need a regression coefficient expressing effects of units of X on the probability of
Y, the "elasticity" is suggested. First, calculate the predicted probability of each outcome when
all X values are fixed at their sample means. Then, change one of the X variables by one unit (or,
if you want a "standardized coefficient," by one standard deviation) and recalculate the
probabilities of each outcome. The difference in the predicted probabilites can be interpreted as
the effect of a one unit change in X on the probability of each Y outcome, when all other variables
are held constant at sample mean values. Of course, values other than the sample means could be
used, if there were some good reason for doing so.

Comments

It is a good idea to collect and examine residual statistics carefully in multinomial logistic
regression, just as it is in any other variety of regression modeling. The output above shows, for
each "poplation" (i.e. unique score on X) the predicted log odds on the "first function" (i.e. the
log odds of fish versus other) and the predicted log odds on the "second function" (i.e. the log
odds of invertebrate versus other prey). Perhaps more usefully, the ouput shows the frequency
distribution of cases in each population on the dependent variable (i.e. how many fish eaters,
inverebrate eaters and other eaters), and the probabilities of each outcome predicted by the
regression function. For the last population, for example, the one case was actually a fish eater --
and the model predicted a .76 probability of that outcome (generating a residual of .24).

This type of output has a couple uses. First, it enables us to construct alternative measures of
goodness of fit, if we are so inclined. It is not uncommon to count the numbers of "correct" and
"incorrect" classifications produced by the model, or false positives and false negatives for each
outcome (one must, of course select some reasonable rule for assigning cases to categories from
the predicted probabilities). Second, and probably more important (at least in somewhat more
complex cases than this example), we can identify outliers (possible indications of measurement
errors or omitted variables), and places where the model consistently over or under predicts
(indicating, perhaps, a less than optimal choice of the linking function).