Multinomial Logistic Regression | SAS Data Analysis Examples

Multinomial logistic regression is for modeling nominal
outcome variables, in which the log odds of the outcomes are modeled as a linear
combination of the predictor variables.

Please Note: The purpose of this page is to show how to use various data analysis commands.

It does not cover all aspects of the research process which researchers are expected to do. In

particular, it does not cover data cleaning and checking, verification of assumptions, model

diagnostics and potential follow-up analyses.

Examples of multinomial logistic regression

Example 1. People’s occupational choices might be influenced
by their parents’ occupations and their own education level. We can study the
relationship of one’s occupation choice with education level and father’s
occupation. The occupational choices will be the outcome variable which
consists of categories of occupations.

Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have
difference preference than young ones. The outcome variable here will be the
types of food, and the predictor variables might be the length of the alligators
and other environmental variables.

Example 3. Entering high school students make program choices among general program,
vocational program and academic program. Their choice might be modeled using
their writing score and their social economic status.

Description of the data

For our data analysis example, we will expand the third example using the
hsbdemo data set. You can download the data
here .

The data set contains variables on 200 students. The outcome variable is prog, program type. The predictor variables
are social economic status, ses, a three-level categorical variable
and writing score, write, a continuous variable. Let’s start with
getting some descriptive statistics of the
variables of interest.

Multiple logistic regression analyses, one for each pair of outcomes:
One problem with this approach is that each analysis is potentially run on a different
sample. The other problem is that without constraining the logistic models,
we can end up with the probability of choosing all possible outcome categories
greater than 1.

Collapsing number of categories to two and then doing a logistic regression: This approach
suffers from loss of information and changes the original research questions to
very different ones.

Ordinal logistic regression: If the outcome variable is truly ordered
and if it also satisfies the assumption of proportional
odds, then switching to ordinal logistic regression will make the model more
parsimonious.

Alternative-specific multinomial probit regression: allows
different error structures therefore allows to relax the independence of
irrelevant alternatives (IIA, see below “Things to Consider”) assumption.
This requires that the data structure be choice-specific.

Nested logit model: also relaxes the IIA assumption, also
requires the data structure be choice-specific.

Multinomial logistic regression

Below we use proc logistic to estimate a multinomial logistic
regression model. The outcome prog and the predictor ses are both
categorical variables and should be indicated as such on the class statement. We
can specify the baseline category for prog using (ref = “2”) and
the reference group for ses using (ref = “1”). The param=ref option
on
the class statement tells SAS to use dummy coding rather than effect coding
for the variable ses. Note that the levels of prog are defined as:

In the output above, the likelihood ratio chi-square of48.23 with a p-value < 0.0001 tells us that our model as a whole fits
significantly better than an empty model (i.e., a model with no
predictors)

Several model fit measures such as the AIC are listed under
Model Fit Statistics

Two models are tested in this multinomial regression, one comparing
membership to general versus academic program and one comparing membership to
vocational versus academic program. They correspond to the two equations below:

A one-unit increase in the variable write is associated with a
.058 decrease in the relative log odds of being in general program vs.
academic program .

A one-unit increase in the variable write is associated with a
.1136 decrease in the relative log odds of being in vocation program vs.
academic program.

The relative log odds of being in general program vs. in academic program will
decrease by 1.163 if moving from the lowest level of ses (ses==1) to the
highest level of ses (ses==3).

The overall effects of ses and write are listed under
“Type 3 Analysis of Effects”, and both are significant.

The ratio of the probability of choosing one outcome category over the
probability of choosing the baseline category is often referred to as relative risk
(and it is also sometimes referred to as odds as we have just used to described the
regression parameters above). Relative risk can be obtained by
exponentiating the linear equations above, yielding regression coefficients that
are relative risk ratios for a unit change in the predictor variable.
In the case of two categories, relative risk ratios are equivalent to
odds ratios, which are listed in the output as well.

The odds ratio for a one-unit increase in the variable write
is .944 (exp(-.0579) from the regression coefficients above the odds
ratios) for being in general program vs. academic program.

The odds ratio of switching from ses = 1 to 3 is .313 for being
in general program vs. academic program. In other words, the expected risk
of staying in the general program is lower for subjects who are high in ses.

Using the test statement, we can also test specific hypotheses within
or even across logits, such as if the effect of ses=3 in
predicting general versus academic equals the effect of ses = 3 in
predicting vocational versus academic. Use of the test statement requires the
unique names SAS assigns each parameter in the model. The option outest
on the proc logistic statement produces an output dataset with
the parameter names and values. We can get these names by printing them,
and we transpose them to be more readable. The noobs option on the proc print
statement suppresses observation numbers, since they are meaningless in the parameter dataset.

Here we see the same parameters as in the output above, but with their unique SAS-given names.
We are interested in testing whether SES3_general is equal to SES3_vocational,
which we can now do with the test statement. The code preceding the “:”
on the test statement is a label identifying the test in the output, and it must
conform to SAS variable-naming rules (i.e., 32 characters in length or less, letters,
numerals, and underscore).

The effect of ses=3 for predicting general versus academic is not different from the effect of
ses=3 for predicting vocational versus academic.

You can also use predicted probabilities to help you understand the model.
You can calculate predicted probabilities using the lsmeans statement and
the ilink option. For multinomial data, lsmeans requires glm
rather than reference (dummy) coding, even though they are essentially
the same, so be sure to respecify the coding on the class statement.
However, glm coding only allows the last category to be the reference
group (prog = vocational and ses = 3)and will ignore any other
reference group specifications. Below we use lsmeans to
calculate the predicted probability of choosing program type academic or general at each level
of ses, holding write at its means.

The predicted probabilities are in the “Mean” column. Thus, for ses
= 3 and write = 52.775, we see that the probability of being the academic
program (program type 2) is 0.7009; for the general program (program type 1),
the probability is 0.1785.

To obtain predicted probabilities for the program type vocational, we can reverse the ordering of the categories
using the descending option on the proc logistic statement.
This will make academic the reference group for prog and 3 the reference
group for ses.

Here we see the probability of being in the vocational program when ses = 3 and
write = 52.775 is 0.1206, which is what we would have expected since (1 –
0.7009 – 0.1785) = 0.1206, where 0.7009 and 0.1785 are the probabilities of
being in the academic and general programs under the same conditions.

Things to consider

The Independence of Irrelevant Alternatives (IIA) assumption: Roughly,
the IIA assumption means that adding or deleting alternative outcome
categories does not affect the odds among the remaining outcomes.

Diagnostics and model fit: Unlike logistic regression where there are
many statistics for performing model diagnostics, it is not as
straightforward to do diagnostics with multinomial logistic regression
models. Some model fit statistics are listed in the output.

Pseudo-R-Squared: The R-squared offered in the output is basically the
change in terms of log-likelihood from the intercept-only model to the
current model. It does not convey the same information as the R-square for
linear regression, even though it is still “the higher, the better”.

Sample size: Multinomial regression uses a maximum likelihood estimation
method. Therefore, it requires a large sample size. It also uses multiple
equations. Therefore, it requires an even larger sample size than ordinal or
binary logistic regression.

Complete or quasi-complete separation: Complete separation implies that only one value of a predictor variable is
associated with only one value of the response variable. You can tell from the output of the
regression coefficients that something is wrong. You can then do a two-way tabulation of the outcome
variable with the problematic variable to confirm this and then rerun the model
without the problematic variable.

Empty cells or small cells: You should check for empty or small
cells by doing a crosstab between categorical predictors and
the outcome variable. If a cell has very few cases (a small cell), the
model may become unstable or it might not run at all.

Sometimes observations are clustered into groups (e.g., people within
families, students within classrooms). In such cases, you may want to see
our page on non-independence within clusters.