Welcome to the Institute for Digital Research and Education

SAS Data Analysis Examples
Exact Logistic Regression

Versioninfo: Code for this page was tested in SAS 9.3.

Exact logistic regression is used to model binary outcome variables in which the
log odds of the outcome is modeled as a linear combination of the predictor
variables. It is used when the sample size is too small for a regular
logistic regression (which uses the standard maximum-likelihood-based estimator) and/or when some of the cells formed by the outcome and
categorical predictor variable have no observations. The estimates given
by exact logistic regression do not depend on asymptotic results.

Please note: The purpose of this page is to show how to use various data
analysis commands. It does not cover all aspects of the research process which
researchers are expected to do. In particular, it does not cover data
cleaning and checking, verification of assumptions, model diagnostics or
potential follow-up analyses.

Example

Suppose that we are interested in the factors
that influence whether or not a high school senior is admitted into a very competitive
engineering school. The
outcome variable is binary (0/1): admit or not admit.
The predictor variables of interest include student gender and whether or not the
student took Advanced Placement calculus in high school. Because the response variable is binary, we need
to use a model that handles 0/1 outcome variables correctly. Also, because of the number of students
involved is small, we will need a procedure that can perform the estimation with
a small sample size.

Description of the data

The data for this exact logistic data analysis include the number of students admitted, the total
number of applicants broken down by gender (the variable female), and whether or not
they had taken AP calculus (the variable apcalc). Since the dataset
is so small, we will read it in directly.

The tables reveal that 30 students applied for the Engineering program. Of
those, 15 were admitted and 15 were denied admission. There were 18 male and 12
female applicants. Sixteen of the applicants had taken AP calculus and 14 had
not. Note that all of the females who took AP calculus were admitted, versus only
about half the males.

Analysis methods you might consider

Below is a list of some analysis methods you may have
encountered. Some of the methods listed are quite reasonable, while others have
either fallen out of favor or have limitations.

Exact logistic regression - This technique is appropriate because
the outcome variable is binary, the sample size is small, and some cells are
empty.

Regular logistic regression - Due to the small sample size and the presence of
cells with no subjects, regular logistic regression is not advisable, and it
might not even be estimable.

Two-way contingency tables - You may need to use the fisher or exact
with proc freq option to
get the Fisher's exact test due to small expected values.

Using the exact logistic model

Let's run the exact logistic analysis using proc logistic with the
exact statement.
We will include the option estimate = both on the exact statement
so that we obtain both the point estimates and the odds ratios in the output.
We will also need to use the freq statement, for which we will specify the
frequency weight variable num.

The output begins with information about the dataset used and the model
run. Next, we see information about the response variable, including
the number of 0s and 1s. We see a note indicating that the 1s are
being modeled (because we used the desc option on the proc
logistic statement), and a note warning us about the 0 count for one of
the lines of data.

We next see model fit statistics, which can be used to compare models,
and tests of the overall model. We see that the overall model is
statistically significant.

Next, we have tables giving us the maximum likelihood estimates.
After the table giving the association between the predicted probabilities
and the observed responses, we see the results of the exact conditional
analysis. Both the score test and the probability test are given.
The variable female is not statistically significant, but the
variable apcalc is. For every one unit change in apcalc,
the expected log odds of admission (admit) increases by 3.34. The
intercept is not included in the output because its sufficient statistic was
conditioned out when creating the joint distribution of female and apcalc.

The final table in the output is table of exact odds ratios. The odds for an applicant who had taken AP calculus was about 28.2 times greater
than for one who had not taken the course.

We can also graph the predicted probabilities. To do this, we will
create a new variable called p using the output statement. Then we
will use proc gplot to graph p.

Things to consider

Exact logistic regression is a very memory intensive procedure, and it
is relatively easy to exceed the memory capacity of a given computer.

Firth logit may be helpful if you have separation in your data.
You can use the firth option on the model statement to run a Firth
logit. This option was added in SAS version 9.2.

Exact logistic regression is an alternative to conditional logistic
regression if you have stratification, since both condition on the number of
positive outcomes within each stratum. The estimates from these two
analyses will be different because conditional logit conditions only on the
intercept term, while exact logistic regression conditions on the sufficient
statistics of the other regression parameters as well as the intercept term.