A binary variable Y
= 0 or 1 can be used to indicate membership in one of two groups.
Explanatory variables may
explain/predict this membership.When one or more of the explanatory variables is non-numerical,discriminant analysis is
inappropriate.Logistic regression
can be used with any combination of explanatory variables, numerical,
categorical, or some of each.

Example. There were 113
applicants for charge accounts at a department store in a large city in the
northeastern U.S. The variables include gender and marital status,
which are categorical and hence require Logistic Regression rather than
Discriminant Analysis.

Logistic
Regression: Regression with a Binary Dependent Variable

Let
the two groups be indicated by the binary variable Y,
which is equal to 0 or
1.The probability of being in one group or the other depends on values of some
variables in the vectorX.Let Px = P(Y=1|X=x) and Qx
= 1 - Px=P(Y=0|X=x). When the
distribution of X for Y = 1 ismultinormal(multivariate
normal) with mean µ1 and the distribution of X for Y =
0 is
multinormal with mean µ0,
and the two covariance matrices are equal, then

ln (Px/Qx)

is
linear in x . This suggests modeling the binary dependent variable
Y by taking ln Px/Qx
to be linear in x, even when the conditional distributions of X
given Y = 0 and 1 are not multinormal. The model

ln (Px/Qx) =b0
+ b1'x

is
called the logistic regression model.

Function:

Name:

Range:

Px

prob. that Y=1, given that X=x

0 to 1

Px/Qx

Odds

0 to infinity

ln(Px/Qx)

log Odds, or "logit"

negative infinity to positive infinity

In
this model, what is the mathematical expression forPitself ?A little algebra shows that if logit(P)
= z, then P = ez/(1+ez),or 1/(1+e-z)
. This function is called the logistic function.

Estimation

If
there is more than one observation at each valuex,weighted least squares estimation can be used.The next example illustrates a use of logistic regression other than classification.

Example.It involves the breaking strength of wires.Thelogitis
regressed on x, the weight applied.Weighted regression is used.The weight is 1/Var(y),where herey =
logit(p)andVar(y) = Var[(logit(p)],which can be shown to be approximately
1/(NPQ),so the weight is NPQ,
which is estimated byNpq.The fitted regression equation is

^logit
= -11.51 + 0.156 Weight.

The median breaking strength
can be estimated as the value ofxfor which px=
1/2, that is, for which logit(px) = 0.

Maximum likelihood
estimation
is used when there are not repeated observations at each value or pattern of x.The likelihoodL(developed below) is maximized. The higher the maximized value of L, the better the fit of
the model.This is assessed on a
log scale by computing-2 log
L,called-2LL .
(This criterion corresponds to residual mean square, i.e., sum of
squared errors, in normal multiple regression models.) When there
are several explanatory variables, different models can be assessed using-2LLas a figure-of-merit. A
penalty term can be added which increases with the number of parameters
used.In AIC (Akaike's information
criterion) this penalty is 2 times the number of parameters. In Schwarz's criterion (denoted by SC, SIC
or BIC, for Bayesian Information
Criterion),the penalty is the
natural log of n, that is, ln n, times the number of parameters.

Here is an explanation of
maximum likelihood estimation in logistic regression models. We begin
with the simpler case of a sample of Bernoulli variables. If you
have a sample

y1, y2,
. . ., yn

of 0,1 variables with
success probability P, then the log likelihood can be written

y1lnP + (1-y1)lnQ
+ y2lnP + (1-y2)lnQ + . . . + ynlnP + (1-yn)ln
Q .

The maximum likelihood
estimator is the value ofPwhich maximizes this; it turns
out to be simply the
sample proportion of 1's,

(y1 + y2
+ . . . + yn)/n .

In the logistic regression
model,

Px =
1/[1+exp(-B0 - BTx)],

whereB0is the constant andBis the vector of logistic regression
parameters, to be estimated. This is done by maximizing the
likelihood by numerical methods.

Testing a reduced model
against a full modelis based on -2LL, whereLLis the natural log of the
maximized likelihood. It is based on the fact that(-2LL)full - (-2LL)reducedis for largendistributed approximately as chi-square with the number of
d.f. equal to kfull - kreduced , where
thesek'sare the numbers of explanatory
variables in the two models.

Incorporating prior
probabilities

Logistic regression
implicitly uses prior probability estimates obtained from the sample. Thus the
estimate of B0 will contain a term ln(p/q),where p and qare the proportions of cases from Group 0 and Group 1 in the
sample. If the appropriate prior probabilities are instead p' and
q',thenb0should be adjusted:

References

Allison, Paul. Logistic
Regression Using the SAS System. There is an excellent
discussion of the basics, along with treatment of many advanced topics such as
ordered logit, discrete choice and repeated measures.
Go to http://www.sas.com/pubs
and find the book in the "books by users" section.