When a logistic regression model has been fitted, estimates of p are marked with a hat symbol above the Greek letter pi to denote that the proportion is estimated from the fitted regression model. Fitted proportional responses are often referred to as event probabilities (i.e. π hat n events out of n trials).

The following information about the difference between two logits demonstrates one of the important uses of logistic regression models:

Logistic models provide important information about the relationship between response/outcome and exposure. It makes no difference to logistic models, whether outcomes have been sampled prospectively or retrospectively, this is not the case with other binomial models.

The conditional logistic model can cope with 1:1 or 1:m case-control matching. In the simplest case, this is an extension of McNemar's test for matched studies.

Data preparation

You must prepare your data case by case, i.e. ungrouped, one subject/observation per row, this is unlike the unconditional logistic function that accepts grouped or ungrouped data.

The binary outcome variable must contain only 0 (control) or 1 (case).

There must be a stratum indicator variable to denote the strata. In case-control studies with 1:1 matching this would mean a code for each pair (i.e. two rows marked stratum x, one with a case + covariates and the other with a control + covariates). For 1:m matched studies there will be 1+m rows of data for each stratum/matching-group.

These are artificially matched data from a study of the risk factors associated with low birth weight in Massachusetts in 1986. The predictors studied here are black race (RACE (b)), smoking status (SMOKE), hypertension (HT), uterine irritability (UI), previous preterm delivery (PTD) and weight of the mother at her last menstrual period (LWT).

To analyse these data using StatsDirect you must first open the test workbook using the file open function of the file menu. Then select Conditional Logistic from the Regression and Correlation section of the analysis menu. Select the column marked "PAIRID" when asked for the stratum (match group) indicator. Then select "LBWT" when asked for the case-control indicator. Then select "RACE (b)", "SMOKE", "HT", "UI", "PTD", and "LWT" in one action when you are asked for predictors.

For this example:

Conditional logistic regression

Deviance (-2 log likelihood) = 51.589852

Deviance (likelihood ratio) chi-square = 26.042632 P = 0.0002

Pseudo (McFadden) R-square = 0.33546

Label

Parameter estimate

Standard error

RACE (b)

0.582272

0.620708

z = 0.938078

P = 0.3482

SMOKE

1.410799

0.562177

z = 2.509528

P = 0.0121

HT

2.351335

1.05135

z = 2.236492

P = 0.0253

UI

1.399261

0.692244

z = 2.021341

P = 0.0432

PTD

1.807481

0.788952

z = 2.290989

P = 0.022

LWT

-0.018222

0.00913

z = -1.995807

P = 0.046

Label

Odds ratio

95% confidence interval

RACE (b)

1.790102

0.53031 to 6.042622

SMOKE

4.099229

1.361997 to 12.337527

HT

10.499579

1.3374 to 82.429442

UI

4.052205

1.043404 to 15.737307

PTD

6.095073

1.298439 to 28.611218

LWT

0.981943

0.964529 to 0.999673

You may infer from the results above that hypertension, smoking status and previous pre-term delivery are convincing predictors of low birth weight in the population studied.

Note that the selection of predictors for regression models such as this can be complex and is best done with the help of a Statistician. Hosmer and Lemeshow (1989) give a good discussion of the example above, but with non-standard dummy variables (StatsDirect uses a standard dummy/design variable coding scheme adopted by most other statistical software). The optimal selection of predictors depends not only upon their numerical performance in the model, with or without appropriate transformations or study of interactions, but also upon their biophysical importance in the study.