Tobit Regression | SAS Annotated Output

This page shows an example of tobit regression analysis in SAS with footnotes
explaining the output. The data in this example were gathered on undergraduates
applying to graduate school and include undergraduate GPAs, the reputation of
the school of the undergraduate (a topnotch indicator), the students’ GRE score,
and whether or not the student was admitted to graduate school.

The range of possible GRE scores is 200 to 800. This means that our outcome
variable is both left censored and right-censored. In other words, if two
students score 800, they are equal according to our scale but might not truly
be equal in aptitude. (In other words, we have a ceiling effect.) The same is
true of two students scoring 200 (a floor effect). Tobit regression generates a
model that predicts a censored outcome variable.

In the output above, we see that 25 of our observations are censored while
375 are not. Next, we program our model in SAS. This can be done with proc
lifereg. We specify our model, indicating that our response is censored.

a. Dataset – This indicates the dataset used in the analysis. If a
dataset is not specified in the model command, SAS uses the most recently
created/modified dataset by default.

b. Dependent Variable – This is the response variable predicted by the
model. We are using a tobit model because this response variable is censored: the GRE scores
are scaled from 200 to 800 and cannot be measured outside of this range
(although the phenomenon underlying the scores, in this case aptitude, is not
bounded).

c. Censoring Variable – This is the variable we defined in preparation
for running the tobit model. Values in our dataset will be considered censored
based on the corresponding value of our censoring variable.

d. Censoring Value(s) – These are the values of the censoring variable
that indicate a censored value in the dependent variable. In this example, the
observations where censor = 1 are censored. We indicated this in
the proc lifereg command with the 1 in parenthesis after censor.

e. Number of Observations – This is the number of observations from
the dataset used in the model. If an observation is missing data in the outcome or
any of the predictor variables, then it is excluded from the analysis.

f. Noncensored Values – This is the number of observations in the
model that were not censored. In this example, there were 375 observations in
the dataset with 200 < gre < 800.

g. Right Censored Values – This is the number of observations in the
model that were right censored. In this example, there were 25 observations in
the dataset with gre >= 800.

h. Left Censored Values – This is the number of observations in the
model that were left censored. In this example, there were zero observations in
the dataset with gre <= 200.

i. Interval Censored Values – This is the number of observations in
the model that were interval censored (where the outcome variable fell in an
interval that was censored). This type of censoring was not used in this model,
and so there were zero observations in the dataset in this category.

j.
Name of Distribution
– This indicates the distribution assumed for the errors terms of the model.
In a tobit model, this distribution is normal.

k.
Log Likelihood
– This is the log likelihood of the fitted model. It is used in the Likelihood
Ratio Chi-Square test of whether all predictors’ regression coefficients in the
model are simultaneously zero.

l.
Algorithm Converged.
– This indicates that the SAS convergence criterion for the iterating steps used
in maximizing the likelihood was met. The default criterion in SAS is the
relative gradient convergence criterion, and the default
precision is 10-8.

m.
Type III Analysis of Effects
– This is an analysis of the model variables using Type III sum of squares.
That is, the effect of a model variable does not depend on the order in which
the variable is specified in the model.

n.
Parameter
– This lists the model parameters. Our model includes an intercept
and the specified predictor variables.

o.
DF
– These are the degrees of freedom associated with each of the model parameters.

p. Estimate – These are the regression coefficients. These
coefficients are interpreted as you would interpret coefficients from an OLS regression: the
expected GRE score changes by Estimate for each unit increase in the
corresponding predictor.

Intercept– If all of the predictor variables in the model are
evaluated at zero, the predicted GRE score would be Intercept =
205.8515. For subjects from non-topnotch undergraduate institutions (topnotch
evaluated at zero) with zero gpa, the predicted GRE score would be
205.8515. This may seem very low, considering the mean GRE score is 587.7,
but note that evaluating gpa at zero is out of the range of plausible
values for gpa.

gpa – If a subject were to increase his gpa by one
point, his expected GRE score would increase by 111.3085 points while
holding all other variables in the model constant. Thus, the higher a
student’s gpa, the higher the predicted GRE score.

topnotch – If a subject attended a topnotch institution for
her undergraduate education, her expected GRE score would be 46.65774 points
higher than a subject with the same grade point average who attended a
non-topnotch institution. Thus, subjects from topnotch undergraduate
institutions have higher predicted GRE scores than subjects from
non-topnotch undergraduate institutions if grade point averages are held
constant.

q. Standard Error – These are the standard errors of the individual
regression coefficients.

r. 95% Confidence Limits – This is the Confidence Interval (CI) for an
individual coefficient given that the other predictors are in the model.
For a given predictor with a level of 95% confidence, we’d say that we are 95%
confident that the “true” coefficient lies between the lower and upper limit of
the interval. The CI is equivalent to the t test statistic: if the CI
includes zero, we’d fail to reject the null hypothesis that a particular
regression coefficient is zero given the other predictors are in the model with
alpha level of zero. An advantage of a CI is that it is illustrative; it
provides a range where the “true” parameter may lie.

s. Chi-Square – This is the Chi-Square test statistic
corresponding to the hypothesis that the given parameter’s estimate is equal to zero.

t. Pr>ChiSq – This is the probability the Chi-Square test statistic (or a
more extreme test statistic) would be observed under the null hypothesis that a
particular predictor’s regression coefficient is zero, given that the rest of
the predictors are in the model. For a given alpha level, P>ChiSq
determines whether or not the null hypothesis can be rejected. If P>ChiSq
is less than alpha, then the null hypothesis can be rejected and the parameter
estimate is considered statistically significant at that alpha level.

Intercept – The Chi-Square test statistic for the intercept,
Intercept is 16.14 with an associated p-value of <
0.001. If we set our alpha level at 0.05, we would reject the null
hypothesis and conclude that the Intercept has been found to be statistically
different from zero given gpa and topnotch are in the model
and evaluated at zero.

gpa – The Chi-Square test statistic for the predictor gpa
is 53.65 with an associated p-value of <0.001. If we
set our alpha level to 0.05, we would reject the null hypothesis and
conclude that the regression coefficient for gpa has been found to be
statistically different from zero given topnotch is in the model.

topnotch -The Chi-Square test statistic for the predictor
topnotch is 8.77 with an associated p-value of
0.0031. If we set our alpha level to 0.05, we would reject the null
hypothesis and conclude that the regression coefficient for topnotch
has been found to be statistically different from zero given gpa is
in the model.

u. Scale – This is the estimated standard error of the regression.
This value, 111.4882, is comparable to the root mean squared error that would be
obtained in an OLS regression.