Week_12_Mini-Lecture_Slides

Week 12
November 17-21
Four Mini-Lectures
QMM 510
Fall 2014
ML 12.1
Chapter Contents
13.1 Multiple Regression
13.2 Assessing Overall Fit
13.3 Predictor Significance
13.4 Confidence Intervals for Y
13.5 Categorical Predictors
13.6 Tests for Nonlinearity and Interaction
13.7 Multicollinearity
13.8 Violations of Assumptions
13.9 Other Regression Topics
13-2
Much of this is
like Chapter 12,
except that we
have more than
one predictor.
Chapter 13
Multiple Regression
Chapter 13
Multiple Regression
Simple or Multivariate?
•
Multiple regression is an extension of simple regression to
include more than one independent variable.
•
Limitations of simple regression:
•
often simplistic
•
biased estimates if relevant predictors are omitted
•
lack of fit does not show that X is unrelated to Y if the true
model is multivariate
13-3
Chapter 13
Multiple Regression
Visualizing a Multiple Regression
13-4
Chapter 13
Multiple Regression
Regression Terminology
•
Y is the response variable and is assumed to be related to the k
predictors (X1, X2, … Xk) by a linear equation called the population
regression model:
Use Greek letters for
population parameters
•
The estimated (fitted) regression equation is:
Use Roman letters
for sample estimates
13-5
Chapter 13
Multiple Regression
Fitted Regression: Simple versus Multivariate
If we have more than
two predictors, there is
no way to visualize it …
13-6
Chapter 13
Multiple Regression
Data Format
n observed values of the response variable Y
and its proposed predictors X1, X2, …, Xk are
presented in the form of an n x k matrix.
13-7
Chapter 13
Multiple Regression
Common Misconceptions about Fit
•
A common mistake is to assume that the model with the best
fit is preferred.
•
Sometimes a model with a low R2 may give useful predictions,
while a model with a high R2 may conceal problems.
•
Thoroughly analyze the results before choosing the model.
13-8
Chapter 13
Multiple Regression
Four Criteria for Regression Assessment
• Logic - Is there an a priori reason to expect a causal relationship
between the predictors and the response variable?
• Fit - Does the overall regression show a significant relationship
between the predictors and the response variable?
• Parsimony - Does each predictor contribute significantly to the
explanation? Are some predictors not worth the trouble?
• Stability - Are the predictors related to one another so strongly that
the regression estimates become erratic?
13-9
Chapter 13
Assessing Overall Fit
F Test for Significance
•
•
For a regression with k predictors, the hypotheses to be tested are
H0: All the true coefficients are zero
H1: At least one of the coefficients is nonzero
In other words,
H0: b1 = b2 = … = bk= 0
H1: At least one of the coefficients is nonzero
13-10
Chapter 13
Assessing Overall Fit
F Test for Significance
The ANOVA calculations for a k-predictor model resemble those for
a simple regression, except for degrees of freedom:
13-11
Chapter 13
Assessing Overall Fit
Coefficient of Determination (R2)
•
R2, the coefficient of determination, is a common
measure of overall fit.
•
It can be calculated in one of two ways (always done
by computer).
•
For example, for the home price data,
13-12
Chapter 13
Assessing Overall Fit
Adjusted R2
•
It is generally possible to raise the coefficient of determination R2
by including additional predictors.
•
The adjusted coefficient of determination is done to penalize the
inclusion of useless predictors.
•
For n observations and k predictors:
13-13
Chapter 13
Assessing Overall Fit
How Many Predictors?
•
Limit the number of predictors based on the sample size.
•
A large sample size permits many predictors.
•
When n/k is small, the R2 no longer gives a reliable
indication of fit.
•
Suggested rules are:
Evan’s Rule (conservative): n/k  0 (at least 10 observations
per predictor)
Doane’s Rule (relaxed): n/k  5 (at least 5 observations
predictor)
These are just guidelines – use your judgment.
13-14
Chapter 13
Predictor Significance
•
Test each fitted coefficient to see whether it is significantly
different from zero.
•
The hypothesis tests for the coefficient of predictor Xj are
•
If we cannot reject the hypothesis that a coefficient is zero,
then the corresponding predictor does not contribute to
the prediction of Y.
13-15
Chapter 13
Predictor Significance
Test Statistic
•
Excel reports the test statistic for the coefficient of predictor Xj :
•
Find the critical value tα for chosen level of significance α from
Appendix D or from Excel using =T.INV.2T(α,df)  2 tailed test.
•
To reject H0 we compare tcalc to tα for the different hypotheses (or reject
if p-value  α).
•
The 95% confidence interval for coefficient bj is
13-16
Chapter 13
Confidence Intervals for Y
Standard Error
•
The standard error of the regression (se) is another important measure of
fit. Except for d.f. the formula for se resembles se for simple regression.
•
For n observations and k predictors
•
If all predictions were perfect (SSE = 0) then se = 0.
13-17
Chapter 13
Confidence Intervals for Y
Approximate Confidence and Prediction Intervals for Y
•
Approximate 95% confidence interval for conditional
mean of Y:
•
Approximate 95% prediction interval for individual Y
value:
13-18
Chapter 13
Confidence Intervals for Y
Quick 95 Percent Confidence and Prediction Interval for Y
•
The t-values for 95% confidence are typically near 2 (as long as n
is not too small).
•
Very quick prediction and confidence intervals for Y interval
without using a t table are:
13-19
ML 12.2
Standardized Residuals
• Use Excel, MINITAB, MegaStat or other software to compute
standardized residuals.
• If the absolute value of any standardized residual is at least 2, then
it is classified as unusual (as in simple regression).
Leverage and Influence
• A high leverage statistic indicates unusual X values in one or more
predictors.
• Such observations are influential because they are near the edge(s)
of the fitted regression plane.
• Leverage for observation i is denoted hi (computed by MegaStat)
12-20
Chapter 13
Unusual Observations
Leverage
• For a regression model with k predictors, an observation
whose leverage exceeds 2(k+1)/n is unusual.
• In Chapter 12, the leverage rule was 4/n. With k = 1 predictor,
we get 2(k+1)/n = 2(1+1)/n = 4/n.
• So this leverage criterion applies to simple regression as a
special case.
12-21
Chapter 13
Unusual Observations
Example: Heart Death Rate in 50 States
standard error
se = 27.422
n = 50 states,
k = 3 predictors
4 states (FL, HI, OK,
WV) have unusual
residuals
(> 2 se) highlighted by
MegaStat
high leverage criterion
is 2(k+1)/n
= 2(3+1)/50 = 0.160
Note: Only unusual observations are shown
(there were n = 50 observations)
12-22
MegaStat highlights the
high leverage
observations (> .160)
Chapter 13
Unusual Observations
Chapter 13
Categorical Predictors
ML 12.3
What Is a Binary or Categorical Predictor?
•
A binary predictor has two values (usually 0 and 1) to denote the
presence or absence of a condition.
•
For example, for n graduates from an MBA program:
Employed = 1
Unemployed = 0
•
These variables are also called dummy , dichotomous, or indicator
variables.
•
For easy understandability, name the binary variable the
characteristic that is equivalent to the value of 1.
13-23
Chapter 13
Categorical Predictors
Effects of a Binary Predictor
•
A binary predictor is sometimes called a shift variable because it
shifts the regression plane up or down.
•
Suppose X1 is a binary predictor that can take on only the values of
0 or 1.
•
Its contribution to the regression is either b1 or nothing, resulting in
an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1).
•
The slope does not change: only the intercept is shifted. For
example,
13-24
Testing a Binary for Significance
•
In multiple regression, binary predictors require no special treatment.
They are tested as any other predictor using a t test.
More Than One Binary
•
More than one binary occurs when the number of categories to be
coded exceeds two.
•
For example, for the variable GPA by class level, each category is a
binary variable:
Freshman = 1 if a freshman, 0 otherwise
Sophomore = 1 if a sophomore, 0 otherwise
Junior = 1 if a junior, 0 otherwise
Senior = 1 if a senior, 0 otherwise
Masters = 1 if a master’s candidate, 0 otherwise
Doctoral = 1 if a PhD candidate, 0 otherwise
13-25
Chapter 13
Categorical Predictors
What if I Forget to Exclude One Binary?
•
Including all binaries for all categories may introduce a serious problem
of collinearity for the regression estimation. Collinearity occurs when
there are redundant independent variables.
•
When the value of one independent variable can be determined from
the values of other independent variables, one column in the X data
matrix will be a perfect linear combination of the other column(s).
•
The least squares estimation would fail because the data matrix would
be singular (i.e., would have no inverse).
13-26
Chapter 13
Categorical Predictors
•
Outliers? (omit only if clearly errors)
•
Missing Predictors? (usually you can’t tell)
•
Ill-Conditioned Data (adjust decimals or take logs)
•
Significance in Large Samples? (if n is huge, almost any regression will
be significant)
•
Model Specification Errors? (may show up in residual patterns)
•
Missing Data? (we may have to live without it)
•
Binary Response? (if Y = 0,1 we use logistic regression)
•
Stepwise and Best Subsets Regression (MegaStat does these)
13-27
13-27
Chapter 13
Other Regression Problems