3
Goals for Course To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. To enable researchers to conduct careful analyses with existing VA (and non-VA) datasets. We will We will –Describe econometric tools and their strengths and limitations –Use examples to reinforce learning

4
Requirements Familiarity with multivariate analysis Familiarity with multivariate analysis We expect you to We expect you to –do the readings –ask questions if you don’t understand something –do your best

7
LiveMeeting Rules Mute your phone Mute your phone Don’t use the hold button Don’t use the hold button If you have to make or answer another call, please hang up and then dial back in

8
Features of web system Main page Main page –Slides, white board, applications demo Side page Side page –Chat –Questions and answers –Informal poll –Polls

9
Virtual Interaction Polls Polls me (or the presenter) with questions or me (or the presenter) with questions or If you are in a group setting, please me the next two items. If you are in a group setting, please me the next two items.

10
Randomized Clinical Trial RCTs are the gold-standard research design for assessing causality RCTs are the gold-standard research design for assessing causality What is unique about a randomized trial? What is unique about a randomized trial? The treatment / exposure is randomly assigned Benefits of randomization: Benefits of randomization: Causal inferences

11
Randomization Random assignment distinguishes experimental and non-experimental design Random assignment distinguishes experimental and non-experimental design Random assignment should not be confused with random selection Random assignment should not be confused with random selection –Selection can be important for generalizability (e.g., randomly-selected survey participants) –Random assignment is required for understanding causation

12
Limitations of RCTs Generalizability to real life may be low Generalizability to real life may be low Hawthorne effect (both arms) Hawthorne effect (both arms) RCTs are expensive and slow RCTs are expensive and slow Can be unethical to randomize people to certain treatments or conditions Can be unethical to randomize people to certain treatments or conditions Quasi-experimental design can fill an important role Quasi-experimental design can fill an important role

15
“i” is an index. If we are analyzing people, then this typically refers to the person There may be other indexes

16
DV Two covariates Error Term Intercept

17
DV j covariates Error Term Intercept Different notation

18
Error term Error exists because Error exists because 1.Other important variables might be omitted 2.Measurement error 3.Human indeterminancy Understand and minimize error Understand and minimize error Error can be additive or multiplicative Error can be additive or multiplicative See Kennedy, P. A Guide to Econometrics

19
Example: is height associated with income?

20
Y=income; X=height Y=income; X=height Hypothesis: Height is not related to income (B 1 =0) Hypothesis: Height is not related to income (B 1 =0) If B 1 =0, then what is B 0 ? If B 1 =0, then what is B 0 ? To test Hypothesis, we need to estimate B 1 with a sample of data To test Hypothesis, we need to estimate B 1 with a sample of data

32
Classic Linear Regression No “superestimator” No “superestimator” CLR models are often used as the starting point for analyses CLR models are often used as the starting point for analyses 5 assumptions for the CLR 5 assumptions for the CLR Variations in these assumption will guide your choice of estimator (and happiness of your reviewers) Variations in these assumption will guide your choice of estimator (and happiness of your reviewers)

33
Assumption 1 The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term The dependent variable can be calculated as a linear function of a specific set of independent variables, plus an error term For example, For example,

34
Violations to Assumption 1 Omitted variables Omitted variables Non-linearities Non-linearities –Note: by transforming independent variables, a nonlinear function can be made from a linear function

36
Assumption 1 and Stepwise Statistical software allows for creating models in a “stepwise” fashion Statistical software allows for creating models in a “stepwise” fashion Be careful when using it. Be careful when using it. Why? Why? –Little penalty for adding a nuisance variable –BIG penalty for missing an important covariate

37
Assumption 2 Expected value of the error term is 0 Expected value of the error term is 0 E(u i )=0 Violations lead to biased intercept A concern when analyzing cost data (Wei will talk about the smearing estimator)

40
Violating Assumption 3 Effects Effects –OLS coefficients are unbiased –OLS is inefficient –Standard errors are biased Plotting is often very helpful Plotting is often very helpful Different statistical tests for heteroskedasticity Different statistical tests for heteroskedasticity –GWHet--but statistical tests have limited power

41
Fixes for Assumption 3 Transforming dependent variable may eliminate it Transforming dependent variable may eliminate it Robust standard errors (Huber White or sandwich estimators) Robust standard errors (Huber White or sandwich estimators) Wei and Ciaran will address this issue in more detail in later courses Wei and Ciaran will address this issue in more detail in later courses

42
Assumption 4 Observations on independent variables are considered fixed in repeated samples Observations on independent variables are considered fixed in repeated samples E(x i u i )=0 E(x i u i )=0 Violations Violations –Errors in variables –Autoregression –Simultaneity

43
Assumption 4: Errors in Variables Measurement error of dependent variable (DV) is maintained in error term. Measurement error of dependent variable (DV) is maintained in error term. Error in measuring covariates can be problematic Error in measuring covariates can be problematic –Is error correlated with error from DV?

44
Common Violations Including a lagged dependent variable(s) as a covariate Including a lagged dependent variable(s) as a covariate Contemporaneous correlation—often called endogeneity Contemporaneous correlation—often called endogeneity –Hausman test (but very weak in small samples) Mark Smith will talk more about this. Mark Smith will talk more about this.

47
Statistical Software SAS is for data management SAS is for data management R and Stata are for analyses R and Stata are for analyses –http://www.r-project.org/ Stattransfer Stattransfer (Always transfer SCRSSN as double precision)