Regression Analysis using ACS 5 year estimates

Hello!

I am grad student at NC State working with a fellow student on a project involving ArcGIS and ACS 5-year estimate data. We would like to construct a model using ordinary least squares (and eventually geographically weighted regression) with various ACS variables as independent variables. I am unsure how to incorporate the MOEs included in ACS data into these models.

Any help for a newbie at using survey data to construct models accurately (particularly in ArcGIS) is much appreciated!

The first thought that comes to mind is to use the MOEs (or calculate them, if you're combining estimates) as a "reliability check" and come up with some cut off point (the difference between the upper bound and lower bound has to be less than 10% of the estimate, 12%, ??? - you can decide on what makes sense for your particular variables and geography levels you're planning to use). If the estimates pass that reliability check, then you can use them in your models. If not, treat as missing data (basically set all those cases to unknown).

If you're finding that there are a lot of estimates that don't pass your reliability cut off point, there are some things you can try:
-work with a "courser" geography-level (e.g. county subdivisions instead of tracts)
-combine categories/estimates (e.g., age 65-74 and age 75-84 become age 65-84)
-work with the 5-year ACS file instead of the 1-year ACS file. Not as recent data, but often more accurate (lower MOEs). It sounds as if you're already working with the 5-year estimates, so this might not help you out here.

I agree with Diana that a first step is assessing the reliability. Both the Census and ESRI have published some guidelines around MoE and Coefficient of Variance which are worth checking out. Also, I would suggest using some of ESRI's learning communities and support. You may want to check out:

In addition to using the MOEs to evaluate the data quality up front, you might also consider using them to do some sensitivity analysis on your estimated coefficients. For example, estimate your model with the reported estimates and then re-estimate using values from the top or bottom of the confidence interval for a particular variable and see how much it affects your model results.