Motor Vehicle Related Expenditures and Disposable Income

Introduction

This paper estimates how overall consumption of motor vehicles and related parts varies with disposable income across the fifty states. I began by collecting seven macroeconomic variables, in addition to disposable income, that have general macroeconomic significance and may be specifically significant to predicting motor vehicles-related expenditure.

Data

The data for this estimate span from 1997 to 2015 at an annual frequency. The personal consumption expenditures data for gasoline/other energy related goods (var: gas), motor vehicles/parts (var: motor), and transportation services (var: transpo) come from the Bureau of Economic Analysis. As does the GDP (var: gdp), population (var: pop), and disposable income (var: dispo) data. The source of the average annual Fahrenheit temperature (var: temp) data is the National Climatic Data Center. The unemployment data (var: u) comes from the Bureau of Labor Statistics. All USD amounts are based in 2015 dollars.

The table below provides the variables’ summary statistics:

The correlation coefficient matrix to the right suggests that multicollinearity is present. Dispinc is highly correlated with four variables. Gdp and dispinc have a correlation of 0.998, which may naturally follow from the variables both being measures of overall income. Dispinc and transpo have a correlation coefficient of 0.959. Dispo and gas have a coefficient of 0.901. Dispo and pop have a 0.997.

In addition, several other variable pairs have correlation coefficients above 0.91. So, the first model will attempt to both avoid excessive multicollinearity and redundancy and include variables whose inclusion makes sense according to microeconomic theory.

Model: motor = transpo dispinc

The model contains the following two independent variables: transportation services expenditures and disposable income. The regression equation is what follows:

I anticipated that an increase in transpo will result in a decrease in motor due to the substitutability of the two variables. I anticipated that dispinc and motor will have a positive relationship. Therefore, an increase in dispinc will result in an increase in motor. This did not turn out to be the case. There are several possible reasons for this. There may be significant multicollinearity, autocorrelation, omitted variables, the true model is non-linear, or other model misspecification issues (e.g. incorrect application of microeconomic theory).

A test for the extent of multicollinearity (“vif”) in a regression analysis indicates that the variance inflation factors (VIF) are high. VIF runs on the assumption that the model’s predictors are not linearly dependent. A score above five indicates that the variables are highly correlated. Both predictors have VIF scores above 12. This suggests that the parameter coefficient estimates are unreliable and poorly estimated. “collinoint” shows that the variables are highly collinear as well. The tables below contain the model’s parameter estimates and other regression results:

Therefore, multicollinearity is a certain problem in this model. However, the predictor variables do explain 0.6781 of the variation in motor, so it is worth investigating further.

Another issue may be autocorrelation. With time series data, it is likely that the variable’s current observed value relates to the variable’s value during the previous time period. When this is the case, the model violates the assumption that the error terms are serially uncorrelated. This creates inefficient estimates.

Therefore, multicollinearity is a certain problem in this model. However, the predictor variables do explain 0.6781 of the variation in motor, so it is worth investigating further.

Another issue may be autocorrelation. With time series data, it is likely that the variable’s current observed value relates to the variable’s value during the previous time period. When this is the case, the model violates the assumption that the error terms are serially uncorrelated. This creates inefficient estimates.

The Durbin-Watson test shows that there is strong first order autocorrelation, 0.611, so the model does not meet the assumption of zero covariance in the error term. The associated p-value for testing autocorrelation is < 0.05, which indicates a positive autocorrelation. The SAS output below shows our parameter estimates, Durbin-Watson test statistic, and degree of first order autocorrelation:
Usually, a plot of the model’s residuals provides a visual hint as to whether or not—and in which direction—autocorrelation exists. In addition, we can use tests for normality. The Shapiro-Wilk W test statistic, a statistical test where the null hypothesis is normal distribution, has a high value, 0.7475. Therefore, we have insufficient evidence to reject the notion that we have normally distributed residuals. However, the d statistic above is sufficient existence to prove positive autocorrelation. Therefore, autocorrelation is an additional problem in the model.

Model revision: model = dispinc temp u

Since dispinc and transpo are so highly collinear, the revised model excludes transpo. I excluded gas because it has a 0.91 correlation with dispinc. The model also includes temp and unemployment. I anticipated that an increase in temp will lead to an increase in motor, an increase in unemployment will lead to a decrease in motor, and an increase in dispinc will lead to an increase in motor. The regression equation and parameter estimates follow:

The revised model has parameter coefficient estimates whose signs match microeconomic theory. This suggests that this model has removed some of the excessive multicollinearity. The variance inflation results confirm this. A VIF of 1 shows no multicollinearity and a VIf above 1 shows moderate correlation between the predictors. The model’s predictors are no longer highly correlation. In addition, the t value of dispinc has slightly increased in magnitude. Temp has a low t value but will remain in the model because of its theoretical significant. U has a statistically significant t value and a probability of 0.0095 of having occurred due to random error.

So, this revision has significantly decreased the model’s multicollinearity and may be approaching the true model. However, autocorrelation still exists. From the original model to the revised model, the 1st order autocorrelation increased from 0.611 to 0.665.

To correct for this, I created 4 new variables to get a sense of the period-to-period correlation: a lag of dispinc (di_lag), a lag of motor (mot_lag), a lag of temp (temp_lag), and a lag of u (u_lag). The matrix below shows that dispinc and its lag have a nearly perfect correlation, motor and its lag have a high correlation, temp and its lag have a slightly negative correlation, and u and its lag have a high correlation.

So, the lagged version of the model still contains high positive autocorrelation. The model also produced nearly identical parameter estimates. Its outputs are what follows:

The next model correction I attempted increased the lag condition to three periods. It showed significantly reduced autocorrelation by the third lag. However, lagging causes loss of degrees of freedom and may not be valid. Given that this data contains 19 time periods, such a loss may not be worth the trade-off. Therefore, perhaps more advanced time series methods are better suited to this data. In addition, it is possible that introducing dummy variables that subset the data according to major economic events would reduce the degree of autocorrelation, e.g. recessionary periods, and before and after 9/11. The results of this regression are what follow.

Conclusion

My original model had severe multicollinearity and autocorrelation. To correct for multicollinearity, I excluded transpo because of its high correlation with dispinc, and included u and temp. Still, autocorrelation remained. To correct for autocorrelation, I lagged the variables by one time period. It was difficult to minimize autocorrelation in this model. The presence of autocorrelation makes the parameter estimates difficult to interpret and ambiguous because we no longer have the minimum variance property. Further ways to improve the model would be to investigate and apply more advanced time series. For example, it would be useful to take a more detailed look at the theoretical validity of lagging by three time periods for this data.