You can use any model you want, as long as it’s linear and has a positive slope

It is inherently challenging to use a model to make predictions beyond the range of data to which the model was fit — making predictions about the future, for example, based on the past and present. Or, as the Danish proverb more elegantly says, “It’s difficult to make predictions, especially about the future.” Still, there’s no excuse for predictions that are quite as bad as these, from the U. S. Department of Transportation, on the predicted amount of total vehicle travel in the U.S., which I learned about via Andrew Gelman’s blog:

The black curve is the actual number of trillions of miles driven, and the lines are the predictions made by the DOT in various years. As noted by Clark Williams-Derry, “the US Department of Transportation has been making the virtually identical vehicle travel forecasts for well over a decade. All of those forecasts project rapid and incessant growth in vehicle travel for as far as the eye can see. Meanwhile, actual traffic volumes have flattened out, and may actually be falling.” (Original graph source: SSTI).

What could the prediction, which clearly looks nothing like recent reality, be based on? A suggestion in the comments to Gelman’s post was that it’s simply extrapolating the known positive correlation between vehicle use and GDP, together with the relatively stable exponential growth in GDP over the past century or so — i.e. assuming that the positive growth trend is inescapable, despite many years’ data to the contrary.

Since this is somewhat relevant to the class I’m teaching (Physics of Energy and the Environment), and since we’re starting to cover transportation (see e.g. this article), and since I’m fond of graphs, I thought I’d look up and plot the driving data normalized in different ways. Here they are (w/ data sources at the end):

Total miles (as above):

Total miles per person:

Total miles per $ of GDP, in constant 2009 dollars:

It seems fairly evident from all of these, especially miles driven per $GDP, that the trend of the data has changed considerably in the past 1-2 decades. Why, then, would the DoT act as if the 1960-1990 trend is immutable? I will leave it to you, dear reader, to supply your own possible reasons…

I have always regarded principal components analysis (and its close cousin, factor analysis) with a great deal of scepticism. Factor analysis with poor graphing goes the extra mile.

The projections of vehicle miles reminded me of a counter-example of the limits of linear regression from a stats text used in my freshman year in college. The weights of the starting defensive line at the University of Texas were plotted and projected (I’m old enough that in those days, UT was a prestige football programme).

Recall the old maxim of Box – all models are wrong; some models are useful.

Were the defensive linemen expected to have an infinite mass, or zero? In any case, this reminds me of the widely criticized (and quite wrong) prediction in Nature long ago of women’s and men’s running speeds becoming equal around now, again a product of linear extrapolation. I think the article can be accessed here: