Note: For those of you unfamiliar with regression analysis or if it’s been a while, it’s explained as follows by this website:

“Regression analysis is most often used for prediction. The goal in regression analysis is to create a mathematical model that can be used to predict the values of a dependent variable based upon the values of an independent variable. In other words, we use the model to predict the value of Y when we know the value of X.”

A couple of weeks ago Andrew began a series of articles about expectations for Chris Johnson going forward.The articles got me thinking, so I decided to run a regression using the data he posted regarding the NFL rushing title winners dating back to 1978.

I posted the regression results in the comments section of Andrew’s second C.J. article.Since then, I’ve given a bit of thought about the best way to specify the model.For this article, I actually specified two models.The first looks at yards in each of the five subsequent years as a percent of yards gained in a rushing title year.The second model simply looks at yards per year for the five years following a rushing title.In each model, I controlled for the following variables as of the year that the rushing title was won:

Years in the NFL (as a quadratic)

As expected, the coefficient on age was negative for every year in both models.Also as expected, the coefficient on the quadratic was universally positive, meaning that the drop off in production is more pronounced for players who win rushing titles later in their careers than for those who win rushing titles early in their careers.Coefficients were statistically significant at the five percent level in both models in the fourth and fifth years following rushing titles.

Career Attempts (as a quadratic)

My initial thought was that the coefficient on both attempts and attempts squared would be negative.The coefficient on career attempts was however positive.It seems counterintuitive as a general rule that yards per year would increase as career attempts increase, but this seems to be an artifact of this particular data set.George Rogers, Freeman McNeil, Marcus Allen, Christian Okoye, and Terrell Davis all saw their production drop dramatically following their respective rushing titles.Edgerrin James would twice again rush for over 1500 yards, but his totals immediately following his back-to-back rushing titles were lower, in part due to injuries.Ricky Williams year-to-year totals were sporadic due to both suspensions and injuries.These seven players won eight rushing titles in their first four years in the league and didn’t maintain their level of productivity.

Career Yards/Att

This was calculated through the year in which the respective rushing title was won.I thought that the sign on this variable might be positive (more yds/att = faster/quicker & less pounding).On the other hand, it could be negative: more yds/att might mean smaller quicker backs who can’t take the hits over time.In both models, the coefficient is positive in each of the first three years and negative in the last two.It didn’t turn out to be very informative.

Yards

The coefficient in the regression of yards as a percent of rushing title yards is negative as expected.I wasn’t sure what to expect in the second model, but the coefficient was similarly negative and consistently statistically significant.It appears that very high rushing totals do have non-negligible impact on not only relative future performance, but absolute performance as well.

I also included controls for the strike-shortened 1982 season and for retirement.Below is a synopsis of my results.

Model 1: Yards as Pct of Title Yards

Year T + 1

Year T + 2

Year T + 3

Year T + 4

Year T + 5

Years

**

**

Years2

**

**

Career Att

*

**

Career Att2

*

**

Career Yds/Att

**

*

Yards

**

**

**

**

**

Strike Season

**

**

**

**

NA

Retired

NA

*

**

**

Constant

**

**

C.J. Proj. Pct

66.2

76.2

47.0

56.9

33.6

C.J. Proj Yds

1327

1528

944

1142

674

**statistically significant at 5% level

*statistically significant at 10% level

Model 2: Yards Per Year

Year T + 1

Year T + 2

Year T + 3

Year T + 4

Year T + 5

Years

**

**

Years2

**

**

Career Att

*

**

Career Att2

**

**

Career Yds/Att

Yards

Strike Season

**

**

**

NA

Retired

NA

**

**

**

Constant

**

C.J. Projected

1347

1376

1211

1141

813

Sample Avg

1248

1009

935

796

702

**statistically significant at 5% level

*statistically significant at 10% level

As you can see, the totals I came up with here are nowhere near those from the previous regression, but I do think this model is better specified.What are your thoughts?How would you feel if Chris Johnson posted these numbers over the next five years?Also, what if anything should I have included in these models?