Friday, June 3, 2016

Here I am back at my original blog to post the results of a study comparing the mortality rates of two sports leagues, the National Basketball Association (NBA) and the National Hockey League (NHL). This will be part report and part lesson in statistics.

Those of you not already asleep are welcome to follow the rest of this.

Motivation: Earlier this week, the Facebook group alt.obituaries posted the death notices of two hockey players, both under 70 at the time of their deaths. This made me wonder if hockey might be bad for one's long term health, so I decided to compare it to basketball. The sports are not comparable in terms of play, but they do have very similar lengths of seasons.

Designing the data sets: I decided in the interest of nice round numbers to look at the rosters of all the teams in the 1975-76 season of the NBA and NHL, which is forty years ago. I did this because of the ages of the deceased this week, Tom Lysiak and Rick MacLeish who died at 63 and 66 respectively. I figured they would both have been playing in that season and I was correct.

Data source and methodology: I went to the websites Hockey-Reference.com and Basketball-Reference.com, both part of the Sports-Reference.com family of websites. While I cannot prove the 100% reliability of these sources, every player in both sports that I knew had died in the past year was listed as deceased, so I am going to assume this website is as reliable a source as I am likely to find.

While this source is reliable, it is not 100% complete. My original idea was to include players from the two rival leagues, the American Basketball Association and the World Hockey Association, both of which fielded teams in 1975-76, but I could not find the data for those teams.

The question, stated as a Null Hypothesis and an Alternate Hypothesis: Is there a difference between the mortality rates of the NHL and the NBA? In a statistical test like this, the Null Hypothesis H0 always has to state that nothing special is happening, which in this case would be that there is no significant difference. The Alternate Hypothesis HAis that there is something special happening, that there is a difference and it is significant. Because of the two hockey deaths this week, I assumed hockey - the clearly more violent of the two sports - might be worse than basketball, but two deaths is not enough information to make an assumption, so we will make the test two-tailed, which means we would be surprised either by significantly greater mortality in the NHL versus the NBA or vice versa. We need to set a confidence level for what we will consider significant.The standard for publishing papers in nearly every field is 95% confidence, which means we need p-values less than .05.

Data set sizes: There are more men on a hockey roster than on a basketball roster and injuries are more common, so the NHL list for that season is longer than the NBA list. In the NHL, n = 461while in the NBA, n = 228.

Significant statistic #1: The average age in the NHL that season was younger than the average age in the NBA. The numbers for the birth years were as follows, rounded to two places after the decimal.

This has to do with the fact most NBA players come out of college while most NHL players come up through the ranks of junior hockey. The difference was only 0.7 years, or about 250 days, but because of the size of the sets and the low standard deviations in both sets, the two tailed p-value was .0265 and so the data is statistically significant, while in terms of mortality rates, 250 days difference in age is negligible in effect, especially for two sets of men averaging less than 70 years old. It should be noted that I only took down the year of birth, not the date, so rounding to two places after the decimal should be different if the measurement system was more precise.

Significant statistic #2: The percentage of now dead players from the NBA 1975-76 is 15.8% while the NHL roster from the same year has only 9.3% deceased. The p-value for this difference is .0092. When publishing papers, lower p-values are better, so this difference is more impressive than the difference of less than a year in age between the rosters. Moreover, because the data of dead/alive is categorical, the size of our two data sets is on the modest end of acceptable, while when comparing numerical data, the data sets are rather large.

Guessing at the source of the difference: Obviously, the more violent nature of hockey does not make it a "deadlier" sport in the long run. The best guess I have for the cause of the significant difference is race and nationality.

African Americans are extremely well represented in basketball, especially in comparison to the population of the United States. The NHL in 1975-76 was still largely a white and mostly Canadian league. (The upstart WHA had a lot of European players. Once the leagues merger, the league became less and less Canadian over time, with a large influx of Eastern European talent after the collapse of the Soviet Union.) The difference in mortality very likely has more to do with comparing the overall mortality rates of black Americans to white Canadians.

Where to go next: The data set makes it hard to compare these cohorts to the general population. When I did a baseball vs. football comparison years ago, I hunted down players born in certain years, which meant I could use the excellent database of the Social Security system to compare the mortality rates against what was the expected mortality for men precisely that age. I don't have a good idea on how to collect a set of about 400 Canadian men who would have been between 20 and 35 in 1976 and a similar set of about 200 American men. If we could find such sets, we could have an idea of whether playing these sports is hazardous to your long term health. The earlier study I did said both baseball and football players are slightly healthier than the population as a whole.