Sabremetrics and Math: How sports can teach statistics

Do those words scare you? If they do, you’re in good company. Mathematical anxiety is a well studied phenomenon that manifests for a number of different reasons. It’s an issue I’ve talked about before at length, and something that frustrates me no end. In my opinion though, one of the biggest culprits behind this is how math alienates people. Lets try an example:

If the average of three distinct positive integers is 22, what is the largest possible value of these three integers?
A: 64
B: 63
C: 33
D: 42
E: 48

Too easy? How about this one:

The average of the integers 24, 6, 12, x and y is 11. What is the value of the sum x + y?

A: 11
B: 17
C: 13
D: 15

I do statistics regularly, and I find these tricky. Not because the underlying math is hard, or that they’re fundamentally “difficult,” but because you have to read the question 3 or 4 times just to figure out what they’re asking. This is exacerbated at higher levels, where you need to first understand the problem, and then understand the math.*

One of my main objectives as a statistics instructor is to take “fear” out of the equation (math joke!), and make my students comfortable with the underlying mathematical concepts. I’m not looking for everyone to become a statistician, but I do want them to be able to understand statistics in everyday life. Once they have mastered the underlying concepts, we can then apply them to new and novel situations. Given most of my students are athletically minded or have a basic understanding of sports, this is a logical and reasonable place to start.

The mean number of teeth in adults is 32. The mean number of teeth among hockey players is considerably less | Chris Neil picture source: NHLPA

First, a little backstory. The world of sports has undergone a major shift in the past 20 years. While in the 50s and 60s it was a much smaller enterprise, now it is a multi-million dollar business, where player performance is vitally important. When every dollar counts, you use every tool at your disposal to maximise your assets – including recording everything you can (documented in the book and film Moneyball). Shots, goals, assists, batting averages, yards gained, completions, you name it, there are stats available. But it’s not just owners, management and staff who use this information – armchair fans are now using this information to help them draft the best fantasy team possible – as there is a large amount of money to be won by competing in these leagues. As a result, a lot of data is freely available online.

Let me illustrate this with an example. One of the first concepts people learn about is the difference between mean vs median vs mode.

To reiterate: the mean is the average value, the median is the middle value (which is useful if your data are very skewed), and the mode is the most common value. Typically, this is accompanied by an example of birth weight, or something somewhat relateable. However, it’s hard to understand why there is a difference between these numbers as they are typically the same, as much of the “example” data we use is almost all normally distributed, or is skewed because of some other, usually more convoluted, reason. But not so in the case of sports.

Note: All examples use data on all players from the 2010-2011 NHL season. They were taken from Hockey-Reference, which has a great list of stats on the NHL going all the way back to 1917 (!).

Lets start with age and look at the mean, median and modal values. The mean is 26.6, the median is 26.0 and the mode is 26. Which basically tells us that the mean age of players in the NHL is 26.6, the “middle value” for age is 26, and the most common age is 26. Graphically, it looks like this:

Those are all very similar, which makes it difficult to see the difference between the values. However, all students have an intuitive understanding of age – they see most players are 20 to 30 years old, and there are very few who continue to play into their late 30s (except Teemu Selanne, who is actually Benjamin Button).

This changes when we look at another important statistic in hockey – goals. In this case, the mean is 7.5, the median is 4.0 and the mode is 0. This is interesting, as it tells us the “average” number of goals scored in the NHL is 7.5, the median, or “middle value” is 4.0, but the most common value is 0, i.e. a large number of people in the NHL didn’t score any goals. The data are highly skewed, and, more importantly, students can understand why, so they can dedicate their energy in understanding what that skew “means” in statistical terms.

The distribution of goals scored in the 2010-11 NHL season | Source: Hockey-reference

Here, the concept of “skew” is very clear, and you can see that the most common number of goals scored in the NHL is 0, i.e. many players didn’t score any goals at all! This is considerably easier to understand than an example on blood pressure, birth weights, or mileage on cars, and takes the intimidation factor out of statistics.

This is one example of how sports can be used to highlight a statistical concept that I find students struggle with. However, here’s where the real power of sports stats comes in handy: You can scale this up to cover advanced concepts. You want to compare means between groups, (i.e. t-tests)? You can calculate the mean number of goals scored by forwards and defencemen and compare them (forwards score more goals). Need to do a chi-square test? Look at the number of forwards and defencemen on each team and if different teams have different numbers (they don’t). Need to talk about regression? Why not model goals scored and how much time on ice you get to see if more time results in more goals. The possibilities keep going from there.**

The thing I like the most about this is how accessible this makes things. Take away the intimidating part of math, and all of a sudden it’s not nearly as scary. You can change sports to pretty much anything else – baseball, football (association or gridiron), or even other widely available databases – movie revenue by genre, number of albums sold by pop artists, voter turnout in recent elections, whatever connects with your students. Once you’ve made the example relatable and have removed the “fear” part of the statistics equation, math can suddenly become much more interesting and engaging to students. And once they’re engaged, learning will become that much easier.

=====

*I should point out: I’m not against difficult problems, as comprehension is an important skill to develop in order to apply statistics to new and novel situations. But lets leave that for another day, and not start there. The way we teach statistics and math now is like asking a toddler to do cartwheels on a balance beam above a lake of hungry alligators before they can walk.

**If you would like me to provide webinars/slideshares on statistical concepts in future posts, let me know in the comments.

Share this page

A note to readers…

The PLOS BLOGS Network is made up of two types of blogs, the six staff-written blogs from PLOS journal editors or departmental teams, at the top of the next column, and PLOS BLOGS Network-hosted independent blogs, listed below them. Independent blogs are not pre-screened or edited by PLOS; as such any views presented are solely those of their authors, and do not necessarily represent views of PLOS. Unless otherwise noted, all posts on active PLOS BLOGS are published under a Creative Commons CC BY 4.0 license, making them available for reuse by anyone, for any purpose, with appropriate attribution. For questions or comments please contact blogs@plos.org