Bayesian Training Camp

Pretty much without exception, there is always at least one surprise during the lead-up to the NHL season, and generally there are multiple surprises. How much weight should we give a standout performance from an unknown? How worried should we be about a struggling veteran?

Oddly enough, an 18th century Englishman can help us with the answer.

This will sound like math, but numbers aren’t really essential. While Thomas Bayes’ famous theorem is mathematically expressed, it’s not really as important to understand the numbers as it is to understand the basic line of thinking when making judgments.

Let’s try it using a player, one in his mid-20’s who has played 100-odd NHL games and was an NHL’er for half of last season. We could call him Player X and keep this hypothetical, but to keep this easy to track we’ll use a concrete example: Jesse Joensuu. What we’re trying to determine is whether Joensuu (or X) is an NHL player; our hypothesis is that he is.

The Prior

The first thing to do is establish what we think about Joensuu before we see so much as a second of training camp. This is easiest before camp, when we haven’t seen anything and are completely uninfluenced; it’s harder to do after the fact. Since I’m writing this, we’ll use my estimates – your mileage may vary, and you may have more or less experience watching him play, but it’s the process rather than the exact numbers that matter. Here are the main things I know about the player:

I saw Joensuu play 42 regular season games last year, plus some time in the preseason. He looked good before the year but terrible during it. Based on my observations alone, I don’t think much of him.

Joensuu’s numbers from 2013-14 are interesting. His relative Corsi was middling on a lousy team, but he also had flat-out brutal zonestarts. Further he got murdered by PDO (4.4 on-ice shooting percentage, 0.879 save percentage). PDO isn’t the same as luck, but most players don’t have a massive impact for good or bad on shooting percentage or save percentage, and in Joensuu’s case he’s a winger (meaning his defensive responsibilities are lessened and he shouldn’t be dramatically impacting save percentage) and he’s been a decent offensive guy overseas and in the AHL (suggesting that he’s not going to kill shooting percentage for his whole line). I’m inclined to be a little more charitable now than I was going solely by my eyes.

Joensuu had another long audition in 2010-11 with the Islanders. I didn’t see many of these games and don’t remember any that I did, so I’m leaning on the numbers here. He had a PDO of 100.2, which suggests that last year’s number was an aberration. He had tough zonestarts but a lousy Corsi number on a terrible team. That’s a few years back, and he’s probably improved since, but that’s suggestive too.

Okay, so I’m more charitable toward him after looking at the numbers than I was before, but I still think he’s probably a ‘tweener rather than a real NHL’er. Let’s say I think there is a one-in-five (20 percent) chance that he’s an NHL’er.

Training Camp Performance

Once we know our beliefs going in, we can assess what happens at training camp. For the sake of argument, let’s say that Joensuu does what he did a year ago and has a great training camp. We now have to establish two things:

First, we need to figure out what the odds are of Joensuu having a great camp if our hypothesis is true – in other words, if he is an NHL player. We know that most real NHL players are going to stand out in training camp; for this exercise let’s say that there is a 75 percent chance that a real NHL player will have a great training camp.

Second, we need to figure out what the odds are of Joensuu having a great camp are if our hypothesis is false – in other words, what are the chances of a good AHL’er coming in and having a great camp. We know some do; let’s say that once every four camps a good AHL player will really stand out (i.e. a 25 percent chance for any given training camp).

Nate Silver goes through this process really well in his book, so I’m going to steal his chart and adapt it to our example:

Using Bayes theorem, we see that our initial 20 percent guess that Joensuu is an NHL player jumps to 43 percent if he has a great camp. Now, we used imaginary numbers here – the reader is free to tweak the estimates above any which way he or she likes and come up with a different answer based on their own beliefs. The important thing is process.

You said the math didn’t matter!

Again, the important thing here isn’t so much the exact math as it is process. There are plenty of people out there who think in this general manner without ever putting it into numbers.

It’s all about weighting our information correctly. The first thing is to have a firm prior – in other words, not to forget the miles and miles of road that led us to where we are now. Every one of these players has a track record coming in, and when deciding whether or not they’re ready it’s extremely important to keep that track record in mind. That’s why NHL training camps aren’t really about the guys at the top of the roster or the prospects at the bottom – they’re about all the guys in the middle, the Joensuus and Pitlicks and Pinizottos and Pakarinens, the players close enough together coming in that the training camp results can push one in front of another.

The second thing is to have a good idea of how useful new information is – in this case, to have a good idea of what a strong training camp actually means. It’s vital not to overvalue an exercise like training camp, to remember that middling guys can look great for a handful of games and vice versa. It’s equally critical not to undervalue training camp – it’s useful information that shouldn’t be ignored in the decision-making process.

The math is just one way of expressing the way we balance new knowledge and old knowledge before coming to a decision. As is often the case in hockey and beyond it, putting that balance into numbers is far less important than making sure we’re striking the correct balance regardless of how we’re expressing the information.

Great article. Interesting kind of numbers approach that results in something that feels kind of organic or intuitive. I’ve been cheering on Joensuu since he got to Edmonton. I still hold out hope that he turns into the player that some of his numbers suggest. I think he got a raw deal in zone starts and linemates last year as well as absolutely terrible puck luck and so deserves another shot. His upside is way better than the alternatives. If you just look at the last non-NHL seasons of some of his competitors he really stands out. Pitlick, Pakarinen, Pinizzotto and Moroz all have an NHLE in the 20-22pts range, so there is not much separating them. Joensuu’s NHLE from his lockout season in SM-Liiga is 46pts(!). He was also a shot machine (in a very small sample size for NYI) getting 15 shots in just 71 minutes of ice time. That works out to about 3 shots per game based on about 14 minutes of ice time. That’s an impressive little run. He also has size and decent foot speed (a bit lanky and clumsly looking though). On paper he looks like a pretty good bet. If he shows well at training camp and preseason again, I think he can make the team at the start of the season. Whether he sticks or not is another story. I have the numbers at 33%, 75%, 25% which results in 60% adjusted probability.

Guaranteed so many “anti math” people got halfway through the article, were confused by numbers, and jumped to the conclusion that you’re trying to put an exact formula to quantify a players chance of making the oilers.

In reality, you’re using a combination of watching games (NERD) and weighing previous information in light of proven statistics to ballpark your opinion of a player. AND the article makes a point much larger than simply the value of Uncle Jesse. Good read young Willis.

JW, as you set up the example you were using Bayes Theorem to predict the results of good training camp.

However, your example is really a multistage inference because you haven’t established any strong prior basis for Joensuu to be .40. We would need the modified bayes theorem, multistage inferences, where in the first stage you show how Joensuu fits into the population of players that make team in say a fourth line role versus some other population of players say rookies drafted in top ten.

In single stage inferences, when we are shown the percentage likelihood, us humans are conservative and guess low. i.e. We would guess that X player trying to make team in fourth line role won’t make NHL even with good training camp.

In multistage examples, it has been shown that we overinflate the first stages and us humans treat it as a ‘best guess’ that first stage succeeds. i.e. We treat Joensuu as if he falls into the .40 percent category of players that make team in fourth line role.

That should have read .20 as JW said our prior was 20 percent, one in five chance.

The first single stage inference question becomes: how did we get Joensuu as a .20 starting input for the Bayes formula for good training camp? Without the strong basis for .20, as was already mentioned, we’d be misusing Bayes theorem.

As much as JW wanted us thinking about the overall process, he left out the part about the starting point having to be well grounded data for the impact of state change to be meaningful.