Remember when Billy Packer declared the 2008 Final Four game between Kansas and North Carolina over? Billy got a bit of blowback for that, especially after UNC was able to pull within four points midway through the second half. I always felt like Billy was on safe ground with his statement. Granted, I supposed “over” taken literally means that there was no chance of the game becoming interesting. I took it to mean UNC had no chance of winning, although of course there was some small chance of winning. But just how safe was Billy’s statement?

Previous attempts to quantify in-game win probabilities in college basketball are limited and have left me unsatisfied because none of them accounted for information known before the game starts. For instance, if Kansas and Alcorn State were tied five minutes into a game, we could come up with a better estimate than just saying each team has an equal chance of winning at that point. We can do better and this post documents my first attempt to do so.

My first step was to estimate a team’s chances of winning, knowing the time and score, and assuming a game between teams of equal strength. To do this, I filtered play-by-play data using my ratings (while accounting for game location). This limits the sample to about 700 play-by-plays involving nearly equal teams, but that’s enough to make reasonable estimates of the probability. With each game, I recorded the lead at a given time and then whether that team won the game. As an example, there were 76 times that a team led by four with ten minutes to go in the first half. Those teams won 56.6% of the time.

We can’t take that number literally because teams with a 5-point lead at that time had a winning percentage of 67.2, which is a larger difference than is logical. So some smoothing of the data had to be applied, then some logistic regression, and finally I got a table of values that makes sense, as shown below.

You can read the values as percent times 10. So that team with a four-point lead with 10 minutes left in the first half has a 58.1% chance of winning. This table ignores a couple of important things, namely which team has possession of the ball and the pace of the game. I’m going to punt on the latter for now, since the effect of pace on winning probabilities is an issue requiring additional study. For the possession issue, it seems reasonable to add a point to whichever team has possession since that’s the expected value of a possession. (Update: My original logic was batty on this issue. It’s more correct to add a half-point for possession.)

I feel that this table is very accurate for teams of even strength, but unfortunately such a matchup is rare in college basketball. Even the two games in the national semifinals, which are matchups of comparable teams, would not have made it through my filter for finding a battle of nearly equal teams. The difficult part is trying to account for team strength.

I need to use an example to explain why. Let’s say we have a game where we assume one team has a 90% chance to win before the game starts. Now suppose that the game is tied at halftime. From our trusty chart, our favorite would have a 50% chance of winning were it an even match with its opponent. The simple thing to do would be to average our two values – our team has a 70% chance to win now. It seems to make sense to use this linear approach, but one can quickly poke holes in it.

Suppose the favorite jumped out to a 15-point lead five minutes into the game. Our chart gives the even-strength team a 70.4% chance of winning in that case. Using the linear method, the favorite would now have an 87% chance of winning. But wait, our favorite just jumped all over their opponent, and their chance of winning dropped slightly? Think of it another way. With these two teams starting tied and 40 minutes of basketball ahead of them, the underdog had a 10% chance for victory. Now faced with a 15-point deficit and just 35 minutes remaining, the ‘dog has a better chance of winning? It doesn’t make sense.

(From this point on, I only recommend reading if you like awkwardly-structured sentences and math. Just know that I have a good formula to calculate win probabilities given the score, time remaining, team possession, and the relative strength of the teams involved. And also know that I’ll be tweeting the in-game probabilities at five-minute game-time intervals during the Final Four.)

I’ve used two tricks to overcome this. First, I’m not going treat time as linear. This doesn’t change much in the example provided at the 35-minute mark, but think about the halftime example. I don’t believe our favorite had a 70% chance to win at that point. I believe it was higher. I’m not going to bore you with theory on this point, and I haven’t looked at data to support the idea. For now, I’m accepting it. If need be, players are going to try harder as the game goes on. In order to account for this, I’m altering the time scale of the game by taking the square root of the fractional time remaining. That’s a mouthful, but at halftime, instead of assuming there is 50% of the game yet to be played, I’m going to pretend like there’s 70.7% of the game left to be played.

However, at the 35-minute mark, no combination of our initial 90% and the predicted 70.4% will give us a number higher than 90%, which is what would make sense. For this, I’m using log5 to adjust our initial estimate of our favorite, using 90% for the favorite, and the 39.6% (100%-70.4%) that’s the even-strength estimate for the opposing team at this point. That returns a value of 95.5%. I can use that in the linear calculation of win probability. I actually convert the probability to odds before I do this. But putting 95.5% and 70.4% into this sausage machine returns a probability of 95.3% that our favored team will win once they have a 15-point lead five minutes into the game. That our favorite’s chances went from 90% to 95.3% with their early run sounds reasonable.

There’s lots more calibration to do with this system, but since I just thought about doing this a few days ago, it was necessary to get something done before the Final Four started. This will allow us to get a feel for how important events affect the outcome of each game this weekend.

By the way, according to the formula, UNC had about a 5% chance of coming back on Kansas when they were down 28 with 5 minutes to go in the first half. If that seems high, it may be. In my database of evenly-matched games, the largest deficit a team faced at that point in the game was 22. But amazingly, I have cases where a team overcame a 21- and a 19-point deficit. So perhaps Billy Packer was slightly crazy for jumping to conclusions when he did.