Essays on education, debate, and math instruction; neat math problems; and whatever else I get around to.

Saturday, February 11, 2017

The Logit Score: a new way to rate debate teams

I recently published an article on a new debate team-rating method I invented, called the logit score. I hope the logit score will take its place among win-loss record, average speaker points, median speaker points, opponent wins, ranks, and so on as an effective way to rate (and thus rank) debate teams at a tournament.

What is the logit score?

The basic idea is simple: the logit score combines win-loss record, speaker points, and opponent strength into one score using a probability model. In other words, the logit score is the answer to the question, "Given these speaker points and these wins and losses to those particular opponents, what is the likeliest strength of this team?"

Let's take a step back and acknowledge a truth not universally acknowledged in debate: results should be thought of as probabilities, not certainties. A good team won't always beat a bad team--just usually. Off days, unusual arguments, mistakes, and odd judging decisions all contribute to a slight risk of the bad team winning. The truly better team won't always prevail. That means actual rounds need to be thought of as suggesting but not definitively proving which team is better. Team A beats team B. Team A is probably better, but then again, they could have had off day, been surprised by a weird argument, or had a terrible judge. If team A got much, much higher speaker points, it was very likely the better team. If team A only edged out team B by a little bit, then the uncertainty grows.

That's where the logit score comes in. Estimating team A's actual, true strength depends on putting together all of those probabilities and uncertainties into one model. I won't get into the specifics (the details are in the article), but the basic idea is using a logistic regression to put the probabilities for wins and losses to specific opponents as well as specific speaker points received together. The logit score for a team means: "If team A were estimated to be stronger, these results would be a bit more likely, but those other results would be far less likely. If team A were estimated to be weaker, these results would be far less likely, even though those other results would be a bit more likely. This logit score is the proper balance that makes all the results most likely overall." Because it factors in all the results in one probability model, the logit score isn't sensitive to outliers: unusually high or low speaker points, losses to outstanding teams, and wins over terrible teams don't affect the logit score much at all.

Does the logit score have any empirical results to back it up?

I took a past college debate season, used those results to give every team a logit score, and then looked to see how well logit scores "retrodicted" the actual results in a season. That is to say, how often did the higher logit scoring team win rounds against the lower logit scoring team? As a baseline of comparison, I also did the same kind of analysis by ranking the teams by win-loss record.

The logit score rankings got slightly more rounds correct than the win-loss record rankings.

The slightly higher accuracy is not, on its own, a reason to rush to adopt logit scores. It merely proves that the logit scores aren't doing anything crazy. For the most part, the logit scores reshuffles teams ever so slightly with their nearest peers. The moves are slight ups or downs, not drastic shifts.

The real reason to consider using logit scores is that (a) they are less sensitive to outliers, which can matter a lot for a six or eight round tournament; and (b) they factor in more information. Win-loss records only use speaker points as a tiebreaker; it's secondary. Measures of opponent strength usually come third. In other words, a team with a really tough random draw and goes 4-2 as a result of dropping the first two rounds might miss out on breaking if no 4-2s break--win-loss record comes first and opponent strength won't factor in in that scenario. The logit score on the other hand--because wins, points, and opponents are all factored in at once--could reflect that this team is in fact very strong because it only lost two rounds to very good opponents. (See how important it is to be less sensitive to outliers?) More information also rewards well-rounded teams: those that win rounds on squeakingly close decisions and don't receive great speaker points are penalized more under a logit score system than a win-loss-then speaker points-system.

2 comments:

Honestly the question running through my head after reading this article is what we value in debate. I'm convinced the logit score is a good way to rank debaters skill in debate, but it occurs to me that skill isn't always necessarily what we are ranking. Old fashioned methods seem to rank debaters based on performance. It seems like there might be reasons artificially value this when determining elimination seeding at the least, and potentially preliminary brackets as well. (Using the logit score in preliminary seeding is especially strange - where do you draw the lines between one bracket and another? If the divisors are drawn based on record is the bracket a given team is placed in dependent on their logit score or their record? Alternatively some kind of evenly-proportioned brackets or maybe size them corresponding to Pascal's triangle? Endless possibilities!)

To be clear I'm not at all making an argument for or against this system; it just seems to be the natural extension of this line of inquiry.

I'm happily amazed that I've convinced you that the logit score is a good way to rank teams -- glad my work was persuasive. I would imagine it being used to determine who breaks and rank them for elims. It doesn't seem necessary to do it for prelims, but I guess you could. (Win-loss first, then ranked by logit score, and paired high-low maybe?)