Monday, November 7, 2011

New scoring scheme for Low-Key 2012?

Low-Key scoring has gone through various phases.

In the 1990's, we scored based on fastest rider. The fastest man and the fastest women each week would score 100. Those slower would score based on the percentage of the fastest rider's score. This was super-simple, but when an exceptionally fast rider would show up, everyone else would score lower than normal. Additionally, this was frustrating for the fastest rider (typically Tracy Colwell among the men), since no matter how hard he or she pushed himself, the result would be the same 100 points.

So with Low-Key 2.0 in 2006, we switched to using the median rider (again treatng men and women separately). The median is much less sensitive to whether a particular individual shows up or not, so scores were now more stable. However, there was still an issue with women, and most especially with our hybrid-electric division, since smaller turnouts in these again made the score sensitive to who showed up.

So in 2010 I updated the system so now all riders were scored using a single median time, except instead of actual time, I used an "effective mens's time" using our history of Low-Key data to generate conversion factors from women's and hybrid electric's times to men's times. Mixed tandem's were scored by averaging a men's and a women's effective time.

This worked even better. Now if just a few women show, it's possible for them to all score over 100 points, as happened at Mix Canyon Road this past Saturday.

But the issue with Mix Canyon Road was because the climb is so challenging, and for many it was a longer than normal drive to reach, the turn-out among more endurance-oriented riders was relatively poor. The average rider at Mix would have scored over 100 points during, for example, Montebello (data here). It seems almost everyone who did both climbs had "a bad day" at Mix. That is far from the truth!

There is another scoring scheme I've been contemplating for many years. It's one which doesn't use a median time for each week, but rather compares the times of riders who did multiple weeks to come up with a relative time ratio for each climb. So if, for example, five riders did both Montebello and Mix, and if each one of them took exactly 10% longer to climb Mix, then a rider on Mix should score the same as a different rider on Montebello as long as the Mix rider's time was exactly 10% longer than the Montebello rider's time, once again after adjusting for whether the rider is a male, female, or hybrid-electric.

So why haven't I made this switch yet? It sounds good, right?

Well, for one it's more work for me. I'd need to code it. But that's not too bad because I know exactly what I need to do to make it work.

Another is it's harder to explain. It involves iterative solution, for example. I like things which are easy to explain. Median time is simple.

But another is it would mean scores for any week wouldn't be final until the entire series was complete. So a rider might celebrate scoring 100.01 points on Montebello, only to see that score drop to below 100 points later in the series. Why? Because the time conversion factor for a given climb would depend on how all riders did on that climb versus other climbs. And it's not as simple as I described: for example if rider A does climbs 1 and 2, and rider B does climbs 2 and 3, then that gives me valuable information about how climb 1 compares to climb 3. In effect I need to use every such connection to determine the conversion factor between these climbs.

But while scores might change for a climb, the ranking between riders during the climb would not. That's the most important thing. Finish faster than someone and you get a higher score. The conversion factor between men and women, for example, would stay the same. That's based on close to 10 years of data, so no need to continue to tweak that further.

I'll need to get to work on this and see if I can make progress. I'll describe my proposed algorithm next post.

11 comments:

How using the climb difficulty rating, which is already calculated at the beginning of the series? Using it straight would create too much of a disparity between climbs (Mix Canyon at 244 vs. Palomares at 57), but I'm sure you could figure out a better way to use it.

This could be done by weighting each result proportional to the climb rating. For example, a climb rated 200 could count twice in the results, while a climb rated 100 (OLH, by definition) could count once. Then the number of climbs needed to rank in the overall standings would climbs whose ratings sum to at least half the total of the sum of the weightings of all climbs ridden so far.

But I don't want to do this because I view being fast on short climbs to be as valid as being fast on long climbs. Some may excel in one versus the other, and each type of rider should get their chance.

Rather the key issue here is the average ability of riders in a particular week. The new algorithm will automatically figure that out. I already started writing the code on the train this morning.

Note I did something very similar in the distant, distant past. See:http://lowkey.djconnel.com/1995/results_analysis.html

That code is now lost, written in an inferior scripting language (awk)... to be honest a bit of what I did then is now somewhat over my head :). But I'll do the best I can.

Dan- why don't you just key off of one person who shows up every week, say, YOU. In fact, you could just score it based on the 2010 (or any other year) Dan. E.g. 2010 Dan = 100. That seems pretty fair and is also very easy and has none of the problems you are worried about.

That's not quite fair, since if I have a good day, everyone suffers, and if I have a bad day, everyone benefits, so I'd probably get taken into a ditch on the side of the road... I'd fear for my safety! :)

The approach I propose here essentially uses everyone as a week-to-week reference. Everyone who does both weeks 1 and 2, for example, contributes to the reference time of week 2 relative to week 1. But also do people who do week 1 and week 3, along with those who do week 3 and week 2. The connections are infinitely complex, yet code-wise it's extremely simple.

This is a thought-provoking idea. I see what you are trying to do, but the property that weekly scores can change all the way through to the final climb is confusing and adds uncertainty (or excitement?) to the series scoring. If you computed the series total scores the old way and the new way, I see several possible outcomes. One is that the series positions don't change at all, and then one questions whether the change was needed. Another is that the series positions change a little with some riders being rewarded and some being penalized. This could result in a situation where top positions flip based on the final round of results due to math that nobody else really understands. Another outcome is that the series positions change a lot with the new system which I think would tend to undermine confidence in the scoring system itself.

Yes -- the overall results could in theory, likely not in practice, be difficult to predict week-to-week. The present system is quite deterministic since existing scores are final. "If I beat rider XXX by YYY points I will finish ahead of him/her in the overall". With scores getting discarded + volunteer points it's a bit complex already, but at least predictive.

With this scheme nobody, even I, will be able to predict what will be needed for rider A to beat rider B. For example, rider A may even be ahead of rider B with all scores more than rider B, rider A beats rider B in the final, and rider B moves ahead. That's extremely unlikely but possible.

But it will solve the theoretical problem of riders killing themselves on Mix then getting crappy scores compared to what riders got on Montebello when the Mix riders were busy coordinating :).

I think you need a metric for the relative "quality" of the field, and then weight the results according to that. I suggest that you calculate a metric for each rider which is that rider's mean score over all previous low-key hill climbs. Riders making their first appearance are unknown quantities and their data is not included in the calculation. (Aside - you could have a page that shows all-time low-key ranking based on this per-rider metric.) You can then establish the quality of the field for a particular climb as the mean of all riders that participated in that climb.

The first ride of the series is Montebello and generally has a good turn-out so you can use that to establish the baseline field quality score for the series. For each subsequent climb you divide that climb's field quality score by the Montebello figure and this is the weighting applied to that climb's results. So if the field quality is 10% higher on Mix than on Montebello, all the scores get boosted by 10%, though I bet the effect is more like low single digit percentage points.

You're correct: the scores could be calculated causally. The reference time could be the mean time for Montebello, which would be the reference field, then I could adjust all subsequent fields for relative field quality. That's an interesting idea. I hadn't thought of that before...

Interesting effect of that: were I to reorder the weeks the scores would be different. That wouldn't be true with this scheme, nor with the prior one. It's not a major problem, though, just a curiosity.

I've got the code probably half-way done, not counting outlier pruning. I'll experiment on it with this year's results. But I really like your suggestion. It wouldn't be hard to code, either. We have enough regulars there would be decent statistical weight for the comparison.

You could base the riders' metrics only on rides from previous years and then the scoring would be the same regardless of the order of events during the year.

Also, you might want to use a bias value rather than a weighting. I mean that you would subtract the Montebello score from this climb's score, and just add that into all the results. The effect is to shift the distribution curve up, and gives more benefit to riders with lower scores. One can argue whether that is fairer or not though.

The climbs don't change much from year-to-year, do they (though Mix may be an exception). If so, then given a power file from the current year you could calculate the VE for a climb and compare it to the true climb to determine differences in environmental conditions (principally wind but perhaps air density, too). Then you could use that to normalize the time across years for the same climb, independent of who shows up that particular day. If Climb A usually takes 10% longer than Climb B but for a particular year Climb B took 12% longer you could (with a power file) figure out whether that was because of who showed up or because of the weather conditions.