Predicting regional competitors from single Open results

Let’s get this straight. The CrossFit Open is five workouts for a reason: a single result does not accurately predict whether you’ll be headed to the Regional competition, and Open results have even less predictive power for Games competitors. There is more to being the “Fittest on Earth” than performing well on a single, 10 minute, AMRAP.

Sure, these statements seem obvious. But I’m a scientist, and I like numbers. And, frankly, sometimes my wife is hysterical about her performance, to the point where she throws out words like ‘impossible’ after completing a single Open workout and not performing to her liking. So, I set out to look at some data, and challenge myself to calculate some probabilities (’cause this is how I show my affection…). I’m writing as I work (or waste time…), and I’m not confident that I can attain my goal, that is, to provide a probability distribution of qualifying for Regional competitions (~ top 50 Open competitors from each region), given a result from a single Open workout. I may just provide some graphs and a few numbers that suggest the first paragraph in this post is true, without actually getting to this distribution thing. After all, I’m an ecologist, am more or less self taught in statistics and probability, and I have a job. Plus, I like playing fetch with my dogs.

I’ve taken results from the 2012 and 2013 Open competitions from the top 180 finishing women in five US regions: South East, Central East, South West, Southern California, and Norther California. There’s much more data to be copied than what I’m working with, but I think these data are representative of the whole, and I couldn’t figure out how to access raw data without copy-paste.

From memory: In 2012, the top 60 went to Regionals, while in 2013 the top 48 were selected. Similarly, the top 48 will be selected in 2014. I’m rounding the selection to 50, given that there are probably a few qualifiers that will compete in a team or decline all together. This is likely a conservative selection cut-off.

A few plots

(I’m not proud of these – they are quick and dirty Excel ‘charts’… don’t tell my students).

Fig. 1: Maximum workout placing across all five workouts for two years and five regions (women only).

Fig. 2: Minimum workout placing across all five workouts for two years and five regions (women only).

The first couple of plots are simple: of the top 180 women in five regions and across two years of the Open, what were their maximum and minimum placings? There is a lot of variation in both plots, and I was tempted to conclude that the scoring method of the Open, which is used to calculated the overall placing (x-axis), was weighted heavier for the maximum placing. I think I’d have to calculate a coefficient of variation to be sure though, given that the scales on the y-axis are pretty different for the two plots.

Bigger picture

There were no qualifying athletes (top 50) who placed higher than 268 in any one workout, and the average maximum placing was 90. Further, all qualifying athletes scored at least one workout below 60th place, with an average minimum placing of about 15th. What this means to me is that if you want to qualify, don’t have any placings above 300, and place within the top 50 at least once (probably more… maybe that’ll be the next calculation: number of workouts placed in top 50 or 60). I round these numbers a bit for a couple reasons: (1) there are more competitors this year, and (2) the I suspect consistency (below top 60) is more important here.