WOD data: CrossFit Open 15.1

note: some readers noticed that I had fewer athletes in my data than CrossFit was reporting. I originally collected the scores after score submissions closed on March 2, but apparently they need more time to settle, so I scraped again on March 6 and got about 40k more athletes. The numbers and plots have been updated. The biggest difference was that the Open has grown 27% since last year (previously I reported 7%).

15.1 is in the books which means everyone is done with their toes-to-bar, but I’m just getting started with my scraping and plotting. Just like last year, I’ve collected scores to make it easier to see what’s really going on. I’ve got my act together this year (2.5 million data points to work with already), so expect a post after each wod (follow me, @swiftsam for updates).

First, how big is the Open these days? As of March 6, there were 261,832 athletes listed on the leaderboard under Individual Men and Individual Women. I stuck to those categories since they are the majority of the story, and most consistent across years. That 261k is 27% larger than last year but 4.1rx as big as the 2012 Open which had 63k athletes. Pretty awesome, but plenty of room to grow.

We do have some slackers who ponied up the $20, but couldn’t get out of bed this weekend. About 15% of all registered athletes were no-shows on the big day, slightly more so for men (17%) than women (14%). 41.5% of athletes this year are women, up from 38.6% last year and 36.1% in 2012.

One of the new features of the 2015 Open is the officially scored Scaled division. It could make the whole event less intimidating and increase signups, but given 27% growth is similar to previous years, it doesn’t seem like it had a huge effect. More likely it will address the awkward part of the Open where significant portions of people who can’t yet do a chest to bar pull up or muscle up end up standing around bored. On 15.1 we saw 27% of athletes opt for the scaled version, but there was a huge gender difference: 46% of women vs 19% of men. CrossFit ladies, I’d love to hear how you decided between scaled and Rx.

And finally, to the workouts. Wait, the workouts? Yep, we’ve got 15.1 and 15.1A. They happened back to back, but it appears they’re being scored as if they were totally independent. Don’t ask me, I just handle the numbers.

Workout 15.1

Well, it looks like everybody agrees that banging out 10 deadlifts at a snatch weight isn’t the time to take a break and run out of time. Both genders saw the end of round 4 as a fine accomplishment and 6715 athletes stopped at exactly 120 reps. Similar peaks at 150 and 180 continue the round-rounds fetish. The Rx weights were scaled remarkably well across genders, with peaks and distributions almost exactly the same in the pink and blue. The exception was the cluster of 750 women who got stuck on the first snatch. Tough skill to pick up on the spot.

Workout 15.1a

1-rep-max clean and jerk
6-minute time cap

A 1RM WOD doesn’t leave much room for behavioral quirks in the scores, both genders have pretty nice normal distributions. If I were coaching (and there are plenty of reasons I’m not), I might suggest that people throw some 1lb plates on with each lift. The Open is scored by the sum of ranks, meaning that men who got 185 overhead accumulated an average of 81150 (bad) rank points, but with just two more pounds, they move ahead of the huge tie, and get only 74683 rank points on average (rank for each score varied by body weight). 6466 points for lifting an extra 2 pounds 1 time is probably the best deal you’re going to get in this Open.

“Full Effort Expected”

That was the guideline announced by CrossFit on Sunday night after it became apparent that some athletes (especially those working for a team score) may have “sandbagged” 15.1 to save energy for big 1RM on 15.1A.

This plot shows each athlete’s score on 15.1 across the bottom, and 15.1A on the y-axis. The cloud of athletes clusters from the bottom left to top right because, for the most part, the stronger people are stronger. If the ‘sandbagging’ problem was widespread, we would see a cluster in the upper left: people who completed few reps on 15.1, but then threw up a monster clean and jerk. Of course, it’s also possible that someone towards that corner is just a beast who trains to go heavy and can’t lift his or her tree trunk legs up to the bar. It doesn’t look like a big problem to me, but there are more men drifting off the top left edge than women. Don’t hate the player, hate the gameable scoring system.

Note that CrossFit presumably removed the scores of the ~20 athletes who were judged to be in violation of this spirit-of-the-rules situation, so those are not included here.

I plan to do something like this for each of the Open WODs as well as some breakdowns using height, weight, and PRs in the profiles. Leave a comment or a tweet with any ideas for interesting things to look into. Until then, rest up. For all we know, 15.2 might have three workouts.

¡Excellent post Sam! Very interesting to know who you are performing against all the athletes on the world. Thanx for the good work

http://twitter.com/cw360 cw360

This is awesome! Would love to see this for every WOD. Thanks!

Dusty Gibson

Very cool analysis on all of the data. Quick question though….how are you able to pull all of the data from the games website so quickly. We are currently doing an in house competition that piggy backs off of the CrossFit Open competition and the ability to pull the data from CrossFit HQ’s website quickly would be extremely helpful.

It takes somewhere around 20 minutes to update all of the scores currently, but if you had a more specific need, it could be even easier.

Dusty Gibson

Awesome thanks! Hopefully I can use this to update our box data a little quicker. The custom leaderboards have also been down so that has made it even more difficult.

http://gettingtorx.com Ron

Great analysis, but are you sure about the Total numbers?

Doing a really quick look, I got 261,540 total registrations. 1,810 pages of Women and 2,549 pages of registered men at 60/pag. Granted, last page is not full 60, but for a rough check that would add about 40,000 to your headcount. I run the numbers from a Masters perspective.

swiftsam

Hi Ron, good question. I’ll have to investigate. The pages that I scraped should have yielded somewhere around 260k, but I only ended up with 221 unique athlete_id’s. These are the mysteries that come with scraping data from a very busy website. Maybe they changed something while I was scraping. I’ll see if I can find the other 40k.

http://gettingtorx.com Ron

Hi Dan – This is fun stuff. Not sure if I like the Open because of the WODs or because of the myriad of data it creates….

Thanks for posting your screen scrape code. I’ll definitely look into it. Did an analysis of 2014 Master’s Qualifier via copy-paste and excel macros which was hella slow, but very fun and enlightening. In this case it helps to be over 40 – since we can compare ourselves in 5 year age groups….

Don’t forget that affiliates have until wednesday to verify their athlete scores so I’m sure some of those “no-shows” just haven’t been verified yet.

swiftsam

Oh, thanks for the detail Dan, I don’t think I knew they had more time. I’ll refresh 15.1 when I start pulling 15.2.

http://www.firstcitycrossfit.com Erica Mirich

CF HQ’s page says that 272,000 people signed up. When I look at my app it says that 108,567 women participated in 15.1/15.1a. Looking at your data, even adding in the “no shows” you have the total number of women as 97,472. So, it looks like your starting numbers are off. Great work on the data breakdown though!

swiftsam

Thanks! more good points consistent with Ron’s observations. I’ll be investigating.

Harold

Great job. I also scraped the board and am building the following site now. The idea here is for the user to analyze the open data instead of me doing any work. Adding more functionality to the site in the next day or two and will push it out soon.

Why did you give x-axis/15.1 *25 pts* for every *100 points* on the y-axis/15.1A? Were you trying to make some particular point?!

swiftsam

Hi Christine,

The scales are just different for 15.1 and 15.1A because the range of scores is so much larger on 15.1A. I probably should have made the plot more square to give a better sense of the relationship.

Alex

I scaled because 75# snatch was too heavy for me. I could do the other skills/lifts no problem. 55# was doable.

Vanessa

I would be interested in the scaled graphs also. Great work!

Babs Bos

I do have a very nice theory as to why women scaled so much: they probably don’t let their ego’s get in the way of participating. Men don’t want to unless they know they can rx most things.
I like my theory

Jennifer

I enjoyed your article but disagree with you on one point. I am one of the 500 women who got stuck on the RX snatch. I did not try to pick up the skill “on the spot”. I have worked on it tirelessly for the year I have been doing crossfit, yet this was about 5 lbs off my max snatch weight. I chose to do the RX workout and attempt the snatch knowing this would rank me higher than doing the workout scaled. I put max effort into both portions of the workout and was bummed I could not hit the lift and continue to the next round. However, I did not recently learn how to do any portion of the workout and I know other women were in the same spot. It was not a matter of skill, but perhaps a weight that was difficult for some of us that were able to complete the other portions of the workout.

swiftsam

My very first Open wod was 13.1 and I got stuck at 100 reps because … I couldn’t snatch 135. I absolutely understand committing the WOD at Rx even though you don’t think you can do all the movements. I think trying anyway is one of the best parts of the Open. I didn’t mean to disparage anyone’s efforts. On the contrary, I meant to acknowledge how tricky the snatch movement can be and how it’s not something you can just brute force with Open adrenaline.

Sam, great article! Just a question what would have happened if it 15.1 and 1a were treated as 1 workout, what if the reps were added to the lbs lifted and there was a total score? Would this have changed the leaderboard.?

Eva

Since you were asking the girls on how they decided whether to do the scaled or Rx’d 15.1: I guess for most girls it was about if they can do toes-to-bars or not since this is a difficult thing to do for beginners as well as some more advanced athletes. The weight of the snatch might be a reason too, but here II think it’s the same for the boys 😉

Jen

Some girls in my gym decided to do the scaled because they knew they’d stick at either the ttb or the weight of the snatch, but also a few decided getting higher reps for the scaled version would just make for a more fun workout – less ego, as someone suggested above. Not me though, I was RX or nothing and ground those 15 ttb out one at a time.

Ben

Great to see this. I have a matlab leaderboard scraper but haven’t dusted it off yet this year.

The most surprising thing to me was the relative amount of variance in the two workouts for each gender–equating a rep to a pound, men’s scores in the two workouts appear to have similar standard deviations, while for women the standard deviation on 15.1 is maybe twice that on 15.1a. This is interesting because of the amount of talk there’s been about how a better way to score these workouts would have been to add the scores together, partly because then there’d be no incentive for affiliates trying to advance a team to regionals to sandbag 15.1. My initial reaction was that this might work well for the women: browsing the leaderboard it seemed that many competitive women had similar numerical scores on 15.1 and 15.1a; but that it would unduly weight 15.1a for the men, since the average for competitive men seemed to be higher by ~100 for 15.1a than 15.1. However, based on your presentation of the scores, I’d say the opposite is true: adding the scores would be reasonable for men, who have a similar spread of scores on each workout, but less so for women, since for them it would de-emphasize 15.1a.

swiftsam

Great points Ben, thanks. My original interest in this last year started with the way the scores are combined and athletes ranked for Regionals. I think the sum of ranks across workouts has a lot of problematic properties, and this 15.1/1A issue highlights them. If I have time I’ll take a look at the comparison you are describing.

Sara Southey

This is really interesting. I’d love to see it for the Masters categories too. I’m over 40 but no less passionate about seeing how I do in the grand scheme of things!

swiftsam

I’m not so far from 40 myself Sara :). Your scores are included in the distributions as the Masters athletes are included in the overall leaderboard. I’d like to do breakdowns for each demographic category, but there are so many combinations people might be interested in that I think something interactive might be best. I’m hoping to get that set up.

Jackie

Love the charts.. very interesting! Thanks for sharing.

Jennifer Boer

Hey Sam,

As a soon-to-be epidemiologist and crossfit athlete I really like what you’re doing here. I will be following you!

Kind regards,

Jennifer.

http://www.crossfit.com Dale Saran

Sam – amazingly interesting stuff you’re doing. You should talk to Moe and Jonathon, the guys at BTWB. Only one thing: FFS can you please put a capital ‘F’ in your “Crossfits?” Yes, unfortunately, I’m that guy. Lynne used to do that job on the message boards but now I gotta do it in public.
Dale
P.S. I told my daughter the EXACT same thing you did about adding pound plates. Spealler figured this out at the Games a few years back and used that fact to move up from those naturally “clumpy” data points.

swiftsam

Hi Dale, thanks for reaching out from HQ with kind words. I’ve updated everything to the proper casing. I don’t think I’d ever noticed that it was particular before!

Zapata

Awesome, Sam. As a BI engineer, this is fascinating

Colton

I love the study, great figures. I am an avid crossfitter and a PhD student and I love numbers. Let me know if you ever need help with analysis or ideas, again awesome work.

I would love to get my hands on those data to do som analyzing. I have never used R and tried to read up on it to use your scraping tool, but I think it will take me a while to obtain the knowledge… 🙁

Keep up the good work!

http://boxevents.org Brigt Erland Nersveen

I’ve been toying with R and your code some hours now. I have come to the conclusion I need to set up a apache server and MSQL? Is that correct?

I would love this data to plug into Tableau for vizualisation…

swiftsam

To use my code exactly, you would need MySQL, but not Apache. There are many other related solutions that would work. I will try to get to exporting and sharing all of the data for interested parties once things calm down a bit. Thanks for your interest.

Harold

Brigt

If you want to analyze data and use R, here is a site I built for you. I’ve put all the data here and all the analyses are done using R in the background. So, you can analyze the data, and you don’t need to know how to use R; I’ve done all that hard work for you

Hey Sam – I really like what you are doing, and I hate to appear to criticize, but I’ve had discussions with folks at my box and they are quoting your 27% yoy growth figure as kind of disappointing.

I think that is not a correct growth figure. CrossFit announced the 2014 total of individuals + masters as around 209,500 – I’m guessing that was the total registration count. This year the individuals are around 261,000 registered plus about 61,000 masters + teens for an Open all up total above 322,000. (2015 data based on leaderboard page counts) That’s better than 54% year over year registration growth. Some of the folks registered will not post any scores – won’t actually participate in the open workouts – and it looks like those names without any score in any workout are dropped from the leaderboard after opens complete. So the final participant count will be lower than 322,000. But comparing registration to registration or participant to participant, the growth from 2014 to 2015 is significantly more than 27%.

Respectfully
John Conrad

swiftsam

Hi John,

No worries about criticism. Data’s no good if it’s not right. I haven’t yet included the Masters > 55 or Teen divisions in any of my analyses. It’s just been another level of complexity I didn’t get to. For growth, I’m comparing to numbers I scraped from previous years of the leaderboard, and I only included Individual Men and Individual Women in those counts. So I think I’m ok now at 27%.

For dropouts, I’ve made sure to include everyone who was registered, even if they never completed a WOD. In previous years, you could select “Roster” to get that list. This year, I just made sure to scrape before and after every round of scores was completed.