Wednesday, September 11, 2013

On Statistics and Half Ironman: Going from a DNF to a 15 minute PR at the half distance

Then, this July, The Route 66 HIM happened. I had a ‘just ok’ swim, but I started to
feel bad with about 10 miles left on the bike.
The run was awful. I cramped the
whole way, and both calves finally locked up around mile 11, sending me face
first into the dirt and qualifying me for a free ambulance ride. Game Over.

I was seriously considering that my efforts at the 70.3
distance were somehow cursed statistically improbable.

But, just the same, Cedar
Point 2013 was on my calendar, and I wanted a bit of redemption. So, how do you come back from your first DNF?

To paraphrase Seth, I think first you have to acknowledge that, if you do this triathlon thing long enough, you'll DNF. It's almost certain. So, it happens. How do you get back up?

For me I'd first have to not cramp up and fall
over...but I just didn't have time to really dig into that. I've got a new job, and I’m training for
Ironman, there just wasn't time. So,
the bigger problem was time management.

Time Management

My two ‘biggest’ rocks outside of family life were work and
Ironman training. These two time
commitments were about 60-70 hours a week, combined. Additionally, I was spending another 1-2
hours a week figuring out which workouts I should be doing, and how I should be
training.

So, when I heard that longtime friend, exercise physiologist,
PhD student, and super triathlete athlete Laura Wheatley was starting a coaching
business it seemed like a good idea to solicit her help.

I've worked with a
lot of coaches in the past. Most of them
want to talk about what an ‘art’ coaching is, and I’ll concede that it somewhat
is. Few will answer my questions when I
ask why. Fewer still have good reasons
when they do answer those questions. And while I’m not an exercise physiologist, I’m a scientist just the
same and scientific process doesn't change. Said another way, I’m an evidence based, research based, pessimistic
math guy that won’t do something because that’s what your n~=50 coaching
experience says works. I’m always going
to ask hard questions and expect proof, and Laura is one of the few people that
have answered those questions in a reasonable way.

So, I have a coach.
Poof, 2-3 hours free per week, more confidence that I’m doing the right
kind of work, and a lot of experience I can call on as needed. I can focus on the work, and not the
planning. Additionally, my run has been bad for a long time, and I needed a new approach to make it less bad. Stick to your core
competencies, as the business guys say.
But first things first, I now had the opportunity to invest those hours on fixing my
cramping issue, hopefully for good…

On Fixing Cramping

No one really knows why cramping happens in a specific
instance. Lots of things can cause
it. It’s a multifactorial problem. It could be overexertion, glycogen depletion,
inadequate hydration, an electrolyte problem, or something yet undiscovered, and
there are decent arguments around each.
It’s something I've struggled
with in the past, but usually only after a race or towards the end. A DNF based on cramping was a whole new
thing.

So, I had a complicated multifactorial problem and about 4 weeks to
solve it. The way I wanted to solve the
problem was to manipulate each individual factor and evaluate. But, that wasn't going to work. 1.
There wasn't time. 2. How do I know that two factors aren't dependent
on one another, or both on a third? 3. I
lacked a testing methodology, because the issues I experienced in racing I wasn't experiencing in training, for various reasons of which probably only some were
known or guessed at.

I was really left in a situation where the only reasonable
option was a shotgun approach. Or, to
quote Ripley from "Aliens," ‘I say we take off and nuke the entire site from orbit. It's the only
way to be sure.’

So, I’d have to be ok
with not knowing why. I could build any
number of models, related to why I was cramping. But that’s the thing about models. My hero statistician is George E.P. Box
(What? Everyone has a hero statistician,
right?). He says “Since all models are wrong the scientist cannot
obtain a "correct" one by excessive elaboration. On the contrary
following William of Occam he should seek an economical description of natural
phenomena. Just as the ability to devise simple but evocative models is the
signature of the great scientist so overelaboration and overparameterization is
often the mark of mediocrity.”

George also says ‘All models
are wrong, some are useful’ or something like that…

I think George would have
been down with Ripley. And I had to be
ok with not knowing ‘why’.

So, I overhauled my entire nutrition plan. This time I hired yet another expert, friend
and Coach Kevin McCarthy, to review my nutrition from the Route 66 half and
make recommendations. Kevin was the
first to see me after the Route 66 half, and he probably had a better gauge of
my physical and mental state than I did. Laura was of course doing the same thing,
giving me great and practical advice on nutrition. She
was also making some changes to my training that I felt would help quite a
bit. I also did an exhaustive amount of
research on my own. Lastly, I talked to
almost every experienced age grouper I trusted, including many of my fellow
Trisharks.

Once I had a lot of recommendations from
many sources, I consolidated them very, very deliberately, and with great
rigor, into what would become my nutrition plan version 2.0. This is an approach I’m very comfortable with
as a data scientist. This is a proxy
for a statistical technique called ensemble learning. If you need to develop some rules, or
generalized learning and you can’t dig deep on the why, because a problem is
too complex or you lack time, ensemble learning is where it’s at. Said simply, you use the ‘vote’ of an
ensemble of learners to obtain better predictive performance than you could
from a single constituent learner. (If
you’re a statistician reading this, also consider that the decisions of the
trees in my little live action roleplay version of a random forest was, from
talking to me and their own personal experience, subject to bootstrap aggregation
and perhaps boosting as well. :P )

And then I tested, and tested again, on long training days,
to make sure it would work, or at least do no harm.

This is not to say that the concept of ‘phone a friend’ is
especially clever in our sport. It’s
not. But, there is a trap us age
groupers sometimes fall into. There is
danger is in reading one paper, speaking to a respected friend or coach, or
even reading one pro’s nutrition plan…and
then doing what they do. My solution to
cramping was using formal methodology to avoid this trap, simple as that.

So, how’d it work?

At Cedar Point this
year, despite a continued string of misfortune (race report to follow), I
managed a 15 minute PR at the half distance.
More importantly, I did it without a single cramp, at approximately the
same effort level I had previously raced at.
I’ll never really know what went wrong at Route 66, and that does bug me
on some level, but truth be told I’d never be 100% certain, even if I had an
infinite number of identical races in which I could isolate and manipulate
variables. The real world is never the
lab.

Perhaps even more
importantly, I stopped trying to ‘do it all’ myself and gained a team, which as
a busy part time long course age group athlete, is really invaluable. If you only get to race a few times a year
you don’t have much opportunity to experiment in race conditions or train sub
optimally.

Big thanks to Laura, Kevin, and all the local athletes I spoke to, that got me this far. Also thanks to all the professional and age group athletes writing blogs like this one, you can be certain I've data mined you all. :)