Third Eye: area of forehead scratched when thinking and slapped when confronted with stupidity

In late June, possibly the most explosive exercise physiology study in several years appeared to great fanfare in the media. The paper of Heuberger and colleagues, published in The Lancet Haematology, was accompanied by several prominent newspapers reporting that human recombinant erythropoietin (EPO) works no better than placebo in well-trained cyclists. The evidence that EPO works to improve cycling performance is flimsy, and it may actually be useless, or so the story went. This was a randomised controlled trial, published in an arm of one of the most prestigious medical journals in the world. But this is one of those stories that sounded too good to be true…

The design

The first thing to say about the study is that its overall design is extremely robust: it followed all the accepted procedures required to ensure a rigorous methodology (both practical and statistical), and the authors went to great lengths to ensure that blinding and randomisation were achieved. The study recruited 48 cyclists who were assigned either to a placebo group, who would receive injections of saline, or to an EPO group who would receive injections of EPO each week for 8 weeks. Before these treatments started, the cyclists performed an incremental exercise test to task failure, and a 45-minute self-paced submaximal test in which the aim was to maximise power output for that duration. This test was intended to simulate a competitive time trial under controlled laboratory conditions. At regular intervals during the treatments, the incremental test was repeated. At the end of the treatment period, the cyclists performed a final incremental test, the submaximal time trial, and a competitive road race set up for the study in which the cyclists actually climbed Mont Ventoux.

The headline results

The study showed, surprisingly, that the EPO group did not perform any better than the placebo during the submaximal time trial, nor during the ascent of Mont Ventoux. Both groups climbed the mountain in just over 1 h 40 min. In contrast, the administration of EPO did improve the maximal oxygen uptake (VO2max) as well as the maximal power output in the incremental test. But, given the null results for the performance tests, which were deemed of greater relevance than the incremental results to competitive cycling, the conclusion drawn was that EPO was ineffective in improving cycling performance.

Are these conclusions valid?

We can approach the answer to this question in a number of ways, and I’m going to focus on two of these: the nature of the tests and the physiology of elite cycling. For the conclusions above to be valid, the tests used must be valid and reliable. The tests should also reflect the physiology of road race cycling in order to be able to generalise the results to the competitive situation. Here is where the study begins to break down.

The incremental exercise test, itself much maligned by the authors as not being representative of the competitive situation, produced results that are entirely in line with previous studies on EPO administration: the increase in haemoglobin concentration, and thus red cell mass, increased VO2max and thus maximal aerobic exercise performance. This has been demonstrated repeatedly in the literature and it is precisely why cyclists and other athletes have abused this drug since the late 1980s. But what of the charge that EPO has no effect on tests more relevant to cycling competition?

The submaximal time trial was performed before and after the treatment phase of the study, and both groups improved their performance on this test at the end of the treatment. There was, however, no difference in performance (as mean power output) between the groups. The Ventoux race backs this result up, and it was this test that grabbed the headlines, largely due to the history associated with what Lance Armstrong has called “that fucking mountain”. The authors themselves conclude that EPO “did not improve submaximal exercise test or road race performance” (p. 12). There is a serious problem with this conclusion: the Mont Ventoux race was only conducted once on a windy day, and so there is no basis on which to suggest that road race performance was improved or not. You can only speak of improvements in something if you measure that something more than once. If I measured my body mass only at the end of a diet, I doubt you’d agree that the diet “didn’t work” unless I showed you what the scales read before I started it. But this is exactly analogous to the way in which the Ventoux race was used.

Never mind though, the submaximal test still shows no effect, right? Well, not quite, actually. You see, there has been a fierce debate in exercise physiology about which tests should be used to quantify exercise performance and monitor the effect of interventions designed to enhance performance. One school of thought is that time trials are the best because the variance in performance is lower than time to task failure tests (the incremental test used here is, effectively, a time to task failure test, albeit with an increasing power output). Another viewpoint contends that time to task failure tests are useful because you can measure and interpret the physiological responses as well as having a performance measure, something you cannot do if you allow the cyclist to choose what power they produce. The truth, as usual, is somewhere in the middle, as Amann and colleagues demonstrated nicely some years ago. Consequently, when performance-related studies are done on a particular theme (e.g., dietary nitrate), studies are done using both types of test for completeness; both have their merits and drawbacks. Some studies even do a bit of both (by doing a pre-load constant power phase, followed by a short time trial).

The major drawback of time trial type tests is that they require very careful familiarisation. Performing a self-paced 45-minute effort on a stationary ergometer is a novel task. As with any novel task you get better at it when you attempt it a second time for no reason other than previous experience. You may even do better the third time. This learning effect needs to be eliminated before using the test for research purposes. In the study in question, no such familiarisation took place. We cannot know, therefore, how much of the change in submaximal performance was due to EPO, and how much was due to learning how to do the test. Simply stated, the submaximal test is not likely to be a reliable indicator of the effects (or not) of EPO in the way it was performed in this study.

Cycling physiology

The authors of the Lancet study can be forgiven for wanting to test cyclists under conditions that closely mimicked competitive cycling. Unfortunately, arguably neither test achieved this. Moreover, neither the submaximal time trial nor the Mont Ventoux race challenged the cyclists in a way that would reveal EPO’s effects. A 45-min time trial is likely to be performed in the upper reaches of the so-called ‘heavy’ domain (above the lactate threshold, but below the critical power/maximal steady state). At best, the cyclists would be exercising at or slightly above the critical power. The point here is that the vast majority of cyclists will be exercising at 85% VO2max or less. The influence of EPO on performance at such intensities is likely to be small.

The benefits of EPO are most obvious at intensities performed close to VO2max, (see Wilkerson et al., 2005, for an example of this), yet neither of the performance tests used in the study did this. The authors contended that maximal performance tests, usually lasting less than 20 minutes, are less relevant because cyclists perform for longer than that “most of the time”. The problem with this statement is that what cyclists in a grand tour are doing “most of the time” is trying NOT to exert themselves! It’s why they stay in the peloton. It may be difficult to believe, but I have it on good authority that the average power sprinters produce on a flat stage is often less than 200 watts, such is the protection offered by the peloton and an organised team.

Mountain stages, which represent the stages in which Grand Tours are won and lost, clearly require greater effort than sprint stages, but even here teams work to protect their team leaders, offering a series of wheels to follow until the last 3-5 km of a major climb. As a result, the General Classification contenders may only exceed their critical power/maximal steady state for the last 15-20 minutes of a mountain stage. At this point their VO2 will be close to VO2max (which itself will be diminished due to altitude). It is only at this point that EPO, if used, will reveal its effects. In short, performance tests lasting 10-20 minutes are exactly the kind of tests you’d want to do if you were trying to find out if EPO worked or not. To dismiss them in favour of unfamiliar and/or poorly controlled race simulations is folly.

The take home message here is that even though the study of Heuberger and colleagues was a well-structured randomised controlled trial, it was always going to succeed or fail on the strength of the physiological tests within it. The submaximal trial and the Mont Ventoux race have so many problems associated with them that their outcomes are difficult, if not impossible, to interpret. Consequently, when somebody tells me “There’s a study that shows EPO does not work”, I’ll reply “Delete “not”, pal”.

Earlier on this morning, we witnessed a unique and surprisingly exciting time trial over 26.2 miles in Nike’s Breaking2 project, now being rebranded as #VeryNearlyBreaking2 (allegedly). I thought that, since almost everyone else will, I’d give my thoughts on the event. First, two confessions/declarations: first, my old PhD supervisor and close collaborator Prof Andy Jones was a consultant on this project, as was Dr Phil Skiba who I have got to know very well since he started working with Andy’s Exeter group. Second, I only watched the last 25 km of the event because it was a Saturday and wasn’t going to get up at 4:30 am to watch the first half of a marathon in which almost nothing happens.

I had no inside information about the even, mostly because I didn’t pry, other than the odd “how’s that thing going?” to which the reply would usually be “fine” or “interesting”. Most of what I could glean either was or now is in the public domain, largely thanks to the twitter feeds of Phil and Andy and the work of Alex Hutchinson. But the event itself was, for me, a lesson in the physiology of endurance running and the acute effects of prolonged high-intensity effort. As the post’s title implies, there are plenty of other lessons to learn from this, but this one is mine.

Kipchoge completed the 26.2 miles in a blistering 2:00:25, by ~2.5 minutes the fastest a human has ever run the distance. At 25 km, he was bang on schedule, but in the last 10 km he lost time, and lost about 15-20 s in the last 5 km alone. Why could he not hold that pace in spite of all of the assistance from the arrowhead of pacers (or the car)? To understand that, I think we need to look at what it takes, physiologically, to run a marathon. We hear a lot about maximal oxygen uptake, lacatate threshold (LT) and running economy, as these largely place limits on what is possible. The maximal oxygen uptake is the size of the engine, the LT is the rev limiter (sort of) and the running economy is the miles per gallon (or, in this case, per gram of glycogen) that the engine can offer. What is less well appreciated is that there is not one “rev limiter” in physiology, but two. Running faster than your LT has implications for endurance because you develop a “slow component” of oxygen uptake which increases the oxygen cost of exercise (it reduces economy, in other words). But the slow component can be stabilised provided you run at a speed slower than what is called your “critical speed”. This is the second rev limiter, and I’m going to argue that it is crucial to Kipchoge’s efforts today.

The critical speed, simply defined (!) is the speed asymptote of the speed-duration relationship (analogous to the critical power in the power-duration relationship). Andy Jones and I have written about these concepts in the scientific literature here and here if you want more detail on the definition. The key point for this post is that a 2 hour marathon effort requires a running speed that is necessarily below the critical speed, because above this point, the slow component of oxygen uptake cannot be stabilised, leading rapidly to the attainment of maximal oxygen uptake and inevitably to task failure. But the sustainable pace for a marathon run this fast must have been very close to the critical speed (see Jones & Vanhatalo (2017) for more on this). Thus, Kipchoge’s task was to run as close to his critical speed as possible, and stay there. For the most part, he suceeded.

Kipchoge looked in full control and on pace for much of the Breaking2 effort. But if Kipchoge ran just below his critical speed, why did he slow down in the last 5 km? Well, a normal runner running above LT but below the critical speed is operating in their “heavy domain”, where we strongly suspect glycogen depletion plays a key role in determining exercise capacity, one way or another. The presence of a slow component of oxygen uptake increases the demand on fuel reserves, which probably explains why most studies show marathon efforts being completed just above (but not too far above) LT. We mere mortals cannot get close to critical speed during a marathon lasting 2:30-4:00 hours, but there is good reason to suppose elite athletes can. One reason for that is an apparent “domain compression” in which the LT and critical speed both occur at a very high fraction of maximal oxygen uptake. Another is that elite runners have phenomenal fatigue resistance, in part due to the high percentage of type I (slow twitch) fibres in their muscles.

The above considerations provide a reason for Kipchoge’s basic speed, but not for his slowing down. Obviously, the distance itself places a severe strain on fuel stores, but he was still running exceptionally quickly towards the end, albeit grimacing. He didn’t look like he blew catastrophically. He lost some of the advantaged of drafting as the “arrow” collapsed and the car gapped all of the runners, but again this would probably have a minor influence since this only happened in the last few laps. The effect of heat and dehydration were also probably minimal as it wasn’t hot and he was regularly drinking. That leaves few possibilities for his slowing down. Undoubtedly there would have been some muscle damage at this stage, and we know that progressive, slowly-developing fatigue occurs in the heavy domain. However, the slowing in running pace was not progressive, which leads me to conclude that his critical speed itself may have decreased.

The above idea seems at odds with what we have seen experimentally: fatiguing exercise and glycogen depletion does not seem to alter the critical power in cycling, but these experiments are nothing like prolonged exercise performance in elite athletes. I have heard anecdotal reports of a diminished critical power at the end of cycle stage races or long-duration time trials. If true, it would explain the fall-off in pace without catastrophic failure. What makes the 2 hour marathon such a challenge is that it is close to the limit of what we think humans can sustain, and the effort itself likely reduces that sustainable pace in the last 10-15 km.

Kipchoge is the first to treat the marathon like a 2 hour time trial and hold it together for most of the distance. If he can find a way of holding it together for the full duration, an athlete of Kipchoge’s talent really could break 2.

I want my country back. It’s a refrain we’ve heard a lot in the last 5-10 years. It’s generally a call to restore some sense of what being British really means, of what Britain should feel like, and the people Britain should be composed of. What has made this refrain so commonplace is the sense that Britain has lost its sense of identity to an influx of immigrants from the EU and elsewhere. If only we could get shot of that damned institution we’d be able to get our country back.

My father-in-law is an ex-pat who lives in France and drives around in a French car. Every once in a while he drives over to see us. On one such occasion he drove it to a local supermarket in Strood, Kent. At a set of traffic lights he stopped, and a man ventured towards his car, as if he was going to ask directions. As my Father-in-law opened the window, he was greeted with “YOU FRENCH BASTARD!” at which point the lights went green. As a result he was unable to explain that he’d lived most of his adult life in Oxfordshire. It’s an amusing anecdote, but underneath it is something far nastier: Britain has become increasingly hostile to foreign people in the last few years, and sometimes this boils over into xenophobia and racism. A colleague of mine from Italy who has lived in the UK for a decade is very clear that this xenophobia is relatively new. It is only in the last few years that strangers have urged him to “fuck off back to your own country” whilst he is walking down the high street.

Immigrants are the modern day bogeymen, easy to label and criticise, but much more difficult to understand. The media and some politicians have effectively dehumanised anybody who isn’t British and use the term “immigrant” to imply that anybody who comes here does so to suck the life and soul out of Britain. UKIP can take much of the blame, but the Tory tub-thumping on immigration, and Labour’s complete inability to offer an alternative voice has made Britain a very unwelcoming place in the last few years. I think that the effect of this is much more corrosive than the odd idiot shouting obscenities in the street.

A fact often lost in the EU debate is that EU migrants contribute very significantly to the British economy and British life. They contribute far more in expenditure and taxation than they claim in benefits. Many of them are highly skilled and would be difficult to replace post-Brexit. Most sensible politicians know this, which is why they usually talk of immigration numbers, points systems and the like, without actually being positive about EU migrant contributions (that would make them look human, you see, and we can’t have that). But the corrosive effect of all of this is that these migrants, given the choice, will probably leave the UK post-Brexit, and our country will be much the poorer for it. This is not because there will be any particular policy that makes them leave, but because the clear attitude of a Britain that votes to leave is that we don’t want to work with YOU.

Although it could be argued that we are just trying to extract ourselves from the political project that is the EU, the truth is that the EU migrants I know have been mulling over whether to stay in the UK or not for a few years now. This has been ever since UKIP gained MPs in the House of Commons, one of whom was briefly the MP in my constituency. With Brexit now a distinct possibility, one colleague, a Dutch national, is virtually resigned to returning to the Netherlands if it occurs because Dutch citizens are not allowed to adopt dual nationality. To continue to work in the UK post-Brexit would require a work permit of some kind, and all of the hassle and uncertainty that goes along with it. Others have cited likely restrictions on work and travel for them and their family members as reasons to leave.

But I’m now going to avoid calling them EU migrants. Instead I’m going to call them what they really are: family and friends. My niece and nephew are both Italian nationals, and I have French, Belgian, Finnish, German, Italian, Greek, Dutch and Polish colleagues in my institution and elsewhere whose lives will be seriously compromised, if not destroyed entirely, in the event of Britain leaving the EU. These are Brilliant, talented people who deserve far better than to be the collateral damage in what is, and always was, the Conservative Party’s internecine pet project.

So yes, I want my country back. A country that is welcoming, tolerant and outward looking. And to get that country back I’m voting Remain.

This post is based on a genuine conversation I had on a walk to school with my son (he is 4). I’m posting it here because I think it is as good an advert for having children as anything. Children are amazingly original thinkers even if their view of reality is not always (if ever) quite right. I hope my son keeps his current enthusiasm for finding things out. Here goes:

Dear Alex:

Our trip to Madame Tussauds clearly made quite an impression on you, judging by our conversation on the walk to school. I feel I need to clear a few things up:

The guy with the feather was William Shakespeare. The guy who looked like a scarecrow in the bit that smelled of poo was a plague doctor. The plague is not “The Black of Death” as you keep saying (you’re close though), and you can’t get it because you sneeze a lot. We have drugs and sanitation now, so doctors don’t need to dress like scarecrows.

The guy with the eye patch and the “pirate hat” was Lord Nelson. He has an eye patch because he hurt his eye in the Battle of the Nile. He is NOT a pirate. In answer to your other question, I don’t know what his friends called him but it was probably something like “Horatio”. He definitely wasn’t a pirate. I know that I wasn’t born when he was alive, but that does not increase the probability that he might have been a pirate just because you think he was. And you don’t win this argument by saying that you know he was a pirate because “Lord Nelson is my brother’s dad”. Even if you had a brother, this would be chronologically and biologically implausible. Whilst he has an eye patch and a hat that makes him look like a pirate HE IS NOT A PIRATE. Nelson has a column too, but that doesn’t make him a journalist.

If you want to continue this line of reasoning, I am quite happy to set up an anonymous Twitter account and sign you up to a few internet forums. You’d excel at trolling.

This post is inspired by a discussion between Antoine Vayer, Jeroen Swart and myself a few days ago on Twitter. Vayer is a former coach of the Festina cycling team, and a strong advocate of interpreting power output data in the context of doping. Swart is an exercise physiologist and sports physician who has been involved in the testing of Chris Froome at GSK (or embroiled in the testing, depending on how you look at it – the social media reaction has been quite something. I can’t put my finger on quite what kind of something it’s been, but it’s something nonetheless). I noticed that Vayer had put up some data from an incremental cycling test, with the following challenge to “experts”: “Game 1/10 for experts from Lisbeth ! Who got this VO2 [oxygen uptake] >91 ml/mn/kg ? Is he a cheater or not ? Is it possible ?”. Now, I consider myself something of an expert here. In fact, I’d say that the number of people in the world who understand the VO2 response to exercise better than me could comfortably fit in a double-decker bus, and some of them are dead. So I had a quick look at the data.

The maximal oxygen uptake (VO2max) value was recorded during an incremental test in which it appears that the athlete exercised for about 4 minutes each stage and rested between stages, with gas exchange data recorded every 30 seconds. The 91 mL/kg/min VO2max was recorded towards the end of the penultimate stage displayed in Vayer’s tweet. But there was something odd about it. The VO2, in absolute terms was 6.29 L/min, but this value was achieved at a power output of 425 W. Any exercise physiologist faced with a high VO2 will naturally enquire about the power at which it was achieved, or if faced with a high power output, will enquire about the VO2 achieved. The first thing you learn (or should learn) when analysing test data like this is to ask the question “does it look right?”. If the data deviate wildly from what is considered normal, it’s probably wrong for some reason. This works for athletic data as well as any other because there are robust relationships between VO2 and power output. Vayer’s data make no physiological sense.

The VO2-power output relationship follows this rule-of-thumb: for every watt of power produced, you consume about 10 mL of oxygen in the steady state (give or take 1 mL/min/W). This makes for a very handy error detector in the lab. This works by taking any absolute VO2 value (in this case 6.29 L/min), subtracting ~0.80-1.00 L/min to account for the O2 cost of pedalling, and dividing the answer by power output. In this case, 6290-1000 = 5290/425 = 12.4 mL/min/W (this figure is known as the “gain” for VO2). In other words, there appears to be a very large error in VO2 of about 1.0 L/min. The actual VO2 that should be associated with a power output of 425 W is ~5.25 L/min (assuming a baseline pedalling VO2 of 1.0 L/min, which at a cadence of 92 rpm seems reasonable). It may even be lower than that in an efficient athlete. Jeroen Swart estimated 4.95 ± 0.15 L/min VO2 for the same power.

By way of contrast, Chris Froome’s recent data produces a gain of ~9.4 mL/min/W (5.91 L/min, less 1 L/min, so 4910/525 = 9.4). This is a normal VO2 gain, and may even be an underestimate given that this was measured during a ramp test, in contrast to the 4 minute stages used by Vayer (quite correctly if steady state VO2 was an important variable to measure). In Froome’s case, the 30 W/min ramp is non-steady state and VO2 will lag power output. Additionally, if Froome had a plateau in VO2 (or was approaching one), this would have reduced the gain still further. Thus, it is reasonable to suppose that his steady-state VO2 gain would be higher than 9.4 mL/min/W. In all likelihood, it would be very close to 10 mL/min/W. Why, then, is the VO2 recorded in the test in question so high? There are four possibilities: 1) an extravagant metabolic response by the cyclist; 2) an error in standardisation of ambient conditions; 3) an ergometer calibration error, or 4) an error in the flow or oxygen sensor and/or calibration.

Extravagant metabolism

It is possible that the relationship between VO2 and power output can be altered by exercising at high intensities for prolonged periods. This increases the VO2 gain, and values >12 mL/min/W have been observed in the literature. So the “slow component of oxygen uptake” could drive the VO2 above the predicted steady state value and increase the gain. I have done a bit of work on the slow component, and I think it may be a factor in this test. It is, however, unlikely to explain the VO2 being 1.0 L/min above expected. This is because, as its name suggests, the slow component takes time to express itself. It takes 90-120 to emerge from the “normal” or “fast component” of the response, and if you fit a curve to it, it has a time constant of at least 200 seconds. It therefore takes many minutes to develop, and in this test many minutes we do not have (nor does the VO2 does seem to be systematically rising in the 425 W stage). It is also unlikely that previous test stages generated much slow component behaviour. This is again partly due to their length, and partly due to the modest blood lactate concentrations measured (2.1 mM at 375 W). The slow component only develops above the lactate threshold, and in the 375 W stage the cyclist is only just above it. Any slow component would be small even if the 375 W stage was continued for 6-8 minutes. Finally, the denominator of the VO2 gain equation is against us: the largest slow components recorded in the literature are for exhaustive exercise bouts lasting 10-15 minutes, and they can exceed 1 L/min. To achieve this in the time required in this stage would need VO2 to rise at an unbelievably swift rate, and we just don’t see this happening. So, whilst the slow component of VO2 is real and significant, it does not appear to be an explanation for the high VO2 values reported.

Ambient conditions and gas exchange calculations

Pulmonary gas exchange variables must be corrected from ambient temperatures and pressures to standardised temperatures before interpretation. These days using automated systems these calculations are done automatically, but sometimes input is needed as part of routine calibration. For ventilation, values are expressed BTPS – Body Temperature and Pressure, Saturated – because the true volume exhaled at the time matters. For VO2 (and VCO2), the correction provides Standard Temperature and Pressure for Dry gas (STPD). The only way to introduce error here is to fail to change the ambient temperature settings prior to analysis. However, a 10°C error in temperature (unheard of in a well-ventilated or an air-conditioned laboratory) would change VO2 by no more than ~0.3 L/min. A similar error would occur if you got barometric pressure wrong by 30 mmHg. Thus, it is unlikely that a 1.0 L/min error would be introduced here.

Ergometer calibration

A cycle ergometer that is not calibrated can produce very strange VO2 responses. I vividly remember taking delivery of an ergometer that had not been stored correctly and its flywheel was off by a few millimetres. Being electrically-braked, a few millimetres is huge, and I was exhausted at a power output 150 W lower than normal with an (apparently) enormous VO2 gain. But my VO2max was in the normal range, so it didn’t seem to be the gas analyser, and it just felt wrong. In the case in hand, it is the VO2 that seems too high, not the power. And an elite cyclist would also very quickly realise if the ergometer was out. As an example, I once started a treadmill test on an athlete who got quite animated about how hard the 10 km/h warm-up felt. It turned out that somebody had changed the settings to miles per hour! The ergometer is not likely to be the problem in this case.

The gas analyser

The origin of the error can be limited to the gas analysis system. I am not sure whether the Oxycon system being used in this test (as Vayer told me it was) was being used in a mixing chamber mode or breath-by-breath, but it seems that the ventilatory variables look normal: minute ventilation is not astoundingly high for an athlete producing 90 mL/kg/min (if anything it is too low). Indeed, the VE/VO2 ratio at maximal exercise is usually >35 in my experience. Here it is ~32. Not that low, but low all the same. In short, the flow sensor or ventilatory volume measurement is not a strong contender for the extra litre/min of VO2.

This leaves us with the O2 sensor itself. The origin of the error is impossible to pin-down without knowing the precise specifications of the analyser (there are a number of Oxycon models, some which use fuel cell type analysers, others paramagnetic sensors), but it is possible that an electrochemical fuel cell analyser had reached the end of its life at the time of the test and started reading high. Alternatively, a calibration error resulting in incorrect zero and/or span calibration could have caused a systematic error in VO2. It is important to state that this error is not peculiar to the 425 W stage: the 375 W stage preceding it produces gain values of around 12 mL/min/W , so this error is evident throughout the test [in a previous edit, I said 14.5 mL/min/W – I’d forgotten to subtract baseline VO2 in the calculation. Mistakes are easy to make with gas exchange data!]. A calibration error on one of the calibration points would amplify the erroneous gain at lower power outputs, wherein the expired O2 fraction (FEO2) would be lower than during maximal exercise (that is, the error will get larger or smaller as FEO2 falls away from 20.95%). That we are not seeing this means that the whole calibration curve is systematically in error. The cause of this (analyser ‘drift’ or fuel cell end-of-life performance) is impossible to call without data from the calibration procedure itself.

In conclusion, I’m overthinking this.

Or, to put it another way, the issues above illustrate why physiological testing is unlikely to ever be a major pillar of anti-doping efforts: there are too many sources of error, as well as too much variation in testing protocols, the equipment and ergometers used between labs. Anybody who has attended a conference on sports physiology will appreciate that there are almost as many measures of “threshold” as there are people working in the field. Getting scientists and practitioners to agree measurement standards seems a very long way off. Even if such standards could be agreed, there are no clear physiological “red lines” above which doping can be inferred. This is because athletes, like all humans, occupy a normal distribution of physiological function. More correctly, the parameters of endurance performance (be they physiological, biomechanical, psychological) are all normally distributed, and the sum of these distributions makes the athlete who they are. Doping can and does shift some of those curves, but from where to where? For specific individuals we simply don’t know most of the time, and until we are sure a change in physiological test results are not due to errors we inadvertently introduce, we never will know.

I was asked if I’d write a thing about doping for a website, so I did but the editor never got back to me. Here is what I wrote:

This year the World Athletics Championships is being held in Beijing. But there is no doubt that this event is clouded by a doping scandal that threatens to seriously damage, if not destroy, the sport’s credibility. The scandal was the result of two news reports. The first concerned the allegation that a high proportion of Russian athletes were involved in systematic doping. The second was the Sunday Times/ARD programme and publication in the first weekend of August which leaked extracts from an historic blood profile database held by the International Amateur Athletics Federation (IAAF). When these profiles were analysed by experts in biological passport analysis, the results suggested that as many as 1 in 7 athletes from 2001 to 2012 had returned “suspicious” values, tainting many major championships, including the 2012 Olympic Games. Clean athletes have lost medals, failed to make finals, and lost funding because they were out-competed by athletes who were doping.

The reaction to the leak has itself been revealing: the leaders of the IAAF have come out fighting against the findings and denying any wrongdoing; the experts concerned have countered these arguments. Finally, social media platforms have been ablaze with comment and speculation about which athletics star will be implicated next. This scandal can, I think, be better understood if the scientific basis of the allegations is understood. Furthermore, the reaction of those at the top of sport and anti-doping, as well as the public reaction to the scandal, tells us a lot about how a future scandal such as this might be avoided.

What does “a suspicious profile” actually mean?

The athlete biological passport (ABP) is a method used to detect doping, not by finding a drug in blood or urine samples, but by finding evidence of doping in biomarkers such as haemoglobin concentration. The ABP works by an individual athlete regularly giving blood samples, from which a profile is generated with individual limits (“probability thresholds”, currently 99%) set to identify abnormally high or low values. A “suspicious” profile is one which exceeds these limits at some point or points, or which contains deviations or fluctuations that are identifiably abnormal (these are known as an “Atypical Passport Findings”). An expert reviews the profile and uses other data (if available) to confirm that it is not likely to be due to pathology or specific circumstances that might affect blood sampling (e.g., illness). If those other data are not available, the result remains “suspicious” and follow-up data collection and additional testing can be performed. The “suspicious” profiles noted by the Sunday Times/ARD experts were those exceeding a 99% probability threshold. This is consistent with the current WADA guidelines.

The reason the “suspicious” profiles are not direct evidence of doping is that the ABP requires further review for this to occur. A suspicious result warrants further review by two independent experts if the first reviewer has ruled out normal physiology and pathology. Only if the three reviewers unanimously agree that the use of a prohibited substance or method is highly likely and illness or other factors are unlikely to account for the results can an “adverse passport finding” result be declared, which then initiates disciplinary procedures against the athlete.

It is worth pausing here briefly to make the point that ABP findings are considered “atypical” and “suspicious” when there is not enough evidence to be certain that things other than doping could explain the finding. Results only become “adverse” (that is, the athlete can be sanctioned) with further evidence and review. The data on which the scandal is based are atypical, NOT adverse findings.

The rigour with which the ABP is administered and managed makes it sounds perfect, but it isn’t. As with standard doping control tests, the ABP is designed specifically to avoid false positives (that is, you want to go out of your way to avoid sanctioning a clean athlete). This means that the false negative rate is high. In other words, cheats can and do avoid sanction (more on this later).

The issue the experts in the Sunday Times/ARD story took with the IAAF was that suspicious findings did not seem to be followed up. The IAAF did have procedures in place to do so. The allegation that they did not is, therefore, very serious. I am in no position to say whether this is true or not, but on other points of the story I have some sympathy for the IAAF. Much of the focus in the original story was on the World Championships in Helsinki in 2005. But the sanctioning of athletes using the ABP did not come into force in athletics until 2009. Judging the actions of the IAAF by the standards of today seems a little unfair. That said, Dr Michael Ashenden, one of the experts consulted and an architect of the ABP itself, makes a compelling argument that the IAAF could and should have done more at the time in any case.

The good, the bad, and the ugly

The reaction the scandal has been an interesting exercise in sports politics and the power of social media. WADA stated that it was shocked and within a week had announced an investigation into the allegations. The full scope of this investigation is not clear, but it is likely to focus, in part, on how the data were disclosed rather than how the IAAF responded to the blood profiles pre-2009. Lord Coe, now president of the IAAF, came out fighting, suggesting that war had been declared on athletics, and dismissing the analysis of the “so-called experts”. This was likely to be part of his successful attempt to get elected to the presidency, but it was also an exercise in how not to manage a crisis. The experts he maligned really were experts in this particular field. They have diagnosed that his sport is sick, and he needs to listen to their advice. More broadly, in the post-Armstrong era, having an international federation “circle its wagons” in this way was bound to lead to accusations of cover-up, whether true or not.

The IAAF’s bullish reaction to the scandal led to commentators on social media in particular savaging the IAAF. This is understandable, given that these commentators are already extremely hostile to any official narratives in the wake of the Lance Armstrong case and the role the UCI played in it. They were also primed by a Tour de France in which suspicion of the CG contenders grew exponentially in spite of attempts to prove innocence. It was to be expected that they would turn their guns on athletics, and several high-profile athletes have been the subject of intense speculation as to whether their ABP profiles are suspicious. Some of that commentary was thoughtful, detailed, and nuanced, but even the most cursory search on Twitter shows many high-profile athletes being very seriously libelled with doping allegations.

Given recent cases, such as McAlpine vs. Bercow, naming names without evidence on social media is extremely ill-advised. The acid test here is: has an athlete been mentioned by name? Has doping guilt been implied, even indirectly? Does the athlete in question have an adverse ABP finding? If the answer to the first two questions is “yes”, and the answer to last question is “no”, then the athlete has been libelled, in the UK at least.

How should the IAAF deal with this in the future?

It is important to remember that the goal of anti-doping is to protect clean athletes. This is difficult because proving a negative in systems shrouded in secrecy is almost impossible. Some form of transparency that federations and athletes can buy into would be a start. Some athletes have chosen to make their passport data public, whereas others have not given their consent. This is already leading to assumptions that lack of disclosure means that the athletes in question have something to hide. This might be true, but it may also mean that the athletes possess passport profiles that contain atypical findings that are explicable, but they are concerned that key contextual details may be lost or simply dismissed. It is also possible that the explanation for atypical contains information of such a personal nature that the athlete does not want any of the information in the public domain. All that considered, the IAAF and WADA urgently need to build transparency into their systems. One way would be to periodically, or on request, perform a full three expert review to give an athlete a clean bill of health (or otherwise), and publish the resulting ABP Document Package with the athlete’s consent.

Finally, there is the future of anti-doping per se. If we have learnt anything from the Armstrong case, it is that analytical anti-doping measures will only go so far. What did for Armstrong was tenacious investigative journalism and equally tenacious law enforcement input, followed by a forensic investigation by USADA. In this regard, catching cheats isn’t just about blood. If anti-doping policy was a pheasant shoot, doping controls and the ABP would be akin to beating the ground and sending in the dogs: to be successful, you still need the people with guns in the end. WADA and the sports federations covered by the code need to use all necessary means to catch doping cheats, because the current systems are not protecting clean athletes in the way they should.

This year’s Tour de France is developing into a bit of a split race, being both exciting by stage and predictable by General Classification (GC). This was most clearly demonstrated by the blistering performance of yesterday’s stage winner Steve Cummings of MTN-Qhubeka (the African team’s first stage win, on Mandela Day, no less), followed by Chris Froome hoovering up all attacks against him. It was an eventful ride for Team Sky, with fists, saliva and urine apparently being thrown at them. They are currently the sport’s bad guys, for no reason other than dominance. The last team to dominate like Sky did was one of the liveries led by Lance Armstrong, and Sky’s tactics and public relations stance continue to draw uncomfortable parallels with the Armstrong era. This suspicion has led to calls for Sky (and others) to be more transparent about their power data in particular, since the view goes that teams with nothing to hide should hide nothing.

Something something Armstrong, something something Froome. Right, let’s SCIENCE… [Forget personalities, there’s a link in two paragraphs time in which the awesome David Wilkie uses very simple power modelling to make a bicycle fly.]

Power output and the physiological response to exercise

Mountain stages in the Tour are critical to success. One bad day in the mountains can cost you the race, and a good day can get you a Yellow Jersey. In contrast, sprint stages rarely produce gaps in the GC, and time trial stages are predictable and (to within a minute or so) run to form. That’s problematic if you’re on the wrong side of the minute, but not fatal. Gaps of over 5 minutes are sometimes seen in the mountains. In a mountain climb, where air resistance plays a more limited role, the rider who can sustain the highest power, and thus (bike and body mass accounted for) the highest speed for the duration of the all the climbs, is likely to win the Tour. Time trial specialists cannot win the Tour with time trials alone. They must train to climb (contrast Boardman’s Tour performances with those of Wiggins – riders with very similar initial backgrounds but very different training approaches to the road).

Because sustaining a high power output on a climb is crucial, there has been a great deal written about the limits of what is possible. I am not going to add to this debate, as in my view without direct measurement of power as well as an understanding of the rider’s physiological capacities (aerobic and anaerobic) there are too many assumptions to be sure that a conclusion about whether something is possible or not can be drawn. There are performances that might look suspicious, but a 4 min mile performance in running would have looked suspicious in 1935. By 1995 it was considered slow. We do, however, know a few things about what determines sustained power, thanks to scientists like AV Hill, David Wilkie, and a number of others.

The physiological response to exercise depends on the power output you produce. For “moderate” exercise, muscle oxygen uptake rises rapidly and reaches a steady state. Blood lactate either does not rise or rises only transiently. At these work rates, exercise can be sustained for many hours. For “heavy exercise”, when you exceed the lactate threshold, oxygen uptake takes longer to stabilise and does so at a higher value than you would predict from steady state responses to moderate exercise (in other words, you are less efficient). This is the result of a “slow component” of the oxygen uptake response that develops after about 2 minutes of exercise and stabilises after 15-20 minutes. In the heavy domain, exercise can be sustained for between 45-60 min and about 3-4 hours. For “severe” or “high-intensity” exercise, the oxygen uptake slow component does not stabilise (and nor does any other metabolic response) until maximal oxygen uptake (VO2max) is attained. Exhaustion inevitably follows soon after this occurs. The “severe-intensity domain” commences when you exceed the critical power (CP). The CP, in turn, represents the asymptote of the power-duration relationship, first noted by AV Hill in 1925. We’ve written a few papers about these concepts, which you can find here (free) and here (not free). The power-duration relationship can be defined by as few as two parameters, namely the CP and a parameter to define the shape of the curve, denoted W’. The CP is thought to reflect the power of the aerobic systems of energy delivery and the W’ is thought to reflect the “anaerobic capacity”, although we know this is a little simplistic. It is the power-duration relationship that is important for working out what is and is not possible when cycling up a hill.

Defining possible and impossible

If you know the values of CP and W’, and you know the power demand of a task, you can make a clear prediction about what the time limit of the task is. The equation, for those who want it, is:

Time limit = W’/(power output – CP)

The problem is that the above parameter values will vary between athletes and will vary day to day. The parameters and the underlying physiology that determines them can also be influenced by various acute interventions (like glycogen depletion, for example), which adds further uncertainty to any “back of the envelope” calculations that you might wish to make. To know if any performance is abnormal, you need to know what the power-duration parameter values actually are. Consider that in an elite cyclist with GC ambitions might have a CP of about 380-440 W, and a W’ of 20-30 kJ, both of which will depend, to some extent, on body mass. This means that to complete an effort lasting 40 minutes, with a W’ of 25 kJ and a CP of 420 W, the “normal” power output sustainable for this duration would be 430 W, or 6.1 W/kg for a 70 kg rider. Notice here that the contribution of the curvature constant to long duration efforts is quite small (about an extra 10 W over 40 min) and thus the most crucial determinant of mountain performance is the maximal sustainable power output, CP.

One reason why I don’t think fixating on a particular W/kg value as “possible” or “suspicious” really works is that it all depends on the value of CP. This value is unknown and variable! Obviously, for a 70 kg rider to sustain 6.1 W/kg without drawing on the W¢, CP would need to be at least 430 W. I don’t think that is unreasonable, given previously documented hour record performances and the power outputs produced during them (Bassett et al., 1999). To sustain 430 W would require an oxygen uptake of approximately 5.3 L/min (O2 cost of ~10 mL/min/W, plus ~1 L/min for the O2 cost of spinning the legs at 90-100 rpm), which, if capable of utilising ~90% of VO2max, would predict a VO2max of 5.8 L/min or 84 mL/kg/min. This is high, but certainly not unheard of. Sustaining 85% of these figures would require a VO2max of 89 mL/kg/min. That is still not impossible. And this is all assuming a normal mechanical efficiency. Efficiency would decrease due to the development of a slow component of oxygen uptake, but this would add no more than about 200 mL/min to the tally (that is, oxygen uptake would remain submaximal even with this factored in).

Knowing the possible

The above calculations are hypotheticals based on reasonable estimates. The numbers accompanying Froome’s (or Nibali’s or Contador’s or…) that appear on the internet are just as hypothetical. In short, we have no direct numbers for either physiological capacity or performance for GC riders at the time of the Tour. Values estimated from those recorded in other parts of the season are likely to underestimate the capacity of a rider who has peaked and ridden conservatively in much of the first week of racing. In addition, direct measures of rolling resistance, wind speed, temperature, altitude, and so on, are also absent. To know what’s possible would require direct power-duration measurements from Froome immediately before the Tour, as well as calibrated power data during each and every stage. It is likely that Sky possess both data sets. They most likely have a variety of physiological measures that could corroborate the power-duration data (i.e., the VO2max, efficiency and LT data would likely fit in the same general picture). But they refuse to place these data in the public domain. Should they? The scientist in me says yes. The sports fan in me says maybe. The pragmatist in me says that there is next to no chance of these contemporary data ever seeing the light of day.

For one thing, Froome’s consent would be needed to release these data, and even if that consent was given, where would the data be stored and how would access be gained? If Froome releases his, every rider in the Peloton should be obliged to release theirs, lest there be any accusations of unfair treatment. The teams are highly unlikely to want to do this for competitive reasons. It’s much the same reason why Formula 1 teams do not release telemetry data in real time – a good rival engineer would identify engine modes, brake balance, tire wear etc and use that information to the team’s advantage. The sport would become even more about who has the best support crew rather than the best performer.

A less problematic point is that not all teams use the same power-measuring devices. Moreover, where on the bike the power is measured also matters. Although often measured at the crank or the rear wheel hub, it’s the power transferred to the road that counts (producing forward propulsion), but the power actually produced on the pedals that costs (in terms of physiological demand). Frictional losses and rolling resistance (though presumably minimised) will also differ, adding errors to any calculations of who produced watt, where and when…

The Future

There has been some chatter on Twitter and elsewhere of power files from races being used as part of the Athlete’s Biological Passport in cycling. I can see some merit in this, as within each athlete, performances can be compared to their power-duration relationship, their physiology and their blood parameters already used. The observation of abnormal power outputs alongside sudden changes in, for example, the Off-score, might trigger closer scrutiny of that athlete in the coming months.

Finally, I can see the potential for power data from grand tours being released following an agreed embargo period. This would serve an educational and scientific purpose of providing a rich seam of data to be used by anybody who wanted it. Those data could also be used as part of a retrospective anti-doping case. But they’d only ever be part of the story. If there was reasonable circumstantial evidence of doping in the absence of a positive test (like the Armstrong case, for the most part), then power files could weight to that case. But it would only be small given the number of variables involved in ultimately producing power output.

I’ve almost certainly not done this issue justice, but the above thoughts lead me to conclude that the question of data transparency in cycling and what its potential uses are does not have any easy answers.

The similarity between Armstrong-era cycling and today ends with what is written above. Quite a few people have asked David Walsh, the man who was instrumental in taking down Armstrong, why he is not asking Sky and Froome tough questions. I personally think that is wrong-headed. Armstrong’s transformation post-cancer was mind-blowing, whereas Froome’s ascent has been more incremental. Add to that the accumulation of damning evidence throughout Armstrong’s career, covered up by Armstrong with the help of the UCI, and in that case Walsh could ask questions about tangible things in Armstrong’s closet. Froome’s closet is bare by comparison, save for a TUE and some stunning performances on the road. So there was good reason to pursue Armstrong, but much less to pin on Froome and Sky. This is why calls for data transparency are timely.

[EDIT: I’ve got quite a few comments about the LA/CF comparison I made above both here and elsewhere. I’ve a good mind to delete it because I think it detracts from the point I’m trying to make (that power output is interpretable in context but it’s always likely to be very difficult). I’m not going to delete it, however, because it would make the post more boring, and writing really bloody boring stuff is something I’m already pretty good at. My point to those arguing over this is that (like the rest of the post, actually) context is everything. We know most of the details surrounding Armstrong’s “inverted U-shaped” career progression, in which he went from an aggressive stage racer/breakaway specialist to cancer patient to GC domination. In this, he went from not really being notable as a climber to dropping Pantani. That’s astonishing. Froome burst onto the scene in late 2011, but had been with Sky for about 18 months before that, and showed some potential between bouts of illness and injury. The impact of these is not certain because, again, of the context. Sky, due to its links with British Cycling, was and still is awash with riders who are good against the clock, so he wasn’t in the team to be the time triallist. He was clearly there to be part of the Sky train, in a team hell-bent on GC success with Bradley Wiggins. In that context, Froome’s rise to prominence was not particularly fast, but was perhaps unexpected given the circumstances at the time. Importantly, he didn’t completely change his style of riding to break through. I am not naïve enough to appreciate that there aren’t other explanations, but, again that’s not what the post is about, and I’ve tried to avoid drawing any of those conclusions. Back to the day job…]