Category: Pro Cycling

Last week I attended an event announcing the forthcoming launch of a new fitness app called Pillar. It offers combined training and nutrition advice to help athletes achieve their goals. Pillar is backed by a strong scientific team including Professor James Morton, Team Sky Head of Performance Nutrition, and Professor Graeme Close, England Rugby Head of Performance Nutrition.

James Morton gave a fascinating presentation about the periodisation of carbohydrate (CHO) fuelling, including a detailed description of the nutrition strategy he created to support Chris Froome’s famous 80km attack on stage 19 of the 2018 Giro d’Italia. His recent paper explains the underlying science. These are some of the key points.

Always go into competition fully fuelled with carbohydrate

Well-fuelled athletes perform for longer at higher intensities than those with depleted reserves

Basic biochemistry: fat burning is too slow and supplies of the phosphocreatine are too small to sustain intensities over 85% of VO2max

A lighter evening meal on day 1 prepares to “sleep low, train low” ahead of a lower intensity session on day 2

Carbohydrate intake rises after exercise on day 2 in anticipation of a high intensity session on day 3

Fuelling is moderated on the evening of day 3 as day 4 is assigned as a recovery day

Carbohydrate rises later on day 4 to prepare for the next block of training

The Pillar app aims to provide these leading edge scientific principles to amateur cyclists and other athletes

In order to put this into action, you need to know how much carbohydrate you are consuming. My assumption has been that my diet is reasonably healthy, but I have never actually measured it. So I have been experimenting with free app MyFitnessPal that can be downloaded onto your phone. This provides a simple and convenient way to track the nutritional composition of your diet, including a barcode scanner that recognises most foods. You can link it to other apps such as Training Peaks to take account of energy expended. However, neither of these tools plans nutrition aheadof training sessions. Pillar aims to fill this gap. It will be interesting to see whether this turns out to be successful.

Some commentators were skeptical of Team Sky’s explanation for Chris Froome’s 80km tour-winning attack on stage 19 of the Giro. His success was put down to the detailed planning of nutrition throughout the ride, with staff positioned at strategic refuelling points along the entire route. If you consider how skeletal the riders look after two and a half weeks of relentless competition, along with the limits on what can be physically absorbed between stages, the nutrition story makes a lot of sense. Did Yates, Pinot and Aru dramatically fall by the wayside simply because they ran out of energy?

The best performing cyclists have excellent balancing skills. This includes the ability to match energy intake with energy demand. The pros benefit from teams of support staff monitoring every aspect of their nutrition and performance. However, many serious club-level cyclists pick up fads and snippets of information from social media or the cycling press that lead them to try out all kinds ideas, in an unscientific manner, in the hope of achieving an improvement in performance. Some of these activities have potentially harmful effects on the body.

Competitive riders can become obsessed with losing weight and sticking to extremely tough training schedules, leading to both short-term and long-term energy deficits that are detrimental to both health and performance. One of the physiological consequences can be a reduction in bone density, which is particularly significant for cyclists, who do not benefit from gravitational stress on bones, due to the non-weight-bearing nature of the sport. In a recent paper, colleagues at Durham University and I describe an approach for identifying male cyclists at risk of Relative Energy Deficit in Sport (RED-S).

You need a certain amount of energy simply to maintain normal life processes, but an athlete can force the body into a deficit in two ways: by intentionally or unintentionally restricting energy intake below the level required to meet demand or by increasing training load without a corresponding increase in fuelling.

Our bodies have a range of ways to deal with an energy deficit. For the average, slightly overweight casual cyclist, burning some fat is not a bad thing. However, most competitive cyclists are already very lean, making the physiological consequences of an energy deficit more serious. Changes arise in the endocrine system that controls the body’s hormones. Certain processes can shut down, such as female menstruation, and males can experience a reduction in testosterone. Sex steroids are important for maintaining healthy bones. In our study of 50 male competitive cyclists, the average bone density in the lumbar spine, measured by DXA scan, was significantly below normal. Some relatively young cyclists had the bones of a 70 year old man!

The key variable associated with poor bone health was low energy availability, i.e. male cyclists exhibiting RED-S. These riders were identified using a questionnaire followed by an interview with a Sports Endocrinologist. The purpose of the interview was to go through the responses in more detail, as most people have a tendency to put a positive spin on their answers. There were two important warning signs.

Among riders with low energy availability, bone density was not so bad for those who had previously engaged in a weight-bearing sport, such as running. For cyclists with adequate energy availability, those with vey low levels of vitamin D had weaker bones. Across the 50 cyclists, most had vitamin D levels below the level of 90 nmol/L recommended for athletes, including some who were taking vitamin D supplements, but clearly not enough. Studies have shown that the advantages of athletes taking vitamin D supplements include better bone health, improved immunity and stronger muscles, so why wouldn’t you?

In terms of performance, British Cycling race category was positively related with a rider’s power to weight ratio, evaluated by 60 minute FTP per kg (FTP60/kg). Out of all the measured variables, including questionnaire responses, blood tests, bone density and body composition, the strongest association with FTP60/kg was the number of weekly training hours. There was no significant relationship between percentage body fat and FTP60/kg. So if you want to improve performance, rather than starving yourself in the hope of losing body fat, you are better off getting on your bike and training with adequate fuelling.

Cyclists using power meters have the advantage of knowing exactly how many calories they have used on every ride. In addition to taking on fuel during the ride, especially when racing, the greatest benefits accrue from having a recovery drink and some food immediately after completing rides of more than one hour.

For those wishing to know more about RED-S, the British Association of Sports and Exercise Medicine has provided a web resource.

A related blog will explore the machine learning and statistical techniques used to analyse the data for this study.

In a recent blog, I described an experiment to train a deep neural network to distinguish between photographs of Vincenzo Nibali and Alejandro Valverde, using a very small data set of images. In the conclusion, I suggested that the network was probably basing its decisions more on the colours of the riders’ kit rather than on facial recognition. This article investigates what the network was actually “looking at”, in order to understand better how it was making decisions.

The issues of accountability and bias were among the topics discussed at the last NIPS conference. As machine learning algorithms are adopted across industry, it is important for companies to be able to explain how conclusions are reached. In many instances, it is not acceptable simply to rely on an impenetrable black box. AI researchers and developers need to be able to explain what is going on inside their models, in order to justify decisions taken. In doing so, some worrying instances of bias have been revealed in the selection of data used to train the algorithms.

I went back to my rider recognition model and used an approach called “Class Activation Maps” to identify which parts of the images accounted for the network’s choice of rider. Making use of the code provided in lesson 7 of the course offered by fast.ai, I took advantage of my existing small set of training, validation and test images of the two famous cyclists. Starting with a pre-trained version of ResNet34, the idea was to replace the last two layers with four new ones, the crucial one being a convolutional layer with two outputs, matching the number of cyclists in the classification task. The two outputs of this layer were 7×7 matrix representations of the relevant image.

The final predictions of the model came from a softmax of a flattened average pooling of these 7×7 representations. The softmax output gave the probabilities of Nibali and Valverde respectively. Since there was no learning beyond the final convolution, the activations of the two 7×7 matrices represented the “Nibali-ness” and “Valverde-ness” of the image. This could be displayed as a heat map on top of the image.

Examples are shown below for the validation set of 10 images of Nibali followed by 10 of Valverde. The yellow patch of the heat map highlights the part of the image that led to the prediction displayed above each image. Nine out of ten were correct for Nibali and six for Valverde.

Class Activation Maps applied to the validation set

The heat maps were very helpful in understanding the model’s decision making process. It seemed that for Nibali, his face and helmet were important, with some attention paid to the upper part of his blue Astana kit. In contrast, the network did a very good job at identifying the M on Valverde’s Moviestar kit. It was interesting to note that the network succeeded in spotting that Nibali was wearing a Specialized helmet whereas Valverde had a Catlike design. Three errors arose in the photos of his face, which was mistaken for Nibali’s. In fact, any picture of a face led to a prediction of Nibali, as demonstrated by the cropped image below that was used for training.

Why should that be? Looking back at the training set, it turned out that, by chance, there were far more mugshots of Nibali, while there were more photos of Valverde riding his bike, with his face obscured by sunglasses. This was an example of unintentional bias in the training data, providing a very useful lesson.

The final set of pictures shows the predictions made on the out-of-sample test set. All the predictions are correct, except the first one, where the model failed to spot the green M on Valverde’s chest and mistook the blurred background for Nibali. Otherwise the results confirmed that the network looked at Nibali’s face, the rider’s helmet or Valverde’s kit. It also remembered seeing an image of Nibali holding the Giro trophy in the training set.

Class Activation Maps applied to the test set

In conclusion, Class Activation Maps provide a useful way of visualising the activations of hidden laters in a deep neural network. This can go some way to accounting for the decisions that appear in the output. The approach can also help identify unintentional bias in the training set.

My last blog explored the effectiveness of deep learning in spotting the difference between Vincenzo Nibali and Alejandro Valverde. Since the faces of the riders were obscured in many of the photos, it is likely that the neural network was basing its evaluations largely on the colours of their team kit. A natural next challenge is to identify a rider’s team from a photograph. This task parallels the approach to the kaggle dog breed competition used in lesson 2 of the fast.ai course on deep learning.

Eighteen World Tour teams are competing this year. So the first step was to trawl the Internet for images, ideally of riders in this year’s kit. As before, I used an automated downloader, but this posed a number of problems. For example, searching for “Astana” brings up photographs of the capital of Kazakhstan. So I narrowed things down by searching for “Astana 2018 cycling team”. After eliminating very small images, I ended up with a total of about 9,700 images, but these still included a certain amount of junk that I did have the time to weed out, such as photos of footballers or motorcycles in the “Sky Racing Team”,.

The following small sample of training images is generally OK, though it includes images of Scott bikes rather than Mitchelton-Scott riders and a picture of Sunweb’s Wilco Kelderman labelled as FDJ. However, with around 500-700 images of each team, I pressed on, noting that, for some reason, there were only 166 of Moviestar and these included the old style kit.

Small sample of training images

For training on this multiple classification problem, I adopted a slightly more sophisticated approach than before. Taking a pre-trained Resnet50 model, I performed some initial fine-tuning, on images rescaled to 224×224. I settled on an optimal learning rate of 1e-3 for the final layer, while allowing some training of lower layers at much lower rates. With a view to improving generalisation, I opted to augment the training set with random changes, such as small shifts in four directions, zooming in up to 10%, adjusting lighting and left-right flips. After initial training, accuracy was 52.6% on the validation set. This was encouraging, given that random guesses would have achieved a rate of 1 in 18 or 5.6%.

Taking a pro tip from fast.ai, training proceeded with the images at a higher resolution of 299×299. The idea is to prevent overfitting during the early stages, but to improve the model later on by providing more data for each image. This raised the accuracy to 58.3% on the validation set. This figure was obtained using a trick called “test time augmentation”, where each final prediction is based on the average prediction of five different “augmented” versions of the image in question.

Given the noisy nature of some of the images used for training, I was pleased with this result, but the acid test was to evaluate performance on unseen images. So I created a test set of two images of a lead rider from each squad and asked the model to identify the team. These are the results.

75% accuracy on the test set

The trained Resnet50 correctly identified the teams of 27 out of 36 images. Interestingly, there were no predictions of MovieStar or Sky. This could be partly due to the underrepresentation of MovieStar in the training set. Froome was mistaken for AG2R and Astana, in column 7, rows 2 and 3. In the first image, his 2018 Sky kit was quite similar to Bardet’s to the left and in the second image the sky did appear to be Astana blue! It is not entirely obvious why Nibali was mistaken for Sunweb and Astana, in the top and bottom rows. However, the huge majority of predictions were correct. An overall success rate of 75% based on an afternoon’s work was pretty amazing.

The results could certainly be improved by cleaning up the training data, but this raises an intriguing question about the efficacy of artificial intelligence. Taking a step back, I used Bing’s algorithms to find images of cycling teams in order to train an algorithm to identify cycling teams. In effect, I was training my network to reverse-engineer Bing’s search algorithm, rather than my actual objective of identifying cycling teams. If an Internet search for FDJ pulls up an image of Wilco Kelderman, my network would be inclined to suggest that he rides for the French team.

In conclusion, for this particular approach to reach or exceed human performance, expert human input is required to provide a reliable training set. This is why this experiment achieved 75%, whereas the top submissions on the dog breeds leaderboard show near perfect performance.

Alejandro Valverde has kicked off the 2018 season with an impressive series of wins. Meanwhile Vincenzo Nibali delighted the tifosi with his victory in Milan San Remo. It is pretty easy to tell these two riders apart in the pictures above, but could computer distinguish between them?

Following up on my earlier blogs about neural networks, I have been taking a look at the updated version of fast.ai’s course on deep learning. With the field advancing at a rapid pace, this provides a good way to staying up to date with the state of the art. For example, there are now a couple of cheaper alternatives to AWS for accessing high powered GPUs, offered by Paperspace and Crestle. The latest fast.ai libraries include many new tools that work extremely well in practice.

There’s a view that deep learning requires hours of training on high-powered supercomputers, using thousands (or millions) of labelled examples, in order to learn to perform computer vision tasks. However, newer architectures, such as ResNet, are able to run on much smaller data sets. In order to test this, I used an image downloader to grab photos of Nibali and Valverde and manually selected about 55 decent pictures of each one.

I divided the images into a training set with about 40 images of each rider, a validation set with 10 of each and a test set containing the rest. Nibali appears in a range of different coloured jerseys, though the Astana blue is often present. Valverde is mainly wearing the old dark blue Movistar kit with a green M. There were more close-up shots of Nibali’s face than Valverde.

I was able to fine-tune a pre-trained ResNet neural network to this task, using some of the techniques from the fast.ai tool box, each designed to improve generalisation. The first trick was to augment the training set by performing minor transformations of the images at random, such as taking a mirror image, shifting left or right and zooming in a bit. The second set of tricks varied the rate of learning as the algorithm iterated repeatedly through the training set. A final useful technique created a set of variants of each test image and took the average of the predictions. Everything ran at lightning speed on a Paperspace GPU. After a run time of just a few minutes, the ResNet was able to score 17 out of 20 on the following validation set.

The confusion matrix shows that the model correctly identified all the Nibali images, but it was wrong on three pictures of Valverde. The first incorrect image (below) shows Valverde in the red leader’s jersey of the Tour of Murcia, which is not dissimilar to Nibali’s new Bahrain Merida kit, though he was wearing red in two of his training images. In the second instance, the network was fooled by the change in colour of Moviestar’s kit, which had become rather similar to Astana’s light blue. The figure of 0.41 above the close-up image indicates that the model assigned only a 41% probability that the image was Valverde. It probably fell below the critical 50% level, in spite of the blue/green colours, because there were were far more close-up shots of Nibali than Valverde in the training set.

Incorrect images of Valverde

Overall of 17 out of 20 on the validation set is impressive. However, the network had access to the validation set during training, so this result is “in sample”. A proper “out of sample” evaluation of the model’s ability made use the following ten images, comprising the test set that was kept aside.

Amazingly, the model correctly identified 9 out of the 10 pictures it had not seen before. The only error was the Valverde selfie shown in the final image. In order to work better in practice, the training set would need to include more examples of the riders’ 2018 kit. A variant of the problem would be to identify the team rather than the rider. The same network can be trained for multiple classes rather than just two.

This experiment shows that it is pretty straightforward to run state of the art image recognition tools remotely on a GPU somewhere in the cloud and come up with pretty impressive results, even with a small data set.

Chris Froome has been logging data on Strava since the beginning of the year. He had already completed over 4,500km, around Johannesburg, in the first four weeks of January. The weather has been hot and he has been based at an altitude of around 1350m. Some have speculated that he has been replicating the conditions of a grand tour, so that measurements can be made that may assist in his defence against the adverse analytical finding made at last year’s Vuelta.

Whatever the reasons, Froome chose to “Empty the tank” with epic ride on 28 January, completing 271km in just over six hours at an average of 44.8kph. The activity was flagged on Strava, presumably because he completed it suspiciously fast. For example, he rode the 20km Back Straight segment at 50.9kph, finishing in 24:24, nearly four minutes faster than holder of the the KOM: a certain Chris Froome. Since there was no significant wind blowing, one can only assume he was being motor-paced.

One interesting thing about rides displayed publicly on Strava is that anyone can download a GPX file of the route, which shows the latitude, longitude and altitude of the rider, typically at one second intervals. Although Froome is one of the professional riders who prefer to keep their power data private, this blog explores the possibility of estimating power from the GPX file. The plan is similar to the way Strava estimates power.

Knowledge is power

An interesting case study is Froome’s TT Bike Squeeeeze from 6 January, which included a sustained 2 hour TT effort. Deriving speed and gradient from the GPX file is straightforward, though it is helpful to include smoothing (say, a five second average) to iron out noise in the recording. It is simple to check the average speed and charts against those displayed on Strava.

Several factors affect air density. Firstly, we can obtain the local weather conditions from sources, such as Weather Underground. Froome set off at 6:36am, when it was still relatively cool, but he Garmin shows that it warmed up from 18 degrees to 40 degrees during the ride. Taking the average of 29 for the whole ride simplifies matters. Air pressure remained constant at around 1018hPa, but this is always quoted for sea level, so the figure needs to be adjusted for altitude. Froome’s GPS recorded an altitude range from 1242m to 1581m. However we can see that his starting altitude was recorded as 1305m, when the actual altitude of this location was 1380m. We conclude that his average altitude for the ride, recorded at 1436m, needs to be corrected by 75m to 1511m and opt to use this as an elevation adjustment for the whole ride. This is important because the air is sufficiently less dense at this altitude to have a noticeable impact on aerodynamic drag.

An estimate of power requires some additional assumptions. Froome uses his road bike, TT bike and mountain bike for training, sometimes all in the same ride, and we suspect some rides are motor-paced. However, he indicates that the 6 January ride was on the TT bike. So a CdA of 0.22 for drag and a Crr of 0.005 for rolling resistance seem reasonable. Froome weighs about 70kg and fair assumptions were taken for the spec of his bike. Finally, the wind was very light, so it was ignored in the calculations.

Under these assumptions, Froome’s estimated average power was 205W. The red shaded area marks a 2 hour effort completed at 43.7kph, with a higher average power of 271W. His maximal average power sustained over one hour was 321W or 4.58W/kg. There is nothing adverse about these figures; they seem to be eminently within the expected capabilities of the multiple grand tour winner.

Of course, quite a few assumptions went into these calculations, so it is worth identifying the most important ones. The variation of temperature had a small effect: the whole ride at 18 degrees would have required an average of 209W or, at 40 degrees, 201W. Taking account of altitude was important: the same ride at sea level would have required 230W, but the variations in altitude during the ride were not significant. At the speeds Froome was riding, aerodynamics were important: a CdA of 0.25 would have needed 221W, whereas a super-aero CdA of 0.20 rider could have done 195W. This sensitivity analysis suggests that the approach is robust.

Running the same analysis over the “Empty the tank” ride gives an average power requirement of 373W for six hours, which is obviously suspect. However, if he was benefiting from a 50% reduction in drag by following a motor vehicle, his estimated average power for the ride would have been 244W – still pretty high, but believable.

Posting rides on Strava provides an independently verifiable adjunct to a biological passport.

Many commentators have been licking their lips at the prospect of head-to-head combat between Chris Froome and Tom Dumoulin at next year’s Tour de France. It is hard to make a comparison based on their results in 2017, because they managed to avoid racing each other over the entire season of UCI World Tour races, meeting only in the World Championship Individual Time Trial, where the Dutchman was victorious. But it is intriguing to ask how Dumoulin might have done in the Tour de France and the Vuelta or, indeed, how Froome might have fared in the Giro.

Inspiration for addressing these hypothetical questions comes from an unexpected source. In 2009 Netflix awarded a $1million prize to a team that improved the company’s technique for making film recommendations to its users, based on the star ratings assigned by viewers. The successful algorithm exploited the fact that viewers may enjoy the films that are highly rated by other users who have generally agreed on the ratings of the films they have seen in common. Initial approaches sought to classify films into genres or those starring particular actors, in the hope of grouping together viewers into similar categories. However, it turned out to be very difficult to identify which features of a film are important. An alternative is simply to let the computer crunch the data and identify the key features for itself. A method called Collaborative Filtering became one of the most popular employed for recommender systems.

Our cycling problem shares certain characteristics with the Netflix challenge: instead of users, films and ratings, we have riders, races and results. Riders enter a selection of races over the season, preferring those where they hope to do well. Similar riders, for example sprinters, tend to finish high in the results of races where other sprinters also do well. Collaborative filtering should be able to exploit the fact that climbers, sprinters or TTers tend to finish close to each other, across a range of races.

This year’s UCI World Tour concluded with the Tour of Guangxi, completing the data set of results for 2017. After excluding team time trials, 883 riders entered 174 races, resulting in 26,966 finishers. Most races have up to 200 participants , so if you imagine a huge table with all the racers down the rows and all the races across the columns, the resulting matrix is “sparse” in the sense that there are lots of missing values for the riders who were not in a particular race. Collaborative Filtering aims to fill in the spaces, i.e. to estimate the position of a rider who did not enter a specific race. This is exactly what we would like to do for the Grand Tours.

It took a couple of minutes to fit a matrix factorisation Collaborative Filtering model, using keras, on my MacBook Pro. Some experimenting suggested that I needed about 50 hidden factors plus a bias to come up with a reasonable fit for this data set. Taking at random the Milan San Remo one day stage race, it did a fairly good job of predicting the top ten riders for this long, hilly race with a flat finish.

Model fit (prediction)

Rider

Actual result

1

Peter_Sagan

2

2

Alexander_Kristoff

4

3

Michael_Matthews

12

4

Edvald_Boasson_Hagen

19

5

Sonny_Colbrelli

13

6

Michal_Kwiatkowski

1

7

John_Degenkolb

7

8

nacer_Bouhanni

8

9

Julian_Alaphilippe

3

10

Diego_Ulissi

40

The following figure visualises the primary factors the model derived for classifying the best riders. Sprinters are in the lower part of chart, with climbers towards the top and allrounders in the middle. Those with a lot of wins are towards the left.

Now we come to the interesting part: how would Tom Dumoulin and Chris Froome have compared in the other’s Grand Tours? Note that this model takes account of the results of all the riders in all the races, so it should be capable of detecting the benefit of being part of a strong team.

Tour de France

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 1(TT), 2, 5, 6, 10 and 21, but the yellow jersey winner would have been stronger in the mountains and won overall.

Giro d’Italia

The model suggested that Chris Froome would have been ahead in the majority of stages, leaving stages 4, 5, 6, 9, 10(TT), 14 and 21(TT) to Dumoulin. The Brit would have most likely claimed the pink jersey.

Vuelta a España

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 2, 4, 12, 18, 19 and 21. In spite of a surge by the Dutchman towards the end of the race, the red jersey would have remained with Froome.

Conclusions

Based on a Collaborative Filtering approach, the results of 2017 suggest that Chris Froome would have beaten Tom Dumoulin in any of the Grand Tours.