Racing in depth. Preparing a strategy for AWS DeepRacer League

August has been an incredible month in the AWS DeepRacer world. The Udacity & AWS DeepRacer Scholarship Challenge brought a lot of attention and this has lead to many new community members with varying level of technical skills. As a community we tried our best to help newcomers with setting up their local training and with studying the reinforcement learning. I would like to share my understanding of the competition itself.

If you are looking for an article on how to reach the second place with just 0.006s loss to Karl-NAB, this one is not it. I mainly intend here to present different aspects that I believe you need to look at when approaching the AWS DeepRacer related competitions, be it the Virtual League, physical races, the weekly challenges from AWS or the AWS DeepRacer Community Contest.

How to approach a competition

1. Identify your goal

When you write an exam in class, you might want to just about pass, pass with a solid score or maybe for instance nail it in one category.
When you are an animal running away from a lion, you need to make sure you’re fastest than the slowest animal in the group.
When you race, you need to do well on the racing track. What makes you do well?

That really depends on the challenge you’re taking on. I really like the challenges where the goal is to be as close to a given time result as possible. I haven’t tried them yet.

When you are taking part in the main races to try and make it into the finals of the AWS DeepRacer League you want to get a fast lap.

2. Understand how you can evaluate your goal

Decide what your evaluation criteria will be. You can check your car can complete a lap using least actions. TonyJ from the community told me he managed to get 15 seconds lap on Shanghai Sudu using just the sharp left and sharp right actions on one speed. He would win in the category for least actions to complete a lap. It’s pretty impressive, isn’t it?

But this isn’t the evaluation of your racing goal.

If the goal is to go fastest on the racing track, I will not base my verification on how well it’s doing on the training track. Up to some point it might be the correlated, past it the car might do worse in race when it does better in training. And it might not be able to complete a training lap but do well in the race. That’s why I keep repeating: submit early, submit often.

Except for the Community challenge - you only get one lap, then the challenge is over for you. What would you do to evaluate your goal in such case?

3. Choose your strategy

To me the strategy is a combination of many elements:
what you expect of your model - the reward function
what you expect of your car - the action space
what you expect of your learning - the hyperparameters
what track you want to train on - and it may be many
what approach you want to apply to your training on a given track - the starting point, direction etc
what tooling you want to use - the AWS DeepRacer Console? Maybe local setup? How will you analyze the logs?

My personal advice would be: instead of asking others how to get best results, first turn that into what-if question and check it, then consider starting a discussion with others to get some new hints. In July I thought: what if I let my car use just four actions? I did that, to a point where I even could not complete the Tokyo training track or any other track I used in the training. How did it do? It turned out it converged faster and while it wasn’t a winner, it was enough to put me in fifth spot.

Bear in mind that this was one of the many, many what-ifs that I tried and I had to ditch most of them, and some seemed to be giving good outcomes but turned out to be an accidental correlation. But I strongly believe none of them were a waste of time (apart from those where I had a plan and made a mistake in implementing it, so the car learned to do the opposite; I still had a lesson though). This is something that training locally let me do at a scale at which I wouldn’t be able to do in the console, with options that the console hides from you (and it’s great that it does as it makes it so easy to start).

4. Monitor your progress

So you have your goal, your evaluation technique, your strategy. How will you monitor that? There is a temptation rely on the reward graph. The reward is growing so it must be progressing, right? Maybe. What if the reward function has a flaw? If I reward the car to go off the track, the reward will also be increasing, but that will not help me complete the track.
I recommend submitting a new model for the race every now and then and comparing how its performance aligns against your training results. Submit it regularly for some period of time to gather the data and reevaluate your strategy.

5. Get to know your model

I mentioned tooling and I will keep reminding: analyse the data that you’re given. I focused efforts around Jupyter Notebook and Pandas. RayG from the community is using some awesome real time graphing with bash. I love his graphs. I hope he’ll share some of them at some point.

It doesn’t matter what tools you use as long as you try to learn when you’re doing well and when not so much, and try to adjust it to be better than you yesterday.

Try to identify where you have problems. If you have a point at the track that your car struggles to get past, maybe you’re trying to train the car to do it wrong? Or maybe your action space is too tight to get it done? Or maybe too big to converge into a good action within reasonable time? I’m not saying data will give you all the answers, but ability to process it and come to conclusions will give you some advantage.

6. Follow the rules

Sure, there are more rules, but these two I find the most important - you cannot use a model that someone makes for you and you should complete a lap. How far can you push the rest?

In May many people were wondering if it was possible to use higher speeds in the race since some did. Some have asked AWS and the response was: sure. Next month the max speed got increased in the console to 8 m/s. And it’s not even the maximum value allowed - in local training you can set it to more, at a cost of losing grip.

In July some people realised you could back-spin - the training code has a safety check that blocks progress from going down when your car starts going back (losing grip). But if it went from 0 to 99.9, it wasn’t going back according to the check. Speed -20, Turn -60 and you can “complete a lap” with a single action. Luckily it doesn’t satisfy the AWS criteria of a complete lap. I don’t remember how many people got those times removed from the final results, I think it was seven or eight of them.

7. Luck is an ingredient, statistics are power

Be careful when watching your car. You may get easily lured into thinking that your car is doing well because you saw it behave so. You may begin to think that your model is good, because you got a good time.
There is this ingredient of luck in the whole racing. A given model can give results that vary and with enough samples the complete lap times will fall into a normal distribution. Can you prove it’s a good model and not a slightly stretched happy tail of it? That’s why I encourage to keep looking at the logs, to gather statistics and try and draw conclusions. Your model is only as good as you can prove it.

But then at some point you’ll see that stability of performance brings more average and fewer better times. In such cases you may need to decide to play the risky game and say farewell to stability in favour of a chance to get a faster lap. Say hello to your luck again.

And there is another point. At some point it makes sense to call it a day - it might be that more training will not bring more improvement, but constant submissions might get you into the happy tail of normal distribution.

8. Don’t forget to improve yourself, not just the car

That’s it, really. I’m pretty sure that isn’t the full picture of racing, especially since I am aware of one big issue of mine: I still don’t know too much machine learning.

The idea of DeepRacer sparked my curiosity at the re:Invent 2018. I decided to take part in the race so that at least the fact that I got the car doesn’t go to waste. Then the AWS Summit in London happened. Then the community happened and I decided to contribute to building it. I was improving slightly every month and now I need to prepare for the finals in December at the re:Invent 2019.

I have learned a lot so far, but mainly about DeepRacer. That’s why I’m currently following a Coursera series “Mathematics for Machine Learning” from King’s College after which I will start learning about machine learning itself.

The folks at AWS have set everything up for us to start racing and have as a side effect (or on purpose) hidden most of the internals behind a higher level concept of it that we can influence (a couple hyperparameters, action space, reward function). I have this strong feeling that getting to know the internals might lead to much faster learning, but first of all to apply all the experience outside of DeepRacer. That’s the point, isn’t it?

When the fun stops, stop.

The race is very addictive and can be very involving. It can also be quite pricey. Remember to maintain a healthy balance. Have enough sleep, watch the billing dashboard and Cost Explorer (every day, seriously), set your personal goals and be happy to let go when you reach them if once you get past them there is more struggle.

It’s rewarding to get such a result, but I remembered at every point that I kept pushing only as long as there was a potential to get more. If there wasn’t, all my important goals were already satisfied and getting to a spot with the reward was an extra.

Still, I know I wouldn’t be able to carry on for another month like in August, there was too much of it. Luckily I can now focus more on learning and helping to build the community. I might even bake some bread, I haven’t for a month.