Behavioral cloning is a method by which human subcognitive skills
can be captured and reproduced in a computer program. As the human
subject performs the skill, his or her actions are recorded along
with the situation that gave rise to the action. A log of these
records is used as input to a learning program. The learning program
outputs a set of rules that reproduce the skilled behavior. This
method can be used to construct automatic control systems for
complex tasks for which classical control theory is inadequate. It
can also be used for training.

In this exercise, I outline the method I took to apply behavioral
cloning, discuss some of the tools I used and developed, and reflect
on some of the lessons I learned along the way. This is not a
step-by-step "How-To" script, and so for that reason much of the
Python code below is redacted, though in some cases the results are
shown. Nevertheless, perhaps this article can help students who
stumble into this fascinating topic in the future. So, let's get
started!

2 Method

The method is quite simple. We somehow acquire training
data—perhaps by recording it ourselves—in a computerized driving
simulator, which Udacity provides. Images from that simulator are
shown below. The training data comprise images of the road as seen
from cameras mounted on the simulated car, along with corresponding
control inputs (in this case, just the steering angle). The
training data are used to train a Deep Learningneural network model
so that it recognizes road/car configurations and generates the
appropriate steering angle. The model is then used to generate
inputs (steering angles) in real-time for the simulation, unpiloted
by a human driver. Here are some of the tools we use along the way.

Keras and TensorFlow (really, TensorFlow) are tailor-made for modern
high-performance parallel numerical computation using GPUs,
environments that are easily-obtained with cloud-computing
environments like Amazon AWS. Yet, everything in this project was
conducted just on the ancient laptop shown in Figure 2. While
better hardware would be almost certainly be essential for real Deep
Learning applications and autonomous vehicles, for toy problems like
this one it may not be necessary.

Figure 2: Project Lab

3 Data Collection and Preparation

Behavioral cloning relies on training neural networks with data
exhibiting the very behavior you wish to clone. We use the driving
simulator provided by Udacity, which in its "training mode" can emit
a stream of data samples as the user operates the car. Each sample
consists of a triplet of images and a single floating point number
in the interval [-1, 1], recording the view and the steering angle
for the simulation and car at regular intervals. The three images
are meant to be from three "cameras" mounted on the simulated car's
left, center, and right, giving three different aspects of the scene
and in principle providing stereoscopic depth information.

The driving simulator also has an "autonomous mode" in which the
car interacts with a network server to exchange telemetry that
guides the car. The simulator sends the network server camera
images and the network server is expected to reply with steering
angles. So, not only is the driving simulator critical for
understanding the problem and helpful for obtaining training
data, it is absolutely essential for evaluating the solution.

The data are recorded into a CSV "index file" and corresponding
image files. Each line in the index file correlates images with the
steering angle, throttle, brake, and speed of the car. The images
are identified by filenames referring to three camera images files.

Deep Learning lore says that it is often prudent to randomize the
data when possible and always prudent to split the data into
training and validation sets, which we can do in just a few lines of
shell code.

Now, Paul Heraty argues that it can be useful in the early stages of
developing a solution to "overtrain" it on a small sample comprising
disparate canonical examples. I can confirm that this was
extremely good advice. One of the chief difficulties I
encountered as a newcomer to Deep Learning and its community of
tools was simply "getting it to work in the first place,"
independent of whether the model actually was very good. One of the
chief strategies for overcoming this difficulty I found was to "try
to get a pulse:" develop the basic machinery of the model and
solution first, with little or no regard for its fidelity. Working
through the inevitable blizzard of error messages one first
encounters is no small task. Once it is cleared and the
practitioner has confidence his tools are working well, then it
becomes possible to rapidly iterate and converge to a good
solution. Creating an "overtraining sample" is worthwhile because
overtraining is a vivid expectation that can quickly be realized
(especially with only 3 samples), and if overtraining does not occur
you know you have deeper problems. With a little magic from Bash,
Awk, etc., we can select three disparate samples, with neutral
steering, extreme left steering, and extreme right steering.

4 Exploratory Analysis

It often pays to explore your data with relatively few
constraints before diving in to build and train the actual
model. One may gain insights that help guide you to better
models and strategies, and avoid pitfalls and dead-ends.

To that end, first we just want to see what kind of input data
we are dealing with. We know that they are RGB images, so we
load a few of them for display. Here, we show the three frames
taken from the driving_log_overtrain.csv file described
above—center camera only—labeled by their corresponding
steering angles. As you can see, the image with a large
negative angle seems to have the car on the extreme right edge
of the road. Perhaps the driver in this situation was executing
a "recovery" maneuver, turning sharply to the left to veer away
from the road's right edge and back to the centerline.
Likewise, with the next figure that has a large positive angle,
we see that the car appears to be on the extreme left edge of
the road. Perhaps the opposite recovery maneuver was in play.
Finally, in the third and last image that has a neutral steering
angle (0.0), the car appears to be sailing right down the middle
of the road, a circumstance that absent extraneous circumstances
(other cars, people, rodents) should not require corrective
steering.

Figure 3: Large Negative Steering Angle

Figure 4: Large Positive Steering Angle

Figure 5: Neutral Steering Angle

We can see that the images naturally divide roughly into "road"
below the horizon and "sky" above the horizon, with background
scenery (trees, mountains, etc.) superimposed onto the sky.
While the sky (really, the scenery) might contain useful
navigational information, it is plausible that it contains
little or no useful information for the simpler task of
maintaining an autonomous vehicle near the centerline of a
track, a subject we shall return to later. Likewise, it is
almost certain that the small amount of car "hood" superimposed
onto the bottom of the images contains no useful information.
Therefore, let us see what the images would look like with the
hood cropped out on the bottom by 20 pixels, and the sky cropped
out on the top by 60 pixels, 80 pixels, and 100 pixels.

Figure 6: Hood Crop: 20, Sky Crop: 60

Figure 7: Hood Crop: 20, Sky Crop: 80

Figure 8: Hood Crop: 20, Sky Crop: 100

Next we perform a very simple analysis of the target labels, which
again are steering angles in the interval [-1, 1]. In fact, as
real-valued outputs it may be a stretch to call them "labels" and
this is not really a classification problem. Nevertheless in the
interest of time we will adopt the term.

The data have non-zero mean and skewness, perhaps arising
from a bias toward left-hand turns when driving on a closed
track.

The data are dominated by small steering angles because the car
spends most of its time on the track in straightaways. The
asymmetry in the data is more apparent if I mask out small
angles and repeat the analysis. Steering angles occupy the
interval [-1, 1], but the "straight" samples appear to be within
the neighborhood [-0.01, 0.01].

We might consider masking out small angled samples from the
actual training data as well, a subject we shall return to in
the summary.

A simple trick we can play to remove this asymmetry—if we
wish—is to join the data with its reflection, effectively
doubling our sample size in the process. For illustration
purposes only, we shall again mask out small angle samples.

In one of the least-surprising outcomes of the year, after
performing the reflection and joining operations, the data now
are symmetrical, with mean and skewness identically 0.

Of course, in this analysis I have only reflected the target
labels. If I apply this strategy to the training data,
naturally I need to reflect along their horizontal axes the
corresponding input images as well. In fact, that is the
purpose of the Xflip, yflip, rmap, rflip, and sflip
utility functions described elsewhere.

It turns out there is another approach to dealing with the bias
and asymmetry in the training data. In lieu of reflecting the
data, which by definition imposes a 0 mean and 0 skewness, we
can instead just randomly flip samples 50% of the time. While
that will not yield a perfectly balanced and symmetric data
distribution, given enough samples it should give us a crude
approximation. Moreover, it saves us from having to store more
images in memory, at the cost of some extra computation.
Essentially, we are making the classic space-time trade-off
between memory consumption and CPU usage.

Here, we see that as we increase the number of samples we draw from
the underlying data set, while randomly flipping them, the mean
tends to diminish. The skewness does not behave quite so well,
though a coarser smoothing kernel (larger bin sizes for the
histograms) may help. In any case, the following figure does
suggest that randomly flipping the data with large enough sample
sizes does help balance out the data.

Figure 12: abs(angle)>0.01 - Random Flipping

The sflip utility function flips not only the target labels—the
steering angles—but also the images (as it must). We check that
by again displaying the 3 samples from driving_log_overtrain.csv
as above, but this time with each of them flipped.

Figure 13: Large Negative Steering Angle Flipped

Figure 14: Large Positive Steering Angle Flipped

Figure 15: Neutral Steering Angle Flipped

If we compare these 3 figures depicting the flipped samples from the
driving_log_overtrain.csv set, with the original unflipped samples
in the figures above we can confirm the expected results. The
images are indeed horizontally-reflected mirror images, and the
corresponding steering angles indeed have their signs flipped
(though, trivially for the neutral-steering case). Armed with some
intuition about the data we can now turn to developing and training
the model.

5 Modeling

There are many approaches to selecting or developing a model. The
one I took was to assume that there is wisdom and experience already
embedded in the Deep Learning models that already have been
developed in the autonomous-vehicle community. I decided to choose
one of those models as a starting point, and then build off of it or
adapt it as needed. There are many to choose from, but two
well-known and often-used ones are a model from comma.ai and a model
from NVIDIA. Comparing these models we see some general shared
characteristics.

start with a non-trainable normalization layer that scales the
input so that each pixel color channel is in [-1.0, 1.0].

no sigmoid, softmax, or one-hot encoding

consider adding pooling and dropout layers

optimize mse using the ADAM optimizer so that at least I do not
have to worry about learning rate parameters

An example model is laid out in Keras code below. It's not redacted
because while this is a real Keras model, it is not the one that I
used and almost certainly is far too simple for the task at hand.
It's provided for illustration purposes only.

As another sanity check, we conduct a small training (3 epochs, 30
samples per epoch, batch size 10) of the data in
driving_log_overtrain.csv. This is just to "get our feet wet" and
quickly to verify that the code written above even works. Note that
we use the same file for the validation set. This is just a test,
so it does not really matter what we use for the validation set.

Next, we perform the actual training on the
driving_log_train.csv file, validating against the
driving_log_validation.csv file. After this training we
actually save the model to model.json and the model weights to
model.h5, files suitable for input into the network service in
drive.py.

7 Lessons

This was a whirlwind tour of using applying behavioral cloning to a
toy example for self-driving cars, and to protect the innocent a lot
of the technical details had to be withheld. Still, we can
summarize some key lessons learned in the exercise.

Think outside the box for tooling. Not everything need be done in
just one language, such as Python.

Consider generating a tiny overtraining sample for testing your
pipeline and model against an expected "non-result."

As in any scientific inquiry, think carefully and pose a question
that can reasonably be answered with the resources you have.

If presented with a question—or a homework assignment—read the
problem statement and its objectives carefully, so it can guide
your investigation.

Embrace the notion of Getting a Pulse, using test data like your
overtrain sample to help work through the mechanics of your data
pipeline, your model, and your training code, irrespective of the
accuracy of the predictions it generates.

Think carefully about what you already believe about the problem
space and try to use it to your advantage. At the risk of some
generality, simple operations like resizing and cropping images
may save your model from having to learn things you already know.

Read the documentation! You may find some delightfully useful
tools buried in your toolboxes.

For instance, you might consider performing pre-processing steps
like normalization, resizing, and cropping right in the model.
It simplifies your code and you get those operations for free
wherever the model is used, at the possible expense of more
numerical processing.

Likewise, try to keep in mind sound software engineering
principles, like the classic space-time trade-off. For instance,
you may be able to fit all your data in memory after all, and if
you can, it may speed up training considerably. This will not
always be the case, but pay attention to when it is.

Think carefully about your model and what it is—and
isn't—capable of. For instance, if you have a neural network
with no memory or anticipatory functions, you might downplay
the importance of features within your data that contain
information about the future as opposed to features that contain
information about the present.

Consider starting simple and adding complexity only as needed,
rather than the other way around. For instance, start with a
small model and set aside pre-processing steps like augmentation,
until and unless they become necessary. Smaller models and less
pre-processing translates to faster training and more rapid
iteration.

Likewise, consider stripping away complexity that adds friction to
the iteration cycle. If you have a small model, a small training
set, and you find cloud-basedGPUs unwieldy, run some initial
experiments to see if you can iterate right on your local
environment. You might not be able to, but if you can it is often
worth it.