README.md

Traffic Sign Recognition

Introduction

In this project, I developed a model using a Convolutional Neural Network (CNN) to classify traffic signs. The model was trained and validated using images from the German Traffic Sign Dataset. After the model was trained, it's performance was evaulated on random traffic sign images found on the web.

My final model results are as follows:

Training set accuracy of 99.90%

Validation set accuracy of 98.84%

Test set accuracy of 97.29%

New Test set accuracy of 100% (6 new images)

The steps undertaken in this project are as follows:

Load the data set

Explore, summarize and visualize the data set

Design, train and test a model architecture

Use the model to make predictions on new images

Analyze the softmax probabilities of the new images

Data Set Summary & Exploration

1. Basic summary of the data set

I used the NumPy library to calculate summary statistics of the traffic signs data set:

The size of original training set is 34799

The size of the validation set is 4410

The size of test set is 12630

The shape of a traffic sign image is 32x32x3 represented as integer values (0-255) in the RGB color space

The number of unique classes/labels in the data set is 43

2. Exploratory visualization of the dataset

Following the data set summarisation, I performed exploratory visualizations of the data set. This included:

Plotting an image for each label from the training set. Some of these have been displayed below for convenience.

Ploting bar charts to see the image distribution across the training & validation sets.

We notice that the distribution in training set is not balanced. We have some classes with less than 300 examples and other well represented with more than 1000 examples. A similar story is observed in the validation set distribution as showcased above.

Design and Test a Model Architecture

1. Augmentation

Given the small size of the training set and the uneven distribution of examples for each class label, I decide to generate additional images using Image Augmentation techniques like:

Central Scaling

Translation along the horizontal and vertical axes

Rotation

Varying lighting conditions, and,

Motion blurring.

The additional benefit of the augmentation process was that it resulted in an increase in the diversity of the data consumed by the model for feature learning.

Following the augmentation process, the size of the training set was 108,242. The class label distribution as a result has been visualised below.

2. Pre-processing

This phase is crucial to improving the performance of the model and also determines how quickly your model converges to a solution. My pre-processing pipeline did the following:

RGB color space -> Grayscale: The images were converted from their original RGB color space to Gray scale. This reduced the the numbers of channels in the input of the network and consequently the amount of memory required to represent these images. My basis for doing this stemmed from the observation that the colors in the image did not impart any additional information that could be leveraged by the network for the classification task.

Feature Scaling & Mean Normalisation: I normalized each image to ensure it had a mean of 0 and standard deviation of 1, and,

Contrast Limited Adaptive Histogram Equalization (CLAHE): I used this algorithm for its ability to amplify local details in areas that are darker or lighter than most of the image. Performing local contrast enhancement on the input images improved the models feature learing ability and consequently its classification accuracy.

3. Final model architecture

The architecture of my final model is inspired by the model presented by Pierre Sermanet and Yann LeCun in their paper.
My model consisted of 3 CONV layers and 2 FC layers adding up to a total of 5 layers. Moreover, similar to the model presented by Sermanet and LeCun, my model also provides the classifier with different scales of receptive fields/ multi-scale features*. This is achieved by branching the output of the 1st CONV layer, performing a second round of subsampling and finally feeding it to the classifier in combination with the output from the 3rd CONV layer.

5. Solution Approach

My final model results were:

Training set accuracy of 99.90%

Validation set accuracy of 98.84%

Test set accuracy of 97.29%

Below I provide an overview of the steps I undertook to reach to my final model architecture. NOTE: for each attempt I only highlight the model params / architectural changes I made in response to the preceeding attempt to improve the training and validation accuracy.

Attempt 1: validation accuracy 93.0%

I started with the baseline LeNet-5 architecture.

Epochs = 10

Learning rate = 0.001

Batch size = 128

Adam Optimiser

Attempt 2: validation accuracy 93.2%

Changes made:

Epochs = 20

Learning rate decay of 10 after 10 epochs

Dropout (with keep probab = 0.8) over C1, C3, C5 and F6

The validation accuracy peaked at ~93% despite all these efforts.

Attempt 3: validation accuracy 96.1%

Changes made:

Increased the size of the training set by augmenting new images. Size of the training set increased by 69% to ~108k images.

Attempt 6: validation accuracy 98.84%

Model Evaluation on new images

In addition to evaluating the performance of my network over the test set, I decided to test how well the network classifies random traffic sign images pulled from the web.

The following images were used:

and the result were as follows:

As showcased above the prediction accuracy was 100%!

Happy days!

Future work

In order to improve the performance of my model I would perform the following:

Evaluate the performance of the network on a bigger and more diverse production data set

Leverage model ensembles

Do a more detailed analysis of model performance by looking at predictions in more detail. For example, calculate the precision and recall for each traffic sign type from the test set and then compare performance on the six new images.

Create a deeper network akin to AlexNet/ZFNet by stacking more conv layers prior to each pooling operation.

Utilise global average pooling to reduce the number of FC layers in the classifier thereby also reducing the overall number of params across the network

Alternatively, rather than constructing the network from scratch, I would like to leverage Transfer Learning by using existing CNN architectures to classify the traffic sign images. In particular, I would like to use ResNet/GoogleNet/VGGNet.