Deep Learning Simplified

Image Recognition

In the CNN tutorial, I covered the basics of how image recognition/classification works. Now we will extend that beyond just classifying digits. However, we will be using some very powerful pre-trained models to help with our unique task. Our tutorial will be based on this link but I will extend to show how to train for a variety of different tasks and also talk a bit more about transfer learning and its nuances.

Benefit from trained intricate image processing models, such as the Inception model.

Installation:

For this implementation, we will be using some component not available in the current stable tensorflow version (r0.10). So if you are using this one or an older one, head over to this linkand click on the right version for your computer and then download with something like:

sudo pip install --upgrade tensorflow-0.11.0rc0-py2-none-any.whl

Next get into your workspace and download the models from tensorflow with:

git clone https://github.com/tensorflow/models/
cd models/slim

Fine-tuning a model

Directories

We will first set up the directories for the trained model (on original dataset), new dataset directory which has the data we wish to train on and a train directory to store new fine-tuned checkpoints.

Data

We will download the flowers dataset and train it on inceptionV3 which was originally trained on imagenet. The program below changes images into TFRecord format. Later on, we will take a look at how to do this with a dataset of image of our own.

Inception v3 was originally trained on Imagenet (1000 classes) but we will fine-tune the model to perform on the flowers dataset (5 classes). We can’t just use the entire original model and expect results so we will keep the earlier layers from the model and implement new layers at the end to fine-tune for our task. The intuition behind this is simple; the earlier layers are able to learn general features that can be assumed to be ubiquitous features in all images. So we can keep the exact weights for these layers. But the later layers, especially the ones nearing the softmax classification are highly skewed towards the original task (Imagenet). They were trained to pick up concrete features based on the input images so we will remove these weights and just retrain these points with our new task. The early conv layers’ weights will be frozen and will maintain their original weights while fine-tuning. I oversimplified this principle of transfer learning but below is a good set of rules to go off on from cs231. For a more formal study, I suggest this paper for more information on how transferable some features can be.

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.

New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.

New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.

New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

It’s also a good idea to keep some constraints in mind, such as using a small learning rate. Our final linear classifier layer(s) may be randomly initialized but if we choose to fine-tune some of the layer conv layers as well, then we need to use a small learning rate. The original conv layer kernel weights for these later layers are still pretty decent so we don’t want to distort them by a lot. If we use a large learning rate, the effect from backpropagation will be high especially because our linear classifiers have been randomly initialized.

With the parameters that we are passing in above, we are fine-tuning our inceptionV3 model with the flowers dataset. Since the dataset is small and similar to original dataset we only have to train the logit producing (FC) layers. If we wanted to train previous layers we just have to append the appropriate layers by name to checkpoint_exclude_scopes and trainable_scopes as below. To get the names of layers for inceptionV3 (or other pretrained models), head over to this link.

Our Data

So far we’ve seen how to fine-tune a pretrained model with a new dataset but some parts may still be unclear for when we want to train on any dataset. So now, we will take a closer look at creating a TFRecord formatted dataset for fine-tuning and we will also do some evaluation with qualitative results. More details available on the inception page. Let’s move WORKSPACE and the inception directory from models/inception/ to models/slim/cat_classifier. We will also create TRAIN and VALIDATION directories to hold our image data (details below). Also create an empty TFRecord_data folder to store our formatted data in later.

Let’s build an iconic cat classifier. We’ll first get our images from Google image and we will use Fatkun Batchto download the images. Once you download the dogs and cats (they download into separate folders and at the location you specified), let’s go ahead and convert them to TFRecord format. We will first need to arrange our data to look like below:

shards: “we aim for selecting the number of shards such that roughly 1024 images reside in each shard.”

labels.txt: Needs to go inside: Note that the labels skips 0 since it is reserved for background.

OUTPUT_DIRECTORY=TFRecord_data/

1:dog
2:cat

Note: when executing the bazel command you may get some weird errors, just retry in a few seconds or delete the folder and restart. You may also get some errors from copy/paste from here or a notepad (esp. for the bazel-bin command, so make sure it’s all cleanly copied)

So now, inside TFRecord_data, we will have our TFRecord formatted images like so: