Kaggle Competition — Image Classification

How to build a CNN model that can predict the classification of the input images using transfer learning

First misconception — Kaggle is a website that hosts machine learning competitions. And I believe this misconception makes a lot of beginners in data science — including me — think that Kaggle is only for data professionals or experts with years of experience. In fact, Kaggle has much more to offer than solely competitions!

There are so many open datasets on Kaggle that we can simply start by playing with a dataset of our choice and learn along the way. If you are a beginner with zero experience in data science and might be thinking to take more online courses before joining it, think again! Kaggle even offers you some fundamental yet practical programming and data science courses. Besides, you can always post your questions in the Kaggle discussion to seek advice or clarification from the vibrant data science community for any data science problems.

In the following section, I hope to share with you the journey of a beginner in his first Kaggle competition (together with his team members) along with some mistakes and takeaways. You can check out the codes here. The sections are distributed as below:

Approach

Whenever people talk about image classification, Convolutional Neural Networks (CNN) will naturally come to their mind — and not surprisingly — we were no exception.

With little knowledge and experience in CNN for the first time, Google was my best teacher and I couldn’t help but to highly recommend this concise yet comprehensive introduction to CNN written by Adit Deshpande. The high level explanation broke the once formidable structure of CNN into simple terms that I could understand.

Image Preprocessing

Generate batches of tensor image data with real-time data augmentation that will be looped over in batches

The data augmentation step was necessary before feeding the images to the models, particularly for the givenimbalanced and limited dataset. Through artificially expanding our dataset by means of different transformations, scales, and shear range on the images, we increased the number of training data.

— FIRST Mistake —

I believe every approach comes from multiple tries and mistakes behind. So let’s talk about our first mistake before diving in to show our final approach.

We began by trying to build our CNN model from scratch (Yes literally!) to see how the CNN model performed based on the training and testing images. Little did we know that most people rarely train a CNN model from scratch with the following reasons:

Insufficient dataset (training images)

CNN models are complex and normally take weeks — or even months — to train despite we have clusters of machines and high performance GPUs.

The costs and time don’t guarantee and justify the model’s performance

Transfer Learning

So… What the heck is transfer learning?

Transfer learningis a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. In our case, it is the method of taking a pre-trained model (the weights and parameters of a network that has been trained on a large dataset previously) and “fine-tuning” the model with our own dataset.

After that we created a new fully connected output layer, followed by a dropout layer for regularization purpose.

Finally, we added a softmax layer for 18 classes (18 categories of the images) and combined the base model with the new output layers created.

At this stage, we froze all the layers of the base model and trained only the new output layer.

This is the beauty of transfer learning as we did not have to re-train the whole combined model knowing that the base model has already been trained.

Fine Tuning the Combined Model

Once the top layers were well trained, we fine-tuned a portion of the inner layers.

Optionally, the fine tuning process was achieved by selecting and training the top 2 inception blocks (all remaining layers after 249 layers in the combined model). The training process was same as before with the difference of the number of layers included.

YES, and we’re done!

Results

The final accuracy was 78.96%.

We tried different ways of fine-tuning the hyperparameters but to no avail.

When all the results and methods were revealed after the competition ended, we discovered our second mistake…

— SECOND Mistake —

We did not use ensemble models with stacking method.

The common point from all the top teams was that they all used ensemble models.

Instead, we trained different pre-trained models separately and only selected the best model. This approach indirectly made our model less robust to testing data with only one model and prone to overfitting.

Final Thoughts

Despite the short period of the competition, I learned so much from my team members and other teams — from understanding CNN models, applying transfer learning, formulating our approach to learning other methods used by other teams.

The process wasn’t easy. The learning curve was steep. The learning journey was challenging but fruitful at the same time. And I’m definitely looking forward to another competition! 😄

Thank you for reading.

If you enjoyed this article, feel free to hit that clap button 👏 to help others find it.

As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! 😄