My experience with new deep learning course from deeplearning.ai @coursera

I am deeply intrigued by advancement of AI that is happening in recent years fueled by deep learning techniques. As of millions of others around the globe I also started learning about this technique and got more and more convinced maybe we are lucky to be around this time when AI is really taking off after many winters before. In fact when Richard Socher famous for his work in NLP and CS224D in stanford (A Deep Learning for NLP course) said that this may be AI spring, it bolstered my belief and I decided to invest more time into it.

Being a full time software professional mainly working in enterprise for developing analytics and data driven insights also motivated me to learn this techniques and look for opportunity to apply to understand data and help business applying predictive power of machine learning and deep learning.

With this motivation I have done some online MOOCs starting with Andrew Ng’s Intro to ML at Coursera and I deeply fall in love. Initially it was hard for me because of unfamiliarity with Matlab but I worked through my way by searching and going through documentation. It took about two months for me to finish but It boosted my confidence that followed by further study this techniques from places like edx.org, coursera.org, udacity.com, udemy.com and online materials.

Recently I encountered a course from Andrew Ng at coursera from his company deeplearning.ai. I immediately understood that when it is Andrew Ng definitely it will be an exciting learning opportunity. In the meantime I have gained some understanding of deep learning and already coded some solution using mostly python and tensor flow. But while learning mostly from online resources I knew that I am creating some knowledge debt for me because while I am applying this technique and have some understanding how they are impacting my model’s performance sometime it was not very clear how they really work and mostly why they help deep learning models. Hence I had in my mind was waiting for a course like this where important concepts of deep learning will be explained in depth and step by step so that I can grasp this and apply in practice efficiently. I am lucky to have taken this course and it was exactly what I desired.

Deep learning specialization is a 5 course program and the first one is about introduction to neural network and deep learning. I will break down my experience and some notes about the first course of the specialization and will post in future blogs about rest of the courses, because I am planning to

complete full specialization.

Before getting into the course details I would like mention that there are very few pre requisite of this course that I believe would help and this may be different from what course actually has listed.

Prerequisites:

Programming: Basic python programming skills is required. This means ability to write simple code which are not more than few lines, knowledge of numpy libraries is helpful.

Math: Knowledge about linear algebra like matrix, vectors, addition, division and dot product of matrices will be helpful. If you are familiar of these operation using numpy library that would be really great. Also some familiarity of calculus like taking first order derivative of simple functions will be helpful.

All these are nice to have not really mandatory to pass the course as course will have enough to prepare you with required knowledge.

For math you can get help from khanacademy.org whenever needed to revisit some concepts.

Comparison with similar courses: There are many great courses for beginning a journey in neural network and deep learning. Few great of these courses are Deep Learning NanoDegree (DLND) @Udacity and fast.ai mooc part1 and part2.

The distinction of the course at coursera is that it really get into skins of things and take step by step approach from very simple to more complex bottom up. It will teach you to do forward propagation, defining and optimizing loss using backpropagation by gradient descent algorithm. But then it builds from that to get into building deep network, practical understanding of all hyper parameters and how to tune them, describing dropout, adam, rmsprop and all other most important concepts to build a practical efficient deep learning model.

Udacity DLND is fast forward into quickly understand the concepts of neural network intuitively and without getting into math it directly takes off into coding part and let’s you build sophisticated practical deep learning models. It stresses on programming network and not too much on understanding nuts and bolts.

Fast.ai believed to be taken top down approach, they use mostly keras APIs to build model and leverage transfer learning techniques. Then jeremy piece together different parts of the working model and explain and sometime provide resources for self learning. It is a fast track (as names says it all) for really developing very powerful models and utilize most advanced techniques for all its practical purpose.

Let’s get into the first course see what is in there:

Course-1: Neural network and Deep learning:

Week-1: First week is really very introductory where Andrew familiarize us (students) with basics of the neural network without any math or getting into a lot of details which I believe he will in later weeks.

He explains what is a simple neural network and from a bird’s eye view how it works. He describes supervised learning with neural network and informs us the most of the success of deep learning is still in the supervised learning space.

He reminds us of the fact that many of the research on neural network is decades old but why it is currently taking off. Fundamentally lots of data in the era of the smartphone connected the device, social media and exponential growth of data collection has made it possible to experiment with the large neural network.

Massive compute power (invent of GPU in mainstream computing — thanks to game technology) has made it even easier to develop and train large models at a reasonable time and scientist finally continued to research as there is a lot of commercial and academic interest never seen before.

He also has a great interview >45 min with Godfather of Deep Learning Jeff Hinton was really enjoyable and informative. I came to know that Hinton is continuing to experiment a lot of his early ideas on mapping works of the brain into developing models like his circulation algorithm.

There is a quiz of some basic 10 questions mostly on the topic he discussed on the lectures.

With this, I conclude my update on week-1. See you next time for week-2.

Week-2:

As expected Andrew takes a bottom up approach, unlike many other similar deep learning MOOCs.

He lets you learn bit by bit of internal details and sew them together to bring the bigger picture.

This week is about learning logistic Regression and implement one from scratch. All you will be using Python, numpy, matplotlib (for visualization) and create a classifier.

The video lectures help you to understand most critical building blocks of logistic regression for binary classification.

First, it explains how each neuron can be considered as logistic regression operation and logistic regression can be thought of as a shallow neural network.

How it works by explaining P(y=1|x) = sigmoid(W.T+X, b )

It goes step by step into details and (remember from the intro to ML) write all the derivation maths. He explains loss function used in Logistic Regression and why SSE come short. How it is a convex problem and how to use Gradient Descent to find global minima. How to calculate average loss over all training example (not SGD yet) and optimize loss by iterating update of weights and biases. Here Andrew explains some calculus for the driving derivative of loss function over model parameters and he explains knowledge of calculus is great to have but he will provide sufficient explanation of key areas needed to understand neural network through this course. He does an excellent job in explaining backpropagation algorithm using the chain rule. He explains backpropagation as a computational graph and I would say it is as good as CS231n (Karpathy class).

The most valuable part of this week for me was how he helps to implement vectorization of these operations and leverage numpy library as a tool. This is great as it saves a lot of computational cost for a large network and only practical. So I would stress the fact try avoiding “for loop” and implement vector operation, it really reduces the number of operation as well as the number of lines of code.

At the end, there is a quiz and one practice program to go over few basic Python and numpy operation taught and needed for logistic regression. There is a one graded programming assignment to build a binary classifier to classify cat images.

This means you get to write your few lines of code to create activation function (sigmoid), calculate the cost, implement Gradient Descent to optimize your model and predict class using your model.

Also strongly recommend playing with num iterations, learning rate to see their effect on test accuracy, plot learning curve to see performance over them and gain intuition,

Week-3:Week-3 is really interesting where we get to gain understanding of actual neural network from logistic regression from week-2. Here Andrew explains what is hidden layer of neural network and explains the reason to have that which wasn’t there in logistic regression model. In fact he explains logistic regression could be thought of as simplest form of neural network without any hidden layer. Also he go over on the non linearity requirement of neural network and what it is needed. He explains how without nonlinearity neural network becomes a series of linear combinations of input data and whole network could be thought of as a single logistic regression and final weight (W) are the combinations of weights in the previous layers. He then justifies usage of nonlinearity in neural network to learn features that are more complex and also relations between input data and not just learn input observation in isolations.

Important aspects to note here that how neural network is built upon logistic regression with activation by using nonlinear function like sigmoid, tanh or ReLU.

He goes over various activation functions (most popular) and explains their strength and limitations. Like sigmoid has problem of saturation when input is very high or low which may arise depending on the data and can make gradient so small that learning becomes very very slow. But sigmoid squashes output to [0,1] and provides probability like interpretation. He explains how hyperbolic tangent solves that problem but it has zero gradient for negative input. He mentions that ReLU or Leaky ReLU probably mostly used nonlinear activation till date. Beside binary classification for which sigmoid could be better choice (two class output) pretty much ReLU is a good idea.

There is a quiz on basics of shallow neural network concepts and also a programming assignments on planar data to classify using a single layer neural network with ReLU in hidden layer and sigmoid in the output layer. As a tips, all I want to say that take notes and walk along with Andrew sir’s maths and you should be fine. Pay close attention with vector operations of the implementation that later you will do it using numpy library.

Ok, with this I will conclude week-3 summary, best of luck everyone.

Week-4:

This is the last week of the first course of 5 course specialization. In this week we take the neural network and make it large in terms of number of layers. So far we had seen logistic regression which is neural network without hidden layer and nonlinear activation and one hidden layer neural network. In practical implementation we have see that we were achieving ~80% training accuracy, but now it is time to reduce bias in our model by creating bigger network.

Note that making large model will enable learn more complex features from data and training accuracy will improve but if model size or number of neurons at each layer increases disproportionately with number of training observation then model will overfit and generalize poorly on test data.

In this week Andrew will explain how step by step same methods that we have learned in previous week — calculate logit Z = WX+b and activation A = g(Z) will be done layer after layer with correct dimension of W and b. Note that W will be always (#number of neuron in current layer, # of row of input) for first hidden layer and (#number of neuron in current node, # neuron in previous layer) for rest of the hidden layers before output layer. Bias b would be a vector (# of neuron in current layer, 1).

There is a programming exercise where you get to build a deeper network and test your model on training and test accuracy. You will try different number of layers and plot learning curve to see where you are getting better accuracy.

With this I will conclude my summary of the first course. I would like stress few pointers to get the most from this course:

1> Pay attention at all times and rewind forward video to capture important points discussed in the video and explained in details.

2> Take notes, write formulas, steps and work with Andrew as he writing all the formulas and steps.

3> be sure to understand vector implementation of math operations, no company will pay you for writing for loop where vectorization was possible.