The goal of this course is to give learners basic understanding of modern neural networks and their applications in computer vision and natural language understanding. The course starts with a recap of linear models and discussion of stochastic optimization methods that are crucial for training deep neural networks. Learners will study all popular building blocks of neural networks including fully connected layers, convolutional and recurrent layers.
Learners will use these building blocks to define complex modern architectures in TensorFlow and Keras frameworks. In the course project learner will implement deep neural network for the task of image captioning which solves the problem of giving a text description for an input image.
The prerequisites for this course are:
1) Basic knowledge of Python.
2) Basic linear algebra and probability.
Please note that this is an advanced course and we assume basic knowledge of machine learning. You should understand:
1) Linear regression: mean squared error, analytical solution.
2) Logistic regression: model, cross-entropy loss, class probability estimation.
3) Gradient descent for linear models. Derivatives of MSE and cross-entropy loss functions.
4) The problem of overfitting.
5) Regularization for linear models.
Do you have technical problems? Write to us: coursera@hse.ru

Ministrado por

Evgeny Sokolov

Andrei Zimovnov

Alexander Panin

Ekaterina Lobacheva

Nikita Kazeev

Transcrição

[MUSIC] Welcome back. We just studied about how Deep Learning works and how to train your [INAUDIBLE]. And with some luck, you've already made it through the practical assignments. So you basically know that deep learning can help you when one of your models just don't cut it. And you probably hope that the process will repeat itself along other [INAUDIBLE]. To a large extent that this is true, but today, let's talk about where this isn't and where deep learning is not. You know these things, some of these things. Now, let's talk about some of the limitations. Now, what deep learning is, the one thing deep learning is not is deep learning is not magic. It won't just solve all the problems for you. It won't be this silver bullet that you can just unpack from [INAUDIBLE] and hope that it gets much better than anything you tried previously for years. This is what a lot of people expect from neural network, but please don't, because it won't solve your problems for free. Instead, deep learning is just a practical field. It has its strengths, we'll talk about them in the second part. But it also has its weak points. And for the one thing, deep learning lacks this core theoretical understanding. It sounds like a lame accusation when you talk about the practical [INAUDIBLE] or absence if a theory isn't obviously preventing it from working. But the problem here is that when you try to build an architecture, develop something new for a model, an absence of a theoretical kernel that'll be able to explain stuff for you actually makes you do a lot more experimentation. It's the fact deep learning only offers you some [INAUDIBLE] like this works, this idea kind of applies everywhere where you have this situation and so on. But those intuitive kind of rules, they are not 100% accurate, and this is a problem if you want to develop something new. Not a problem is that turn complex dependencies, neural networks and deep learning models and journal have a lot of parameter. This not only means that they can capture complex dependencies in the data but they can also over feed tremendously. This means that for any problem, if you neural network, you generally need much larger dataset to train on, then you would if linear models or decision trees. Whenever you end up in some new area which is not some image classification or text processing, sometimes you'll find out that for practical reasons, it's better to use decision trees or even linear models. Now, finally, deep learning models are computationally heavy. And whenever you want your machine learning to run super fast or require as little memory as possible. So if you're running on smart phones or embedded systems, you'll generally have to do some, again, dark magic to make your neural network run as fast as you require. This isn't true for C-linear models, that apply almost instantaneously. There's one more disadvantage. It's kind of well, it's hard to fix a disadvantage. It has some strong points. But the deep learning is pathologically overhyped. But basically, machine learning, the super domain of deep learning, is overhyped as well. But deep learning is the most kind of advertised, the most hot topic within the most hot area of the mathematics, which is the machine learning. This is good because deep learning attracts a lot of talented researches and talented practitioners. But the problem is that since it's so hard, a lot of people expect wonders from it. So sometimes you'll find yourself, if you're trying to apply deep learning in business, find yourself in a company of people who don't understand deep learning. They believe that it's some super artificial intelligence big data blah blah, yada yada yada. It will get you top one position in the business and solve all your problems for you. So not only you should not expect deep learning to make wonders yourself, because wonders, as you know, require all the hard work. You also have to fight with other people who expect otherwise. Now all those arguments draw a rather grim picture what deep learning is, but there are a lot of positive sides of it as well. For one, you can think of deep learning as this kind of machine learning language, like any language. A language is a tool to express something. A national language is a tool which you can express at least to all the humans. And a programming language means to express what you want your computer to do in a way that the computer can execute it. And deep learning, in turn, is a language that allows you to hint your machine learning model about what you want this model to learn. Hint is about what kind of features you want it to have, and what kind of expert knowledge can be applied for on this dataset. Let's draw a few examples to prove this point. Let's say you have a usual classification problem. You have two sets of features. You have the raw low-level features, and the high-level features. And you want to predict some kind of target given this. This whole thing is being to sound a little abstract, so let's get to a concrete scenario. Say you want to make a regression on the price of a car, a second hand car, to be accurate. Have a, well, a photo of a car, and some high level features like a brand, some model, maybe a production dates, and some blemishes and enhancement installed in this car. What you want to do is you want to build a model that uses both of those feature types. And the simplest way to do so is to just concatenate them and feed them into a whole [INAUDIBLE] using neural network in your model, whatever. Of course you can do that, but the problem with this is approach is kind of inefficient. Now if we speak about neural networks, the resulting model would like this, for example. And the problem with this model, the main one, is that the first dense layer here, it tries to combine two worlds, two domains of features, and it tries to combine them linearly. So, what it does is it takes the age measured in years or months. And it's multiplied by some coefficient, and adds up with a pixel intensity. It's technically possible. I mean no one will punish you from doing so unless there's a physicist nearby. But it's kind of [INAUDIBLE] and it made practical application this architecture tends to work worse than it otherwise could. What you can do is you can save the following thing of this language. You can see that you want to view the representation for those raw features, which is as complex as those high-level ones. The way you can express this is by, well, taking more layers. Now, basically you have two branches of data. And for some amount of time, you persist them independently. You have those raw features and you apply dense layers, stacked maybe two, three dense layers, that only extract features from raw image pixels. And only then, once you've got those features like a presence of a blemish, or maybe a crack on the front glass, or anything, only then you combine those features with the high-level features you've got. Now, it makes slightly more sense, although it's not the perfect model. Generalists taking more layers to extract features is also in the more abstract kind of features. And if you stack enough layers, you'll eventually get features that are easy to combine there. So let's now consider a similar although a slightly different problem. This time, we're still solving car price regression. But we want to also infuse another [INAUDIBLE]. They say that through some kind of external information that we've got, we don't want our network to trust the image data too enthusiastically. For example, I might be unwilling to trust the car dealers that much. Let's say that some of their images have shown to be too optimistic, and showing a car in a condition better than the actual one. By default, our network does the exact opposite. It trusts the images too much because there is say 10,000 image pixels, 100 by 100% pixels. And there's only say, 100 attributes that are high-level features. So we want to do the opposite. You can of course achieve this by means of applying usual machine learning. Simply over-glorizing the raw features, the pixels, or maybe you're [INAUDIBLE] here. But in deep learning, you can also do this by means of architecture. In this case, we have introduced the thing called the bottle neck layer. This one layer, this one with 32 units, which is much smaller than any other layer. And it's the bottle neck, so any information that neural network takes from the image, it should go through this layer. It kind of limits the amount of useful features your model can get, and biases toward trusting raw image features less. This is of course not guaranteed. So, technically, if you feed your model for too long, it might just encode everything in this super-complex non-linear dependency and still get all the information through. But it's one way you can approach this problem. [MUSIC]