Incorporating Momentum Into Neural Networks Learning

Post navigation

Newton’s cradle is the most popular example of momentumconservation. A lifted and released sphere strikes the stationary spheres and force is transmitted through the stationary spheres. This action pushes the last sphere upward. This shows that the last ball receives the momentum of the first ball. We would apply similar principle in neural networks to improve learning speed. The idea including momentum into neural networks learning is incorporating previous update in the current change.

Newton’s crandle

Gradient descent guarantees to reach the local minimum when iteration approaches to infinity. However, that is not applicable in reality. Gradient descent iterations have to be terminated by a reasonable value. Moreover, gradient descent converges slowly. Herein, momentum improves the performance of the gradient descent considerably. Thus, cost might converge faster with less iterations if momentum is involved in the weight update formula.

Furthermore, momentum changes the path you take to the local minimum. Standard gradient descent might get you stuck in local minimum but this point might be far away from the global minimum. Incorporating momentum might reach to global minimum.

In this way, we can improve the learning speed as demonstrated below. For instace, standard gradient descent reduces the cost to the value of 0.09 in 50th iteration whereas incorporating momentum decreases the cost same value in 26th iteration. This means that momentum speeds up the calculation almost 2 times faster. In other words, system is optimized almost 50%.

So, we’ve focused on momentum incorporation in weight update procedure in neural networks. Although, momentum incorporation is an optional add on, it is very common in real world applications. Because, this approach would improve convergence considerably.