無用之用，方為大用

Menu ☰

Nesterov’s Accelerated Gradient Descent

In this lecture we consider the same setting than in the previous post (that is we want to minimize a smooth convex function over ). Previously we saw that the plain Gradient Descent algorithm has a rate of convergence of order after steps, while the lower bound that we proved is of order .

We present now a beautiful algorithm due to Nesterov, called Nesterov’s Accelerated Gradient Descent, which attains a rate of order . First we define the following sequences:

(Note that .) Now the algorithm is simply defined by the following equations, with an arbitrary initial point ,

In other words, Nesterov’s Accelerated Gradient Descent performs a simple step of gradient descent to go from to , and then it ‘slides’ a little bit further than in the direction given by the previous point .

The intuition behind the algorithm is quite difficult to grasp, and unfortunately the analysis will not be very enlightening either. Nonetheless Nesterov’s Accelerated Gradient is an optimal method (in terms of oracle complexity) for smooth convex optimization, as shown by the following theorem.