First published: 2016/09/15 (11 months ago)Abstract: Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.