Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix

Sebastien Arnold, Chunming Wang

Abstract:We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients will provide further information on the loss surface.

TL;DR:We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix, in the distributed regime.

OpenReview is created by the Information Extraction and Synthesis Laboratory, College of Information and Computer Science, University of Massachusetts Amherst. We gratefully acknowledge the support of the OpenReview sponsors: Google, Facebook, NSF, the University of Massachusetts Amherst Center for Data Science, and Center for Intelligent Information Retrieval, as well as the Google Cloud Platform for donating the computing and networking services on which OpenReview.net runs.

Send Feedback

Enter your feedback below and we'll get back to you as soon as possible.