Iterate Averaging as Regularization for Stochastic Gradient Descent

Abstract

We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

Related Material

@InProceedings{pmlr-v75-neu18a,
title = {Iterate Averaging as Regularization for Stochastic Gradient Descent},
author = {Neu, Gergely and Rosasco, Lorenzo},
booktitle = {Proceedings of the 31st Conference On Learning Theory},
pages = {3222--3242},
year = {2018},
editor = {Bubeck, S\'ebastien and Perchet, Vianney and Rigollet, Philippe},
volume = {75},
series = {Proceedings of Machine Learning Research},
address = {},
month = {06--09 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v75/neu18a/neu18a.pdf},
url = {http://proceedings.mlr.press/v75/neu18a.html},
abstract = {We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.}
}

%0 Conference Paper
%T Iterate Averaging as Regularization for Stochastic Gradient Descent
%A Gergely Neu
%A Lorenzo Rosasco
%B Proceedings of the 31st Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet
%F pmlr-v75-neu18a
%I PMLR
%J Proceedings of Machine Learning Research
%P 3222--3242
%U http://proceedings.mlr.press
%V 75
%W PMLR
%X We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.