Abstract

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic inexact oracle. Our method can be applied to problems with composite objective function, both deterministic and stochastic inexactness of the oracle, and allows using a non-Euclidean setup. We estimate the rate of convergence in terms of the expectation of the non-optimality gap and provide a way to control the probability of large deviations from this rate. Also we introduce two modifications of this method for strongly convex problems. For the first modification, we estimate the rate of convergence for the non-optimality gap expectation and, for the second, we provide a bound for the probability of large deviations from the rate of convergence in terms of the expectation of the non-optimality gap. All the rates lead to the complexity estimates for the proposed methods, which up to a multiplicative constant coincide with the lower complexity bound for the considered class of convex composite optimization problems with stochastic inexact oracle.

Mathematics Subject Classification

Notes

Acknowledgments

The research presented in Sect. 4 of this paper was conducted in IITP RAS and was supported by the Russian Science Foundation Grant (project 14-50-00150), and the research presented in other sections was supported by RFBR, research project No. 15-31-20571 mol_a_ved. Authors would like to thank professor Yurii Nesterov and professor Arkadi Nemirovski for useful discussions. Also we are grateful to two anonymous reviewers for their suggestions which helped to improve the text.

It remains to calculate the number of oracle calls to obtain an \(\varepsilon \)-solution in the sense that \({\mathbb E}\varphi (u_{N}) - \varphi ^* \le \varepsilon \). We perform N outer iterations (k runs from 0 to \(N-1\)), on each outer iteration k, we perform \(N_k\) inner iterations, and, on each inner iteration, we call the oracle \(m_k\) times. Hence, the total number of oracle calls is

Hence, the solution of the problem \(\min _{x \in Q_{k-1}} \varphi (x)\) is the same as the solution of the initial Problem (1). Let us denote \(D_{k-1} := \max _{x,y\in Q_{k-1}} \Vert x-y\Vert \). Clearly, \(D_{k-1} \le 2R_{k-1}\). Note that \( D_{k-1} = R_{k-1} \max _{x,y\in Q_{k-1}}\frac{\Vert x~-~y\Vert }{R_{k-1}}\) and the diameter of the set \(Q_{k-1}\) with respect to the norm \(\frac{\Vert \cdot \Vert }{R_{k-1}}\) is not greater than 2. We apply Theorems 3.2 and 3.4 with \(LR_{k-1}^2\) in the role of \(L,\frac{\sigma R_{k-1} }{\sqrt{m_{k-1}}}\) in the role of \(\sigma ,V\) in the role of R, 2 in the role of D, use (37) and make the same argument as in the proof of Lemma 4.1. This leads to the following inequality