On Two Continuum Armed Bandit Problems in High Dimensions

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension k ≪ d and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

Keywords

Notes

Acknowledgments

The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.

(A preliminary version of this paper appeared in the proceedings of the 11th Workshop on Approximation and Online Algorithms (WAOA). This is a significantly expanded version including analysis for a generalization of the problem considered in the WAOA paper.)

with \(\mathbf {L_{1}} = [L_{1,1} {\dots } L_{1,m_{\Phi }}]\) and \(\mathbf {L_{2}} = [L_{2,1} {\dots } L_{2,m_{\Phi }}]\) so that N=L1−L2. We then have that ∥Φ∗(N)∥≤∥Φ∗(L1)∥+∥Φ∗(L2)∥. By using Lemma 1.1 of Candes et al. [14] and denoting \(m=\max \left \{d,m_{\mathcal X}\right \}\) we first have that:

holds with probability at least 1 − 2e−cm where \(c = \frac {\gamma ^{2}}{2} - 2\log 12\) and \(\gamma \;>\;2\sqrt {\log 12}\). This can be verified using the proof technique of Candes et al. [14, Lemma 1.1]. Care has to be taken of the fact that the entries of L1 are correlated as they are identical copies of the same Gaussian random variable \(\frac {1}{\epsilon }{\sum }_{j=1}^{m_{\mathcal X}} \eta _{j}\). Furthermore we also have that:

holds with probability at least 1−2e−cm with constants c,γ as defined earlier. This is again easily verifiable using the proof technique of Candes et al. [14, Lemma 1.1], as the entries of L2 are i.i.d Gaussian random variables. Combining (51) and (52) we then have that the following holds true with probability at least 1−4e−cm.

A.3 Proof of Lemma 5

Proof

Let τ denote the bound on \(\parallel {\widehat {\mathbf {X}}_{DS}^{(k)} - \mathbf {X}}\parallel _{F}\) as stated in Lemma 4. We now make use of a result by Tyagi et al. [41, Lemma 2]. This states that if \(\tau < \frac {\sqrt {(1-\rho )m_{\mathcal {X}}\alpha k}}{\sqrt {k}+\sqrt {2}}\) holds, then it implies that