The Relative Entropy Bound for Squint

Wouter M. Koolen

2015-08-13

Introduction

I want to talk about Squint, a new prediction strategy Tim van Erven and I recently discovered (see Koolen and Van Erven (2015)). Squint operates in the following protocol, called the Hedge setting. Prediction proceeds in rounds. In round \(t\) the learner plays a probability vector \({\boldsymbol{w}}_t\) on experts \(k \in \{1, \ldots, K\}\). Then the adversary reveals the vector \({\boldsymbol{{\ell}}}_t \in [0,1]^K\) of expert losses and the learner incurs the dot loss \({\boldsymbol{w}}_t^{\intercal}{\boldsymbol{{\ell}}}_t\).

Now what happens if we do not want to compare to distributions of the form \(\pi(k|{\mathcal K})\) but to a general distribution \(\rho(k)\)? Intuitively, using abbreviations \(R_T^\rho = \operatorname*{\mathbb E}_{\rho(k)} [R_T^k]\) and \(V_T^\rho = \operatorname*{\mathbb E}_{\rho(k)} [ V_T^k]\), we should be able to prove a relative entropy regret bound\begin{equation}\label{eq:newbd}
R_T^\rho ~\preceq~
\sqrt{V_T^\rho {\left({\operatorname{KL}}{\left(\rho\middle\|\pi\right)} + \ln \ln T\right)}}
.
\end{equation} Now \(\eqref{eq:newbd}\) is a proper generalisation of \(\eqref{eq:orig}\), as applying it to \(\rho(k) = \pi(k|{\mathcal K})\) gives \[{\operatorname{KL}}{\left(\rho\middle\|\pi\right)}
~=~
\operatorname*{\mathbb E}_{\pi(k|{\mathcal K})} {\left[\ln \frac{\pi(k|{\mathcal K})}{\pi(k)}\right]}
~=~
- \ln \pi({\mathcal K})\] When we wrote the paper, we restricted to bounds of the form \(\eqref{eq:orig}\) for simplicity. In related work, Luo and Schapire (2015) give relative entropy bounds for a similar algorithm. Now several people have asked us whether bounds \(\eqref{eq:newbd}\) actually hold for Squint. They do hold, for unmodified Squint, without any additional effort. In this post I will show how to derive them.