Batched Bandit Problems

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Related Material

@InProceedings{pmlr-v40-Perchet15,
title = {Batched Bandit Problems},
author = {Vianney Perchet and Philippe Rigollet and Sylvain Chassang and Erik Snowberg},
booktitle = {Proceedings of The 28th Conference on Learning Theory},
pages = {1456--1456},
year = {2015},
editor = {Peter Grünwald and Elad Hazan and Satyen Kale},
volume = {40},
series = {Proceedings of Machine Learning Research},
address = {Paris, France},
month = {03--06 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v40/Perchet15.pdf},
url = {http://proceedings.mlr.press/v40/Perchet15.html},
abstract = {Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.}
}

%0 Conference Paper
%T Batched Bandit Problems
%A Vianney Perchet
%A Philippe Rigollet
%A Sylvain Chassang
%A Erik Snowberg
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale
%F pmlr-v40-Perchet15
%I PMLR
%J Proceedings of Machine Learning Research
%P 1456--1456
%U http://proceedings.mlr.press
%V 40
%W PMLR
%X Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.