Epsilon Best Arm Identification in Spectral Bandits

Tomáš Kocák; Aurélien Garivier

doi:10.24963/ijcai.2021/363

Epsilon Best Arm Identification in Spectral Bandits

Tomáš Kocák, Aurélien Garivier

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Main Track. Pages 2636-2642. https://doi.org/10.24963/ijcai.2021/363

PDF BibTeX

We propose an analysis of Probably Approximately Correct (PAC) identification of an ϵ-best arm in graph bandit models with Gaussian distributions. We consider finite but potentially very large bandit models where the set of arms is endowed with a graph structure, and we assume that the arms' expectations μ are smooth with respect to this graph. Our goal is to identify an arm whose expectation is at most ϵ below the largest of all means. We focus on the fixed-confidence setting: given a risk parameter δ, we consider sequential strategies that yield an ϵ-optimal arm with probability at least 1-δ. All such strategies use at least T*(μ)log(1/δ) samples, where R is the smoothness parameter. We identify the complexity term T*(μ) as the solution of a min-max problem for which we give a game-theoretic analysis and an approximation procedure. This procedure is the key element required by the asymptotically optimal Track-and-Stop strategy.

Keywords:

Machine Learning: Learning Theory

Machine Learning: Online Learning