Epsilon Best Arm Identification in Spectral Bandits

Epsilon Best Arm Identification in Spectral Bandits

Tomáš Kocák, Aurélien Garivier

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 2636-2642. https://doi.org/10.24963/ijcai.2021/363

We propose an analysis of Probably Approximately Correct (PAC) identification of an ϵ-best arm in graph bandit models with Gaussian distributions. We consider finite but potentially very large bandit models where the set of arms is endowed with a graph structure, and we assume that the arms' expectations μ are smooth with respect to this graph. Our goal is to identify an arm whose expectation is at most ϵ below the largest of all means. We focus on the fixed-confidence setting: given a risk parameter δ, we consider sequential strategies that yield an ϵ-optimal arm with probability at least 1-δ. All such strategies use at least T*(μ)log(1/δ) samples, where R is the smoothness parameter. We identify the complexity term T*(μ) as the solution of a min-max problem for which we give a game-theoretic analysis and an approximation procedure. This procedure is the key element required by the asymptotically optimal Track-and-Stop strategy.
Keywords:
Machine Learning: Learning Theory
Machine Learning: Online Learning