Skip to main content
Log in

Boundary Crossing Probabilities for General Exponential Families

  • Published:
Mathematical Methods of Statistics Aims and scope Submit manuscript

Abstract

We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that Bψ(θ̂ n , θ*) ≥ f(t/n)/n, where θ* is the parameter of an unknown target distribution, θ̂ n is the empirical parameter estimate built from n observations, ψ is the log-partition function of the exponential family and Bψ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function f is logarithmic, as it is enables to analyze the regret of the state-of-the-art KL-ucb and KL-ucb+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when K = 1, while we provide results for arbitrary finite dimension K, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T. L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rajeev Agrawal, “Sample Mean Based Index Policies by o(log n) Regret for the Multi-Armed Bandit Problem”, Adv. in Appl. Probab. 27 (04), 1054–1078 (1995).

    Article  MathSciNet  MATH  Google Scholar 

  2. J.-Y. Audibert, R. Munos, and Cs. Szepesvári, “Exploration-Exploitation Trade-Off Using Variance Estimates in Multi-Armed Bandits”, Theoret. Comp. Sci. 410 (19), (2009).

    Google Scholar 

  3. J.-Y. Audibert and S. Bubeck, “Regret Bounds andMinimax Policies under PartialMonitoring”, J. Machine Learning Res. 11, 2635–2686 (2010).

    MathSciNet  Google Scholar 

  4. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-Time Analysis of the Multiarmed Bandit Problem”, Machine Learning 47 (2), 235–256 (2002).

    Article  MATH  Google Scholar 

  5. L. M. Bregman, “The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming”, USSR Comput. Math. and Math. Phys. (Elsevier) 7 (3), 200–217 (1967).

    Article  MathSciNet  MATH  Google Scholar 

  6. S. Bubeck, N. Cesa-Bianchi, et al., “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems”, Foundations and Trends ® in Machine Learning 5 (1), 1–122 (2012).

    Article  MATH  Google Scholar 

  7. A. N. Burnetas and M. N. Katehakis, “Optimal Adaptive Policies for Markov Decision Processes”, in Mathematics of Operations Research (1997), pp. 222–255.

    Google Scholar 

  8. O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz, “Kullback–Leibler Upper Confidence Bounds for Optimal Sequential Allocation”, Ann. Statist. 41 (3), 1516–1541 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  9. Y. S. Chow and H. Teicher, Probability Theory, 2nd. ed. (Springer, 1988).

    Book  MATH  Google Scholar 

  10. I. H. Dinwoodie, “Mesures dominantes et théoreme de sanov”, in Annales de l’IHP Probabilités et statistiques (1992), Vol. 28, pp. 365–373.

    MathSciNet  MATH  Google Scholar 

  11. A. Garivier, P. Ménard, and G. Stoltz, Explore first, Exploit Next: The True Shape of Regret in Bandit Problems arXiv preprint arXiv:1602.07182 (2016).

    Google Scholar 

  12. J. C. Gittins, “Bandit Processes and Dynamic Allocation Indices”, J. Roy. Statist. Soc., Ser. B 41 (2), 148–177 (1979).

    MathSciNet  MATH  Google Scholar 

  13. J. Honda and A. Takemura, “An Asymptotically Optimal Bandit Algorithm for Bounded SupportModels”, in Conf. Comput. Learning Theory, Ed. by T. Kalai and M. Mohri (Haifa, Israel, 2010).

    Google Scholar 

  14. T. L. Lai and H. Robbins, “Asymptotically Efficient Adaptive Allocation Rules”, Advances in Appl. Math. 6 (1), 4–22 (1985).

    Article  MathSciNet  MATH  Google Scholar 

  15. T. L. Lai, “Adaptive Treatment Allocation and the Multi-Armed Bandit Problem”, Ann. Statist, 1091–1114 (1987).

    Google Scholar 

  16. T. L. Lai, “Boundary Crossing Problems for SampleMeans”, Ann. Probab., 375–396 (1988).

    Google Scholar 

  17. O.-A. Maillard, R. Munos, and G. Stoltz, “A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback–LeiblerDivergences”, in Proc. 24th ConferenceOn Learning Theory (Budapest,Hungary), 497–514 (2011).

    Google Scholar 

  18. H. Robbins, “Some Aspects of the Sequential Design of Experiments”, Bull.Amer. Math. Soc. 58 (5), 527–535 (1952).

    Article  MathSciNet  MATH  Google Scholar 

  19. H. Robbins, Herbert Robbins Selected Papers (Springer, 2012).

    Google Scholar 

  20. W. R. Thompson, “On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples”, Biometrika 25 (3/4), 285–294 (1933).

    Article  MATH  Google Scholar 

  21. W. R. Thompson, “On a Criterion for the Rejection of Observations and the Distribution of the Ratio of Deviation to Sample Standard Deviation”, Ann.Math. Statist. 6 (4), 214–219 (1935).

    Article  MATH  Google Scholar 

  22. A. Wald, “Sequential Tests of Statistical Hypotheses”, Ann.Math. Statist. 16 (2), 117–186 (1945).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O.-A. Maillard.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maillard, OA. Boundary Crossing Probabilities for General Exponential Families. Math. Meth. Stat. 27, 1–31 (2018). https://doi.org/10.3103/S1066530718010015

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1066530718010015

Keywords

2000 Mathematics Subject Classification

Navigation