Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

Zhang, Xuan; Oommen, B. John; Granmo, Ole-Christoffer

doi:10.1007/978-3-642-23960-1_16

Xuan Zhang⁴,
B. John Oommen^5,4 &
Ole-Christoffer Granmo⁴

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 364))

Included in the following conference series:

1405 Accesses

Abstract

In the last decades, a myriad of approaches to the multi-armed bandit problem have appeared in several different fields. The current top performing algorithms from the field of Learning Automata reside in the Pursuit family, while UCB-Tuned and the ε-greedy class of algorithms can be seen as state-of-the-art regret minimizing algorithms. Recently, however, the Bayesian Learning Automaton (BLA) outperformed all of these, and other schemes, in a wide range of experiments. Although seemingly incompatible, in this paper we integrate the foundational learning principles motivating the design of the BLA, with the principles of the so-called Generalized Pursuit algorithm (GPST), leading to the Generalized Bayesian Pursuit algorithm (GBPST). As in the BLA, the estimates are truly Bayesian in nature, however, instead of basing exploration upon direct sampling from the estimates, GBPST explores by means of the arm selection probability vector of GPST. Further, as in the GPST, in the interest of higher rates of learning, a set of arms that are currently perceived as being optimal is pursued to minimize the probability of pursuing a wrong arm. It turns out that GBPST is superior to GPST and that it even performs better than the BLA by controlling the learning speed of GBPST. We thus believe that GBPST constitutes a new avenue of research, in which the performance benefits of the GPST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.

Download to read the full chapter text

Chapter PDF

Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case

Sub-sampling for Multi-armed Bandits

An asymptotically optimal strategy for constrained multi-armed bandit problems

Article 02 January 2020

Hyeong Soo Chang

Keywords

References

Gittins, J.C.: Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological) 41(2), 148–177 (1979)
MathSciNet MATH Google Scholar
Gittins, J.C., Jones, D.M.: A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3), 561–565 (1979)
Article Google Scholar
Whittle, P.: Multi-armed bandits and the gittins index. Journal of the Royal Statistical Society. Series B (Methodological) 42(2), 143–149 (1980)
MathSciNet MATH Google Scholar
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multiarmed bandit problem. IEEE Trans. Autom. Control 30, 426–439 (1985)
Article MathSciNet MATH Google Scholar
Katehakis, M., Veinott, A.: The multi-armed bandit problem: decomposition and computation. Math. Oper. Res. 12(2), 262–268 (1987)
Article MathSciNet MATH Google Scholar
Sonin, I.: A generalized gittins index for a markov chain and its recursive calculation. Statistics and Probability Letters 78, 1526–1533 (2008)
Article MathSciNet MATH Google Scholar
Nino-Mora, J.: A (2/3)n3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS Journal of Computing 19(4), 596–606 (2007)
Article MathSciNet Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis. Cambridge University (1989)
Google Scholar
Cesa-Bianchi, N., Fischer, P.: Finite-time regret bounds for the multiarmed bandit problem. In: ICML1998, Madison, Wisconsin, USA, pp. 100–108 (1998)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)
Article MATH Google Scholar
Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203–210. Springer, Heidelberg (2010)
Chapter Google Scholar
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)
Chapter Google Scholar
Kaelbling, L.P.: Learning in embedded systems. PhD thesis, Stanford University (1993)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversial multi-armed bandit problem. In: the 36th Annual Symposium on Foundations of Computer Science (FOCS 1995), Milwaukee, Wisconsin, pp. 322–331 (1995)
Google Scholar
Narendra, K.S., Thathachar, M.A.L.: Learning Automat: An Introduction. Prentice Hall, Englewood Cliffs (1989)
Google Scholar
Thathachar, M., Sastry, P.: Estimator algorithms for learning automata. In: The Platinum Jubilee Conference on Systems and Signal Processing, Bangalore, India, pp. 29–32 (1986)
Google Scholar
Oommen, B., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 31(3), 277–287 (2001)
Article Google Scholar
Norheim, T., Bradland, T., Granmo, O.C., Oommen, B.J.: A generic solution to multi-armed bernoulli bandit problems based on random sampling from sibling conjugate priors. In: Filipe, J., Fred, A., Sharp, B. (eds.) ICAART 2010. CCIS, vol. 129, pp. 36–44. Springer, Heidelberg (2011)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Google Scholar
Wyatt, J.: Exploration and inference in learning from reinforcement. PhD thesis, University of Edinburgh (1997)
Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: The 15th National Conf. on Artificial Intelligence, Madison, Wisconsin, pp. 761–768 (1998)
Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
MATH Google Scholar
Granmo, O.: Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics 3(2), 207–234 (2010)
Article MathSciNet MATH Google Scholar
Granmo, O.C., Berg, S.: Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS, vol. 6098, pp. 199–208. Springer, Heidelberg (2010)
Chapter Google Scholar
Agache, M., Oommen, B.J.: Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 32(6), 738–749 (2002)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of ICT, University of Agder, Grimstad, Norway
Xuan Zhang, B. John Oommen & Ole-Christoffer Granmo
School of Computer Science, Carleton University, Ottawa, Canada
B. John Oommen

Authors

Xuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
B. John Oommen
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Democritus University of Thrace, 68200 N., Orestiada, Greece
Lazaros Iliadis
University of Central Greece, 35100, Lamia, Greece
Ilias Maglogiannis
Frederick University, 1036, Nicosia, Cyprus
Harris Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Oommen, B.J., Granmo, OC. (2011). Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds) Artificial Intelligence Applications and Innovations. EANN AIAI 2011 2011. IFIP Advances in Information and Communication Technology, vol 364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23960-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-23960-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23959-5
Online ISBN: 978-3-642-23960-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

Abstract

Chapter PDF

Similar content being viewed by others

Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case

Sub-sampling for Multi-armed Bandits

An asymptotically optimal strategy for constrained multi-armed bandit problems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems

Abstract

Chapter PDF

Similar content being viewed by others

Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case

Sub-sampling for Multi-armed Bandits

An asymptotically optimal strategy for constrained multi-armed bandit problems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation