Abstract
In the last decades, a myriad of approaches to the multi-armed bandit problem have appeared in several different fields. The current top performing algorithms from the field of Learning Automata reside in the Pursuit family, while UCB-Tuned and the ε-greedy class of algorithms can be seen as state-of-the-art regret minimizing algorithms. Recently, however, the Bayesian Learning Automaton (BLA) outperformed all of these, and other schemes, in a wide range of experiments. Although seemingly incompatible, in this paper we integrate the foundational learning principles motivating the design of the BLA, with the principles of the so-called Generalized Pursuit algorithm (GPST), leading to the Generalized Bayesian Pursuit algorithm (GBPST). As in the BLA, the estimates are truly Bayesian in nature, however, instead of basing exploration upon direct sampling from the estimates, GBPST explores by means of the arm selection probability vector of GPST. Further, as in the GPST, in the interest of higher rates of learning, a set of arms that are currently perceived as being optimal is pursued to minimize the probability of pursuing a wrong arm. It turns out that GBPST is superior to GPST and that it even performs better than the BLA by controlling the learning speed of GBPST. We thus believe that GBPST constitutes a new avenue of research, in which the performance benefits of the GPST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.
Chapter PDF
Similar content being viewed by others
Keywords
References
Gittins, J.C.: Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological) 41(2), 148–177 (1979)
Gittins, J.C., Jones, D.M.: A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3), 561–565 (1979)
Whittle, P.: Multi-armed bandits and the gittins index. Journal of the Royal Statistical Society. Series B (Methodological) 42(2), 143–149 (1980)
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multiarmed bandit problem. IEEE Trans. Autom. Control 30, 426–439 (1985)
Katehakis, M., Veinott, A.: The multi-armed bandit problem: decomposition and computation. Math. Oper. Res. 12(2), 262–268 (1987)
Sonin, I.: A generalized gittins index for a markov chain and its recursive calculation. Statistics and Probability Letters 78, 1526–1533 (2008)
Nino-Mora, J.: A (2/3)n3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS Journal of Computing 19(4), 596–606 (2007)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis. Cambridge University (1989)
Cesa-Bianchi, N., Fischer, P.: Finite-time regret bounds for the multiarmed bandit problem. In: ICML1998, Madison, Wisconsin, USA, pp. 100–108 (1998)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)
Tokic, M.: Adaptive ε-greedy exploration in reinforcement learning based on value differences. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS, vol. 6359, pp. 203–210. Springer, Heidelberg (2010)
Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)
Kaelbling, L.P.: Learning in embedded systems. PhD thesis, Stanford University (1993)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversial multi-armed bandit problem. In: the 36th Annual Symposium on Foundations of Computer Science (FOCS 1995), Milwaukee, Wisconsin, pp. 322–331 (1995)
Narendra, K.S., Thathachar, M.A.L.: Learning Automat: An Introduction. Prentice Hall, Englewood Cliffs (1989)
Thathachar, M., Sastry, P.: Estimator algorithms for learning automata. In: The Platinum Jubilee Conference on Systems and Signal Processing, Bangalore, India, pp. 29–32 (1986)
Oommen, B., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 31(3), 277–287 (2001)
Norheim, T., Bradland, T., Granmo, O.C., Oommen, B.J.: A generic solution to multi-armed bernoulli bandit problems based on random sampling from sibling conjugate priors. In: Filipe, J., Fred, A., Sharp, B. (eds.) ICAART 2010. CCIS, vol. 129, pp. 36–44. Springer, Heidelberg (2011)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Wyatt, J.: Exploration and inference in learning from reinforcement. PhD thesis, University of Edinburgh (1997)
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: The 15th National Conf. on Artificial Intelligence, Madison, Wisconsin, pp. 761–768 (1998)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Granmo, O.: Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics 3(2), 207–234 (2010)
Granmo, O.C., Berg, S.: Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS, vol. 6098, pp. 199–208. Springer, Heidelberg (2010)
Agache, M., Oommen, B.J.: Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 32(6), 738–749 (2002)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhang, X., Oommen, B.J., Granmo, OC. (2011). Generalized Bayesian Pursuit: A Novel Scheme for Multi-Armed Bernoulli Bandit Problems. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds) Artificial Intelligence Applications and Innovations. EANN AIAI 2011 2011. IFIP Advances in Information and Communication Technology, vol 364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23960-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-23960-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23959-5
Online ISBN: 978-3-642-23960-1
eBook Packages: Computer ScienceComputer Science (R0)