Abstract
In this paper we study a special class of bandit problems, which are characterized by a unimodal structure of the expected rewards of the arms. In Section 1, the motivation for studying this problem is explained. In the next two sections, two different decision procedures are analyzed, which are based on a stochastic approximation of the best arm of the bandit. Finally, in Section 4, a special procedure is discussed and some numerical data are presented, which were obtained by applying it to a concreteN-armed bandit with unimodal structure.
Similar content being viewed by others
References
Bather, J.: Randomised allocation of treatments in sequential trials. Adv. Appl. Prob.12, 1980, 174–182.
Fabian, V.: Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist.38, 1967, 191–200.
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc.58, 1952, 527–535.
Wasan, M.T.: Stochastic Approximation. Cambridge 1969.
Wilde, D.J., andC.S. Beightler: Foundations of Optimization. Englewood Cliffs, NJ, 1967.
Witten, I.H.: The apparent conflict between estimation and control—A survey of the two-armed bandit problem. J. Franklin Instit.301, 1976, 161–189.
Author information
Authors and Affiliations
Additional information
Herrn Professor Dr. Walter Vogel zu seinem 60. Geburtstag am 22. Juni 1983 gewidmet
Research supported by the Deutsche Forschungsgemeinschaft, SFB 72.
Rights and permissions
About this article
Cite this article
Herkenrath, U. TheN-armed bandit with unimodal structure. Metrika 30, 195–210 (1983). https://doi.org/10.1007/BF02056924
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02056924