Abstract
This chapter discusses decision making under uncertainty. More specifically, it offers an overview of efficient Bayesian and distribution-free algorithms for making near-optimal sequential decisions under uncertainty about the environment. Due to the uncertainty, such algorithms must not only learn from their interaction with the environment but also perform as well as possible while learning is taking place.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R.: The continuum-armed bandit problem. SIAM Journal on Control and Optimization 33(6), 1926–1951 (1995)
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Research 3, 397–422 (2002)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)
Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)
Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 454–468. Springer, Heidelberg (2007)
Bellman, R., Kalaba, R.: A mathematical theory of adaptive control processes. Proceedings of the National Academy of Sciences of the United States of America 45(8), 1288–1290 (1959), http://www.jstor.org/stable/90152
Berger, J.: The case for objective Bayesian analysis. Bayesian Analysis 1(3), 385–402 (2006)
Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MPC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005); From 2005 CDC, Seville, Spain
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific, Belmont (2001)
Blumer, A., Ehrenfeuch, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis Dimension. Journal of the Association for Computing Machinery 36(4), 929–965 (1989)
Boender, C., Rinnooy Kan, A.: Bayesian stopping rules for multistart global optimization methods. Mathematical Programming 37(1), 59–80 (1987)
Castro, P.S., Precup, D.: Using linear programming for bayesian exploration in Markov decision processes. In: Veloso, M.M. (ed.) IJCAI, pp. 2437–2442 (2007)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning and Games. Cambridge University Press, Cambridge (2006)
Chen, M.H., Ibrahim, J.G., Yiannoutsos, C.: Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of the Royal Statistical Society (Series B): Statistical Methodology 61(1), 223–242 (1999)
Chernoff, H.: Sequential Models for Clinical Trials. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 805–812. Univ. of Calif. Press, Berkeley (1966)
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: UAI 2007, Proceedings of the 23rd Conference in Uncertainty in Artificial Intelligence, Vancouver, BC, Canada (2007)
Dearden, R., Friedman, N., Russell, S.J.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998), http://citeseer.ist.psu.edu/dearden98bayesian.html
DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons, Chichester (1970)
Dey, D., Muller, P., Sinha, D.: Practical nonparametric and semiparametric Bayesian statistics. Springer, Heidelberg (1998)
Dimitrakakis, C.: Nearly optimal exploration-exploitation decision thresholds. In: Int. Conf. on Artificial Neural Networks, ICANN (2006); IDIAP-RR 06-12
Dimitrakakis, C.: Tree exploration for Bayesian RL exploration. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, CIMCA 2008 (2008)
Dimitrakakis, C.: Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning. Tech. Rep. IAS-UVA-09-01, University of Amsterdam (2009)
Dimitrakakis, C., Lagoudakis, M.G.: Algorithms and bounds for rollout sampling approximate policy iteration. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 27–40. Springer, Heidelberg (2008), http://www.springerlink.com/content/93u40ux345651423
Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stopping and active learning. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 96–111. Springer, Heidelberg (2008)
Duff, M.O.: Optimal learning computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts at Amherst (2002)
Duff, M.O., Barto, A.G.: Local bandit approximation for optimal learning problems. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 1019. The MIT Press, Cambridge (1997), citeseer.ist.psu.edu/147419.html
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed and reinforcement learning problems. Journal of Machine Learning Research, 1079–1105 (2006)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Friedman, M., Savage, L.J.: The Utility Analysis of Choices Involving Risk. The Journal of Political Economy 56(4), 279 (1948)
Friedman, M., Savage, L.J.: The Expected-Utility Hypothesis and the Measurability of Utility. The Journal of Political Economy 60(6), 463 (1952)
Gittins, C.J.: Multi-armed Bandit Allocation Indices. John Wiley & Sons, New Jersey (1989)
Goldstein, M.: Subjective Bayesian analysis: Principles and practice. Bayesian Analysis 1(3), 403–420 (2006)
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Resesarch, 33–94 (2000)
Hoeffding, W.: Lower bounds for the expected sample size and the average risk of a sequential procedure. The Annals of Mathematical Statistics 31(2), 352–368 (1960), http://www.jstor.org/stable/2237951
Hren, J.F., Munos, R.: Optimistic planning of deterministic systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151–164. Springer, Heidelberg (2008)
Kall, P., Wallace, S.: Stochastic programming. Wiley, New York (1994)
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998), citeseer.ist.psu.edu/kearns98nearoptimal.html
Kelly, F.P.: Multi-armed bandits with discount factor near one: The bernoulli case. The Annals of Statistics 9(5), 987–1001 (1981)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Luce, R.D., Raiffa, H.: Games and Decisions. John Wiley and Sons, Chichester (1957); Republished by Dover in 1989
McCall, J.: The Economics of Information and Optimal Stopping Rules. Journal of Business 38(3), 300–317 (1965)
Moustakides, G.: Optimal stopping times for detecting changes in distributions. Annals of Statistics 14(4), 1379–1387 (1986)
Poupart, P., Vlassis, N.: Model-based bayesian reinforcement learning in partially observable domains. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press New York (2006)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey (1994/2005)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20. MIT Press, Cambridge (2008)
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. Journal of Artificial Intelligence Resesarch 32, 663–704 (2008)
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001), citeseer.ist.psu.edu/roy01toward.html
Savage, L.J.: The Foundations of Statistics. Dover Publications, New York (1972)
Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: Proceedigns of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005), pp. 542–547 (2005)
Stengel, R.F.: Optimal Control and Estimation, 2nd edn. Dover, New York (1994)
Talagrand, M.: A new look at independence. Annals of Probability 24(1), 1–34 (1996)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16(2), 264–280 (1971)
Wald, A.: Sequential Analysis. John Wiley & Sons, Chichester (1947); Republished by Dover in 2004
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML 2005, pp. 956–963. ACM, New York (2005)
Zhang, T.: From ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34(5), 2180–2210 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dimitrakakis, C. (2010). Efficient Methods for Near-Optimal Sequential Decision Making under Uncertainty. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-11688-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11687-2
Online ISBN: 978-3-642-11688-9
eBook Packages: EngineeringEngineering (R0)