Efficient Methods for Near-Optimal Sequential Decision Making under Uncertainty

Dimitrakakis, Christos

doi:10.1007/978-3-642-11688-9_5

Christos Dimitrakakis⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 281))

1512 Accesses

Abstract

This chapter discusses decision making under uncertainty. More specifically, it offers an overview of efficient Bayesian and distribution-free algorithms for making near-optimal sequential decisions under uncertainty about the environment. Due to the uncertainty, such algorithms must not only learn from their interaction with the environment but also perform as well as possible while learning is taking place.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R.: The continuum-armed bandit problem. SIAM Journal on Control and Optimization 33(6), 1926–1951 (1995)
Article MATH MathSciNet Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Research 3, 397–422 (2002)
Article MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)
Article MATH Google Scholar
Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)
Google Scholar
Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 454–468. Springer, Heidelberg (2007)
Chapter Google Scholar
Bellman, R., Kalaba, R.: A mathematical theory of adaptive control processes. Proceedings of the National Academy of Sciences of the United States of America 45(8), 1288–1290 (1959), http://www.jstor.org/stable/90152
Article MATH MathSciNet Google Scholar
Berger, J.: The case for objective Bayesian analysis. Bayesian Analysis 1(3), 385–402 (2006)
MathSciNet Google Scholar
Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MPC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005); From 2005 CDC, Seville, Spain
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific, Belmont (2001)
MATH Google Scholar
Blumer, A., Ehrenfeuch, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis Dimension. Journal of the Association for Computing Machinery 36(4), 929–965 (1989)
MATH MathSciNet Google Scholar
Boender, C., Rinnooy Kan, A.: Bayesian stopping rules for multistart global optimization methods. Mathematical Programming 37(1), 59–80 (1987)
Article MATH MathSciNet Google Scholar
Castro, P.S., Precup, D.: Using linear programming for bayesian exploration in Markov decision processes. In: Veloso, M.M. (ed.) IJCAI, pp. 2437–2442 (2007)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning and Games. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Chen, M.H., Ibrahim, J.G., Yiannoutsos, C.: Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of the Royal Statistical Society (Series B): Statistical Methodology 61(1), 223–242 (1999)
Article MATH MathSciNet Google Scholar
Chernoff, H.: Sequential Models for Clinical Trials. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 805–812. Univ. of Calif. Press, Berkeley (1966)
Google Scholar
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: UAI 2007, Proceedings of the 23rd Conference in Uncertainty in Artificial Intelligence, Vancouver, BC, Canada (2007)
Google Scholar
Dearden, R., Friedman, N., Russell, S.J.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998), http://citeseer.ist.psu.edu/dearden98bayesian.html
DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons, Chichester (1970)
MATH Google Scholar
Dey, D., Muller, P., Sinha, D.: Practical nonparametric and semiparametric Bayesian statistics. Springer, Heidelberg (1998)
MATH Google Scholar
Dimitrakakis, C.: Nearly optimal exploration-exploitation decision thresholds. In: Int. Conf. on Artificial Neural Networks, ICANN (2006); IDIAP-RR 06-12
Google Scholar
Dimitrakakis, C.: Tree exploration for Bayesian RL exploration. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, CIMCA 2008 (2008)
Google Scholar
Dimitrakakis, C.: Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning. Tech. Rep. IAS-UVA-09-01, University of Amsterdam (2009)
Google Scholar
Dimitrakakis, C., Lagoudakis, M.G.: Algorithms and bounds for rollout sampling approximate policy iteration. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 27–40. Springer, Heidelberg (2008), http://www.springerlink.com/content/93u40ux345651423
Chapter Google Scholar
Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stopping and active learning. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 96–111. Springer, Heidelberg (2008)
Chapter Google Scholar
Duff, M.O.: Optimal learning computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts at Amherst (2002)
Google Scholar
Duff, M.O., Barto, A.G.: Local bandit approximation for optimal learning problems. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 1019. The MIT Press, Cambridge (1997), citeseer.ist.psu.edu/147419.html
Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed and reinforcement learning problems. Journal of Machine Learning Research, 1079–1105 (2006)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Friedman, M., Savage, L.J.: The Utility Analysis of Choices Involving Risk. The Journal of Political Economy 56(4), 279 (1948)
Article Google Scholar
Friedman, M., Savage, L.J.: The Expected-Utility Hypothesis and the Measurability of Utility. The Journal of Political Economy 60(6), 463 (1952)
Article Google Scholar
Gittins, C.J.: Multi-armed Bandit Allocation Indices. John Wiley & Sons, New Jersey (1989)
MATH Google Scholar
Goldstein, M.: Subjective Bayesian analysis: Principles and practice. Bayesian Analysis 1(3), 403–420 (2006)
MathSciNet Google Scholar
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Resesarch, 33–94 (2000)
Google Scholar
Hoeffding, W.: Lower bounds for the expected sample size and the average risk of a sequential procedure. The Annals of Mathematical Statistics 31(2), 352–368 (1960), http://www.jstor.org/stable/2237951
Article MATH MathSciNet Google Scholar
Hren, J.F., Munos, R.: Optimistic planning of deterministic systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151–164. Springer, Heidelberg (2008)
Chapter Google Scholar
Kall, P., Wallace, S.: Stochastic programming. Wiley, New York (1994)
MATH Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998), citeseer.ist.psu.edu/kearns98nearoptimal.html
Google Scholar
Kelly, F.P.: Multi-armed bandits with discount factor near one: The bernoulli case. The Annals of Statistics 9(5), 987–1001 (1981)
Article MATH MathSciNet Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Luce, R.D., Raiffa, H.: Games and Decisions. John Wiley and Sons, Chichester (1957); Republished by Dover in 1989
MATH Google Scholar
McCall, J.: The Economics of Information and Optimal Stopping Rules. Journal of Business 38(3), 300–317 (1965)
Article Google Scholar
Moustakides, G.: Optimal stopping times for detecting changes in distributions. Annals of Statistics 14(4), 1379–1387 (1986)
Article MATH MathSciNet Google Scholar
Poupart, P., Vlassis, N.: Model-based bayesian reinforcement learning in partially observable domains. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)
Google Scholar
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press New York (2006)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey (1994/2005)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20. MIT Press, Cambridge (2008)
Google Scholar
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. Journal of Artificial Intelligence Resesarch 32, 663–704 (2008)
MATH MathSciNet Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001), citeseer.ist.psu.edu/roy01toward.html
Google Scholar
Savage, L.J.: The Foundations of Statistics. Dover Publications, New York (1972)
MATH Google Scholar
Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: Proceedigns of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005), pp. 542–547 (2005)
Google Scholar
Stengel, R.F.: Optimal Control and Estimation, 2nd edn. Dover, New York (1994)
MATH Google Scholar
Talagrand, M.: A new look at independence. Annals of Probability 24(1), 1–34 (1996)
Article MATH MathSciNet Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)
MATH Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16(2), 264–280 (1971)
Article MATH MathSciNet Google Scholar
Wald, A.: Sequential Analysis. John Wiley & Sons, Chichester (1947); Republished by Dover in 2004
MATH Google Scholar
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML 2005, pp. 956–963. ACM, New York (2005)
Chapter Google Scholar
Zhang, T.: From ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34(5), 2180–2210 (2006)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Science Park 107, 1098 XG, Amsterdam, The Netherlands
Christos Dimitrakakis

Authors

Christos Dimitrakakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628, Delft, CD, The Netherlands
Robert Babuška
Faculty of Science, Informatics Institute, Science Park 107, 1098, Amsterdam, XG, The Netherlands
Frans C. A. Groen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dimitrakakis, C. (2010). Efficient Methods for Near-Optimal Sequential Decision Making under Uncertainty. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-11688-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11687-2
Online ISBN: 978-3-642-11688-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics