Skip to main content

Efficient Methods for Near-Optimal Sequential Decision Making under Uncertainty

  • Chapter
Interactive Collaborative Information Systems

Part of the book series: Studies in Computational Intelligence ((SCI,volume 281))

  • 1512 Accesses

Abstract

This chapter discusses decision making under uncertainty. More specifically, it offers an overview of efficient Bayesian and distribution-free algorithms for making near-optimal sequential decisions under uncertainty about the environment. Due to the uncertainty, such algorithms must not only learn from their interaction with the environment but also perform as well as possible while learning is taking place.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R.: The continuum-armed bandit problem. SIAM Journal on Control and Optimization 33(6), 1926–1951 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  2. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Research 3, 397–422 (2002)

    Article  MathSciNet  Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)

    Article  MATH  Google Scholar 

  4. Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Proceedings of NIPS 2008 (2008)

    Google Scholar 

  5. Auer, P., Ortner, R., Szepesvari, C.: Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 454–468. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Bellman, R., Kalaba, R.: A mathematical theory of adaptive control processes. Proceedings of the National Academy of Sciences of the United States of America 45(8), 1288–1290 (1959), http://www.jstor.org/stable/90152

    Article  MATH  MathSciNet  Google Scholar 

  7. Berger, J.: The case for objective Bayesian analysis. Bayesian Analysis 1(3), 385–402 (2006)

    MathSciNet  Google Scholar 

  8. Bertsekas, D.: Dynamic programming and suboptimal control: From ADP to MPC. Fundamental Issues in Control, European Journal of Control 11(4-5) (2005); From 2005 CDC, Seville, Spain

    Google Scholar 

  9. Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific, Belmont (2001)

    MATH  Google Scholar 

  10. Blumer, A., Ehrenfeuch, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis Dimension. Journal of the Association for Computing Machinery 36(4), 929–965 (1989)

    MATH  MathSciNet  Google Scholar 

  11. Boender, C., Rinnooy Kan, A.: Bayesian stopping rules for multistart global optimization methods. Mathematical Programming 37(1), 59–80 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  12. Castro, P.S., Precup, D.: Using linear programming for bayesian exploration in Markov decision processes. In: Veloso, M.M. (ed.) IJCAI, pp. 2437–2442 (2007)

    Google Scholar 

  13. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning and Games. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  14. Chen, M.H., Ibrahim, J.G., Yiannoutsos, C.: Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of the Royal Statistical Society (Series B): Statistical Methodology 61(1), 223–242 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  15. Chernoff, H.: Sequential Models for Clinical Trials. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 805–812. Univ. of Calif. Press, Berkeley (1966)

    Google Scholar 

  16. Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: UAI 2007, Proceedings of the 23rd Conference in Uncertainty in Artificial Intelligence, Vancouver, BC, Canada (2007)

    Google Scholar 

  17. Dearden, R., Friedman, N., Russell, S.J.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998), http://citeseer.ist.psu.edu/dearden98bayesian.html

  18. DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons, Chichester (1970)

    MATH  Google Scholar 

  19. Dey, D., Muller, P., Sinha, D.: Practical nonparametric and semiparametric Bayesian statistics. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  20. Dimitrakakis, C.: Nearly optimal exploration-exploitation decision thresholds. In: Int. Conf. on Artificial Neural Networks, ICANN (2006); IDIAP-RR 06-12

    Google Scholar 

  21. Dimitrakakis, C.: Tree exploration for Bayesian RL exploration. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, CIMCA 2008 (2008)

    Google Scholar 

  22. Dimitrakakis, C.: Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning. Tech. Rep. IAS-UVA-09-01, University of Amsterdam (2009)

    Google Scholar 

  23. Dimitrakakis, C., Lagoudakis, M.G.: Algorithms and bounds for rollout sampling approximate policy iteration. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 27–40. Springer, Heidelberg (2008), http://www.springerlink.com/content/93u40ux345651423

    Chapter  Google Scholar 

  24. Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stopping and active learning. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 96–111. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Duff, M.O.: Optimal learning computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts at Amherst (2002)

    Google Scholar 

  26. Duff, M.O., Barto, A.G.: Local bandit approximation for optimal learning problems. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 1019. The MIT Press, Cambridge (1997), citeseer.ist.psu.edu/147419.html

    Google Scholar 

  27. Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed and reinforcement learning problems. Journal of Machine Learning Research, 1079–1105 (2006)

    Google Scholar 

  28. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  29. Friedman, M., Savage, L.J.: The Utility Analysis of Choices Involving Risk. The Journal of Political Economy 56(4), 279 (1948)

    Article  Google Scholar 

  30. Friedman, M., Savage, L.J.: The Expected-Utility Hypothesis and the Measurability of Utility. The Journal of Political Economy 60(6), 463 (1952)

    Article  Google Scholar 

  31. Gittins, C.J.: Multi-armed Bandit Allocation Indices. John Wiley & Sons, New Jersey (1989)

    MATH  Google Scholar 

  32. Goldstein, M.: Subjective Bayesian analysis: Principles and practice. Bayesian Analysis 1(3), 403–420 (2006)

    MathSciNet  Google Scholar 

  33. Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Resesarch, 33–94 (2000)

    Google Scholar 

  34. Hoeffding, W.: Lower bounds for the expected sample size and the average risk of a sequential procedure. The Annals of Mathematical Statistics 31(2), 352–368 (1960), http://www.jstor.org/stable/2237951

    Article  MATH  MathSciNet  Google Scholar 

  35. Hren, J.F., Munos, R.: Optimistic planning of deterministic systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151–164. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  36. Kall, P., Wallace, S.: Stochastic programming. Wiley, New York (1994)

    MATH  Google Scholar 

  37. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proc. 15th International Conf. on Machine Learning, pp. 260–268. Morgan Kaufmann, San Francisco (1998), citeseer.ist.psu.edu/kearns98nearoptimal.html

    Google Scholar 

  38. Kelly, F.P.: Multi-armed bandits with discount factor near one: The bernoulli case. The Annals of Statistics 9(5), 987–1001 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  39. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  40. Luce, R.D., Raiffa, H.: Games and Decisions. John Wiley and Sons, Chichester (1957); Republished by Dover in 1989

    MATH  Google Scholar 

  41. McCall, J.: The Economics of Information and Optimal Stopping Rules. Journal of Business 38(3), 300–317 (1965)

    Article  Google Scholar 

  42. Moustakides, G.: Optimal stopping times for detecting changes in distributions. Annals of Statistics 14(4), 1379–1387 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  43. Poupart, P., Vlassis, N.: Model-based bayesian reinforcement learning in partially observable domains. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)

    Google Scholar 

  44. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press New York (2006)

    Google Scholar 

  45. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey (1994/2005)

    Google Scholar 

  46. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  47. Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20. MIT Press, Cambridge (2008)

    Google Scholar 

  48. Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. Journal of Artificial Intelligence Resesarch 32, 663–704 (2008)

    MATH  MathSciNet  Google Scholar 

  49. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. 18th International Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001), citeseer.ist.psu.edu/roy01toward.html

    Google Scholar 

  50. Savage, L.J.: The Foundations of Statistics. Dover Publications, New York (1972)

    MATH  Google Scholar 

  51. Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: Proceedigns of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005), pp. 542–547 (2005)

    Google Scholar 

  52. Stengel, R.F.: Optimal Control and Estimation, 2nd edn. Dover, New York (1994)

    MATH  Google Scholar 

  53. Talagrand, M.: A new look at independence. Annals of Probability 24(1), 1–34 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  54. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  55. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16(2), 264–280 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  56. Wald, A.: Sequential Analysis. John Wiley & Sons, Chichester (1947); Republished by Dover in 2004

    MATH  Google Scholar 

  57. Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML 2005, pp. 956–963. ACM, New York (2005)

    Chapter  Google Scholar 

  58. Zhang, T.: From ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34(5), 2180–2210 (2006)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dimitrakakis, C. (2010). Efficient Methods for Near-Optimal Sequential Decision Making under Uncertainty. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11688-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11687-2

  • Online ISBN: 978-3-642-11688-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics