Skip to main content
Log in

Risk-sensitive reinforcement learning algorithms with generalized average criterion

  • Published:
Applied Mathematics and Mechanics Aims and scope Submit manuscript

Abstract

A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton R S. Learning to predict by the method of temporal difference[J]. Machine Learning, 1988, 3(1):9–44.

    Google Scholar 

  2. Sutton R S. Open theoretical questions in reinforcement learning[C]. In: Proc of Euro-COLT’99 (Computational Learning Theory). Cambridge, MA: MIT Press, 1999, 11–17.

    Google Scholar 

  3. Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. MA: MIT Press, 1998, 20–300.

    Google Scholar 

  4. Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(13):279–292.

    MATH  Google Scholar 

  5. Watkins C J C H Learning from delayed rewards[D]. Ph D Dissertation. University of Cambridge, England, 1989.

    Google Scholar 

  6. Bertsekas D P, Tsitsiklis J N. Parallel and distributed computation: numerical methods[M]. Englewood Cliffs, NJ: Prentice-Hall, 1989, 10–109.

    Google Scholar 

  7. Yin Changming, Chen Huanwen, Xie Lijuan. A relative value iteration q-learning algorithm and its convergence based on finite samples[J]. Journal of Computer Research and Development, 2002, 39(9):1064–1070.

    Google Scholar 

  8. Yin Changming, Chen Huanwen, Xie Lijuan. Optimality cost relative value iteration Q-learning algorithm based on finite samples[J]. Journal of Computer Engineering and Applications, 2002, 38(11):65–67.

    Google Scholar 

  9. Wiering M, Schmidhuber J. Speeding up Q-learning[C]. In: Proc of the 10th European Conf on Machine Learning, Springer-Verlag, 1998, 253–363.

  10. Singh S. Soft dynamic programming algorithms: convergence proofs[C]. In: Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.

  11. Cavazos-Cadena R, Montes-de-Oca R. The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J]. Mathematics of Operations Research, 2003, 28(4):752–776.

    Article  MATH  MathSciNet  Google Scholar 

  12. Peng J, Williams R. Incremental multi-step Q-learning[J]. Machine Learning, 1996, 22(4):283–290.

    Google Scholar 

  13. Singh S. Reinforcement learning algorithm for average-payoff markovian decision processes[C]. In: Proceedings of the 12th National Conference on Artificial Intelligence, 1994, Vol 1, 700–705.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin Chang-ming  (殷苌茗).

Additional information

Communicated by GUO Xing-ming

Project supported by the National Natural Science Foundation of China (Nos.10471088 and 60572126)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, Cm., Han-xing, W. & Fei, Z. Risk-sensitive reinforcement learning algorithms with generalized average criterion. Appl Math Mech 28, 405–416 (2007). https://doi.org/10.1007/s10483-007-0313-x

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10483-007-0313-x

Key words

Chinese Library Classification

2000 Mathematics Subject Classification

Navigation