Risk-sensitive reinforcement learning algorithms with generalized average criterion

Yin, Chang-ming; Han-xing, Wang; Fei, Zhao

doi:10.1007/s10483-007-0313-x

Risk-sensitive reinforcement learning algorithms with generalized average criterion

Published: March 2007

Volume 28, pages 405–416, (2007)
Cite this article

Applied Mathematics and Mechanics Aims and scope Submit manuscript

Yin Chang-ming (殷苌茗)^1,2,
Wang Han-xing (王汉兴)² &
Zhao Fei (赵飞)²

57 Accesses
1 Citation
Explore all metrics

Abstract

A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sutton R S. Learning to predict by the method of temporal difference[J]. Machine Learning, 1988, 3(1):9–44.
Google Scholar
Sutton R S. Open theoretical questions in reinforcement learning[C]. In: Proc of Euro-COLT’99 (Computational Learning Theory). Cambridge, MA: MIT Press, 1999, 11–17.
Google Scholar
Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. MA: MIT Press, 1998, 20–300.
Google Scholar
Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(13):279–292.
MATH Google Scholar
Watkins C J C H Learning from delayed rewards[D]. Ph D Dissertation. University of Cambridge, England, 1989.
Google Scholar
Bertsekas D P, Tsitsiklis J N. Parallel and distributed computation: numerical methods[M]. Englewood Cliffs, NJ: Prentice-Hall, 1989, 10–109.
Google Scholar
Yin Changming, Chen Huanwen, Xie Lijuan. A relative value iteration q-learning algorithm and its convergence based on finite samples[J]. Journal of Computer Research and Development, 2002, 39(9):1064–1070.
Google Scholar
Yin Changming, Chen Huanwen, Xie Lijuan. Optimality cost relative value iteration Q-learning algorithm based on finite samples[J]. Journal of Computer Engineering and Applications, 2002, 38(11):65–67.
Google Scholar
Wiering M, Schmidhuber J. Speeding up Q-learning[C]. In: Proc of the 10th European Conf on Machine Learning, Springer-Verlag, 1998, 253–363.
Singh S. Soft dynamic programming algorithms: convergence proofs[C]. In: Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.
Cavazos-Cadena R, Montes-de-Oca R. The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J]. Mathematics of Operations Research, 2003, 28(4):752–776.
Article MATH MathSciNet Google Scholar
Peng J, Williams R. Incremental multi-step Q-learning[J]. Machine Learning, 1996, 22(4):283–290.
Google Scholar
Singh S. Reinforcement learning algorithm for average-payoff markovian decision processes[C]. In: Proceedings of the 12th National Conference on Artificial Intelligence, 1994, Vol 1, 700–705.
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha, 410076, P. R. China
Yin Chang-ming (殷苌茗) (Associate Professor, Doctor)
College of Sciences, Shanghai University, Shanghai, 200444, P. R. China
Yin Chang-ming (殷苌茗) (Associate Professor, Doctor), Wang Han-xing (王汉兴) & Zhao Fei (赵飞)

Authors

Yin Chang-ming (殷苌茗)
View author publications
You can also search for this author in PubMed Google Scholar
Wang Han-xing (王汉兴)
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Fei (赵飞)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin Chang-ming (殷苌茗).

Additional information

Communicated by GUO Xing-ming

Project supported by the National Natural Science Foundation of China (Nos.10471088 and 60572126)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, Cm., Han-xing, W. & Fei, Z. Risk-sensitive reinforcement learning algorithms with generalized average criterion. Appl Math Mech 28, 405–416 (2007). https://doi.org/10.1007/s10483-007-0313-x

Download citation

Received: 21 March 2006
Revised: 07 December 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s10483-007-0313-x

Key words

Chinese Library Classification

2000 Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Risk-sensitive reinforcement learning algorithms with generalized average criterion

Abstract

Access this article

Similar content being viewed by others

Examining Average and Discounted Reward Optimality Criteria in Reinforcement Learning

Reinforcement Learning for Model Problems of Optimal Control

A data-based online reinforcement learning algorithm satisfying probably approximately correct principle

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Chinese Library Classification

2000 Mathematics Subject Classification

Navigation

Risk-sensitive reinforcement learning algorithms with generalized average criterion

Abstract

Access this article

Similar content being viewed by others

Examining Average and Discounted Reward Optimality Criteria in Reinforcement Learning

Reinforcement Learning for Model Problems of Optimal Control

A data-based online reinforcement learning algorithm satisfying probably approximately correct principle

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Chinese Library Classification

2000 Mathematics Subject Classification

Search

Navigation