Greedy feature replacement for online value function approximation

Zhao, Feng-fei; Qin, Zheng; Shao, Zhuo; Fang, Jun; Ren, Bo-yan

doi:10.1631/jzus.C1300246

Greedy feature replacement for online value function approximation

Published: 12 March 2014

Volume 15, pages 223–231, (2014)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Feng-fei Zhao¹,
Zheng Qin^1,2,
Zhuo Shao³,
Jun Fang¹ &
…
Bo-yan Ren²

88 Accesses
2 Citations
Explore all metrics

Abstract

Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Linear Value Function Approximation in Reinforcement Learning

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Article 03 July 2018

Ajin George Joseph & Shalabh Bhatnagar

Value Function Approximation

References

Albus, J.S., 1971. A theory of cerebellar function. Math. Biosci., 10(1–2):25–61. [doi:10.1016/0025-5564(71)900 51-4]
Article Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P., 1995. Learning to act using real-time dynamic programming. Artif. Intell., 72(1–2):81–138. [doi:10.1016/0004-3702(94)00011-O]
Article Google Scholar
Buro, M., 1999. From simple features to sophisticated evaluation functions. Proc. 1st Int. Conf. on Computers and Games, p.126–145. [doi:10.1007/3-540-48957-6_8]
Chapter Google Scholar
de Hauwere, Y.M., Vrancx, P., Nowé, A., 2010. Generalized learning automata for multi-agent reinforcement learning. AI Commun., 23(4):311–324. [doi:10.3233/AIC-2010-0476]
MATH MathSciNet Google Scholar
Geramifard, A., Doshi, F., Redding, J., et al., 2011. Online discovery of feature dependencies. Proc. 28th Int. Conf. on Machine Learning, p.881–888.
Google Scholar
Geramifard, A., Dann, C., How, J.P., 2013. Off-policy learning combined with automatic feature expansion for solving large MDPs. Proc. 1st Multidisciplinary Conf. on Reinforcement Learning and Decision Making, p.29–33.
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W., 1996. Reinforcement learning: a survey. J. Artif. Intell. Res., 4:237–285. [doi:10.1613/jair.301]
Google Scholar
Kolter, J.Z., Ng, A.Y., 2009. Near-Bayesian exploration in polynomial time. Proc. 26th Annual Int. Conf. on Machine Learning, p. 513–520. [doi:10.1145/1553374. 1553441]
Google Scholar
Lagoudakis, M.G., Parr, R., 2003. Least-squares policy iteration. J. Mach. Learn. Res., 4(6):1107–1149.
MathSciNet Google Scholar
Pazis, J., Lagoudakis, M.G., 2009. Binary action search for learning continuous-action control policies. Proc. 26th Annual Int. Conf. on Machine Learning, p.793–800. [doi:10.1145/1553374.1553476]
Google Scholar
Puterman, M.L., 1994. Markov Decision Processes-Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY, p.139–161.
MATH Google Scholar
Ratitch, B., Precup, D., 2004. Sparse distributed memories for on-line value-based reinforcement learning. Proc. 15th European Conf. on Machine Learning, p.347–358. [doi:10. 1007/978-3-540-30115-8_33]
Google Scholar
Rummery, G.A., Niranjan, M., 1994. On-line Q-learning Using Connectionist Systems. Technical Report No. cued/f-infeng/tr166, Engineering Department, Cambridge University.
Google Scholar
Singh, S., Jaakkola, T., Littman, M.L., et al., 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn., 38(3):287–308. [doi:10.1023/A: 1007678930559]
Article MATH Google Scholar
Singh, S.P., Sutton, R.S., 1996. Reinforcement learning with replacing eligibility traces. Mach. Learn., 22(1–3):123–158. [doi:10.1023/A:1018012322525]
MATH Google Scholar
Singh, S.P., Yee, R.C., 1994. An upper bound on the loss from approximate optimal-value functions. Mach. Learn., 16(3):227–233. [doi:10.1007/Bf00993308]
MATH Google Scholar
Sprague, N., Ballard, D., 2003. Multiple-goal reinforcement learning with modular sarsa(0). Proc. 18th Int. Joint Conf. on Artificial Intelligence, p.1445–1447.
Google Scholar
Sturtevant, N.R., White, A.M., 2006. Feature construction for reinforcement learning in hearts. Proc. 5th Int. Conf. on Computers and Games, p.122–134. [doi:10.1007/978-3-540-75538-8_11]
Google Scholar
Sutton, R.S., 1996. Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv. Neur. Inform. Process. Syst., 8:1038–1044.
Google Scholar
Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA, p.3–25.
Google Scholar
Tsitsiklis, J.N., 1994. Asynchronous stochastic approximation and Q-learning. Mach. Learn., 16(3):185–202. [doi:10. 1007/Bf00993306]
MATH MathSciNet Google Scholar
Tsitsiklis, J.N., van Roy, B., 1997. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr., 42(5):674–690. [doi:10.1109/9. 580874]
Article MATH Google Scholar
Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3–4):279–292. [doi:10.1007/Bf00992698]
MATH Google Scholar
Whiteson, S., Taylor, M.E., Stone, P., 2007. Adaptive Tile Coding for Value Function Approximation. Technical Report No. AI-TR-07-339, University of Texas at Austin.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Feng-fei Zhao, Zheng Qin & Jun Fang
School of Software, Tsinghua University, Beijing, 100084, China
Zheng Qin & Bo-yan Ren
Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics, Tsinghua University, Beijing, 100084, China
Zhuo Shao

Authors

Feng-fei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Bo-yan Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng-fei Zhao.

Additional information

Project supported by the 12th Five-Year Defense Exploration Project of China (No. 041202005) and the Ph.D. Program Foundation of the Ministry of Education of China (No. 20120002130007)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Ff., Qin, Z., Shao, Z. et al. Greedy feature replacement for online value function approximation. J. Zhejiang Univ. - Sci. C 15, 223–231 (2014). https://doi.org/10.1631/jzus.C1300246

Download citation

Received: 06 September 2013
Accepted: 14 January 2014
Published: 12 March 2014
Issue Date: March 2014
DOI: https://doi.org/10.1631/jzus.C1300246

Key words

CLC number

TP181

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Greedy feature replacement for online value function approximation

Abstract

Access this article

Similar content being viewed by others

A Survey of Linear Value Function Approximation in Reinforcement Learning

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Value Function Approximation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Greedy feature replacement for online value function approximation

Abstract

Access this article

Similar content being viewed by others

A Survey of Linear Value Function Approximation in Reinforcement Learning

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method

Value Function Approximation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation