Efficient Sample Reuse in EM-Based Policy Search

Hachiya, Hirotaka; Peters, Jan; Sugiyama, Masashi

doi:10.1007/978-3-642-04180-8_48

Hirotaka Hachiya²²,
Jan Peters²³ &
Masashi Sugiyama²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2645 Accesses
3 Citations

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R³), is demonstrated through a robot learning experiment.

Download to read the full chapter text

Chapter PDF

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

A feature selection method for a sample-based stochastic policy

Article 05 September 2014

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bagnell, J.A., Kakade, S., Ng, A.Y., Schneider, J.: Policy search by dynamic programming. In: Neural Information Processing Systems, vol. 16 (2003)
Google Scholar
Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)
Article MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.: Adaptive importance sampling with automatic model selection in value function approximation. In: Proceedings of the Twenty-Third National Conference on Artificial Intelligence (2008)
Google Scholar
Kakade, S.: A natural policy gradient. In: Neural Information Processing Systems, vol. 14, pp. 1531–1538 (2002)
Google Scholar
Kober, J., Peters, J.: Policy search for motor primitives in robotics. In: Neural Information Processing Systems, vol. 21 (2008)
Google Scholar
Peshkin, C.R., Shelton, L.: Learning from scarce experience. In: Proceedings of International Conference on Machine Learning, pp. 498–505 (2002)
Google Scholar
Peters, J., Schaal, S.: Reinforcement learning by reward-weighted regression for operational space control. In: Proceedings of the International Conference on Machine Learning (2007)
Google Scholar
Peters, J., Vijayakumar, S., Shaal, S.: Natural actor-critic. In: Proceedings of the 16th European Conference on Machine Learning, pp. 280–291 (2005)
Google Scholar
Precup, D., Sutton, R.S., Singh, S.: Eligibility traces for off-policy policy evaluation. In: Proceedings of International Conference on Machine Learning, pp. 759–766 (2000)
Google Scholar
Rao, C.R.: Linear Statistical Inference and Its Applications. Wiley, Chichester (1973)
Book MATH Google Scholar
Schaal, S.: The SL Simulation and Real-Time Control Software Package. University of Southern California (2007)
Google Scholar
Shelton, C.R.: Policy improvement for POMDPs using normalized importance sampling. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 496–503 (2001)
Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2), 227–244 (2000)
Article MathSciNet MATH Google Scholar
Sugiyama, M., Krauledat, M., Müller, K.-R.: Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 985–1005 (2007)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., Mcallester, M., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, 152-8552, Japan
Hirotaka Hachiya & Masashi Sugiyama
Max-Planck Institute for Biological Cybernetics, 72076, Tübingen, Germany
Jan Peters

Authors

Hirotaka Hachiya
View author publications
You can also search for this author in PubMed Google Scholar
Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hachiya, H., Peters, J., Sugiyama, M. (2009). Efficient Sample Reuse in EM-Based Policy Search. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Sample Reuse in EM-Based Policy Search

Abstract

Chapter PDF

Similar content being viewed by others

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

A feature selection method for a sample-based stochastic policy

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Sample Reuse in EM-Based Policy Search

Abstract

Chapter PDF

Similar content being viewed by others

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Improving Sample Efficiency in Behavior Learning by Using Sub-optimal Planners for Robots

A feature selection method for a sample-based stochastic policy

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation