Ideas for a Reinforcement Learning Algorithm that Learns Programs

Katayama, Susumu

doi:10.1007/978-3-319-41649-6_36

Ideas for a Reinforcement Learning Algorithm that Learns Programs

Susumu Katayama¹⁶

Conference paper
First Online: 25 June 2016

1518 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9782))

Abstract

Conventional reinforcement learning algorithms such as Q-learning are not good at learning complicated procedures or programs because they are not designed to do that. AIXI, which is a general framework for reinforcement learning, can learn programs as the environment model, but it is not computable.AIXI has a computable and computationally tractable approximation, MC-AIXI(FAC-CTW), but it models the environment not as programs but as a trie, and still has not resolved the trade-off between exploration and exploitation within a realistic amount of computation.

This paper presents our research idea for realizing an efficient reinforcement learning algorithm that retains the property of modeling the environment as programs. It also models the policy as programs and has the ability to imitate other agents in the environment.

The design policy of the algorithm has two points: (1) the ability to program is indispensable for human-level intelligence, and (2) a realistic solution to the exploration/exploitation trade-off is teaching via imitation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.html.

References

Hutter, M.: Self-optimizing and pareto-optimal policies in general environments based on bayes-mixtures. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 364–379. Springer, Heidelberg (2002). http://dx.doi.org/10.1007/3-540-45435-7_25
Chapter Google Scholar
Hutter, M.: Universal algorithmic intelligence: a mathematical top \(\rightarrow \) down approach. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence. Cognitive Technologies, pp. 227–290. Springer, Heidelberg (2007). http://www.hutter1.net/ai/aixigentle.htm
Chapter Google Scholar
Katayama, S.: Systematic search for lambda expressions. In: Sixth Symposium on Trends in Functional Programming, pp. 195–205 (2005)
Google Scholar
Katayama, S.: Efficient exhaustive generation of functional programs using Monte-Carlo search with iterative deepening. In: Ho, T.B., Zhou, Z.H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 199–210. Springer, Heidelberg (2008)
Chapter Google Scholar
Katayama, S.: Towards human-level inductive functional programming. In: Bieger, J., Goertzel, B., Potapov, A. (eds.) AGI 2015. LNCS, vol. 9205, pp. 111–120. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-21365-1_12
Chapter Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Orseau, L.: Optimality issues of universal greedy agents with static priors. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) Algorithmic Learning Theory. LNCS, vol. 6331, pp. 345–359. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-16108-7_28
Chapter Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Google Scholar
Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approximation. J. Artif. Intell. Res. 40, 95–142 (2011)
MathSciNet MATH Google Scholar
Willems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context tree weighting method: basic properties. IEEE Trans. Inf. Theor. 41, 653–664 (1995)
Article MATH Google Scholar

Download references

Acknowledgements

The author thanks anonymous reviewers who helped improving the paper, especially who mentioned [8].

Author information

Authors and Affiliations

University of Miyazaki, 1-1 W. Gakuenkibanadai, Miyazaki, Miyazaki, 889-2192, Japan
Susumu Katayama

Authors

Susumu Katayama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susumu Katayama .

Editor information

Editors and Affiliations

Galleria 1, IDSIA, Manno, Switzerland
Bas Steunebrink
Temple University, Phoenixville, Pennsylvania, USA
Pei Wang
Hong Kong Polytechnic University, Hong Kong, Hong Kong
Ben Goertzel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Katayama, S. (2016). Ideas for a Reinforcement Learning Algorithm that Learns Programs. In: Steunebrink, B., Wang, P., Goertzel, B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science(), vol 9782. Springer, Cham. https://doi.org/10.1007/978-3-319-41649-6_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-41649-6_36
Published: 25 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41648-9
Online ISBN: 978-3-319-41649-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics