Skip to main content

Parallel Reinforcement Learning with Linear Function Approximation

  • Conference paper
Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning (AAMAS 2005, ALAMAS 2007, ALAMAS 2006)

Abstract

In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) algorithm, with value functions represented using linear function approximators. In our proposed method, each agent learns independently in a separate simulation of the single-agent problem. The agents periodically exchange information extracted from the weights of their approximators, accelerating convergence towards the optimal policy. We develop three increasingly efficient versions of this approach to parallel RL, and present empirical results for an implementation of the methods on a Beowulf cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: ICML 1993. Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)

    Google Scholar 

  2. Nunes, L., Oliveira, E.: Cooperative learning using advice exchange. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Adaptive Agents and Multi-Agent Systems. LNCS (LNAI), vol. 2636. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Whitehead, S.D.: A complexity analysis of cooperative mechanisms in reinforcement learning. In: AAAI 1991. Proceedings of the 9th National Conference on Artificial Intelligence, pp. 607–613 (1991)

    Google Scholar 

  4. Kretchmar, R.M.: Parallel reinforcement learning. In: SCI 2002. Proceedings of the 6th World Conference on Systemics, Cybernetics, and Informatics (2002)

    Google Scholar 

  5. Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, U.K. (1989)

    Google Scholar 

  6. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report TR166, Cambridge University Engineering Dept. (1994)

    Google Scholar 

  7. Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Neural Information Processing Systems, vol. 8 (1996)

    Google Scholar 

  8. Singh, S., Jaakkola, T., Littman, M.L., Szepesvari, C.: Convergence results for single-step on-policy reinforcement learning algorithms. Machine Learning 38(3), 287–308 (2000)

    Article  MATH  Google Scholar 

  9. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Archibald, T.: Parallel dynamic programming. In: Kronsjö, L., Shumsheruddin, D. (eds.) Advances in Parallel Algorithms. Blackwell Scientific, Malden (1992)

    Google Scholar 

  12. Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)

    MATH  Google Scholar 

  13. Grounds, M.J.: Scaling Up Reinforcement Learning using Parallelization and Symbolic Planning. PhD thesis, The University of York, UK (2007)

    Google Scholar 

  14. Ahmadabadi, M.N., Asadpour, M.: Expertness based cooperative Q-learning. IEEE Transactions on Systems, Man and Cybernetics 32(1), 66–76 (2002)

    Article  Google Scholar 

  15. Wingate, D., Seppi, K.: P3VI: A partitioned, prioritized, parallel value iterator. In: Proceedings of the 21st International Conference on Machine Learning (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Karl Tuyls Ann Nowe Zahia Guessoum Daniel Kudenko

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grounds, M., Kudenko, D. (2008). Parallel Reinforcement Learning with Linear Function Approximation. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds) Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. AAMAS ALAMAS ALAMAS 2005 2007 2006. Lecture Notes in Computer Science(), vol 4865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77949-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77949-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77947-6

  • Online ISBN: 978-3-540-77949-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics