Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Agogino, Adrian K.; Tumer, Kagan

doi:10.1007/s10458-008-9046-9

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Published: 23 April 2008

Volume 17, pages 320–338, (2008)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Adrian K. Agogino¹ &
Kagan Tumer²

520 Accesses
61 Citations
Explore all metrics

Abstract

The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agogino, A., Martin, C., & Ghosh, J. (1998). Principal curve classifier—A nonlinear approach to pattern classification. In Proceedings of International Joint Conference on Neural Networks, Anchorage, Alaska.
Agogino, A., Martin, C., & Ghosh, J. (1999). Visualization of radial basis function networks. In Proceedings of International Joint Conference on Neural Networks. Washington, DC.
Agogino, A., & Tumer, K. (2004). Efficient evaluation functions for multi-rover systems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004) (pp. 1–12). Seattle, WA.
Agogino, A., & Tumer, K. (2005). Multi agent reward analysis for learning in noisy domains. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems, Utrecht, Netherlands.
Baird, L., & Moore, A. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) (pp. 968–974). Cambridge, MA.
Bishof, H., Pinz, A., & Kropatsch, W. G. (1992). Visualization methods for neural networks. In 11th International Conference on Pattern Recognition (pp. 581–585). The Hague, Netherlands.
Bishop C.M. (1995). Neural networks for pattern recognition. Oxford University Press, New York
Google Scholar
Chalkiadakis, G., & Boutilier, C. (2003). Coordination in multiagent reinforcement learning: A Bayesian approach. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-03), Melbourne, Australia.
Crites R.H. and Barto A.G. (1996). Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C. and Hasselmo, M.E. (eds) Advances in neural information processing systems-8, pp 1017–1023. MIT Press, Cambridge, MA
Google Scholar
Excelente-Toledo C.B. and Jennings N.R. (2004). The dynamic selection of coordination mechanisms. Journal of Autonomous Agents and Multi-Agent Systems 9(1–2): 55–85
Article Google Scholar
Gallagher, M., & Downs, T. (1997). Visualization of learning in neural networks using principal component analysis. In International Conference on Computational Intelligence and Multimedia Applications (pp. 327–331).
Guestrin, C., Hauskrecht, M., & Kveton, B. (2004). Solving factored MDPs with continuous and discrete variables. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 235–242).
Guestrin, C., Koller, D., & Parr, R. (2001a). Max-norm projections for factored MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence.
Guestrin, C., Koller, D., & Parr, R. (2001b). Multiagent planning with factored MDPs. In NIPS-14.
Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the 19th International Conference on Machine Learning.
Hinton G. (1986). Connectionist learning procedures. Artificial Intelligence 40: 185–234
Article Google Scholar
Hoen, P., Redekar, H. L. P. G., & Robu, V. (2004). Simulation and visualization of a market-based model for logistics management in transportation. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 1218–1219). New York, NY.
Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242–250).
Jolliffe I. (2002). Principal component analysis (2nd ed). Springer, New York
MATH Google Scholar
Kearns, M., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 740–747).
Mataric, M. J. (1998). Coordination and learning in multi-robot systems. In IEEE Intelligent Systems (pp. 6–8).
Stone P. and Veloso M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8(3): 345–383
Article Google Scholar
Sutton R.S. and Barto A.G. (1998). Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Tumer, K. (2005). Designing agent utilities for coordinated, scalable and robust multi-agent systems. In Scerri, P. Mailler, R., & R. Vincent (Eds.), Challenges in the coordination of large scale multiagent Systems. Springer (to appear).
Tumer, K., & Agogino, A. (2007). Distributed agent-based air traffic flow management. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 330–337). Honolulu, HI (Best paper award).
Tumer, K., Agogino, A., & Wolpert, D. (2002). Learning sequences of actions in collectives of autonomous agents. In Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, Bologna, Italy (pp. 378–385).
Tumer K. and Wolpert D. (Eds). (2004a). Collectives and the design of complex systems. Springer, New York
MATH Google Scholar
Tumer, K., & Wolpert, D. (2004b). A survey of collectives. In Collectives and the design of complex systems (pp. 1–42). Springer.
Tumer, K., & Wolpert, D. H. (2000). Collective intelligence and Braess Paradox. In Proceedings of the Seventeeth National Conference on Artificial Intelligence (pp. 104–109).
Wejchert J. and Tesauro G. (1991). Visualizing processes in neural networks. IBM Journal of Research and Development 35: 244–253
Article Google Scholar
Wolpert D.H. and Tumer K. (2001). Optimal payoff functions for members of collectives. Advances in Complex Systems 4(2/3): 265–279
Article MATH Google Scholar
Wolpert D.H., Tumer K. and Bandari E. (2004). Improving search algorithms by using intelligent coordinates. Physical Review E 69: 017701
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Cruz, Santa Cruz, CA, USA
Adrian K. Agogino
Oregon State University, 204 Rogers Hall, Corvallis, OR, 97330, USA
Kagan Tumer

Authors

Adrian K. Agogino
View author publications
You can also search for this author in PubMed Google Scholar
Kagan Tumer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kagan Tumer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agogino, A.K., Tumer, K. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Auton Agent Multi-Agent Syst 17, 320–338 (2008). https://doi.org/10.1007/s10458-008-9046-9

Download citation

Published: 23 April 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10458-008-9046-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation