Skip to main content
Log in

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agogino, A., Martin, C., & Ghosh, J. (1998). Principal curve classifier—A nonlinear approach to pattern classification. In Proceedings of International Joint Conference on Neural Networks, Anchorage, Alaska.

  2. Agogino, A., Martin, C., & Ghosh, J. (1999). Visualization of radial basis function networks. In Proceedings of International Joint Conference on Neural Networks. Washington, DC.

  3. Agogino, A., & Tumer, K. (2004). Efficient evaluation functions for multi-rover systems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004) (pp. 1–12). Seattle, WA.

  4. Agogino, A., & Tumer, K. (2005). Multi agent reward analysis for learning in noisy domains. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems, Utrecht, Netherlands.

  5. Baird, L., & Moore, A. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) (pp. 968–974). Cambridge, MA.

  6. Bishof, H., Pinz, A., & Kropatsch, W. G. (1992). Visualization methods for neural networks. In 11th International Conference on Pattern Recognition (pp. 581–585). The Hague, Netherlands.

  7. Bishop C.M. (1995). Neural networks for pattern recognition. Oxford University Press, New York

    Google Scholar 

  8. Chalkiadakis, G., & Boutilier, C. (2003). Coordination in multiagent reinforcement learning: A Bayesian approach. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-03), Melbourne, Australia.

  9. Crites R.H. and Barto A.G. (1996). Improving elevator performance using reinforcement learning. In: Touretzky, D.S., Mozer, M.C. and Hasselmo, M.E. (eds) Advances in neural information processing systems-8, pp 1017–1023. MIT Press, Cambridge, MA

    Google Scholar 

  10. Excelente-Toledo C.B. and Jennings N.R. (2004). The dynamic selection of coordination mechanisms. Journal of Autonomous Agents and Multi-Agent Systems 9(1–2): 55–85

    Article  Google Scholar 

  11. Gallagher, M., & Downs, T. (1997). Visualization of learning in neural networks using principal component analysis. In International Conference on Computational Intelligence and Multimedia Applications (pp. 327–331).

  12. Guestrin, C., Hauskrecht, M., & Kveton, B. (2004). Solving factored MDPs with continuous and discrete variables. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (pp. 235–242).

  13. Guestrin, C., Koller, D., & Parr, R. (2001a). Max-norm projections for factored MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence.

  14. Guestrin, C., Koller, D., & Parr, R. (2001b). Multiagent planning with factored MDPs. In NIPS-14.

  15. Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the 19th International Conference on Machine Learning.

  16. Hinton G. (1986). Connectionist learning procedures. Artificial Intelligence 40: 185–234

    Article  Google Scholar 

  17. Hoen, P., Redekar, H. L. P. G., & Robu, V. (2004). Simulation and visualization of a market-based model for logistics management in transportation. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 1218–1219). New York, NY.

  18. Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242–250).

  19. Jolliffe I. (2002). Principal component analysis (2nd ed). Springer, New York

    MATH  Google Scholar 

  20. Kearns, M., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 740–747).

  21. Mataric, M. J. (1998). Coordination and learning in multi-robot systems. In IEEE Intelligent Systems (pp. 6–8).

  22. Stone P. and Veloso M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8(3): 345–383

    Article  Google Scholar 

  23. Sutton R.S. and Barto A.G. (1998). Reinforcement learning: An introduction. MIT Press, Cambridge, MA

    Google Scholar 

  24. Tumer, K. (2005). Designing agent utilities for coordinated, scalable and robust multi-agent systems. In Scerri, P. Mailler, R., & R. Vincent (Eds.), Challenges in the coordination of large scale multiagent Systems. Springer (to appear).

  25. Tumer, K., & Agogino, A. (2007). Distributed agent-based air traffic flow management. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (pp. 330–337). Honolulu, HI (Best paper award).

  26. Tumer, K., Agogino, A., & Wolpert, D. (2002). Learning sequences of actions in collectives of autonomous agents. In Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, Bologna, Italy (pp. 378–385).

  27. Tumer K. and Wolpert D. (Eds). (2004a). Collectives and the design of complex systems. Springer, New York

    MATH  Google Scholar 

  28. Tumer, K., & Wolpert, D. (2004b). A survey of collectives. In Collectives and the design of complex systems (pp. 1–42). Springer.

  29. Tumer, K., & Wolpert, D. H. (2000). Collective intelligence and Braess Paradox. In Proceedings of the Seventeeth National Conference on Artificial Intelligence (pp. 104–109).

  30. Wejchert J. and Tesauro G. (1991). Visualizing processes in neural networks. IBM Journal of Research and Development 35: 244–253

    Article  Google Scholar 

  31. Wolpert D.H. and Tumer K. (2001). Optimal payoff functions for members of collectives. Advances in Complex Systems 4(2/3): 265–279

    Article  MATH  Google Scholar 

  32. Wolpert D.H., Tumer K. and Bandari E. (2004). Improving search algorithms by using intelligent coordinates. Physical Review E 69: 017701

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kagan Tumer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agogino, A.K., Tumer, K. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Auton Agent Multi-Agent Syst 17, 320–338 (2008). https://doi.org/10.1007/s10458-008-9046-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-008-9046-9

Keywords

Navigation