skip to main content
10.1145/1102351.1102372acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Learning to compete, compromise, and cooperate in repeated general-sum games

Published:07 August 2005Publication History

ABSTRACT

Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.

References

  1. Bowling, M. (2004). Convergence and no-regret in multiagent learning. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  2. Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), 215--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Crandall, J. W., & Goodrich, M. A. (2004). Learning near-pareto efficient solutions with minimal knowledge requirements using satisficing. AAAI Spring Symp. on Artificial Multiagent Learning.Google ScholarGoogle Scholar
  4. Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. The MIT Press.Google ScholarGoogle Scholar
  5. Gintis, H. (2000). Game theory evolving: A problem-centered introduction to modeling strategic behavior. Princeton, New Jersey: Princeton University Press.Google ScholarGoogle Scholar
  6. Greenwald, A., & Hall, K. (2003). Correlated-q learning. Proc. of the 20th Intl. Conf. on Machine Learning.Google ScholarGoogle Scholar
  7. Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. of the 15th Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proc. of the 11th Intl. Conf. on Machine Learning.Google ScholarGoogle ScholarCross RefCross Ref
  9. Littman, M. L. (2001). Friend-or-foe: Q-learning in general-sum games. Proc. of the 18th Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Littman, M. L., & Stone, P. (2003). A polynomial-time nash equilibrium algorithm for repeated games. ACM Conf. on Electronic Commerce. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nash, J. F. (1951). Non-cooperative games. Annals of Mathematics, 54, 286--295.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sandholm, T. W., & Crites, R. H. (1995). Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma. Biosystems, Special Issue on the Prisoner's Dilemma.Google ScholarGoogle Scholar
  13. Stimpson, J. R., Goodrich, M. A., & Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner's dilemma. Proc. of the 17th Intl. Joint Conf. on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, University of Cambridge.Google ScholarGoogle Scholar
  1. Learning to compete, compromise, and cooperate in repeated general-sum games

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICML '05: Proceedings of the 22nd international conference on Machine learning
              August 2005
              1113 pages
              ISBN:1595931805
              DOI:10.1145/1102351

              Copyright © 2005 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 August 2005

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate140of548submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader