ABSTRACT
Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.
- Bowling, M. (2004). Convergence and no-regret in multiagent learning. Advances in Neural Information Processing Systems.Google Scholar
- Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), 215--250. Google ScholarDigital Library
- Crandall, J. W., & Goodrich, M. A. (2004). Learning near-pareto efficient solutions with minimal knowledge requirements using satisficing. AAAI Spring Symp. on Artificial Multiagent Learning.Google Scholar
- Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. The MIT Press.Google Scholar
- Gintis, H. (2000). Game theory evolving: A problem-centered introduction to modeling strategic behavior. Princeton, New Jersey: Princeton University Press.Google Scholar
- Greenwald, A., & Hall, K. (2003). Correlated-q learning. Proc. of the 20th Intl. Conf. on Machine Learning.Google Scholar
- Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. of the 15th Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proc. of the 11th Intl. Conf. on Machine Learning.Google ScholarCross Ref
- Littman, M. L. (2001). Friend-or-foe: Q-learning in general-sum games. Proc. of the 18th Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Littman, M. L., & Stone, P. (2003). A polynomial-time nash equilibrium algorithm for repeated games. ACM Conf. on Electronic Commerce. Google ScholarDigital Library
- Nash, J. F. (1951). Non-cooperative games. Annals of Mathematics, 54, 286--295.Google ScholarCross Ref
- Sandholm, T. W., & Crites, R. H. (1995). Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma. Biosystems, Special Issue on the Prisoner's Dilemma.Google Scholar
- Stimpson, J. R., Goodrich, M. A., & Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner's dilemma. Proc. of the 17th Intl. Joint Conf. on Artificial Intelligence. Google ScholarDigital Library
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. The MIT Press. Google ScholarDigital Library
- Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, University of Cambridge.Google Scholar
- Learning to compete, compromise, and cooperate in repeated general-sum games
Recommendations
Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
We consider the problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. Due to the non-stationarity of the environment caused by learning associates in these games, ...
Comments