research-article

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

Authors:
Walid Krichene

University of California, Berkeley

University of California, Berkeley
View Profile

,
Mohamed Chedhli Bourguiba

Ecole Polytechnique, Palaiseau, France

Ecole Polytechnique, Palaiseau, France
View Profile

,
Kiet Tlam

University of California, Berkeley

University of California, Berkeley
View Profile

,
Alexandre Bayen

University of California, Berkeley

University of California, Berkeley
View Profile

Authors Info & Claims

ACM Transactions on Cyber-Physical Systems Volume 2 Issue 1Article No.: 6pp 1–23https://doi.org/10.1145/3078620

Published:03 January 2018Publication History

ACM Transactions on Cyber-Physical Systems

Abstract

The routing game models congestion in transportation networks, communication networks, and other cyber-physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players.

We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the parameters of the learning model. We consider, in particular, entropic mirror descent dynamics and reduce the problem to estimating the learning rates of each player.

In order to demonstrate our methods, we developed a web application that allows players to participate in a distributed, online routing game, and we deployed the application on Amazon Mechanical Turk. When players log in, they are assigned an origin and destination on a shared network. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a dataset using this platform, then apply the proposed method to estimate the learning rates of each player. We observe, in particular, that after an exploration phase, the joint decision of the players remains within a small distance of the set of equilibria. We also use the estimated model parameters to predict the flow distribution over routes, and compare our predictions to the actual distributions, showing that the online learning model can be used as a predictive model over short horizons. Finally, we discuss some of the qualitative insights from the experiments, and give directions for future research.

References

G. Arslan and J. S. Shamma. 2004. Distributed convergence to Nash equilibria with local utility measurements. In 43rd IEEE Conference on Decision and Control, Vol. 2. 1538--1543.Google Scholar
Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. 2005. Clustering with Bregman divergences. J. Mach. Learn. Res. 6 (Dec. 2005), 1705--1749. Google ScholarDigital Library
A. M. Bayen, J. Butler, A. D. Patire, CCIT, UC Berkeley ITS, and California Dpartment of Transportation, Division of Research and Innovation. 2011. Mobile Millennium Final Report.Google Scholar
Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 3 (May 2003), 167--175. Google ScholarDigital Library
Martin J. Beckmann, Charles B. McGuire, and Christopher B. Winsten. 1955. Studies in the economics of transportation. Cowles Commission for Research in Economics at Yale University.Google Scholar
Aharon Ben-Tal, Tamar Margalit, and Arkadi Nemirovski. 2001. The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 1 (Jan. 2001), 79--108. Google ScholarDigital Library
David Blackwell. 1956. An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6, 1 (1956), 1--8.Google ScholarCross Ref
Avrim Blum, Eyal Even-Dar, and Katrina Ligett. 2006. Routing without regret: On convergence to Nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, 45--52. Google ScholarDigital Library
Stephen Boyd and Lieven Vandenberghe. 2010. Convex Optimization. Vol. 25. Cambridge University Press.Google Scholar
Carlos Canudas De Wit, Fabio Morbidi, Luis Leon Ojeda, Alain Y. Kibangou, Iker Bellicot, and Pascal Bellemain. 2015. Grenoble traffic lab: An experimental platform for advanced traffic monitoring and forecasting. IEEE Control Syst. 35, 3 (June 2015), 23--39.Google Scholar
Yair Censor and Stavros Zenios. 1997. Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press. Google ScholarDigital Library
Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google ScholarDigital Library
Simon Fischer and Berthold Vöcking. 2004. On the evolution of selfish routing. In Algorithms--ESA 2004. Springer, 323--334.Google Scholar
Michael J. Fox and Jeff S. Shamma. 2013. Population games, stable games, and passivity. Games 4, 4 (2013), 561--583.Google ScholarCross Ref
Yoav Freund and Robert E. Schapire. 1999. Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 1 (1999), 79--103.Google ScholarCross Ref
Drew Fudenberg and David K. Levine. 1998. The Theory of Learning in Games. Vol. 2. MIT Press.Google Scholar
James Hannan. 1957. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games 3 (1957), 97--139.Google Scholar
Sergiu Hart. 2005. Adaptive heuristics. Econometrica 73, 5 (2005), 1401--1430.Google ScholarCross Ref
Sergiu Hart and Andreu Mas-Colell. 2001. A general class of adaptive strategies. J. Econ. Theory 98, 1 (2001), 26--54.Google ScholarCross Ref
Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, and Wolfram Burgard. 2016. Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. JMLR, 102--110.Google Scholar
Jyrki Kivinen and Manfred K. Warmuth. 1997. Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132, 1 (1997), 1--63. Google ScholarDigital Library
Robert Kleinberg, Georgios Piliouras, and Eva Tardos. 2009. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 533--542. Google ScholarDigital Library
Elias Koutsoupias and Christos Papadimitriou. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science. 404--413. Google ScholarDigital Library
Walid Krichene, Benjamin Drighès, and Alexandre Bayen. 2015a. Online learning of Nash equilibria in congestion games. SIAM J. Control Opt. (SICON) 53, 2 (2015), 1056--1081.Google ScholarCross Ref
Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015b. Convergence of mirror descent dynamics in the routing game. In Proceedings of the European Control Conference (ECC).Google ScholarCross Ref
Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015c. Efficient Bregman projections onto the simplex. In 54th IEEE Conference on Decision and Control (CDC). Osaka, 3291–3298.Google ScholarCross Ref
J. R. Marden and J. S. Shamma. 2013. Game theory and distributed control. In Handbook of Game Theory, Vol. 4, H.P. Young and S. Zamir (Eds.). Elsevier Science.Google Scholar
Jason R. Marden, Shalom D. Ruben, and Lucy Y. Pao. 2013. A model-free approach to wind farm control using game theoretic methods. IEEE Trans Control Syst. Technol. 21, 4 (2013), 1207--1214.Google ScholarCross Ref
Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods 44, 1 (2011), 1–23.Google ScholarCross Ref
A. S. Nemirovsky and D. B. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley.Google Scholar
Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, 663--670. Google ScholarDigital Library
Asuman Ozdaglar and R Srikant. 2007. Incentives and pricing in communication networks. Algorithmic Game Theory (2007), 571--591.Google Scholar
Robert W. Rosenthal. 1973. A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 2, 1 (1973), 65--67.Google ScholarDigital Library
Tim Roughgarden. 2007. Routing games. In Algorithmic Game Theory. Cambridge University Press, Chap. 18, 461--486.Google Scholar
William H. Sandholm. 2001. Potential games with continuous player sets. J. Econ. Theory 97, 1 (2001), 81--108.Google ScholarCross Ref
Herbert A. Simon. 1955. A behavioral model of rational choice. Q. J. Econ. 69, 1 (1955), pp. 99--118.Google ScholarCross Ref
John Glen Wardrop. 1952. Some theoretical aspects of road traffic research. In ICE Proceedings: Engineering Divisions, Vol. 1. Thomas Telford, 325--362.Google ScholarCross Ref

Index Terms

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

Recommendations

On learning how players learn: estimation of learning dynamics in the routing game
ICCPS '16: Proceedings of the 7th International Conference on Cyber-Physical Systems

The routing game models congestion in transportation networks, communication networks, and other cyber physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every ...
Read More
A price-based reliable routing game in wireless networks
GameNets '06: Proceeding from the 2006 workshop on Game theory for communications and networks

We investigate a price-based reliable routing game in a wireless network of selfish users. Each node is characterized by a probability of reliably forwarding a packet, and each link is characterized by a cost of transmission. The objective is to form a ...
Read More
Understanding online collectible card game players' motivations: a survey study with two games
OzCHI '18: Proceedings of the 30th Australian Conference on Computer-Human Interaction

Online collectible card games (OCCGs) are digital, networked contemporaries of collectible card games (CCGs) which combine the collection of trading cards with strategic deck building and competitive gameplay. Despite their popularity and unique ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Cyber-Physical Systems Volume 2, Issue 1
Special Issue on ICCPS 2016
January 2018
140 pages
ISSN:2378-962X
EISSN:2378-9638
DOI:10.1145/3174275
Editor:
Tei-Wei Kuo
National Taiwan University, and Academia Sinica, Taiwan
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 3 January 2018
- Accepted: 1 April 2017
- Revised: 1 March 2017
- Received: 1 July 2016
Published in tcps Volume 2, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Routing game
behavioral experiment
sequential decision model
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 235
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

ACM Transactions on Cyber-Physical Systems

Abstract

References

Cited By

Index Terms

Recommendations

On learning how players learn: estimation of learning dynamics in the routing game

A price-based reliable routing game in wireless networks

Understanding online collectible card game players' motivations: a survey study with two games