Abstract
The routing game models congestion in transportation networks, communication networks, and other cyber-physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players.
We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the parameters of the learning model. We consider, in particular, entropic mirror descent dynamics and reduce the problem to estimating the learning rates of each player.
In order to demonstrate our methods, we developed a web application that allows players to participate in a distributed, online routing game, and we deployed the application on Amazon Mechanical Turk. When players log in, they are assigned an origin and destination on a shared network. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a dataset using this platform, then apply the proposed method to estimate the learning rates of each player. We observe, in particular, that after an exploration phase, the joint decision of the players remains within a small distance of the set of equilibria. We also use the estimated model parameters to predict the flow distribution over routes, and compare our predictions to the actual distributions, showing that the online learning model can be used as a predictive model over short horizons. Finally, we discuss some of the qualitative insights from the experiments, and give directions for future research.
- G. Arslan and J. S. Shamma. 2004. Distributed convergence to Nash equilibria with local utility measurements. In 43rd IEEE Conference on Decision and Control, Vol. 2. 1538--1543.Google Scholar
- Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. 2005. Clustering with Bregman divergences. J. Mach. Learn. Res. 6 (Dec. 2005), 1705--1749. Google ScholarDigital Library
- A. M. Bayen, J. Butler, A. D. Patire, CCIT, UC Berkeley ITS, and California Dpartment of Transportation, Division of Research and Innovation. 2011. Mobile Millennium Final Report.Google Scholar
- Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 3 (May 2003), 167--175. Google ScholarDigital Library
- Martin J. Beckmann, Charles B. McGuire, and Christopher B. Winsten. 1955. Studies in the economics of transportation. Cowles Commission for Research in Economics at Yale University.Google Scholar
- Aharon Ben-Tal, Tamar Margalit, and Arkadi Nemirovski. 2001. The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 1 (Jan. 2001), 79--108. Google ScholarDigital Library
- David Blackwell. 1956. An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6, 1 (1956), 1--8.Google ScholarCross Ref
- Avrim Blum, Eyal Even-Dar, and Katrina Ligett. 2006. Routing without regret: On convergence to Nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, 45--52. Google ScholarDigital Library
- Stephen Boyd and Lieven Vandenberghe. 2010. Convex Optimization. Vol. 25. Cambridge University Press.Google Scholar
- Carlos Canudas De Wit, Fabio Morbidi, Luis Leon Ojeda, Alain Y. Kibangou, Iker Bellicot, and Pascal Bellemain. 2015. Grenoble traffic lab: An experimental platform for advanced traffic monitoring and forecasting. IEEE Control Syst. 35, 3 (June 2015), 23--39.Google Scholar
- Yair Censor and Stavros Zenios. 1997. Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press. Google ScholarDigital Library
- Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google ScholarDigital Library
- Simon Fischer and Berthold Vöcking. 2004. On the evolution of selfish routing. In Algorithms--ESA 2004. Springer, 323--334.Google Scholar
- Michael J. Fox and Jeff S. Shamma. 2013. Population games, stable games, and passivity. Games 4, 4 (2013), 561--583.Google ScholarCross Ref
- Yoav Freund and Robert E. Schapire. 1999. Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 1 (1999), 79--103.Google ScholarCross Ref
- Drew Fudenberg and David K. Levine. 1998. The Theory of Learning in Games. Vol. 2. MIT Press.Google Scholar
- James Hannan. 1957. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games 3 (1957), 97--139.Google Scholar
- Sergiu Hart. 2005. Adaptive heuristics. Econometrica 73, 5 (2005), 1401--1430.Google ScholarCross Ref
- Sergiu Hart and Andreu Mas-Colell. 2001. A general class of adaptive strategies. J. Econ. Theory 98, 1 (2001), 26--54.Google ScholarCross Ref
- Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, and Wolfram Burgard. 2016. Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. JMLR, 102--110.Google Scholar
- Jyrki Kivinen and Manfred K. Warmuth. 1997. Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132, 1 (1997), 1--63. Google ScholarDigital Library
- Robert Kleinberg, Georgios Piliouras, and Eva Tardos. 2009. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 533--542. Google ScholarDigital Library
- Elias Koutsoupias and Christos Papadimitriou. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science. 404--413. Google ScholarDigital Library
- Walid Krichene, Benjamin Drighès, and Alexandre Bayen. 2015a. Online learning of Nash equilibria in congestion games. SIAM J. Control Opt. (SICON) 53, 2 (2015), 1056--1081.Google ScholarCross Ref
- Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015b. Convergence of mirror descent dynamics in the routing game. In Proceedings of the European Control Conference (ECC).Google ScholarCross Ref
- Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015c. Efficient Bregman projections onto the simplex. In 54th IEEE Conference on Decision and Control (CDC). Osaka, 3291–3298.Google ScholarCross Ref
- J. R. Marden and J. S. Shamma. 2013. Game theory and distributed control. In Handbook of Game Theory, Vol. 4, H.P. Young and S. Zamir (Eds.). Elsevier Science.Google Scholar
- Jason R. Marden, Shalom D. Ruben, and Lucy Y. Pao. 2013. A model-free approach to wind farm control using game theoretic methods. IEEE Trans Control Syst. Technol. 21, 4 (2013), 1207--1214.Google ScholarCross Ref
- Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods 44, 1 (2011), 1–23.Google ScholarCross Ref
- A. S. Nemirovsky and D. B. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley.Google Scholar
- Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, 663--670. Google ScholarDigital Library
- Asuman Ozdaglar and R Srikant. 2007. Incentives and pricing in communication networks. Algorithmic Game Theory (2007), 571--591.Google Scholar
- Robert W. Rosenthal. 1973. A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 2, 1 (1973), 65--67.Google ScholarDigital Library
- Tim Roughgarden. 2007. Routing games. In Algorithmic Game Theory. Cambridge University Press, Chap. 18, 461--486.Google Scholar
- William H. Sandholm. 2001. Potential games with continuous player sets. J. Econ. Theory 97, 1 (2001), 81--108.Google ScholarCross Ref
- Herbert A. Simon. 1955. A behavioral model of rational choice. Q. J. Econ. 69, 1 (1955), pp. 99--118.Google ScholarCross Ref
- John Glen Wardrop. 1952. Some theoretical aspects of road traffic research. In ICE Proceedings: Engineering Divisions, Vol. 1. Thomas Telford, 325--362.Google ScholarCross Ref
Index Terms
- On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game
Recommendations
On learning how players learn: estimation of learning dynamics in the routing game
ICCPS '16: Proceedings of the 7th International Conference on Cyber-Physical SystemsThe routing game models congestion in transportation networks, communication networks, and other cyber physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every ...
A price-based reliable routing game in wireless networks
GameNets '06: Proceeding from the 2006 workshop on Game theory for communications and networksWe investigate a price-based reliable routing game in a wireless network of selfish users. Each node is characterized by a probability of reliably forwarding a packet, and each link is characterized by a cost of transmission. The objective is to form a ...
Understanding online collectible card game players' motivations: a survey study with two games
OzCHI '18: Proceedings of the 30th Australian Conference on Computer-Human InteractionOnline collectible card games (OCCGs) are digital, networked contemporaries of collectible card games (CCGs) which combine the collection of trading cards with strategic deck building and competitive gameplay. Despite their popularity and unique ...
Comments