skip to main content
research-article

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

Published:03 January 2018Publication History
Skip Abstract Section

Abstract

The routing game models congestion in transportation networks, communication networks, and other cyber-physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players.

We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the parameters of the learning model. We consider, in particular, entropic mirror descent dynamics and reduce the problem to estimating the learning rates of each player.

In order to demonstrate our methods, we developed a web application that allows players to participate in a distributed, online routing game, and we deployed the application on Amazon Mechanical Turk. When players log in, they are assigned an origin and destination on a shared network. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a dataset using this platform, then apply the proposed method to estimate the learning rates of each player. We observe, in particular, that after an exploration phase, the joint decision of the players remains within a small distance of the set of equilibria. We also use the estimated model parameters to predict the flow distribution over routes, and compare our predictions to the actual distributions, showing that the online learning model can be used as a predictive model over short horizons. Finally, we discuss some of the qualitative insights from the experiments, and give directions for future research.

References

  1. G. Arslan and J. S. Shamma. 2004. Distributed convergence to Nash equilibria with local utility measurements. In 43rd IEEE Conference on Decision and Control, Vol. 2. 1538--1543.Google ScholarGoogle Scholar
  2. Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. 2005. Clustering with Bregman divergences. J. Mach. Learn. Res. 6 (Dec. 2005), 1705--1749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. M. Bayen, J. Butler, A. D. Patire, CCIT, UC Berkeley ITS, and California Dpartment of Transportation, Division of Research and Innovation. 2011. Mobile Millennium Final Report.Google ScholarGoogle Scholar
  4. Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 3 (May 2003), 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Martin J. Beckmann, Charles B. McGuire, and Christopher B. Winsten. 1955. Studies in the economics of transportation. Cowles Commission for Research in Economics at Yale University.Google ScholarGoogle Scholar
  6. Aharon Ben-Tal, Tamar Margalit, and Arkadi Nemirovski. 2001. The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 1 (Jan. 2001), 79--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David Blackwell. 1956. An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6, 1 (1956), 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  8. Avrim Blum, Eyal Even-Dar, and Katrina Ligett. 2006. Routing without regret: On convergence to Nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stephen Boyd and Lieven Vandenberghe. 2010. Convex Optimization. Vol. 25. Cambridge University Press.Google ScholarGoogle Scholar
  10. Carlos Canudas De Wit, Fabio Morbidi, Luis Leon Ojeda, Alain Y. Kibangou, Iker Bellicot, and Pascal Bellemain. 2015. Grenoble traffic lab: An experimental platform for advanced traffic monitoring and forecasting. IEEE Control Syst. 35, 3 (June 2015), 23--39.Google ScholarGoogle Scholar
  11. Yair Censor and Stavros Zenios. 1997. Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Simon Fischer and Berthold Vöcking. 2004. On the evolution of selfish routing. In Algorithms--ESA 2004. Springer, 323--334.Google ScholarGoogle Scholar
  14. Michael J. Fox and Jeff S. Shamma. 2013. Population games, stable games, and passivity. Games 4, 4 (2013), 561--583.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yoav Freund and Robert E. Schapire. 1999. Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 1 (1999), 79--103.Google ScholarGoogle ScholarCross RefCross Ref
  16. Drew Fudenberg and David K. Levine. 1998. The Theory of Learning in Games. Vol. 2. MIT Press.Google ScholarGoogle Scholar
  17. James Hannan. 1957. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games 3 (1957), 97--139.Google ScholarGoogle Scholar
  18. Sergiu Hart. 2005. Adaptive heuristics. Econometrica 73, 5 (2005), 1401--1430.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sergiu Hart and Andreu Mas-Colell. 2001. A general class of adaptive strategies. J. Econ. Theory 98, 1 (2001), 26--54.Google ScholarGoogle ScholarCross RefCross Ref
  20. Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, and Wolfram Burgard. 2016. Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. JMLR, 102--110.Google ScholarGoogle Scholar
  21. Jyrki Kivinen and Manfred K. Warmuth. 1997. Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132, 1 (1997), 1--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Robert Kleinberg, Georgios Piliouras, and Eva Tardos. 2009. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Elias Koutsoupias and Christos Papadimitriou. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science. 404--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Walid Krichene, Benjamin Drighès, and Alexandre Bayen. 2015a. Online learning of Nash equilibria in congestion games. SIAM J. Control Opt. (SICON) 53, 2 (2015), 1056--1081.Google ScholarGoogle ScholarCross RefCross Ref
  25. Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015b. Convergence of mirror descent dynamics in the routing game. In Proceedings of the European Control Conference (ECC).Google ScholarGoogle ScholarCross RefCross Ref
  26. Walid Krichene, Syrine Krichene, and Alexandre Bayen. 2015c. Efficient Bregman projections onto the simplex. In 54th IEEE Conference on Decision and Control (CDC). Osaka, 3291–3298.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. R. Marden and J. S. Shamma. 2013. Game theory and distributed control. In Handbook of Game Theory, Vol. 4, H.P. Young and S. Zamir (Eds.). Elsevier Science.Google ScholarGoogle Scholar
  28. Jason R. Marden, Shalom D. Ruben, and Lucy Y. Pao. 2013. A model-free approach to wind farm control using game theoretic methods. IEEE Trans Control Syst. Technol. 21, 4 (2013), 1207--1214.Google ScholarGoogle ScholarCross RefCross Ref
  29. Winter Mason and Siddharth Suri. 2011. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods 44, 1 (2011), 1–23.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. S. Nemirovsky and D. B. Yudin. 1983. Problem Complexity and Method Efficiency in Optimization. Wiley.Google ScholarGoogle Scholar
  31. Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, 663--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Asuman Ozdaglar and R Srikant. 2007. Incentives and pricing in communication networks. Algorithmic Game Theory (2007), 571--591.Google ScholarGoogle Scholar
  33. Robert W. Rosenthal. 1973. A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 2, 1 (1973), 65--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tim Roughgarden. 2007. Routing games. In Algorithmic Game Theory. Cambridge University Press, Chap. 18, 461--486.Google ScholarGoogle Scholar
  35. William H. Sandholm. 2001. Potential games with continuous player sets. J. Econ. Theory 97, 1 (2001), 81--108.Google ScholarGoogle ScholarCross RefCross Ref
  36. Herbert A. Simon. 1955. A behavioral model of rational choice. Q. J. Econ. 69, 1 (1955), pp. 99--118.Google ScholarGoogle ScholarCross RefCross Ref
  37. John Glen Wardrop. 1952. Some theoretical aspects of road traffic research. In ICE Proceedings: Engineering Divisions, Vol. 1. Thomas Telford, 325--362.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Cyber-Physical Systems
            ACM Transactions on Cyber-Physical Systems  Volume 2, Issue 1
            Special Issue on ICCPS 2016
            January 2018
            140 pages
            ISSN:2378-962X
            EISSN:2378-9638
            DOI:10.1145/3174275
            • Editor:
            • Tei-Wei Kuo
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 January 2018
            • Accepted: 1 April 2017
            • Revised: 1 March 2017
            • Received: 1 July 2016
            Published in tcps Volume 2, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format