skip to main content
research-article

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

Published:03 March 2022Publication History
Skip Abstract Section

Abstract

Dynamic pricing plays an important role in solving the problems such as traffic load reduction, congestion control, and revenue improvement. Efficient dynamic pricing strategies can increase capacity utilization, total revenue of service providers, and the satisfaction of both passengers and drivers. Many proposed dynamic pricing technologies focus on short-term optimization and face poor scalability in modeling long-term goals for the limitations of solution optimality and prohibitive computation. In this article, a deep reinforcement learning framework is proposed to tackle the dynamic pricing problem for ride-hailing platforms. A soft actor-critic (SAC) algorithm is adopted in the reinforcement learning framework. First, the dynamic pricing problem is translated into a Markov Decision Process (MDP) and is set up in continuous action spaces, which is no need for the discretization of action space. Then, a new reward function is obtained by the order response rate and the KL-divergence between supply distribution and demand distribution. Experiments and case studies demonstrate that the proposed method outperforms the baselines in terms of order response rate and total revenue.

REFERENCES

  1. [1] Didi Chuxing. Retrieved from http://www.didichuxing.com/en/.Google ScholarGoogle Scholar
  2. [2] New York City Taxi and Limousine Commission Dataset. Retrieved from https://www1.nyc.gov/site/.Google ScholarGoogle Scholar
  3. [3] New York City Trip Data. Retrieved from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.Google ScholarGoogle Scholar
  4. [4] Uber. Retrieved from https://www.uber.com.Google ScholarGoogle Scholar
  5. [5] Asghari Mohammad and Shahabi Cyrus. 2018. ADAPT-pricing: A dynamic and predictive technique for pricing to maximize revenue in ridesharing platforms. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 189198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Banerjee Siddhartha, Johari Ramesh, and Riquelme Carlos. 2015. Pricing in ride-sharing platforms: A queueing-theoretic approach. In Proceedings of the 16th ACM Conference on Economics and Computation. ACM, 639639.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Battifarano Matthew and Qian Zhen Sean. 2019. Predicting real-time surge pricing of ride-sourcing companies. Transport. Res. Part C: Emerg. Technol. 107 (2019), 444462.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bimpikis Kostas, Candogan Ozan, and Saban Daniela. 2019. Spatial pricing in ride-sharing networks. Oper. Res. 67, 3 (2019), 744–769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Castillo Juan Camilo, Knoepfle Dan, and Weyl Glen. 2017. Surge pricing solves the wild goose chase. In Proceedings of the ACM Conference on Economics and Computation. 241242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Chen M. Keith and Sheldon Michael. 2016. Dynamic pricing in a labor market: Surge pricing and flexible work on the Uber platform. In Proceedings of the 2016 ACM Conference on Economics and Computation. 455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chen Yiwei and Hu Ming. 2018. Pricing and matching with forward-looking buyers and sellers. Rotman School Manag. Work. Pap.2859864 (2018).Google ScholarGoogle Scholar
  12. [12] Guda Harish and Subramanian Upender. 2019. Your Uber is arriving: Managing on-demand workers through surge pricing, forecast communication, and worker incentives. Manag. Sci. 65, 5 (2019),1995–2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Haarnoja Tuomas, Zhou Aurick, Abbeel Pieter, and Levine Sergey. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018).Google ScholarGoogle Scholar
  14. [14] Kim Byung-Gook, Zhang Yu, Schaar Mihaela Van Der, and Lee Jang-Won. 2015. Dynamic pricing and energy consumption scheduling with reinforcement learning. IEEE Trans. Smart Grid 7, 5 (2015), 21872198.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Lillicrap Timothy P., Hunt Jonathan J., Pritzel Alexander, Heess Nicolas, Erez Tom, Tassa Yuval, Silver David, and Wierstra Daan. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google ScholarGoogle Scholar
  16. [16] Liu Jiaxi, Zhang Yidong, Wang Xiaoqing, Deng Yuming, and Wu Xingyu. 2019. Dynamic pricing on e-commerce platform with deep reinforcement learning. arxiv:1912.02572 [cs.LG]Google ScholarGoogle Scholar
  17. [17] Lu Renzhi, Hong Seung Ho, and Zhang Xiongfeng. 2018. A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach. Appl. Energ. 220 (2018), 220230.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Maestre Roberto, Duque Juan, Rubio Alberto, and Arévalo Juan. 2018. Reinforcement learning for fair dynamic pricing. In Proceedings of SAI Intelligent Systems Conference. Springer, 120135.Google ScholarGoogle Scholar
  19. [19] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Graves Alex, Antonoglou Ioannis, Wierstra Daan, and Riedmiller Martin. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google ScholarGoogle Scholar
  20. [20] Peters Markus, Ketter Wolfgang, Saar-Tsechansky Maytal, and Collins John. 2013. A reinforcement learning approach to autonomous decision-making in smart electricity markets. Mach. Learn. 92, 1 (2013), 539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Qian Xinwu and Ukkusuri Satish. 2017. Time-of-day pricing in taxi markets. IEEE Trans. Intell. Transport. Syst. (01 2017). DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Rana Rupal and Oliveira Fernando S.. 2014. Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning. Omega 47 (2014), 116126.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Rempel J.. 2014. A review of Uber, the growing alternative to traditional taxi service. AFB AccessWorld® Mag. 51, 6 (2014).Google ScholarGoogle Scholar
  24. [24] Rummery Gavin Adrian. 1995. Problem Solving with Reinforcement Learning. Ph.D. Dissertation. University of Cambridge.Google ScholarGoogle Scholar
  25. [25] Schmidhuber Jürgen. 2015. Deep learning in neural networks: An overview. Neural Netw. 61 (2015), 85117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Schulman John, Levine Sergey, Abbeel Pieter, Jordan Michael, and Moritz Philipp. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. 18891897.Google ScholarGoogle Scholar
  27. [27] Schulman John, Wolski Filip, Dhariwal Prafulla, Radford Alec, and Klimov Oleg. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google ScholarGoogle Scholar
  28. [28] Shen Weiran, Peng Binghui, Liu Hanpeng, Zhang Michael, Qian Ruohan, Hong Yan, Guo Zhi, Ding Zongyao, Lu Pengjun, and Tang Pingzhong. 2020. Reinforcement mechanism design: With applications to dynamic pricing in sponsored search auctions. In Proceedings of the AAAI Conference on Artificial Intelligence. 22362243.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Sutton Richard S., Barto Andrew G. et al. 1998. Introduction to Reinforcement Learning. Vol. 135. The MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Turan Berkay, Pedarsani Ramtin, and Alizadeh Mahnoosh. 2019. Dynamic pricing and management for electric autonomous mobility on demand systems using reinforcement learning. arXiv preprint arXiv:1909.06962 (2019).Google ScholarGoogle Scholar
  31. [31] Wang Shuoyao, Bi Suzhi, and Zhang Ying Jun Angela. 2019. Reinforcement learning for real-time pricing and scheduling control in EV charging stations. IEEE Trans. Industr. Inform. 30, 4 (2019), 2149–2159.Google ScholarGoogle Scholar
  32. [32] Watkins Christopher John Cornish Hellaby. 1989. Learning from delayed rewards. PhD Thesis. University of Cambridge, England.Google ScholarGoogle Scholar
  33. [33] Yan Chiwei, Zhu Helin, Korolko Nikita, and Woodard Dawn. 2019. Dynamic pricing and matching in ride-hailing platforms. Nav. Res. Logist. 67, 8 (2019), 705–724.Google ScholarGoogle Scholar
  34. [34] Zhou Ming, Jin Jiarui, Zhang Weinan, Qin Zhiwei, Jiao Yan, Wang Chenxi, Wu Guobin, Yu Yong, and Ye Jieping. 2019. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 26452653.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 13, Issue 3
        June 2022
        415 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/3508465
        • Editor:
        • Huan Liu
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 March 2022
        • Accepted: 1 July 2021
        • Revised: 1 April 2021
        • Received: 1 January 2021
        Published in tist Volume 13, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format