research-article

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

Authors:
Jianbin Huang

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
Longji Huang

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
Meijuan Liu

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
He Li

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
Qinglin Tan

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
Xiaoke Ma

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
Jiangtao Cui

Xidian University, XiAn, Shanxi, China

Xidian University, XiAn, Shanxi, China
View Profile

,
De-Shuang Huang

Guangxi Academy of Science, China and Tongji University, Shanghai, China

Guangxi Academy of Science, China and Tongji University, Shanghai, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 13 Issue 3Article No.: 41pp 1–19https://doi.org/10.1145/3474841

Published:03 March 2022Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Dynamic pricing plays an important role in solving the problems such as traffic load reduction, congestion control, and revenue improvement. Efficient dynamic pricing strategies can increase capacity utilization, total revenue of service providers, and the satisfaction of both passengers and drivers. Many proposed dynamic pricing technologies focus on short-term optimization and face poor scalability in modeling long-term goals for the limitations of solution optimality and prohibitive computation. In this article, a deep reinforcement learning framework is proposed to tackle the dynamic pricing problem for ride-hailing platforms. A soft actor-critic (SAC) algorithm is adopted in the reinforcement learning framework. First, the dynamic pricing problem is translated into a Markov Decision Process (MDP) and is set up in continuous action spaces, which is no need for the discretization of action space. Then, a new reward function is obtained by the order response rate and the KL-divergence between supply distribution and demand distribution. Experiments and case studies demonstrate that the proposed method outperforms the baselines in terms of order response rate and total revenue.

REFERENCES

[1] Didi Chuxing. Retrieved from http://www.didichuxing.com/en/.Google Scholar
[2] New York City Taxi and Limousine Commission Dataset. Retrieved from https://www1.nyc.gov/site/.Google Scholar
[3] New York City Trip Data. Retrieved from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page.Google Scholar
[4] Uber. Retrieved from https://www.uber.com.Google Scholar
[5] Asghari Mohammad and Shahabi Cyrus. 2018. ADAPT-pricing: A dynamic and predictive technique for pricing to maximize revenue in ridesharing platforms. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 189–198.Google ScholarDigital Library
[6] Banerjee Siddhartha, Johari Ramesh, and Riquelme Carlos. 2015. Pricing in ride-sharing platforms: A queueing-theoretic approach. In Proceedings of the 16th ACM Conference on Economics and Computation. ACM, 639–639.Google ScholarDigital Library
[7] Battifarano Matthew and Qian Zhen Sean. 2019. Predicting real-time surge pricing of ride-sourcing companies. Transport. Res. Part C: Emerg. Technol. 107 (2019), 444–462.Google ScholarCross Ref
[8] Bimpikis Kostas, Candogan Ozan, and Saban Daniela. 2019. Spatial pricing in ride-sharing networks. Oper. Res. 67, 3 (2019), 744–769.Google ScholarDigital Library
[9] Castillo Juan Camilo, Knoepfle Dan, and Weyl Glen. 2017. Surge pricing solves the wild goose chase. In Proceedings of the ACM Conference on Economics and Computation. 241–242.Google ScholarDigital Library
[10] Chen M. Keith and Sheldon Michael. 2016. Dynamic pricing in a labor market: Surge pricing and flexible work on the Uber platform. In Proceedings of the 2016 ACM Conference on Economics and Computation. 455.Google ScholarDigital Library
[11] Chen Yiwei and Hu Ming. 2018. Pricing and matching with forward-looking buyers and sellers. Rotman School Manag. Work. Pap.2859864 (2018).Google Scholar
[12] Guda Harish and Subramanian Upender. 2019. Your Uber is arriving: Managing on-demand workers through surge pricing, forecast communication, and worker incentives. Manag. Sci. 65, 5 (2019),1995–2014.Google ScholarDigital Library
[13] Haarnoja Tuomas, Zhou Aurick, Abbeel Pieter, and Levine Sergey. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018).Google Scholar
[14] Kim Byung-Gook, Zhang Yu, Schaar Mihaela Van Der, and Lee Jang-Won. 2015. Dynamic pricing and energy consumption scheduling with reinforcement learning. IEEE Trans. Smart Grid 7, 5 (2015), 2187–2198.Google ScholarCross Ref
[15] Lillicrap Timothy P., Hunt Jonathan J., Pritzel Alexander, Heess Nicolas, Erez Tom, Tassa Yuval, Silver David, and Wierstra Daan. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
[16] Liu Jiaxi, Zhang Yidong, Wang Xiaoqing, Deng Yuming, and Wu Xingyu. 2019. Dynamic pricing on e-commerce platform with deep reinforcement learning. arxiv:1912.02572 [cs.LG]Google Scholar
[17] Lu Renzhi, Hong Seung Ho, and Zhang Xiongfeng. 2018. A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach. Appl. Energ. 220 (2018), 220–230.Google ScholarCross Ref
[18] Maestre Roberto, Duque Juan, Rubio Alberto, and Arévalo Juan. 2018. Reinforcement learning for fair dynamic pricing. In Proceedings of SAI Intelligent Systems Conference. Springer, 120–135.Google Scholar
[19] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Graves Alex, Antonoglou Ioannis, Wierstra Daan, and Riedmiller Martin. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
[20] Peters Markus, Ketter Wolfgang, Saar-Tsechansky Maytal, and Collins John. 2013. A reinforcement learning approach to autonomous decision-making in smart electricity markets. Mach. Learn. 92, 1 (2013), 5–39.Google ScholarDigital Library
[21] Qian Xinwu and Ukkusuri Satish. 2017. Time-of-day pricing in taxi markets. IEEE Trans. Intell. Transport. Syst. (01 2017). DOI:Google ScholarDigital Library
[22] Rana Rupal and Oliveira Fernando S.. 2014. Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning. Omega 47 (2014), 116–126.Google ScholarCross Ref
[23] Rempel J.. 2014. A review of Uber, the growing alternative to traditional taxi service. AFB AccessWorld® Mag. 51, 6 (2014).Google Scholar
[24] Rummery Gavin Adrian. 1995. Problem Solving with Reinforcement Learning. Ph.D. Dissertation. University of Cambridge.Google Scholar
[25] Schmidhuber Jürgen. 2015. Deep learning in neural networks: An overview. Neural Netw. 61 (2015), 85–117.Google ScholarDigital Library
[26] Schulman John, Levine Sergey, Abbeel Pieter, Jordan Michael, and Moritz Philipp. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. 1889–1897.Google Scholar
[27] Schulman John, Wolski Filip, Dhariwal Prafulla, Radford Alec, and Klimov Oleg. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
[28] Shen Weiran, Peng Binghui, Liu Hanpeng, Zhang Michael, Qian Ruohan, Hong Yan, Guo Zhi, Ding Zongyao, Lu Pengjun, and Tang Pingzhong. 2020. Reinforcement mechanism design: With applications to dynamic pricing in sponsored search auctions. In Proceedings of the AAAI Conference on Artificial Intelligence. 2236–2243.Google ScholarCross Ref
[29] Sutton Richard S., Barto Andrew G. et al. 1998. Introduction to Reinforcement Learning. Vol. 135. The MIT Press, Cambridge, MA.Google ScholarDigital Library
[30] Turan Berkay, Pedarsani Ramtin, and Alizadeh Mahnoosh. 2019. Dynamic pricing and management for electric autonomous mobility on demand systems using reinforcement learning. arXiv preprint arXiv:1909.06962 (2019).Google Scholar
[31] Wang Shuoyao, Bi Suzhi, and Zhang Ying Jun Angela. 2019. Reinforcement learning for real-time pricing and scheduling control in EV charging stations. IEEE Trans. Industr. Inform. 30, 4 (2019), 2149–2159.Google Scholar
[32] Watkins Christopher John Cornish Hellaby. 1989. Learning from delayed rewards. PhD Thesis. University of Cambridge, England.Google Scholar
[33] Yan Chiwei, Zhu Helin, Korolko Nikita, and Woodard Dawn. 2019. Dynamic pricing and matching in ride-hailing platforms. Nav. Res. Logist. 67, 8 (2019), 705–724.Google Scholar
[34] Zhou Ming, Jin Jiarui, Zhang Weinan, Qin Zhiwei, Jiao Yan, Wang Chenxi, Wu Guobin, Yu Yong, and Ye Jieping. 2019. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2645–2653.Google ScholarDigital Library

Index Terms

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A Deep Reinforcement Learning Based Dynamic Pricing Algorithm in Ride-Hailing
Database Systems for Advanced Applications
Abstract
Online ride-hailing has become one of the most important transportation ways in the modern city. In the ride-hailing system, the vehicle supply and riding demand is different in different regions, and thus the passengers’ willingness to take a ...
Read More
Pricing in Ride-Sharing Platforms: A Queueing-Theoretic Approach
EC '15: Proceedings of the Sixteenth ACM Conference on Economics and Computation

We study optimal pricing strategies for ride-sharing platforms, using a queueing-theoretic economic model. Analysis of pricing in such settings is complex: On one hand these platforms are two-sided - this requires economic models that capture the ...
Read More
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

As more companies are beginning to adopt the e-business model, it becomes easier for buyers to compare prices at multiple sellers and choose the one that charges the best price for the same item or service. As a result, the demand for the goods of a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 13, Issue 3
June 2022
415 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3508465
Editor:
Huan Liu
Arizona State University, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 March 2022
- Accepted: 1 July 2021
- Revised: 1 April 2021
- Received: 1 January 2021
Published in tist Volume 13, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Trajectory dynamic pricing
traffic management
Reinforcement learning
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 949
  Total Downloads
- Downloads (Last 12 months)348
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

ACM Transactions on Intelligent Systems and Technology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Deep Reinforcement Learning Based Dynamic Pricing Algorithm in Ride-Hailing

Pricing in Ride-Sharing Platforms: A Queueing-Theoretic Approach

A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

ACM Transactions on Intelligent Systems and Technology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Deep Reinforcement Learning Based Dynamic Pricing Algorithm in Ride-Hailing

Pricing in Ride-Sharing Platforms: A Queueing-Theoretic Approach

A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media