Abstract
As an emerging mobility-on-demand service, bike-sharing system (BSS) has spread all over the world by providing a flexible, cost-efficient, and environment-friendly transportation mode for citizens. Demand-supply unbalance is one of the main challenges in BSS because of the inefficiency of the existing bike repositioning strategy, which reallocates bikes according to a pre-defined periodic schedule without considering the highly dynamic user demands. While reinforcement learning has been used in some repositioning problems for mitigating demand-supply unbalance, there are significant barriers when extending it to BSS due to the dimension curse of action space resulting from the dynamic number of workers and bikes in the city. In this paper, we study these barriers and address them by proposing a novel bike repositioning system, namely BikeBrain, which consists of a demand prediction model and a spatio-temporal bike repositioning algorithm. Specifically, to obtain accurate and real-time usage demand for efficient bike repositioning, we first present a prediction model ST-NetPre, which directly predicts user demand considering the highly dynamic spatio-temporal characteristics. Furthermore, we propose a spatio-temporal cooperative multi-agent reinforcement learning method (ST-CBR) for learning the worker-based bike repositioning strategy in which each worker in BSS is considered an agent. Especially, ST-CBR adopts the centralized learning and decentralized execution way to achieve effective cooperation among large-scale dynamic agents based on Mean Field Reinforcement Learning (MFRL), while avoiding the huge dimension of action space. For dynamic action space, ST-CBR utilizes a SoftMax selector to select the specific action. Meanwhile, for the benefits and costs of agents’ operation, an efficient reward function is designed to seek an optimal control policy considering both immediate and future rewards. Extensive experiments are conducted based on large-scale real-world datasets, and the results have shown significant improvements of our proposed method over several state-of-the-art baselines on the demand-supply gap and operation cost measures.
- Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, and Pieter Abbeel. 2017. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641(2017).Google Scholar
- Szilárd Aradi. 2020. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23, 2(2020), 740–759.Google ScholarDigital Library
- Di Chai, Leye Wang, and Qiang Yang. 2018. Bike flow prediction with multi-graph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL international conference on advances in geographic information systems. 397–400.Google ScholarDigital Library
- Jianguo Chen, Kenli Li, Keqin Li, Philip S Yu, and Zeng Zeng. 2021. Dynamic bicycle dispatching of dockless public bicycle-sharing systems using multi-objective reinforcement learning. ACM Transactions on Cyber-Physical Systems (TCPS) 5, 4 (2021), 1–24.Google ScholarDigital Library
- Longbiao Chen, Zhihan Jiang, Jiangtao Wang, and Yasha Wang. 2019. Data-Driven Bike Sharing System Optimization: State of the Art and Future Opportunities.. In EWSN. 347–350.Google Scholar
- Longbiao Chen, Daqing Zhang, Gang Pan, Xiaojuan Ma, Dingqi Yang, Kostadin Kushlev, Wangsheng Zhang, and Shijian Li. 2015. Bike sharing station placement leveraging heterogeneous urban open data. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 571–575.Google ScholarDigital Library
- Longbiao Chen, Daqing Zhang, Leye Wang, Dingqi Yang, Xiaojuan Ma, Shijian Li, Zhaohui Wu, Gang Pan, Thi-Mai-Trang Nguyen, and Jérémie Jakubowicz. 2016. Dynamic cluster-based over-demand prediction in bike sharing systems. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 841–852.Google ScholarDigital Library
- Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarDigital Library
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.Google ScholarDigital Library
- Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
- Christine Fricker and Nicolas Gast. 2016. Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity. Euro journal on transportation and logistics 5, 3 (2016), 261–291.Google Scholar
- Supriyo Ghosh, Michael Trick, and Pradeep Varakantham. 2016. Robust repositioning to counter unpredictable demand in bike sharing systems. (2016).Google Scholar
- Supriyo Ghosh, Pradeep Varakantham, Yossiri Adulyasak, and Patrick Jaillet. 2017. Dynamic repositioning to reduce lost demand in bike sharing systems. Journal of Artificial Intelligence Research 58 (2017), 387–430.Google ScholarDigital Library
- Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. 2017. Cooperative multi-agent control using deep reinforcement learning. In International conference on autonomous agents and multiagent systems. Springer, 66–83.Google ScholarCross Ref
- David Isele, Reza Rahimi, Akansel Cosgun, Kaushik Subramanian, and Kikuo Fujimura. 2018. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2034–2039.Google ScholarDigital Library
- Guofu Li, Ning Cao, Pengjia Zhu, Yanwu Zhang, Yingying Zhang, Lei Li, Qingyuan Li, and Yu Zhang. 2021. Towards smart transportation system: A case study on the rebalancing problem of bike sharing system based on reinforcement learning. Journal of Organizational and End User Computing (JOEUC) 33, 3(2021), 35–49.Google ScholarCross Ref
- Xinghua Li, Xinyuan Zhang, Cheng Cheng, Wei Wang, and Chao Yang. 2022. Dynamic Repositioning in Dock-less Bike-sharing System: A Multi-agent Reinforcement Learning Approach. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 170–175.Google Scholar
- Yexin Li, Yu Zheng, and Qiang Yang. 2018. Dynamic bike reposition: A spatio-temporal reinforcement learning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1724–1733.Google ScholarDigital Library
- Yexin Li, Yu Zheng, and Qiang Yang. 2019. Efficient and effective express via contextual cooperative reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 510–519.Google ScholarDigital Library
- Aristidis Likas, Nikos Vlassis, and Jakob J Verbeek. 2003. The global k-means clustering algorithm. Pattern recognition 36, 2 (2003), 451–461.Google Scholar
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).Google Scholar
- Junming Liu, Leilei Sun, Weiwei Chen, and Hui Xiong. 2016. Rebalancing bike sharing systems: A multi-source data smart optimization. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1005–1014.Google ScholarDigital Library
- Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, and Fuad E Alsaadi. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234(2017), 11–26.Google ScholarCross Ref
- Meghna Lowalekar, Pradeep Varakantham, Supriyo Ghosh, Sanjay Dominik Jena, and Patrick Jaillet. 2017. Online repositioning in bike sharing systems. In Twenty-seventh international conference on automated planning and scheduling.Google ScholarCross Ref
- Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).Google Scholar
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013).Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.Google Scholar
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml.Google Scholar
- Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, and Longbo Huang. 2019. A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 1393–1400.Google ScholarDigital Library
- Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, and Jun Wang. 2017. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069(2017).Google Scholar
- Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 4295–4304.Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).Google Scholar
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.Google Scholar
- Adish Singla, Marco Santoni, Gábor Bartók, Pratik Mukerji, Moritz Meenen, and Andreas Krause. 2015. Incentivizing users for balancing bike sharing systems. In Twenty-Ninth AAAI conference on artificial intelligence.Google ScholarDigital Library
- Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. 2020. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 914–921.Google ScholarCross Ref
- Shangyu Sun, Huayi Wu, and Longgang Xiang. 2020. City-wide traffic flow forecasting using a deep convolutional neural network. Sensors 20, 2 (2020), 421.Google ScholarCross Ref
- Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296(2017).Google Scholar
- Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. 2017. Multiagent cooperation and competition with deep reinforcement learning. PloS one 12, 4 (2017), e0172395.Google ScholarCross Ref
- Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning. 330–337.Google ScholarDigital Library
- Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.Google ScholarCross Ref
- Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.Google Scholar
- Guang Wang, Zhou Qin, Shuai Wang, Huijun Sun, Zheng Dong, and Desheng Zhang. 2021. Record: Joint Real-Time Repositioning and Charging for Electric Carsharing with Dynamic Deadlines. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3660–3669.Google ScholarDigital Library
- Jiadai Wang, Lei Zhao, Jiajia Liu, and Nei Kato. 2019. Smart resource allocation for mobile edge computing: A deep reinforcement learning approach. IEEE Transactions on emerging topics in computing 9, 3 (2019), 1529–1541.Google ScholarCross Ref
- Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.Google Scholar
- Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean field multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 5571–5580.Google Scholar
- Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. 2018. Crafting a toolchain for image restoration by deep reinforcement learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2443–2452.Google ScholarCross Ref
- Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference. 167–176.Google ScholarDigital Library
- Xiaokang Zhou, Wei Liang, I Kevin, Kai Wang, Hao Wang, Laurence T Yang, and Qun Jin. 2020. Deep-learning-enhanced human activity recognition for Internet of healthcare things. IEEE Internet of Things Journal 7, 7 (2020), 6429–6438.Google ScholarCross Ref
- Xiaokang Zhou, Wei Liang, I Kevin, Kai Wang, and Laurence T Yang. 2020. Deep correlation mining based on hierarchical hybrid networks for heterogeneous big data recommendations. IEEE Transactions on Computational Social Systems 8, 1 (2020), 171–178.Google ScholarCross Ref
- Xiaokang Zhou, Wei Liang, Ke Yan, Weimin Li, I Kevin, Kai Wang, Jianhua Ma, and Qun Jin. 2022. Edge-enabled two-stage scheduling based on deep reinforcement learning for internet of everything. IEEE Internet of Things Journal 10, 4 (2022), 3295–3304.Google ScholarCross Ref
Index Terms
- Efficient Bike-sharing Repositioning with Cooperative Multi-Agent Deep Reinforcement Learning
Recommendations
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United KingdomThe development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems
Agents and Artificial IntelligenceAbstractUsing multi-agent reinforcement learning to find solutions to complex decision-making problems in shared environments has become standard practice in many scenarios. However, this is not the case in safety-critical scenarios, where the ...
Feudal Multi-Agent Deep Reinforcement Learning for Traffic Signal Control
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent SystemsReinforcement learning (RL) is a promising technique for optimizing traffic signal controllers that dynamically respond to real-time traffic conditions. Recent efforts that applied Multi-Agent RL (MARL) to this problem have shown remarkable improvement ...
Comments