ABSTRACT
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient (RSD4)1, which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. RSD4 guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that RSD4 is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.
- Dimitri Bertsekas, Angelia Nedic, and Asuman Ozdaglar. 2003. Convex analysis and optimization. Vol. 1. Athena Scientific.Google Scholar
- Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. 2019. Qflow: A reinforcement learning approach to high qoe video streaming over wireless networks. In Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing. 251--260.Google ScholarDigital Library
- Kun Chen and Longbo Huang. 2018. Timely-throughput optimal scheduling with prediction. IEEE/ACM Transactions on Networking 26, 6 (2018), 2457--2470.Google ScholarDigital Library
- Nhu-Ngoc Dao, Anh-Tien Tran, Ngo Hoang Tu, Tran Thien Thanh, Vo Nguyen Quoc Bao, and Sungrae Cho. 2022. A Contemporary Survey on Live Video Streaming from a Computation-Driven Perspective. ACM Computing Surveys (CSUR) (2022).Google Scholar
- Khaled MF Elsayed and Ahmed KF Khattab. 2006. Channel-aware earliest deadline due fair scheduling for wireless multimedia networks. Wireless Personal Communications 38, 2 (2006), 233--252.Google ScholarDigital Library
- Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. ICML (2018).Google Scholar
- Chaofan He, Yang Hu, Yan Chen, and Bing Zeng. 2019. Joint power allocation and channel assignment for NOMA with deep reinforcement learning. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2200--2210.Google ScholarCross Ref
- Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, and David Silver. 2015. Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455 (2015).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- I-Hong Hou and PR Kumar. 2010. Utility-optimal scheduling in time-varying wireless networks with delay constraints. In ACM MobiHoc. 31--40.Google Scholar
- Longbo Huang, Shaoquan Zhang, Minghua Chen, and Xin Liu. 2015. When backpressure meets predictive scheduling. IEEE/ACM Transactions on Networking 24, 4 (2015), 2237--2250.Google ScholarDigital Library
- Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.Google Scholar
- Cheng Li, Pihe Hu, Yao Yao, Bin Xia, and Zhiyong Chen. 2019. Optimal multiuser scheduling for the unbalanced full-duplex buffer-aided relay systems. IEEE Transactions on Wireless Communications 18, 6 (2019), 3208--3221.Google ScholarCross Ref
- Chih-ping Li and Michael J Neely. 2013. Network utility maximization over partially observable Markovian channels. Performance Evaluation 70, 7--8 (2013), 528--548.Google Scholar
- Naebolo_Khanh Loi. 2018. Predict traffic of LTE network | Kaggle. https://www.kaggle.com/naebolo/predict-traffic-of-lte-network. Accessed: 2021-07.Google Scholar
- Lingling Lv, Chan Zheng, Lei Zhang, Chun Shan, Zhihong Tian, Xiaojiang Du, and Mohsen Guizani. 2021. Contract and Lyapunov Optimization-Based Load Scheduling and Energy Management for UAV Charging Stations. IEEE Transactions on Green Communications and Networking 5, 3 (2021), 1381--1394.Google ScholarCross Ref
- Siqi Ma. 2017. Fast or free shipping options in online and Omni-channel retail? The mediating role of uncertainty on satisfaction and purchase intentions. The International Journal of Logistics Management (2017).Google Scholar
- Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In ACM SIGCOMM. 197--210.Google Scholar
- Fan Meng, Peng Chen, Lenan Wu, and Julian Cheng. 2020. Power allocation in multi-user cellular networks: Deep reinforcement learning approaches. IEEE Transactions on Wireless Communications 19, 10 (2020), 6255--6267.Google ScholarCross Ref
- Sangwoo Moon, Sumyeong Ahn, Kyunghwan Son, Jinwoo Park, and Yung Yi. 2021. Neuro-DCF: Design of Wireless MAC via Multi-Agent Reinforcement Learning Approach. ACM MobiHoc (2021).Google ScholarDigital Library
- Yasar Sinan Nasir and Dongning Guo. 2019. Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2239--2250.Google ScholarCross Ref
- Ling Pan, Qingpeng Cai, and Longbo Huang. 2020. Softmax Deep Double Deterministic Policy Gradients. Advances in Neural Information Processing Systems 33 (2020).Google Scholar
- Su Pan and Yuqing Chen. 2018. Energy-optimal scheduling of mobile cloud computing based on a modified lyapunov optimization method. IEEE Transactions on Green Communications and Networking 3, 1 (2018), 227--235.Google ScholarCross Ref
- Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 1--8.Google ScholarDigital Library
- Long Qu, Chadi Assi, Khaled Shaban, and Maurice J Khabbaz. 2017. A reliability-aware network service chain provisioning with delay guarantees in NFV-enabled enterprise datacenter networks. IEEE Transactions on Network and Service Management 14, 3 (2017), 554--568.Google ScholarDigital Library
- Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. 2019. Simple near-optimal scheduling for the M/G/1. ACM SIGMETRICS Performance Evaluation Review 47, 2 (2019), 24--26.Google ScholarDigital Library
- Kaiming Shen and Wei Yu. 2018. Fractional programming for communication systems---Part I: Power control and beamforming. IEEE Transactions on Signal Processing 66, 10 (2018), 2616--2630.Google ScholarCross Ref
- Rahul Singh and PR Kumar. 2018. Throughput optimal decentralized scheduling of multihop networks with end-to-end deadline constraints: Unreliable links. IEEE Trans. Automat. Control 64, 1 (2018), 127--142.Google ScholarCross Ref
- JiTing Song, Wei Wang, and Rashdan Ibrahim. 2021. The Impact of Obstruction by Vehicle on in-Tunnel Wireless Propagation Channel. In 2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT). IEEE, 572--576.Google ScholarCross Ref
- Wang Taotao, Xin Jiantao, Xu Wensen, Cai Yucheng, and Zhang Shengli. 2021. Wireless signal strength on 2.4GHz (WSS24) dataset. https://github.com/postman511/Wireless-Signal-Strength-on-2.4GHz-WSS24-dataset. [Online; accessed 17-2-2022].Google Scholar
- Liang Xiao, Yanda Li, Canhuang Dai, Huaiyu Dai, and H Vincent Poor. 2017. Reinforcement learning-based NOMA power allocation in the presence of smart jamming. IEEE Transactions on Vehicular Technology 67, 4 (2017), 3377--3389.Google ScholarCross Ref
- Renchao Xie, Qinqin Tang, Chenghao Liang, Fei Richard Yu, and Tao Huang. 2020. Dynamic computation offloading in IoT fog systems with imperfect channel-state information: a POMDP approach. IEEE Internet of Things Journal 8, 1 (2020), 345--356.Google ScholarCross Ref
- Chunmei Xu, Shengheng Liu, Cheng Zhang, Yongming Huang, and Luxi Yang. 2020. Joint user scheduling and beam selection in mmWave networks based on multi-agent reinforcement learning. In 2020 IEEE 11th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 1--5.Google ScholarCross Ref
- Han Zhang, Wenzhong Li, Shaohua Gao, Xiaoliang Wang, and Baoliu Ye. 2019. Reles: A neural adaptive multipath scheduler based on deep reinforcement learning. In IEEE INFOCOM. IEEE, 1648--1656.Google Scholar
- Fanqin Zhou, Lei Feng, Michel Kadoch, Peng Yu, Wenjing Li, and Zhili Wang. 2021. Multi-agent RL aided task offloading and resource management in Wi-Fi 6 and 5G coexisting industrial wireless environment. IEEE Transactions on Industrial Informatics (2021).Google Scholar
Index Terms
- Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning
Recommendations
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information ProcessingAbstractAs the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender SystemsDeep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling
Neural Information ProcessingAbstractThe resource scheduling problem in the cloud environment has always been a difficult and hot research field of cloud computing. The difficult problem of online decision-making tasks for resource management in a complex cloud environment can be ...
Comments