skip to main content
10.1145/3492866.3549712acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article
Open Access

Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning

Published:03 October 2022Publication History

ABSTRACT

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient (RSD4)1, which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. RSD4 guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that RSD4 is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.

References

  1. Dimitri Bertsekas, Angelia Nedic, and Asuman Ozdaglar. 2003. Convex analysis and optimization. Vol. 1. Athena Scientific.Google ScholarGoogle Scholar
  2. Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. 2019. Qflow: A reinforcement learning approach to high qoe video streaming over wireless networks. In Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing. 251--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kun Chen and Longbo Huang. 2018. Timely-throughput optimal scheduling with prediction. IEEE/ACM Transactions on Networking 26, 6 (2018), 2457--2470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nhu-Ngoc Dao, Anh-Tien Tran, Ngo Hoang Tu, Tran Thien Thanh, Vo Nguyen Quoc Bao, and Sungrae Cho. 2022. A Contemporary Survey on Live Video Streaming from a Computation-Driven Perspective. ACM Computing Surveys (CSUR) (2022).Google ScholarGoogle Scholar
  5. Khaled MF Elsayed and Ahmed KF Khattab. 2006. Channel-aware earliest deadline due fair scheduling for wireless multimedia networks. Wireless Personal Communications 38, 2 (2006), 233--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. ICML (2018).Google ScholarGoogle Scholar
  7. Chaofan He, Yang Hu, Yan Chen, and Bing Zeng. 2019. Joint power allocation and channel assignment for NOMA with deep reinforcement learning. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2200--2210.Google ScholarGoogle ScholarCross RefCross Ref
  8. Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, and David Silver. 2015. Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455 (2015).Google ScholarGoogle Scholar
  9. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I-Hong Hou and PR Kumar. 2010. Utility-optimal scheduling in time-varying wireless networks with delay constraints. In ACM MobiHoc. 31--40.Google ScholarGoogle Scholar
  11. Longbo Huang, Shaoquan Zhang, Minghua Chen, and Xin Liu. 2015. When backpressure meets predictive scheduling. IEEE/ACM Transactions on Networking 24, 4 (2015), 2237--2250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.Google ScholarGoogle Scholar
  13. Cheng Li, Pihe Hu, Yao Yao, Bin Xia, and Zhiyong Chen. 2019. Optimal multiuser scheduling for the unbalanced full-duplex buffer-aided relay systems. IEEE Transactions on Wireless Communications 18, 6 (2019), 3208--3221.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chih-ping Li and Michael J Neely. 2013. Network utility maximization over partially observable Markovian channels. Performance Evaluation 70, 7--8 (2013), 528--548.Google ScholarGoogle Scholar
  15. Naebolo_Khanh Loi. 2018. Predict traffic of LTE network | Kaggle. https://www.kaggle.com/naebolo/predict-traffic-of-lte-network. Accessed: 2021-07.Google ScholarGoogle Scholar
  16. Lingling Lv, Chan Zheng, Lei Zhang, Chun Shan, Zhihong Tian, Xiaojiang Du, and Mohsen Guizani. 2021. Contract and Lyapunov Optimization-Based Load Scheduling and Energy Management for UAV Charging Stations. IEEE Transactions on Green Communications and Networking 5, 3 (2021), 1381--1394.Google ScholarGoogle ScholarCross RefCross Ref
  17. Siqi Ma. 2017. Fast or free shipping options in online and Omni-channel retail? The mediating role of uncertainty on satisfaction and purchase intentions. The International Journal of Logistics Management (2017).Google ScholarGoogle Scholar
  18. Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In ACM SIGCOMM. 197--210.Google ScholarGoogle Scholar
  19. Fan Meng, Peng Chen, Lenan Wu, and Julian Cheng. 2020. Power allocation in multi-user cellular networks: Deep reinforcement learning approaches. IEEE Transactions on Wireless Communications 19, 10 (2020), 6255--6267.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sangwoo Moon, Sumyeong Ahn, Kyunghwan Son, Jinwoo Park, and Yung Yi. 2021. Neuro-DCF: Design of Wireless MAC via Multi-Agent Reinforcement Learning Approach. ACM MobiHoc (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yasar Sinan Nasir and Dongning Guo. 2019. Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2239--2250.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ling Pan, Qingpeng Cai, and Longbo Huang. 2020. Softmax Deep Double Deterministic Policy Gradients. Advances in Neural Information Processing Systems 33 (2020).Google ScholarGoogle Scholar
  23. Su Pan and Yuqing Chen. 2018. Energy-optimal scheduling of mobile cloud computing based on a modified lyapunov optimization method. IEEE Transactions on Green Communications and Networking 3, 1 (2018), 227--235.Google ScholarGoogle ScholarCross RefCross Ref
  24. Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Long Qu, Chadi Assi, Khaled Shaban, and Maurice J Khabbaz. 2017. A reliability-aware network service chain provisioning with delay guarantees in NFV-enabled enterprise datacenter networks. IEEE Transactions on Network and Service Management 14, 3 (2017), 554--568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. 2019. Simple near-optimal scheduling for the M/G/1. ACM SIGMETRICS Performance Evaluation Review 47, 2 (2019), 24--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kaiming Shen and Wei Yu. 2018. Fractional programming for communication systems---Part I: Power control and beamforming. IEEE Transactions on Signal Processing 66, 10 (2018), 2616--2630.Google ScholarGoogle ScholarCross RefCross Ref
  28. Rahul Singh and PR Kumar. 2018. Throughput optimal decentralized scheduling of multihop networks with end-to-end deadline constraints: Unreliable links. IEEE Trans. Automat. Control 64, 1 (2018), 127--142.Google ScholarGoogle ScholarCross RefCross Ref
  29. JiTing Song, Wei Wang, and Rashdan Ibrahim. 2021. The Impact of Obstruction by Vehicle on in-Tunnel Wireless Propagation Channel. In 2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT). IEEE, 572--576.Google ScholarGoogle ScholarCross RefCross Ref
  30. Wang Taotao, Xin Jiantao, Xu Wensen, Cai Yucheng, and Zhang Shengli. 2021. Wireless signal strength on 2.4GHz (WSS24) dataset. https://github.com/postman511/Wireless-Signal-Strength-on-2.4GHz-WSS24-dataset. [Online; accessed 17-2-2022].Google ScholarGoogle Scholar
  31. Liang Xiao, Yanda Li, Canhuang Dai, Huaiyu Dai, and H Vincent Poor. 2017. Reinforcement learning-based NOMA power allocation in the presence of smart jamming. IEEE Transactions on Vehicular Technology 67, 4 (2017), 3377--3389.Google ScholarGoogle ScholarCross RefCross Ref
  32. Renchao Xie, Qinqin Tang, Chenghao Liang, Fei Richard Yu, and Tao Huang. 2020. Dynamic computation offloading in IoT fog systems with imperfect channel-state information: a POMDP approach. IEEE Internet of Things Journal 8, 1 (2020), 345--356.Google ScholarGoogle ScholarCross RefCross Ref
  33. Chunmei Xu, Shengheng Liu, Cheng Zhang, Yongming Huang, and Luxi Yang. 2020. Joint user scheduling and beam selection in mmWave networks based on multi-agent reinforcement learning. In 2020 IEEE 11th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  34. Han Zhang, Wenzhong Li, Shaohua Gao, Xiaoliang Wang, and Baoliu Ye. 2019. Reles: A neural adaptive multipath scheduler based on deep reinforcement learning. In IEEE INFOCOM. IEEE, 1648--1656.Google ScholarGoogle Scholar
  35. Fanqin Zhou, Lei Feng, Michel Kadoch, Peng Yu, Wenjing Li, and Zhili Wang. 2021. Multi-agent RL aided task offloading and resource management in Wi-Fi 6 and 5G coexisting industrial wireless environment. IEEE Transactions on Industrial Informatics (2021).Google ScholarGoogle Scholar

Index Terms

  1. Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
        October 2022
        442 pages
        ISBN:9781450391658
        DOI:10.1145/3492866

        Copyright © 2022 Owner/Author

        This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 October 2022

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate296of1,843submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader