research-article

Open Access

Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning

Authors:
Pihe Hu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Ling Pan

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yu Chen

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Zhixuan Fang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Longbo Huang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingOctober 2022Pages 1–10https://doi.org/10.1145/3492866.3549712

Published:03 October 2022Publication History

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Pages 1–10

ABSTRACT

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient (RSD4)¹, which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. RSD4 guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that RSD4 is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.

References

Dimitri Bertsekas, Angelia Nedic, and Asuman Ozdaglar. 2003. Convex analysis and optimization. Vol. 1. Athena Scientific.Google Scholar
Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. 2019. Qflow: A reinforcement learning approach to high qoe video streaming over wireless networks. In Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing. 251--260.Google ScholarDigital Library
Kun Chen and Longbo Huang. 2018. Timely-throughput optimal scheduling with prediction. IEEE/ACM Transactions on Networking 26, 6 (2018), 2457--2470.Google ScholarDigital Library
Nhu-Ngoc Dao, Anh-Tien Tran, Ngo Hoang Tu, Tran Thien Thanh, Vo Nguyen Quoc Bao, and Sungrae Cho. 2022. A Contemporary Survey on Live Video Streaming from a Computation-Driven Perspective. ACM Computing Surveys (CSUR) (2022).Google Scholar
Khaled MF Elsayed and Ahmed KF Khattab. 2006. Channel-aware earliest deadline due fair scheduling for wireless multimedia networks. Wireless Personal Communications 38, 2 (2006), 233--252.Google ScholarDigital Library
Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. ICML (2018).Google Scholar
Chaofan He, Yang Hu, Yan Chen, and Bing Zeng. 2019. Joint power allocation and channel assignment for NOMA with deep reinforcement learning. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2200--2210.Google ScholarCross Ref
Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, and David Silver. 2015. Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455 (2015).Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
I-Hong Hou and PR Kumar. 2010. Utility-optimal scheduling in time-varying wireless networks with delay constraints. In ACM MobiHoc. 31--40.Google Scholar
Longbo Huang, Shaoquan Zhang, Minghua Chen, and Xin Liu. 2015. When backpressure meets predictive scheduling. IEEE/ACM Transactions on Networking 24, 4 (2015), 2237--2250.Google ScholarDigital Library
Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.Google Scholar
Cheng Li, Pihe Hu, Yao Yao, Bin Xia, and Zhiyong Chen. 2019. Optimal multiuser scheduling for the unbalanced full-duplex buffer-aided relay systems. IEEE Transactions on Wireless Communications 18, 6 (2019), 3208--3221.Google ScholarCross Ref
Chih-ping Li and Michael J Neely. 2013. Network utility maximization over partially observable Markovian channels. Performance Evaluation 70, 7--8 (2013), 528--548.Google Scholar
Naebolo_Khanh Loi. 2018. Predict traffic of LTE network | Kaggle. https://www.kaggle.com/naebolo/predict-traffic-of-lte-network. Accessed: 2021-07.Google Scholar
Lingling Lv, Chan Zheng, Lei Zhang, Chun Shan, Zhihong Tian, Xiaojiang Du, and Mohsen Guizani. 2021. Contract and Lyapunov Optimization-Based Load Scheduling and Energy Management for UAV Charging Stations. IEEE Transactions on Green Communications and Networking 5, 3 (2021), 1381--1394.Google ScholarCross Ref
Siqi Ma. 2017. Fast or free shipping options in online and Omni-channel retail? The mediating role of uncertainty on satisfaction and purchase intentions. The International Journal of Logistics Management (2017).Google Scholar
Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In ACM SIGCOMM. 197--210.Google Scholar
Fan Meng, Peng Chen, Lenan Wu, and Julian Cheng. 2020. Power allocation in multi-user cellular networks: Deep reinforcement learning approaches. IEEE Transactions on Wireless Communications 19, 10 (2020), 6255--6267.Google ScholarCross Ref
Sangwoo Moon, Sumyeong Ahn, Kyunghwan Son, Jinwoo Park, and Yung Yi. 2021. Neuro-DCF: Design of Wireless MAC via Multi-Agent Reinforcement Learning Approach. ACM MobiHoc (2021).Google ScholarDigital Library
Yasar Sinan Nasir and Dongning Guo. 2019. Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE Journal on Selected Areas in Communications 37, 10 (2019), 2239--2250.Google ScholarCross Ref
Ling Pan, Qingpeng Cai, and Longbo Huang. 2020. Softmax Deep Double Deterministic Policy Gradients. Advances in Neural Information Processing Systems 33 (2020).Google Scholar
Su Pan and Yuqing Chen. 2018. Energy-optimal scheduling of mobile cloud computing based on a modified lyapunov optimization method. IEEE Transactions on Green Communications and Networking 3, 1 (2018), 227--235.Google ScholarCross Ref
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 1--8.Google ScholarDigital Library
Long Qu, Chadi Assi, Khaled Shaban, and Maurice J Khabbaz. 2017. A reliability-aware network service chain provisioning with delay guarantees in NFV-enabled enterprise datacenter networks. IEEE Transactions on Network and Service Management 14, 3 (2017), 554--568.Google ScholarDigital Library
Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. 2019. Simple near-optimal scheduling for the M/G/1. ACM SIGMETRICS Performance Evaluation Review 47, 2 (2019), 24--26.Google ScholarDigital Library
Kaiming Shen and Wei Yu. 2018. Fractional programming for communication systems---Part I: Power control and beamforming. IEEE Transactions on Signal Processing 66, 10 (2018), 2616--2630.Google ScholarCross Ref
Rahul Singh and PR Kumar. 2018. Throughput optimal decentralized scheduling of multihop networks with end-to-end deadline constraints: Unreliable links. IEEE Trans. Automat. Control 64, 1 (2018), 127--142.Google ScholarCross Ref
JiTing Song, Wei Wang, and Rashdan Ibrahim. 2021. The Impact of Obstruction by Vehicle on in-Tunnel Wireless Propagation Channel. In 2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT). IEEE, 572--576.Google ScholarCross Ref
Wang Taotao, Xin Jiantao, Xu Wensen, Cai Yucheng, and Zhang Shengli. 2021. Wireless signal strength on 2.4GHz (WSS24) dataset. https://github.com/postman511/Wireless-Signal-Strength-on-2.4GHz-WSS24-dataset. [Online; accessed 17-2-2022].Google Scholar
Liang Xiao, Yanda Li, Canhuang Dai, Huaiyu Dai, and H Vincent Poor. 2017. Reinforcement learning-based NOMA power allocation in the presence of smart jamming. IEEE Transactions on Vehicular Technology 67, 4 (2017), 3377--3389.Google ScholarCross Ref
Renchao Xie, Qinqin Tang, Chenghao Liang, Fei Richard Yu, and Tao Huang. 2020. Dynamic computation offloading in IoT fog systems with imperfect channel-state information: a POMDP approach. IEEE Internet of Things Journal 8, 1 (2020), 345--356.Google ScholarCross Ref
Chunmei Xu, Shengheng Liu, Cheng Zhang, Yongming Huang, and Luxi Yang. 2020. Joint user scheduling and beam selection in mmWave networks based on multi-agent reinforcement learning. In 2020 IEEE 11th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 1--5.Google ScholarCross Ref
Han Zhang, Wenzhong Li, Shaohua Gao, Xiaoliang Wang, and Baoliu Ye. 2019. Reles: A neural adaptive multipath scheduler based on deep reinforcement learning. In IEEE INFOCOM. IEEE, 1648--1656.Google Scholar
Fanqin Zhou, Lei Feng, Michel Kadoch, Peng Yu, Wenjing Li, and Zhili Wang. 2021. Multi-agent RL aided task offloading and resource management in Wi-Fi 6 and 5G coexisting industrial wireless environment. IEEE Transactions on Industrial Informatics (2021).Google Scholar

Index Terms

Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning for deterministic actions
2. Networks
  1. Network algorithms
    1. Control path algorithms
      1. Network resources allocation

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Read More
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Read More
Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling
Neural Information Processing
Abstract
The resource scheduling problem in the cloud environment has always been a difficult and hot research field of cloud computing. The difficult problem of online decision-making tasks for resource management in a complex cloud environment can be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
October 2022
442 pages
ISBN:9781450391658
DOI:10.1145/3492866
General Chairs:
Song Chong
KAIST
,
Changhee Joo
Korea University
,
Kyunghan Lee
Seoul National University
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 October 2022
Check for updates
Author Tags
deep reinforcement learning
delay-constrained
partial observability
scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate296of1,843submissions,16%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 684
  Total Downloads
- Downloads (Last 12 months)291
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Conversational Recommender System Using Deep Reinforcement Learning

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective multi-user delay-constrained scheduling with deep recurrent reinforcement learning

MobiHoc '22: Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Conversational Recommender System Using Deep Reinforcement Learning

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media