skip to main content
10.1145/3458306.3460995acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

PAAS: a preference-aware deep reinforcement learning approach for 360° video streaming

Authors Info & Claims
Published:02 July 2021Publication History

ABSTRACT

Conventional tile-based 360° video streaming methods, including deep reinforcement learning (DRL) based, ignore the interactive nature of 360° video streaming and download tiles following fixed sequential orders, thus failing to respond to the user's head motion changes. We show that these existing solutions suffer from either the prefetch accuracy or the playback stability drop. Furthermore, these methods are constrained to serve only one fixed streaming preference, causing extra training overhead and the lack of generalization on unseen preferences. In this paper, we propose a dual-queue streaming framework, with accuracy and stability purposes respectively, to enable the DRL agent to determine and change the tile download order without incurring overhead. We also design a preference-aware DRL algorithm to incentivize the agent to learn preference-dependent ABR decisions efficiently. Compared with state-of-the-art DRL baselines, our method not only significantly improves the streaming quality, e.g., increasing the average streaming quality by 13.6% on a public dataset, but also demonstrates better performance and generalization under dynamic preferences, e.g., an average quality improvement of 19.9% on unseen preferences.

References

  1. A. T. Nasrabadi, A. Mahzari, J. D. Beshay, and R. Prakash, "Adaptive 360-degree video streaming using scalable video coding," in Proceedings of the 2017 ACM on Multimedia Conference, ser. MM '17. New York, NY, USA: ACM, 2017, pp. 1689--1697. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Zhou, Z. Li, and Y. Liu, "A measurement study of oculus 360 degree video streaming," in Proceedings of the 8th ACM on Multimedia Systems Conference, ser. MMSys'17. New York, NY, USA: ACM, 2017, pp. 27--37. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Xie, X. Zhang, and Z. Guo, "Cls: A cross-user learning based system for improving qoe in 360-degree video adaptive streaming," in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 564--572.Google ScholarGoogle Scholar
  4. X. Corbillon, A. Devlic, G. Simon, and J. Chakareski, "Optimal set of 360-degree videos for viewport-adaptive streaming," in Proceedings of the 2017 ACM on Multimedia Conference, ser. MM '17. New York, NY, USA: ACM, 2017, pp. 943--951. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Petrangeli, V. Swaminathan, M. Hosseini, and F. De Turck, "An http/2-based adaptive streaming framework for 360 virtual reality videos," in Proceedings of the 2017 ACM on Multimedia Conference, ser. MM '17. New York, NY, USA: ACM, 2017, pp. 306--314. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Xiao, C. Zhou, Y. Liu, and S. Chen, "Optile: Toward optimal tiling in 360-degree video streaming," in Proceedings of the 2017 ACM on Multimedia Conference, ser. MM '17. New York, NY, USA: ACM, 2017, pp. 708--716. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Xie, Z. Xu, Y. Ban, X. Zhang, and Z. Guo, "360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming," in Proceedings of the 2017 ACM on Multimedia Conference, ser. MM '17. New York, NY, USA: ACM, 2017, pp. 315--323. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Zhou, M. Xiao, and Y. Liu, "Clustile: Toward minimizing bandwidth in 360-degree video streaming," in IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2018, pp. 962--970.Google ScholarGoogle Scholar
  9. Y. Zhang, P. Zhao, K. Bian, Y. Liu, L. Song, and X. Li, "Drl360: 360-degree video streaming with deep reinforcement learning," in IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019.Google ScholarGoogle Scholar
  10. J. Fu, X. Chen, Z. Zhang, S. Wu, and Z. Chen, "360srl: A sequential reinforcement learning approach for abr tile-based 360 video streaming," in 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019, pp. 290--295.Google ScholarGoogle Scholar
  11. N. Kan, J. Zou, K. Tang, C. Li, N. Liu, and H. Xiong, "Deep reinforcement learning-based rate adaptation for adaptive 360-degree video streaming," in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 4030--4034.Google ScholarGoogle Scholar
  12. M. Almquist, V. Almquist, V. Krishnamoorthi, N. Carlsson, and D. Eager, "The prefetch aggressiveness tradeoff in 360° video streaming," in Proceedings of the 9th ACM Multimedia Systems Conference, ser. MMSys '18. New York, NY, USA: ACM, 2018, pp. 258--269. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Natarajan and P. Tadepalli, "Dynamic preferences in multi-criteria reinforcement learning," in Proceedings of the 22nd international conference on Machine learning, 2005, pp. 601--608.Google ScholarGoogle Scholar
  14. K. Van Moffaert and A. Nowé, "Multi-objective reinforcement learning using sets of pareto dominating policies," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3483--3512, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Yang, X. Sun, and K. Narasimhan, "A generalized algorithm for multi-objective reinforcement learning and policy adaptation," in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. deBuc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 14636--14647.Google ScholarGoogle Scholar
  16. H. Mao, R. Netravali, and M. Alizadeh, "Neural adaptive video streaming with pensieve," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017, pp. 197--210.Google ScholarGoogle Scholar
  17. Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, "Dueling network architectures for deep reinforcement learning," in International conference on machine learning, 2016, pp. 1995--2003.Google ScholarGoogle Scholar
  18. A. Abels, D. Roijers, T. Lenaerts, A. Nowé, and D. Steckelmacher, "Dynamic weights in multi-objective deep reinforcement learning," in International Conference on Machine Learning, 2019, pp. 11--20.Google ScholarGoogle Scholar
  19. X. Yao, T. Huang, C. Wu, R.-X. Zhang, and L. Sun, "Adversarial feature alignment: Avoid catastrophic forgetting in incremental task lifelong learning," Neural computation, vol. 31, no. 11, pp. 2266--2291, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, "An empirical investigation of catastrophic forgetting in gradient-based neural networks," arXiv preprint arXiv:1312.6211, 2013.Google ScholarGoogle Scholar
  21. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.Google ScholarGoogle Scholar
  22. H. Riiser, P. Vigmostad, C. Griwodz, and P. Halvorsen, "Commute path bandwidth traces from 3g networks: analysis and applications," in Proceedings of the 4th ACM Multimedia Systems Conference, 2013, pp. 114--118.Google ScholarGoogle Scholar
  23. C. Wu, Z. Tan, Z. Wang, and S. Yang, "A dataset for exploring user behaviors in vr spherical video streaming," in Proceedings of the 8th ACM on Multimedia Systems Conference, ser. MMSys'17. New York, NY, USA: ACM, 2017, pp. 193--198. [Online]. Available Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Bao, H. Wu, T. Zhang, A. A. Ramli, and X. Liu, "Shooting a moving target: Motion-prediction-based transmission for 360-degree videos," in 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016, pp. 1161--1170.Google ScholarGoogle Scholar
  25. M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, "Rainbow: Combining improvements in deep reinforcement learning," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarGoogle Scholar

Index Terms

  1. PAAS: a preference-aware deep reinforcement learning approach for 360° video streaming

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      NOSSDAV '21: Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video
      July 2021
      128 pages
      ISBN:9781450384353
      DOI:10.1145/3458306

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 July 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      NOSSDAV '21 Paper Acceptance Rate15of52submissions,29%Overall Acceptance Rate118of363submissions,33%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader