Skip to main content

Telepresence Video Quality Assessment

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

Abstract

Video conferencing, which includes both video and audio content, has contributed to dramatic increases in Internet traffic, as the COVID-19 pandemic forced millions of people to work and learn from home. Global Internet traffic of video conferencing has dramatically increased Because of this, efficient and accurate video quality tools are needed to monitor and perceptually optimize telepresence traffic streamed via Zoom, Webex, Meet, etc.. However, existing models are limited in their prediction capabilities on multi-modal, live streaming telepresence content. Here we address the significant challenges of Telepresence Video Quality Assessment (TVQA) in several ways. First, we mitigated the dearth of subjectively labeled data by collecting \(\sim \)2k telepresence videos from different countries, on which we crowdsourced \(\sim \)80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new TVQA database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.

The entity that conducted all of the data collection/experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. FFmpeg. https://ffmpeg.org/

  2. Akhtar, Z., Falk, T.H.: Audio-visual multimedia quality assessment: a comprehensive survey. IEEE Access 5, 21090–21117 (2017)

    Article  Google Scholar 

  3. Argyropoulos, S., Raake, A., Garcia, M.N., List, P.: No-reference video quality assessment for SD and HD H.264/AVC sequences based on continuous estimates of packet loss visibility. In: 2011 Third International Workshop on Quality of Multimedia Experience, pp. 31–36 (2011)

    Google Scholar 

  4. Bampis, C.G., Li, Z., Katsavounidis, I., Huang, T., Ekanadham, C., Bovik, A.C.: Towards perceptually optimized end-to-end adaptive video streaming. arXiv preprint arXiv:1808.03898 (2018)

  5. Belmudez, B., Moeller, S., Lewcio, B., Raake, A., Mehmood, A.: Audio and video channel impact on perceived audio-visual quality in different interactive contexts. In: 2009 IEEE International Workshop on Multimedia Signal Processing, pp. 1–5. IEEE (2009)

    Google Scholar 

  6. Cao, Y., Min, X., Sun, W., Zhai, G.: Deep neural networks for full-reference and no-reference audio-visual quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1429–1433. IEEE (2021)

    Google Scholar 

  7. Caviedes, J.E., Oberti, F.: No-reference quality metric for degraded and enhanced video. Digit. Video Image Qual. Perceptual Coding, 305–324 (2017)

    Google Scholar 

  8. Cheng, S., Zeng, H., Chen, J., Hou, J., Zhu, J., Ma, K.: Screen content video quality assessment: subjective and objective study. IEEE Trans. Image Process. 29, 8636–8651 (2020)

    Article  Google Scholar 

  9. Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)

  10. Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 87–103. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54184-6_6

    Chapter  Google Scholar 

  11. Zoom Video Communications, Inc.: Using QoS DSCP marking (2021). https://support.zoom.us/hc/en-us/articles/207368756-Using-QoS-DSCP-Marking

  12. Demirbilek, E., Grégoire, J.: INRS audiovisual quality dataset. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 167–171 (2016)

    Google Scholar 

  13. Demirbilek, E., Grégoire, J.: Towards reduced reference parametric models for estimating audiovisual quality in multimedia services. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2016)

    Google Scholar 

  14. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 248–255, June 2009

    Google Scholar 

  15. Elsayed, N., Maida, A.S., Bayoumi, M.: Deep gated recurrent and convolutional network hybrid model for univariate time series classification. arXiv preprint arXiv:1812.07683 (2018)

  16. Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. arXiv preprint arXiv:1804.03619 (2018)

  17. Gamper, H., Reddy, C.K., Cutler, R., Tashev, I.J., Gehrke, J.: Intrusive and non-intrusive perceptual speech quality assessment using a convolutional neural network. In: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 85–89. IEEE (2019)

    Google Scholar 

  18. Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)

    Google Scholar 

  19. Ghadiyaram, D., Bovik, A.C.: Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25(1), 372–387 (2016). Jan

    Article  MathSciNet  Google Scholar 

  20. Ghadiyaram, D., Pan, J., Bovik, A.C., Moorthy, A.K., Panda, P., Yang, K.C.: In-capture mobile video distortions: a study of subjective behavior and objective algorithms. IEEE Trans. Circ. Syst. Video Tech. (2017). LIVE-Qualcomm Database. http://live.ece.utexas.edu/research/incaptureDatabase/index.html

  21. Goetze, S., Albertin, E., Rennies, J., Habets, E.A., Kammeyer, K.: Speech quality assessment for listening-room compensation. In: Audio Engineering Society Conference: 38th International Conference: Sound Quality Evaluation. Audio Engineering Society (2010)

    Google Scholar 

  22. Goudarzi, M., Sun, L., Ifeachor, E.: Audiovisual quality estimation for video calls in wireless applications. In: 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, pp. 1–5. IEEE (2010)

    Google Scholar 

  23. Hahn, F.G., Hosu, V., Lin, H., Saupe, D.: No-reference video quality assessment using multi-level spatially pooled features (2019)

    Google Scholar 

  24. Hands, D.S.: A basic multimedia quality model. IEEE Trans. Multimedia 6(6), 806–816 (2004)

    Article  Google Scholar 

  25. Hosu, V., et al.: The Konstanz natural video database (KoNviD-1K). In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE (2017). http://database.mmsp-kn.de/konvid-1k-database.html

  26. Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

    Google Scholar 

  27. Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020)

    Article  Google Scholar 

  28. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)

  29. ITU-T Recommendation P.910: Subjective video quality assessment methods for multimedia applications. International Telecommunication Union (2021)

    Google Scholar 

  30. Kang, L., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for no-reference image quality assessment. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1733–1740, June 2014

    Google Scholar 

  31. Kay, W., et al.: The Kinetics human action video dataset (2017)

    Google Scholar 

  32. Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)

    Google Scholar 

  33. Keegan, L.: Video conferencing statistics (all you need to know!) (2020). https://skillscouter.com/videoconferencing-statistics

  34. Keimel, C., Oelbaum, T., Diepold, K.: No-reference video quality evaluation for high-definition video. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1145–1148 (2009)

    Google Scholar 

  35. Keimel, C., Redl, A., Dieopold, K.: The TUM high definition video datasets, vol. pp. 97–102 (2012). https://doi.org/10.1109/QoMEX.2012.6263865

  36. Kendall, M.G.: Rank correlation methods (1948)

    Google Scholar 

  37. Kim, W., Kim, J., Ahn, S., Kim, J., Lee, S.: Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 224–241. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_14

    Chapter  Google Scholar 

  38. Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019). https://doi.org/10.1109/TIP.2019.2923051

    Article  MathSciNet  MATH  Google Scholar 

  39. Köster, F., Guse, D., Wältermann, M., Möller, S.: Comparison between the discrete ACR scale and an extended continuous scale for the quality assessment of transmitted speech. Fortschritte der Akustik, DAGA 3 (2015)

    Google Scholar 

  40. Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos (2019). https://doi.org/10.1145/3343031.3351028

  41. Li, T., Min, X., Zhao, H., Zhai, G., Xu, Y., Zhang, W.: Subjective and objective quality assessment of compressed screen content videos. IEEE Trans. Broadcast. 67(2), 438–449 (2020)

    Article  Google Scholar 

  42. Li, Z., Bampis, C.G.: Recover subjective quality scores from noisy measurements. In: 2017 Data Compression Conference (DCC), pp. 52–61. IEEE (2017)

    Google Scholar 

  43. Li, Z., Bampis, C.G., Janowski, L., Katsavounidis, I.: A simple model for subject behavior in subjective experiments. Electron. Imaging 2020(11), 131–1 (2020)

    Google Scholar 

  44. Lin, H., Hosu, V., Saupe, D.: KonIQ-10K: towards an ecologically valid and large-scale IQA database. arXiv preprint arXiv:1803.08489, March 2018

  45. Liu, W., Duanmu, Z., Wang, Z.: End-to-end blind quality assessment of compressed videos using deep neural networks. In: Proceedings ACM Multimedia Conference (MM), pp. 546–554 (2018)

    Google Scholar 

  46. Madhusudana, P.C., Birkbeck, N., Wang, Y., Adsumilli, B., Bovik, A.C.: Image quality assessment using contrastive learning. arXiv preprint arXiv:2110.13266 (2021)

  47. Martinez, H.B., Farias, M.C.: Full-reference audio-visual video quality metric. J. Electron. Imaging 23(6), 061108 (2014)

    Article  Google Scholar 

  48. Martinez, H.B., Farias, M.C.: A no-reference audio-visual video quality metric. In: 2014 22nd European Signal Processing Conference (EUSIPCO), pp. 2125–2129. IEEE (2014)

    Google Scholar 

  49. Martinez, H.A.B., Farias, M.C.Q.: Combining audio and video metrics to assess audio-visual quality. Multimedia Tools Appl. 77(18), 23993–24012 (2018). https://doi.org/10.1007/s11042-018-5656-7

    Article  Google Scholar 

  50. Martinez, H.B., Hines, A., Farias, M.C.: Perceptual quality of audio-visual content with common video and audio degradations. Appl. Sci. 11(13), 5813 (2021)

    Article  Google Scholar 

  51. Martinez, H.B., Hines, A., Farias, M.: UNB-AV: an audio-visual database for multimedia quality research. IEEE Access 8, 56641–56649 (2020)

    Article  Google Scholar 

  52. Martinez, H.B., Farias, M.C., Hines, A.: NAViDad: a no-reference audio-visual quality metric based on a deep autoencoder. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE (2019)

    Google Scholar 

  53. Min, X., Zhai, G., Zhou, J., Farias, M., Bovik, A.C.: Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans. Image Process. 29, 6054–6068 (2020)

    Article  Google Scholar 

  54. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)

    Article  MathSciNet  Google Scholar 

  55. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “Completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20, 209–212 (2013)

    Article  Google Scholar 

  56. Nuutinen, M., Virtanen, T., Vaahteranoksa, M., Vuori, T., Oittinen, P., Häkkinen, J.: CVD2014-a database for evaluating no-reference video quality assessment algorithms. IEEE Trans. Image Process. 25(7), 3073–3086 (2016). https://doi.org/10.1109/TIP.2016.2562513

    Article  MathSciNet  MATH  Google Scholar 

  57. Oguiza, I.: tsai - a state-of-the-art deep learning library for time series and sequential data. Github (2020). https://github.com/timeseriesAI/tsai

  58. Pandremmenou, K., Shahid, M., Kondi, L.P., Lövström, B.: A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human Vision and Electronic Imaging XX, vol. 9394, p. 93941F (2015)

    Google Scholar 

  59. Perrin, A.N.M., Xu, H., Kroupi, E., Řeřábek, M., Ebrahimi, T.: Multimodal dataset for assessment of quality of experience in immersive multimedia. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1007–1010 (2015)

    Google Scholar 

  60. Pinson, M.H., et al.: The influence of subjects and environment on audiovisual subjective tests: an international study. IEEE J. Sel. Top. Sig. Process. 6(6), 640–651 (2012)

    Article  Google Scholar 

  61. Reddy Dendi, S.V., Channappayya, S.S.: No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans. Image Process. 29, 5612–5624 (2020). https://doi.org/10.1109/TIP.2020.2984879

    Article  MATH  Google Scholar 

  62. Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42(1), 59–66 (1988). https://doi.org/10.1080/00031305.1988.10475524

  63. Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Trans. Image Process. 19(6), 1427–1441 (2010). https://doi.org/10.1109/TIP.2010.2042111

    Article  MathSciNet  MATH  Google Scholar 

  64. Simone, F.D., Tagliasacchi, M., Naccari, M., Tubaro, S., Ebrahimi, T.: A H.264/AVC video database for the evaluation of quality metrics. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2430–2433 (2010)

    Google Scholar 

  65. Simou, N., Mastorakis, Y., Stefanakis, N.: Towards blind quality assessment of concert audio recordings using deep neural networks. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3477–3481. IEEE (2020)

    Google Scholar 

  66. Sinno, Z., Bovik, A.: Large-scale study of perceptual video quality. IEEE Trans. Image Process. 28(2), 612–627 (2019). LIVE VQC Database. http://live.ece.utexas.edu/research/LIVEVQC/index.html

  67. Su, S., et al.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)

    Google Scholar 

  68. Søgaard, J., Forchhammer, S., Korhonen, J.: No-reference video quality assessment using codec analysis. Trans. Circuits Syst. Video Technol. 25(10), 1637–1650 (2015)

    Article  Google Scholar 

  69. Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018). Aug

    Article  MathSciNet  Google Scholar 

  70. YouTube Geofind: Search YouTube for geographically tagged videos by location, topic, or channel. https://mattw.io/youtube-geofind/location

  71. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)

    Google Scholar 

  72. Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: UGC-VQA: benchmarking blind video quality assessment for user generated content (2020)

    Google Scholar 

  73. Tu, Z., Yu, X., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: RAPIQUE: rapid and accurate video quality prediction of user generated content. arXiv preprint arXiv:2101.10955 (2021)

  74. Urban, J.: Understanding video compression artifacts, September 2017. https://blog.biamp.com/understanding-video-compression-artifacts/

  75. Valenzise, G., Magni, S., Tagliasacchis, M., Tubaro, S.: No-reference pixel video quality monitoring of channel-induced distortion. IEEE Trans. Circuits Syst. Video Technol. 22(4), 605–618 (2011)

    Article  Google Scholar 

  76. Vega, M.T., Mocanu, D.C., Stavro, S., Liotta, A.: Predictive no-reference assessment of video quality. Sig. Process. Image Commun. 52, 20–32 (2017)

    Article  Google Scholar 

  77. (VQEG): VQEG HDTV phase I database. https://www.its.bldrdoc.gov/vqeg/projects/hdtv/hdtv.aspx

  78. Vu, P.V., Chandler, D.M.: VIS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imag. 23(1), 013016 (2014). Feb

    Article  Google Scholar 

  79. Wang, H., et al.: MCL-JCV: a JND-based H.264/AVC video quality assessment dataset. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1509–1513 (2016). https://doi.org/10.1109/ICIP.2016.7532610

  80. Wang, H., et al.: VideoSet: a large-scale compressed video quality dataset based on JND measurement (2017)

    Google Scholar 

  81. Wang, Y., Inguva, S., Adsumilli, B.: YouTube UGC dataset for video compression research (2019). https://doi.org/10.1109/MMSP.2019.8901772

  82. Warzybok, A., et al.: Subjective speech quality and speech intelligibility evaluation of single-channel dereverberation algorithms. In: 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 332–336. IEEE (2014)

    Google Scholar 

  83. Winkler, S., Faller, C.: Perceived audiovisual quality of low-bitrate multimedia content. IEEE Trans. Multimedia 8(5), 973–980 (2006)

    Article  Google Scholar 

  84. Xu, J.: Optimizing perceptual quality for online multimedia systems with fast-paced interactions. The Chinese University of Hong Kong, Hong Kong (2017)

    Google Scholar 

  85. Xu, J., Wah, B.W.: Optimizing the perceptual quality of real-time multimedia applications. IEEE Multimedia 22(4), 14–28 (2015)

    Article  Google Scholar 

  86. Ying, Z., Mandal, M., Ghadiyaram, D., Bovik, A.C.: Patch-VQ: ‘patching up’ the video quality problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14019–14029 (2021)

    Google Scholar 

  87. Ying, Z., Niu, H., Gupta, P., Mahajan, D., Ghadiyaram, D., Bovik, A.C.: From patches to pictures (PaQ-2-PiQ): mapping the perceptual space of picture quality. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3572–3582 (2020). https://doi.org/10.1109/CVPR42600.2020.00363

  88. You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkis, A.: Perceptual-based quality assessment for audio-visual services: a survey. Sig. Process. Image Commun. 25(7), 482–501 (2010)

    Article  Google Scholar 

  89. Yu, H., et al.: Yamnet (2021). https://github.com/tensorflow/models/tree/master/research/audioset/yamnet

  90. Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 30(1), 36–47 (2018)

    Article  Google Scholar 

  91. Zhang, Y., Gao, X., He, L., Lu, W., He, R.: Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol. 29(8), 2244–2255 (2019). https://doi.org/10.1109/TCSVT.2018.2868063

    Article  Google Scholar 

  92. Zheng, X., Zhang, C.: Towards blind audio quality assessment using a convolutional-recurrent neural network. In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), pp. 91–96. IEEE (2021)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Meta Platforms, Inc. A.C. Bovik was supported in part by the National Science Foundation AI Institute for Foundations of Machine Learning (IFML) under Grant 2019844.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenqiang Ying .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 234 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ying, Z., Ghadiyaram, D., Bovik, A. (2022). Telepresence Video Quality Assessment. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19836-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19835-9

  • Online ISBN: 978-3-031-19836-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics