Telepresence Video Quality Assessment

Ying, Zhenqiang; Ghadiyaram, Deepti; Bovik, Alan

doi:10.1007/978-3-031-19836-6_19

Zhenqiang Ying¹²,
Deepti Ghadiyaram¹³ &
Alan Bovik¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13697))

Included in the following conference series:

European Conference on Computer Vision

2187 Accesses
1 Citations

Abstract

Video conferencing, which includes both video and audio content, has contributed to dramatic increases in Internet traffic, as the COVID-19 pandemic forced millions of people to work and learn from home. Global Internet traffic of video conferencing has dramatically increased Because of this, efficient and accurate video quality tools are needed to monitor and perceptually optimize telepresence traffic streamed via Zoom, Webex, Meet, etc.. However, existing models are limited in their prediction capabilities on multi-modal, live streaming telepresence content. Here we address the significant challenges of Telepresence Video Quality Assessment (TVQA) in several ways. First, we mitigated the dearth of subjectively labeled data by collecting \(\sim \)2k telepresence videos from different countries, on which we crowdsourced \(\sim \)80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new TVQA database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.

The entity that conducted all of the data collection/experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

FFmpeg. https://ffmpeg.org/
Akhtar, Z., Falk, T.H.: Audio-visual multimedia quality assessment: a comprehensive survey. IEEE Access 5, 21090–21117 (2017)
Article Google Scholar
Argyropoulos, S., Raake, A., Garcia, M.N., List, P.: No-reference video quality assessment for SD and HD H.264/AVC sequences based on continuous estimates of packet loss visibility. In: 2011 Third International Workshop on Quality of Multimedia Experience, pp. 31–36 (2011)
Google Scholar
Bampis, C.G., Li, Z., Katsavounidis, I., Huang, T., Ekanadham, C., Bovik, A.C.: Towards perceptually optimized end-to-end adaptive video streaming. arXiv preprint arXiv:1808.03898 (2018)
Belmudez, B., Moeller, S., Lewcio, B., Raake, A., Mehmood, A.: Audio and video channel impact on perceived audio-visual quality in different interactive contexts. In: 2009 IEEE International Workshop on Multimedia Signal Processing, pp. 1–5. IEEE (2009)
Google Scholar
Cao, Y., Min, X., Sun, W., Zhai, G.: Deep neural networks for full-reference and no-reference audio-visual quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1429–1433. IEEE (2021)
Google Scholar
Caviedes, J.E., Oberti, F.: No-reference quality metric for degraded and enhanced video. Digit. Video Image Qual. Perceptual Coding, 305–324 (2017)
Google Scholar
Cheng, S., Zeng, H., Chen, J., Hou, J., Zhu, J., Ma, K.: Screen content video quality assessment: subjective and objective study. IEEE Trans. Image Process. 29, 8636–8651 (2020)
Article Google Scholar
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)
Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 87–103. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54184-6_6
Chapter Google Scholar
Zoom Video Communications, Inc.: Using QoS DSCP marking (2021). https://support.zoom.us/hc/en-us/articles/207368756-Using-QoS-DSCP-Marking
Demirbilek, E., Grégoire, J.: INRS audiovisual quality dataset. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 167–171 (2016)
Google Scholar
Demirbilek, E., Grégoire, J.: Towards reduced reference parametric models for estimating audiovisual quality in multimedia services. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 248–255, June 2009
Google Scholar
Elsayed, N., Maida, A.S., Bayoumi, M.: Deep gated recurrent and convolutional network hybrid model for univariate time series classification. arXiv preprint arXiv:1812.07683 (2018)
Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. arXiv preprint arXiv:1804.03619 (2018)
Gamper, H., Reddy, C.K., Cutler, R., Tashev, I.J., Gehrke, J.: Intrusive and non-intrusive perceptual speech quality assessment using a convolutional neural network. In: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 85–89. IEEE (2019)
Google Scholar
Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)
Google Scholar
Ghadiyaram, D., Bovik, A.C.: Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25(1), 372–387 (2016). Jan
Article MathSciNet Google Scholar
Ghadiyaram, D., Pan, J., Bovik, A.C., Moorthy, A.K., Panda, P., Yang, K.C.: In-capture mobile video distortions: a study of subjective behavior and objective algorithms. IEEE Trans. Circ. Syst. Video Tech. (2017). LIVE-Qualcomm Database. http://live.ece.utexas.edu/research/incaptureDatabase/index.html
Goetze, S., Albertin, E., Rennies, J., Habets, E.A., Kammeyer, K.: Speech quality assessment for listening-room compensation. In: Audio Engineering Society Conference: 38th International Conference: Sound Quality Evaluation. Audio Engineering Society (2010)
Google Scholar
Goudarzi, M., Sun, L., Ifeachor, E.: Audiovisual quality estimation for video calls in wireless applications. In: 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, pp. 1–5. IEEE (2010)
Google Scholar
Hahn, F.G., Hosu, V., Lin, H., Saupe, D.: No-reference video quality assessment using multi-level spatially pooled features (2019)
Google Scholar
Hands, D.S.: A basic multimedia quality model. IEEE Trans. Multimedia 6(6), 806–816 (2004)
Article Google Scholar
Hosu, V., et al.: The Konstanz natural video database (KoNviD-1K). In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE (2017). http://database.mmsp-kn.de/konvid-1k-database.html
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020)
Article Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)
ITU-T Recommendation P.910: Subjective video quality assessment methods for multimedia applications. International Telecommunication Union (2021)
Google Scholar
Kang, L., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for no-reference image quality assessment. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1733–1740, June 2014
Google Scholar
Kay, W., et al.: The Kinetics human action video dataset (2017)
Google Scholar
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5148–5157 (2021)
Google Scholar
Keegan, L.: Video conferencing statistics (all you need to know!) (2020). https://skillscouter.com/videoconferencing-statistics
Keimel, C., Oelbaum, T., Diepold, K.: No-reference video quality evaluation for high-definition video. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1145–1148 (2009)
Google Scholar
Keimel, C., Redl, A., Dieopold, K.: The TUM high definition video datasets, vol. pp. 97–102 (2012). https://doi.org/10.1109/QoMEX.2012.6263865
Kendall, M.G.: Rank correlation methods (1948)
Google Scholar
Kim, W., Kim, J., Ahn, S., Kim, J., Lee, S.: Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 224–241. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_14
Chapter Google Scholar
Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019). https://doi.org/10.1109/TIP.2019.2923051
Article MathSciNet MATH Google Scholar
Köster, F., Guse, D., Wältermann, M., Möller, S.: Comparison between the discrete ACR scale and an extended continuous scale for the quality assessment of transmitted speech. Fortschritte der Akustik, DAGA 3 (2015)
Google Scholar
Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos (2019). https://doi.org/10.1145/3343031.3351028
Li, T., Min, X., Zhao, H., Zhai, G., Xu, Y., Zhang, W.: Subjective and objective quality assessment of compressed screen content videos. IEEE Trans. Broadcast. 67(2), 438–449 (2020)
Article Google Scholar
Li, Z., Bampis, C.G.: Recover subjective quality scores from noisy measurements. In: 2017 Data Compression Conference (DCC), pp. 52–61. IEEE (2017)
Google Scholar
Li, Z., Bampis, C.G., Janowski, L., Katsavounidis, I.: A simple model for subject behavior in subjective experiments. Electron. Imaging 2020(11), 131–1 (2020)
Google Scholar
Lin, H., Hosu, V., Saupe, D.: KonIQ-10K: towards an ecologically valid and large-scale IQA database. arXiv preprint arXiv:1803.08489, March 2018
Liu, W., Duanmu, Z., Wang, Z.: End-to-end blind quality assessment of compressed videos using deep neural networks. In: Proceedings ACM Multimedia Conference (MM), pp. 546–554 (2018)
Google Scholar
Madhusudana, P.C., Birkbeck, N., Wang, Y., Adsumilli, B., Bovik, A.C.: Image quality assessment using contrastive learning. arXiv preprint arXiv:2110.13266 (2021)
Martinez, H.B., Farias, M.C.: Full-reference audio-visual video quality metric. J. Electron. Imaging 23(6), 061108 (2014)
Article Google Scholar
Martinez, H.B., Farias, M.C.: A no-reference audio-visual video quality metric. In: 2014 22nd European Signal Processing Conference (EUSIPCO), pp. 2125–2129. IEEE (2014)
Google Scholar
Martinez, H.A.B., Farias, M.C.Q.: Combining audio and video metrics to assess audio-visual quality. Multimedia Tools Appl. 77(18), 23993–24012 (2018). https://doi.org/10.1007/s11042-018-5656-7
Article Google Scholar
Martinez, H.B., Hines, A., Farias, M.C.: Perceptual quality of audio-visual content with common video and audio degradations. Appl. Sci. 11(13), 5813 (2021)
Article Google Scholar
Martinez, H.B., Hines, A., Farias, M.: UNB-AV: an audio-visual database for multimedia quality research. IEEE Access 8, 56641–56649 (2020)
Article Google Scholar
Martinez, H.B., Farias, M.C., Hines, A.: NAViDad: a no-reference audio-visual quality metric based on a deep autoencoder. In: 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5. IEEE (2019)
Google Scholar
Min, X., Zhai, G., Zhou, J., Farias, M., Bovik, A.C.: Study of subjective and objective quality assessment of audio-visual signals. IEEE Trans. Image Process. 29, 6054–6068 (2020)
Article Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Article MathSciNet Google Scholar
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “Completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20, 209–212 (2013)
Article Google Scholar
Nuutinen, M., Virtanen, T., Vaahteranoksa, M., Vuori, T., Oittinen, P., Häkkinen, J.: CVD2014-a database for evaluating no-reference video quality assessment algorithms. IEEE Trans. Image Process. 25(7), 3073–3086 (2016). https://doi.org/10.1109/TIP.2016.2562513
Article MathSciNet MATH Google Scholar
Oguiza, I.: tsai - a state-of-the-art deep learning library for time series and sequential data. Github (2020). https://github.com/timeseriesAI/tsai
Pandremmenou, K., Shahid, M., Kondi, L.P., Lövström, B.: A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human Vision and Electronic Imaging XX, vol. 9394, p. 93941F (2015)
Google Scholar
Perrin, A.N.M., Xu, H., Kroupi, E., Řeřábek, M., Ebrahimi, T.: Multimodal dataset for assessment of quality of experience in immersive multimedia. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1007–1010 (2015)
Google Scholar
Pinson, M.H., et al.: The influence of subjects and environment on audiovisual subjective tests: an international study. IEEE J. Sel. Top. Sig. Process. 6(6), 640–651 (2012)
Article Google Scholar
Reddy Dendi, S.V., Channappayya, S.S.: No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans. Image Process. 29, 5612–5624 (2020). https://doi.org/10.1109/TIP.2020.2984879
Article MATH Google Scholar
Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42(1), 59–66 (1988). https://doi.org/10.1080/00031305.1988.10475524
Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Trans. Image Process. 19(6), 1427–1441 (2010). https://doi.org/10.1109/TIP.2010.2042111
Article MathSciNet MATH Google Scholar
Simone, F.D., Tagliasacchi, M., Naccari, M., Tubaro, S., Ebrahimi, T.: A H.264/AVC video database for the evaluation of quality metrics. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2430–2433 (2010)
Google Scholar
Simou, N., Mastorakis, Y., Stefanakis, N.: Towards blind quality assessment of concert audio recordings using deep neural networks. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3477–3481. IEEE (2020)
Google Scholar
Sinno, Z., Bovik, A.: Large-scale study of perceptual video quality. IEEE Trans. Image Process. 28(2), 612–627 (2019). LIVE VQC Database. http://live.ece.utexas.edu/research/LIVEVQC/index.html
Su, S., et al.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)
Google Scholar
Søgaard, J., Forchhammer, S., Korhonen, J.: No-reference video quality assessment using codec analysis. Trans. Circuits Syst. Video Technol. 25(10), 1637–1650 (2015)
Article Google Scholar
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018). Aug
Article MathSciNet Google Scholar
YouTube Geofind: Search YouTube for geographically tagged videos by location, topic, or channel. https://mattw.io/youtube-geofind/location
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
Google Scholar
Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: UGC-VQA: benchmarking blind video quality assessment for user generated content (2020)
Google Scholar
Tu, Z., Yu, X., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: RAPIQUE: rapid and accurate video quality prediction of user generated content. arXiv preprint arXiv:2101.10955 (2021)
Urban, J.: Understanding video compression artifacts, September 2017. https://blog.biamp.com/understanding-video-compression-artifacts/
Valenzise, G., Magni, S., Tagliasacchis, M., Tubaro, S.: No-reference pixel video quality monitoring of channel-induced distortion. IEEE Trans. Circuits Syst. Video Technol. 22(4), 605–618 (2011)
Article Google Scholar
Vega, M.T., Mocanu, D.C., Stavro, S., Liotta, A.: Predictive no-reference assessment of video quality. Sig. Process. Image Commun. 52, 20–32 (2017)
Article Google Scholar
(VQEG): VQEG HDTV phase I database. https://www.its.bldrdoc.gov/vqeg/projects/hdtv/hdtv.aspx
Vu, P.V., Chandler, D.M.: VIS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imag. 23(1), 013016 (2014). Feb
Article Google Scholar
Wang, H., et al.: MCL-JCV: a JND-based H.264/AVC video quality assessment dataset. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1509–1513 (2016). https://doi.org/10.1109/ICIP.2016.7532610
Wang, H., et al.: VideoSet: a large-scale compressed video quality dataset based on JND measurement (2017)
Google Scholar
Wang, Y., Inguva, S., Adsumilli, B.: YouTube UGC dataset for video compression research (2019). https://doi.org/10.1109/MMSP.2019.8901772
Warzybok, A., et al.: Subjective speech quality and speech intelligibility evaluation of single-channel dereverberation algorithms. In: 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 332–336. IEEE (2014)
Google Scholar
Winkler, S., Faller, C.: Perceived audiovisual quality of low-bitrate multimedia content. IEEE Trans. Multimedia 8(5), 973–980 (2006)
Article Google Scholar
Xu, J.: Optimizing perceptual quality for online multimedia systems with fast-paced interactions. The Chinese University of Hong Kong, Hong Kong (2017)
Google Scholar
Xu, J., Wah, B.W.: Optimizing the perceptual quality of real-time multimedia applications. IEEE Multimedia 22(4), 14–28 (2015)
Article Google Scholar
Ying, Z., Mandal, M., Ghadiyaram, D., Bovik, A.C.: Patch-VQ: ‘patching up’ the video quality problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14019–14029 (2021)
Google Scholar
Ying, Z., Niu, H., Gupta, P., Mahajan, D., Ghadiyaram, D., Bovik, A.C.: From patches to pictures (PaQ-2-PiQ): mapping the perceptual space of picture quality. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3572–3582 (2020). https://doi.org/10.1109/CVPR42600.2020.00363
You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkis, A.: Perceptual-based quality assessment for audio-visual services: a survey. Sig. Process. Image Commun. 25(7), 482–501 (2010)
Article Google Scholar
Yu, H., et al.: Yamnet (2021). https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 30(1), 36–47 (2018)
Article Google Scholar
Zhang, Y., Gao, X., He, L., Lu, W., He, R.: Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol. 29(8), 2244–2255 (2019). https://doi.org/10.1109/TCSVT.2018.2868063
Article Google Scholar
Zheng, X., Zhang, C.: Towards blind audio quality assessment using a convolutional-recurrent neural network. In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), pp. 91–96. IEEE (2021)
Google Scholar

Download references

Acknowledgments

This work was supported by Meta Platforms, Inc. A.C. Bovik was supported in part by the National Science Foundation AI Institute for Foundations of Machine Learning (IFML) under Grant 2019844.

Author information

Authors and Affiliations

University of Texas at Austin, Austin, USA
Zhenqiang Ying & Alan Bovik
Facebook AI, Menlo Park, California, USA
Deepti Ghadiyaram

Authors

Zhenqiang Ying
View author publications
You can also search for this author in PubMed Google Scholar
Deepti Ghadiyaram
View author publications
You can also search for this author in PubMed Google Scholar
Alan Bovik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenqiang Ying .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 234 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ying, Z., Ghadiyaram, D., Bovik, A. (2022). Telepresence Video Quality Assessment. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. Springer, Cham. https://doi.org/10.1007/978-3-031-19836-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-19836-6_19
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19835-9
Online ISBN: 978-3-031-19836-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics