Abstract
The perceptual quality of VoIP conversations depends tightly on the pattern of packet losses, i.e., the distribution and duration of packet loss runs. The wider (resp. smaller) the inter-loss gap (resp. loss gap) duration, the lower is the quality degradation. Moreover, a set of speech sequences impaired using an identical packet loss pattern results in a different degree of perceptual quality degradation because dropped voice packets have unequal impact on the perceived quality. Therefore, we consider the voicing feature of speech wave included in lost packets in addition to packet loss pattern to estimate speech quality scores. We distinguish between voiced, unvoiced, and silence packets. This enables to achieve better correlation and accuracy between human-based subjective and machine-calculated objective scores.
This paper proposes novel no-reference parametric speech quality estimate models which account for the voicing feature of signal wave included in missing packets. Precisely, we develop separate speech quality estimate models, which capture the perceptual effect of removed voiced or unvoiced packets, using elaborated simple and multiple regression analyses. A new speech quality estimate model, which mixes voiced and unvoiced quality scores to compute the overall speech quality score at the end of an assessment interval, is developed following a rigorous multiple linear regression analysis. The input parameters of proposed voicing-aware speech quality estimate models, namely Packet Loss Ratio (PLR) and Effective Burstiness Probability (EBP), are extracted based on a novel Markov model of voicing-aware packet loss which captures properly the feature of packet loss process as well as the voicing property of speech wave included in lost packets. The conceived voicing-aware packet loss model is calibrated at run time using an efficient packet loss event driven algorithm. The performance evaluation study shows that our voicing-aware speech quality estimate models outperform voicing-unaware speech quality estimate models, especially in terms of accuracy over a wide range of conditions. Moreover, it validates the accuracy of the developed parametric no-reference speech quality models. In fact, we found that predicted scores using our speech quality models achieve an excellent correlation with measured scores (>0.95) and a small mean absolute deviation (<0.25) for ITU-T G.729 and G.711 speech CODECs.
Similar content being viewed by others
References
Technology Marketing Corporation: TMCNet, Official Website: http://www.tmcnet.com/, visited on April 2009.
Scheets, G., Parperis, M., & Singh, R. (2004). Voice over the internet: a tutorial discussing problems and solutions associated with alternative transport. IEEE Communications Surveys & Tutorials, 6(1–4), 22–31.
Melvin, H. (2004). The use of synchronized time in voice over Internet Protocol (VoIP) applications. Ph.D. dissertation, University College Dublin, Ireland.
Hoene, C. (2005). Internet telephony over wireless links. Ph.D. dissertation, Technical University of Berlin, Germany, December 2005.
Sat, B., & Wah, B. W. (2006). Analysis and evaluation of the Skype and Google-Talk VoIP system. In Proceedings of IEEE international conference on multimedia and exposition.
Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., & Weiss, W. (1998). Architecture for differentiated services. IETF RFC 2475, December 1998.
Braden, R., Clark, D., & Shenker, S. (1994). Integrated services in the Internet architecture: an overview. IETF RFC 1633, June 1994.
Li, Z. (2003). Improving perceived speech quality for wireless VoIP by cross-layer designs. Master thesis report, School of Computing, Communication and Electronics, University of Plymouth, September 2003.
Madhani, S., Shah, S., & Gutierrez, A. (2007). Optimized adaptive jitter buffer design for wireless internet telephony. In The Proceedings of IEEE GLOBECOM 2007, 26–30 November 2007.
Hu, P. The impact of adaptive play-out buffer algorithm on perceived speech quality transported over IP networks. Master thesis report, School of Computing, Communication and Electronics, University of Plymouth, September 2003.
Rix, A., Beerends, J., Kim, D., Kroon, P., & Ghitza, O. (2006). Objective assessment of speech and audio quality: technology and applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1890–1901.
ITU-T Recommendation P. 800 (1996). Methods for subjective determination of transmission quality.
Sun, L., & Ifeachor, E. C. (2002). Subjective and objective speech quality evaluation under bursty losses. In Proceedings of measurement of speech, audio and video quality (MESAQIN’02), June 2002.
Roychoudhuri, L., Al-Shaer, E., & Settimi, R. (2006). Statistical measurement approach for on-line audio quality assessment. In Proceedings of passive and active measurements (PAM’06).
Takahashi, A., Egi, N., & Kurashima, A. (2007). QoE estimation method for interconnected VoIP networks employing different CODECs. IEICE Transactions on Communication, E90-B(12), 3572–3578.
Masuda, M., & Hayashi, T. (2006). Non-intrusive quality monitoring method of VoIP speech based on network performance metrics. IEICE Transactions on Communication, E89-B(2), 304–312.
ITU-T Recommendation P.862 (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech CODECs, February 2001.
ITU-T Recommendation G.107 (2003). The E-model a computational model for use in transmission planning, March 2003.
Takahashi, A. (2004). Opinion model for estimating conversational quality of VoIP. In Proceedings of ICASSP’04 (Vol. III, pp. 1072–1075).
Cole, R. G., & Rosenbluth, J. H. (2001). Voice over IP performance monitoring. Computer Communication Review, ACM SIGCOMM, 31(2), 9–24.
Clark, A. D. (2001). Modeling the effects of burst packet loss and recency on subjective voice quality. In Proceedings of IP telephony workshop, Columbia, USA.
Sun, L., & Ifeachor, E. (2006). Voice quality prediction models and their application in VoIP networks. IEEE Transactions on Multimedia, 8(4), 809–820.
Broom, S. R. (2006). VoIP quality assessment: taking account of the edge-device. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1977–1983.
Sun, L., Wade, G., Lines, B., & Ifeachor, E. (2001). Impact of packet loss location on perceived speech quality. In Proceedings of 2nd IP-telephony workshop (IPTEL ’01) (pp. 114–122). New York: Columbia University.
Sanneck, H. (2000). Packet loss recovery and control for voice transmission over the internet. Ph.D. dissertation, Technical University of Berlin, Germany, December 2000.
ITU-T Recommendation G.729 (2007). Coding of speech at 8 Kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP).
Recommendation G.711 (1999). Appendix I, ITU-T. A high quality low-complexity algorithm for packet loss concealment with G.711, Sept. 1999.
De Martin, J. C. (2001). Source-driven packet marking for speech transmission over differentiated-services networks. In Proceedings of the IEEE international conference on audio, speech and signal processing, Salt Lake City, UT, May 2001 (pp. 753–756).
Li, Z., Sun, L., Qiao, Z., & Ifeachor, E. (2003). Perceived speech quality driven retransmission mechanism for wireless VoIP. In Proceedings of IEE fourth international conference on 3G mobile communication technologies, London, UK, June 2003 (pp. 395–399).
Ding, L., Lin, Z., Radwan, A., El-Hennawey, M. S., & Goubran, R. A. (2007). Non-intrusive single-ended speech quality assessment in VoIP. Elsevier Speech Communication Journal, 49, 477–489.
Fall, K., & Varadhan, K. (2001). The ns manual. VINT Project, November 2001.
GL Communications (2009). Protocol simulation/conformance testing of SS7 & ISDN protocols. Official website http://www.tmcnet.com/, visited on April 2009.
Jain, U., Yokoyama, Y., & Kumar, A. (2009). Study of factors influencing QoS in next generation networks [Online]. Available at http://www.eng.auburn.edu/department/csse/classes/comp8700/index.html, visited on January 2009.
Jain, R. (1991). The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. New York: Wiley-Interscience. ISBN: 0471503361.
Greenwood, M., & Kinghorn, A. (1999). SUVing: automatic silence/unvoiced/voiced classification of speech. Undergraduate Coursework, Department of Computer Science, The University of Sheffield, UK.
Jiang, W., & Schulzrinne, H. (1999). QoS measurement of internet real-time multimedia services. Technical Report CUCS-015-99, Department of Computer Science, Columbia University, December 1999.
Hammer, F., Reichl, P., & Ziegler, T. (2004). Where packet traces meet speech samples: an instrumental approach to perceptual QoS evaluation of VoIP. In Proceedings of 12th international workshop IWQoS, Montreal, Canada, June 7–9, 2004, pp. 273–280.
Werber, M., Kamps, K., Tuisel, U., Beerends, J. G., & Vary, P. (2003). Parameter-based speech quality measures for GSM. In 14th IEEE international symposium on personal, indoor and mobile radio communications (PIMRC2003), Beijing, China, September 7–10, 2003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jelassi, S., Youssef, H., Hoene, C. et al. Single-ended parametric voicing-aware models for live assessment of packetized VoIP conversations. Telecommun Syst 49, 17–34 (2012). https://doi.org/10.1007/s11235-010-9350-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-010-9350-y