Skip to main content

Advertisement

Log in

Single-ended parametric voicing-aware models for live assessment of packetized VoIP conversations

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

The perceptual quality of VoIP conversations depends tightly on the pattern of packet losses, i.e., the distribution and duration of packet loss runs. The wider (resp. smaller) the inter-loss gap (resp. loss gap) duration, the lower is the quality degradation. Moreover, a set of speech sequences impaired using an identical packet loss pattern results in a different degree of perceptual quality degradation because dropped voice packets have unequal impact on the perceived quality. Therefore, we consider the voicing feature of speech wave included in lost packets in addition to packet loss pattern to estimate speech quality scores. We distinguish between voiced, unvoiced, and silence packets. This enables to achieve better correlation and accuracy between human-based subjective and machine-calculated objective scores.

This paper proposes novel no-reference parametric speech quality estimate models which account for the voicing feature of signal wave included in missing packets. Precisely, we develop separate speech quality estimate models, which capture the perceptual effect of removed voiced or unvoiced packets, using elaborated simple and multiple regression analyses. A new speech quality estimate model, which mixes voiced and unvoiced quality scores to compute the overall speech quality score at the end of an assessment interval, is developed following a rigorous multiple linear regression analysis. The input parameters of proposed voicing-aware speech quality estimate models, namely Packet Loss Ratio (PLR) and Effective Burstiness Probability (EBP), are extracted based on a novel Markov model of voicing-aware packet loss which captures properly the feature of packet loss process as well as the voicing property of speech wave included in lost packets. The conceived voicing-aware packet loss model is calibrated at run time using an efficient packet loss event driven algorithm. The performance evaluation study shows that our voicing-aware speech quality estimate models outperform voicing-unaware speech quality estimate models, especially in terms of accuracy over a wide range of conditions. Moreover, it validates the accuracy of the developed parametric no-reference speech quality models. In fact, we found that predicted scores using our speech quality models achieve an excellent correlation with measured scores (>0.95) and a small mean absolute deviation (<0.25) for ITU-T G.729 and G.711 speech CODECs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Technology Marketing Corporation: TMCNet, Official Website: http://www.tmcnet.com/, visited on April 2009.

  2. Scheets, G., Parperis, M., & Singh, R. (2004). Voice over the internet: a tutorial discussing problems and solutions associated with alternative transport. IEEE Communications Surveys & Tutorials, 6(1–4), 22–31.

    Google Scholar 

  3. Melvin, H. (2004). The use of synchronized time in voice over Internet Protocol (VoIP) applications. Ph.D. dissertation, University College Dublin, Ireland.

  4. Hoene, C. (2005). Internet telephony over wireless links. Ph.D. dissertation, Technical University of Berlin, Germany, December 2005.

  5. Sat, B., & Wah, B. W. (2006). Analysis and evaluation of the Skype and Google-Talk VoIP system. In Proceedings of IEEE international conference on multimedia and exposition.

  6. Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., & Weiss, W. (1998). Architecture for differentiated services. IETF RFC 2475, December 1998.

  7. Braden, R., Clark, D., & Shenker, S. (1994). Integrated services in the Internet architecture: an overview. IETF RFC 1633, June 1994.

  8. Li, Z. (2003). Improving perceived speech quality for wireless VoIP by cross-layer designs. Master thesis report, School of Computing, Communication and Electronics, University of Plymouth, September 2003.

  9. Madhani, S., Shah, S., & Gutierrez, A. (2007). Optimized adaptive jitter buffer design for wireless internet telephony. In The Proceedings of IEEE GLOBECOM 2007, 26–30 November 2007.

  10. Hu, P. The impact of adaptive play-out buffer algorithm on perceived speech quality transported over IP networks. Master thesis report, School of Computing, Communication and Electronics, University of Plymouth, September 2003.

  11. Rix, A., Beerends, J., Kim, D., Kroon, P., & Ghitza, O. (2006). Objective assessment of speech and audio quality: technology and applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1890–1901.

    Article  Google Scholar 

  12. ITU-T Recommendation P. 800 (1996). Methods for subjective determination of transmission quality.

  13. Sun, L., & Ifeachor, E. C. (2002). Subjective and objective speech quality evaluation under bursty losses. In Proceedings of measurement of speech, audio and video quality (MESAQIN’02), June 2002.

  14. Roychoudhuri, L., Al-Shaer, E., & Settimi, R. (2006). Statistical measurement approach for on-line audio quality assessment. In Proceedings of passive and active measurements (PAM’06).

  15. Takahashi, A., Egi, N., & Kurashima, A. (2007). QoE estimation method for interconnected VoIP networks employing different CODECs. IEICE Transactions on Communication, E90-B(12), 3572–3578.

    Article  Google Scholar 

  16. Masuda, M., & Hayashi, T. (2006). Non-intrusive quality monitoring method of VoIP speech based on network performance metrics. IEICE Transactions on Communication, E89-B(2), 304–312.

    Article  Google Scholar 

  17. ITU-T Recommendation P.862 (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech CODECs, February 2001.

  18. ITU-T Recommendation G.107 (2003). The E-model a computational model for use in transmission planning, March 2003.

  19. Takahashi, A. (2004). Opinion model for estimating conversational quality of VoIP. In Proceedings of ICASSP’04 (Vol. III, pp. 1072–1075).

  20. Cole, R. G., & Rosenbluth, J. H. (2001). Voice over IP performance monitoring. Computer Communication Review, ACM SIGCOMM, 31(2), 9–24.

    Article  Google Scholar 

  21. Clark, A. D. (2001). Modeling the effects of burst packet loss and recency on subjective voice quality. In Proceedings of IP telephony workshop, Columbia, USA.

  22. Sun, L., & Ifeachor, E. (2006). Voice quality prediction models and their application in VoIP networks. IEEE Transactions on Multimedia, 8(4), 809–820.

    Article  Google Scholar 

  23. Broom, S. R. (2006). VoIP quality assessment: taking account of the edge-device. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1977–1983.

    Article  Google Scholar 

  24. Sun, L., Wade, G., Lines, B., & Ifeachor, E. (2001). Impact of packet loss location on perceived speech quality. In Proceedings of 2nd IP-telephony workshop (IPTEL ’01) (pp. 114–122). New York: Columbia University.

    Google Scholar 

  25. Sanneck, H. (2000). Packet loss recovery and control for voice transmission over the internet. Ph.D. dissertation, Technical University of Berlin, Germany, December 2000.

  26. ITU-T Recommendation G.729 (2007). Coding of speech at 8 Kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP).

  27. Recommendation G.711 (1999). Appendix I, ITU-T. A high quality low-complexity algorithm for packet loss concealment with G.711, Sept. 1999.

  28. De Martin, J. C. (2001). Source-driven packet marking for speech transmission over differentiated-services networks. In Proceedings of the IEEE international conference on audio, speech and signal processing, Salt Lake City, UT, May 2001 (pp. 753–756).

  29. Li, Z., Sun, L., Qiao, Z., & Ifeachor, E. (2003). Perceived speech quality driven retransmission mechanism for wireless VoIP. In Proceedings of IEE fourth international conference on 3G mobile communication technologies, London, UK, June 2003 (pp. 395–399).

  30. Ding, L., Lin, Z., Radwan, A., El-Hennawey, M. S., & Goubran, R. A. (2007). Non-intrusive single-ended speech quality assessment in VoIP. Elsevier Speech Communication Journal, 49, 477–489.

    Google Scholar 

  31. Fall, K., & Varadhan, K. (2001). The ns manual. VINT Project, November 2001.

  32. GL Communications (2009). Protocol simulation/conformance testing of SS7 & ISDN protocols. Official website http://www.tmcnet.com/, visited on April 2009.

  33. Jain, U., Yokoyama, Y., & Kumar, A. (2009). Study of factors influencing QoS in next generation networks [Online]. Available at http://www.eng.auburn.edu/department/csse/classes/comp8700/index.html, visited on January 2009.

  34. Jain, R. (1991). The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. New York: Wiley-Interscience. ISBN: 0471503361.

    Google Scholar 

  35. Greenwood, M., & Kinghorn, A. (1999). SUVing: automatic silence/unvoiced/voiced classification of speech. Undergraduate Coursework, Department of Computer Science, The University of Sheffield, UK.

  36. Jiang, W., & Schulzrinne, H. (1999). QoS measurement of internet real-time multimedia services. Technical Report CUCS-015-99, Department of Computer Science, Columbia University, December 1999.

  37. Hammer, F., Reichl, P., & Ziegler, T. (2004). Where packet traces meet speech samples: an instrumental approach to perceptual QoS evaluation of VoIP. In Proceedings of 12th international workshop IWQoS, Montreal, Canada, June 7–9, 2004, pp. 273–280.

  38. Werber, M., Kamps, K., Tuisel, U., Beerends, J. G., & Vary, P. (2003). Parameter-based speech quality measures for GSM. In 14th IEEE international symposium on personal, indoor and mobile radio communications (PIMRC2003), Beijing, China, September 7–10, 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sofiene Jelassi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jelassi, S., Youssef, H., Hoene, C. et al. Single-ended parametric voicing-aware models for live assessment of packetized VoIP conversations. Telecommun Syst 49, 17–34 (2012). https://doi.org/10.1007/s11235-010-9350-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-010-9350-y

Keywords

Navigation