Skip to main content
Log in

Pitch Estimation by Block and Instantaneous Methods

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Choosing a pitch estimation algorithm is not a simple task. One must balance between the accuracy and the reliability of the estimates. Two classes of methods are available. The first one, known as the “block methods” class, gives noise robust solutions and has an intrinsic averaging property, but is not very accurate, especially for the transition regions. The second one, known as the “instantaneous (or event-based) methods” class, gives very accurate estimates, but is considered to be inadequate in the presence of noise.

In this paper, we present potential enhancements of the performance in pitch estimation, based on both block and instantaneous methods. In this respect we discuss mainly two algorithms: a nonlinear cepstral algorithm and a wavelet-based one. The first algorithm, due to the proposed nonlinear model, enhances the classical linear model performance related to the accuracy of the estimated pitch for the transition regions and to the robustness in the presence of noise. Concerning the second algorithm, to the inherent accuracy of the estimated pitch, we add robust estimates even in the presence of noise, based on the multiresolution properties of an improved wavelet transform. The obtained enhancements were evaluated on a hand-labeled speech database, and the improved algorithms are now being applied in our research concerning speech compression and prosody.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmadi, S. and Spanias, A.S. (1999). Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Transactions on Speech and Audio Processing, 7: 333–338.

    Google Scholar 

  • Ananthapadmanbha, T.V. and Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Audio, Speech and Signal Processing, 27: 309–319.

    Google Scholar 

  • Cheng, Y.M. and O'shaughnessy, D. (1989). Automatic and reliable estimation of glottal source instant and period. IEEE Transactions on Audio, Speech and Signal Processing, 37: 1805–1815.

    Google Scholar 

  • De Mori, R., Laface, P., Makhonine, V.A., and Mezzalama, M. (1977). A syntactic procedure for the recognition of glottal pulses in continuous speech. Pattern Recognition, 9: 181–189.

    Google Scholar 

  • Di Francesco, P. and Moulines, E. (1989). Detection of the glottal closure by jumps in the statistical properties of the signal. EUROSPEECH'89 Proceedings. Paris, France: European Speech Communication Association (ESCA), pp. 39–42.

    Google Scholar 

  • Flanagan, J.L. and Saslow, M.G. (1958). Pitch discrimination for synthetic vowels. Journal of American Society of Acoustics, 30: 435–442.

    Google Scholar 

  • Gavat, I., Zirra, M., and Enescu, V. (1995). Compresia semnalului vocal de calitate telefonicã utilizând prelucrarea homomorfic? (Compression of telephone quality speech signal using homomorfic processing), Military Technical Academy Conference Proceedings, Bucharest, Romania, pp. 109–116.

  • Gavat, I. (1995). Naturalness improvement in Romanian language speech synthesis. ICSPAT'95 Proceedings, Boston,MA,pp. 1951–1954.

  • Gavat, I., Zirra, M., and Enescu, V. (1996). Pitch detection of speech by dyadic wavelet transform. ICSPAT'96 Proceedings, Boston, MA, pp. 1630–1634.

  • Harris, M.S. and Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences, Journal of American Society of Acoustics, 81: 1139–1145.

    Google Scholar 

  • Hess, W.J. (1976). A pitch-synchronous digital feature extraction system for phonemic recognition of speech. IEEE Transactions on Audio, Speech and Signal Processing, 24:14–25.

    Google Scholar 

  • Hess, W.J. (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Berlin: Springer Verlag.

    Google Scholar 

  • Hess, W.J. (1992). Pitch and voicing determination. In M. Sondhi and S. Furui (Eds.), Advances in Speech Signal Processing. New York: Marcel Decker.

    Google Scholar 

  • Hodgson, L., Jerrigan, M.E., and Wills, B.L. (1990). Nonlinear multiplicative cepstral analysis for pitch extraction in speech. ICASSP'90 Proceedings, Adelaide, Australia, pp. 257–260.

  • Jo, C.W., Bang, H.G., and Ainsworth, W.A. (1996). Improved glottal closure instant detector based on linear prediction and standard pitch concept. ICSLP'96 Proceedings, Philadelphia, PA, pp. 1–5.

  • Kadambe, S. and Boudreaux-Bartels, G. (1992). Application of the wavelet transform for pitch detection of speech signals. IEEE Transactions on Information Theory, 38: 917–924.

    Google Scholar 

  • Kamp, C.Y. and Willems, L.F. (1994). A Frobenius norm approach to glottal closure detection from the speech signal. IEEE Transactions on Speech and Audio Processing, 2: 258–264.

    Google Scholar 

  • Klatt, D. (1973). Discrimination of fundamental frequency contours in synthetic speech: Implications for models of speech perception. Journal of American Society of Acoustics, 53: 8–16.

    Google Scholar 

  • Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11: 674–694.

    Google Scholar 

  • Mallat, S. and Hwang, W.L. (1992). Singularity detection and processing with wavelets. IEEE Transactions on Information Theory, 38: 617–643.

    Google Scholar 

  • Markel, J.D. (1972). The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20: 367–377.

    Google Scholar 

  • Markel, J.D and Gray, A.H. (1976). Linear Prediction of Speech. New York: Springer Verlag.

    Google Scholar 

  • Medan, Y., Yair, E., and Chazan, D. (1991). Super resolution pitch determination of speech signals. IEEE Transactions on Signal Processing, 39: 40–48.

    Google Scholar 

  • Murthy, P.S. and Yegnanarayana, B. (1999). Robustness of groupdelay-based method for extraction of significant instants of excitation from speech signals. IEEE Transactions on Speech and Audio Processing, 7: 609–620.

    Google Scholar 

  • Noll, A.M. (1967). Cepstrum pitch determination. Journal of American Society of Acoustics, 41: 293–309.

    Google Scholar 

  • Oppenheim, A.V. (1986). Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics, 16: 221–226.

    Google Scholar 

  • Qi, Y. and Hunt, B.R. (1993). Voiced-univoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Transactions on Speech and Audio Processing, 1: 250–254.

    Google Scholar 

  • Ross, M.J., Shaffer, H.L., Cohen, A., Freudberg, R., and Manle, H.J. (1974). Average magnitude difference function pitch extractor. IEEE Transactions on Audio, Speech and Signal Processing, 22: 353–361.

    Google Scholar 

  • Smits, R. and Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech. IEEE Transactions on Speech and Audio Processing, 3: 325–333.

    Google Scholar 

  • Wong, D.J., Markel, J.D., and Gray, A.H. (1979). Least squares glottal inverse filtering from the acoustic speech wave. IEEE Transactions on Audio, Speech and Signal Processing, 27: 350–355.

    Google Scholar 

  • Yegnanarayana, B. and Smits, R. (1995). A robust method for determining instants of major excitations in voiced speech. ICASSP'95 Proceedings, Detroit, MI, pp. 776–779.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gavat, I., Zirra, M. & Sabac, B. Pitch Estimation by Block and Instantaneous Methods. International Journal of Speech Technology 5, 269–279 (2002). https://doi.org/10.1023/A:1020201125377

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020201125377

Navigation