Pitch Estimation by Block and Instantaneous Methods

Gavat, Inge; Zirra, Matei; Sabac, Bogdan

doi:10.1023/A:1020201125377

Pitch Estimation by Block and Instantaneous Methods

Published: September 2002

Volume 5, pages 269–279, (2002)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Inge Gavat¹,
Matei Zirra² &
Bogdan Sabac¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Choosing a pitch estimation algorithm is not a simple task. One must balance between the accuracy and the reliability of the estimates. Two classes of methods are available. The first one, known as the “block methods” class, gives noise robust solutions and has an intrinsic averaging property, but is not very accurate, especially for the transition regions. The second one, known as the “instantaneous (or event-based) methods” class, gives very accurate estimates, but is considered to be inadequate in the presence of noise.

In this paper, we present potential enhancements of the performance in pitch estimation, based on both block and instantaneous methods. In this respect we discuss mainly two algorithms: a nonlinear cepstral algorithm and a wavelet-based one. The first algorithm, due to the proposed nonlinear model, enhances the classical linear model performance related to the accuracy of the estimated pitch for the transition regions and to the robustness in the presence of noise. Concerning the second algorithm, to the inherent accuracy of the estimated pitch, we add robust estimates even in the presence of noise, based on the multiresolution properties of an improved wavelet transform. The obtained enhancements were evaluated on a hand-labeled speech database, and the improved algorithms are now being applied in our research concerning speech compression and prosody.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahmadi, S. and Spanias, A.S. (1999). Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Transactions on Speech and Audio Processing, 7: 333–338.
Google Scholar
Ananthapadmanbha, T.V. and Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Audio, Speech and Signal Processing, 27: 309–319.
Google Scholar
Cheng, Y.M. and O'shaughnessy, D. (1989). Automatic and reliable estimation of glottal source instant and period. IEEE Transactions on Audio, Speech and Signal Processing, 37: 1805–1815.
Google Scholar
De Mori, R., Laface, P., Makhonine, V.A., and Mezzalama, M. (1977). A syntactic procedure for the recognition of glottal pulses in continuous speech. Pattern Recognition, 9: 181–189.
Google Scholar
Di Francesco, P. and Moulines, E. (1989). Detection of the glottal closure by jumps in the statistical properties of the signal. EUROSPEECH'89 Proceedings. Paris, France: European Speech Communication Association (ESCA), pp. 39–42.
Google Scholar
Flanagan, J.L. and Saslow, M.G. (1958). Pitch discrimination for synthetic vowels. Journal of American Society of Acoustics, 30: 435–442.
Google Scholar
Gavat, I., Zirra, M., and Enescu, V. (1995). Compresia semnalului vocal de calitate telefonicã utilizând prelucrarea homomorfic? (Compression of telephone quality speech signal using homomorfic processing), Military Technical Academy Conference Proceedings, Bucharest, Romania, pp. 109–116.
Gavat, I. (1995). Naturalness improvement in Romanian language speech synthesis. ICSPAT'95 Proceedings, Boston,MA,pp. 1951–1954.
Gavat, I., Zirra, M., and Enescu, V. (1996). Pitch detection of speech by dyadic wavelet transform. ICSPAT'96 Proceedings, Boston, MA, pp. 1630–1634.
Harris, M.S. and Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences, Journal of American Society of Acoustics, 81: 1139–1145.
Google Scholar
Hess, W.J. (1976). A pitch-synchronous digital feature extraction system for phonemic recognition of speech. IEEE Transactions on Audio, Speech and Signal Processing, 24:14–25.
Google Scholar
Hess, W.J. (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Berlin: Springer Verlag.
Google Scholar
Hess, W.J. (1992). Pitch and voicing determination. In M. Sondhi and S. Furui (Eds.), Advances in Speech Signal Processing. New York: Marcel Decker.
Google Scholar
Hodgson, L., Jerrigan, M.E., and Wills, B.L. (1990). Nonlinear multiplicative cepstral analysis for pitch extraction in speech. ICASSP'90 Proceedings, Adelaide, Australia, pp. 257–260.
Jo, C.W., Bang, H.G., and Ainsworth, W.A. (1996). Improved glottal closure instant detector based on linear prediction and standard pitch concept. ICSLP'96 Proceedings, Philadelphia, PA, pp. 1–5.
Kadambe, S. and Boudreaux-Bartels, G. (1992). Application of the wavelet transform for pitch detection of speech signals. IEEE Transactions on Information Theory, 38: 917–924.
Google Scholar
Kamp, C.Y. and Willems, L.F. (1994). A Frobenius norm approach to glottal closure detection from the speech signal. IEEE Transactions on Speech and Audio Processing, 2: 258–264.
Google Scholar
Klatt, D. (1973). Discrimination of fundamental frequency contours in synthetic speech: Implications for models of speech perception. Journal of American Society of Acoustics, 53: 8–16.
Google Scholar
Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11: 674–694.
Google Scholar
Mallat, S. and Hwang, W.L. (1992). Singularity detection and processing with wavelets. IEEE Transactions on Information Theory, 38: 617–643.
Google Scholar
Markel, J.D. (1972). The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20: 367–377.
Google Scholar
Markel, J.D and Gray, A.H. (1976). Linear Prediction of Speech. New York: Springer Verlag.
Google Scholar
Medan, Y., Yair, E., and Chazan, D. (1991). Super resolution pitch determination of speech signals. IEEE Transactions on Signal Processing, 39: 40–48.
Google Scholar
Murthy, P.S. and Yegnanarayana, B. (1999). Robustness of groupdelay-based method for extraction of significant instants of excitation from speech signals. IEEE Transactions on Speech and Audio Processing, 7: 609–620.
Google Scholar
Noll, A.M. (1967). Cepstrum pitch determination. Journal of American Society of Acoustics, 41: 293–309.
Google Scholar
Oppenheim, A.V. (1986). Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics, 16: 221–226.
Google Scholar
Qi, Y. and Hunt, B.R. (1993). Voiced-univoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Transactions on Speech and Audio Processing, 1: 250–254.
Google Scholar
Ross, M.J., Shaffer, H.L., Cohen, A., Freudberg, R., and Manle, H.J. (1974). Average magnitude difference function pitch extractor. IEEE Transactions on Audio, Speech and Signal Processing, 22: 353–361.
Google Scholar
Smits, R. and Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech. IEEE Transactions on Speech and Audio Processing, 3: 325–333.
Google Scholar
Wong, D.J., Markel, J.D., and Gray, A.H. (1979). Least squares glottal inverse filtering from the acoustic speech wave. IEEE Transactions on Audio, Speech and Signal Processing, 27: 350–355.
Google Scholar
Yegnanarayana, B. and Smits, R. (1995). A robust method for determining instants of major excitations in voiced speech. ICASSP'95 Proceedings, Detroit, MI, pp. 776–779.

Download references

Author information

Authors and Affiliations

Faculty of Electronics and Telecommunications, “Politehnica” University of Bucharest, Romania
Inge Gavat & Bogdan Sabac
Advanced Technology SRL, ADTECH, Bucharest, Romania
Matei Zirra

Authors

Inge Gavat
View author publications
You can also search for this author in PubMed Google Scholar
Matei Zirra
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Sabac
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gavat, I., Zirra, M. & Sabac, B. Pitch Estimation by Block and Instantaneous Methods. International Journal of Speech Technology 5, 269–279 (2002). https://doi.org/10.1023/A:1020201125377

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1020201125377

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch Estimation by Block and Instantaneous Methods

Abstract

Access this article

Similar content being viewed by others

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

Pitch segmentation of speech signals based on short-time energy waveform

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Pitch Estimation by Block and Instantaneous Methods

Abstract

Access this article

Similar content being viewed by others

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

Pitch segmentation of speech signals based on short-time energy waveform

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation