Pitch and Voicing Determination of Speech with an Extension Toward Music Signals

Hess, Wolfgang J.

doi:10.1007/978-3-540-49127-9_10

Wolfgang J. Hess Prof.⁴

Part of the book series: Springer Handbooks ((SHB))

8474 Accesses
7 Citations

Abstract

This chapter reviews selected methods for pitch determination of speech and music signals. As both these signals are time variant we first define what is subsumed under the term pitch. Then we subdivide pitch determination algorithms (PDAs) into short-term analysis algorithms, which apply some spectral transform and derive pitch from a frequency or lag domain representation, and time-domain algorithms, which analyze the signal directly and apply structural analysis or determine individual periods from the first partial or compute the instant of glottal closure in speech. In the 1970s, when many of these algorithms were developed, the main application in speech technology was the vocoder, whereas nowadays prosody recognition in speech understanding systems and high-accuracy pitch period determination for speech synthesis corpora are emphasized. In musical acoustics, pitch determination is applied in melody recognition or automatic musical transcription, where we also have the problem that several pitches can exist simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACF:: autocorrelation function
CH:: call home
DFT:: discrete Fourier transform
FFT:: fast Fourier transform
FIR:: finite impulse response
GCI:: glottal closure instant
IP:: internet protocol
LP:: linear prediction
MAP:: maximum a posteriori
PDA:: pitch determination algorithms
PDF:: probability density function
SNR:: signal-to-noise ratio
SVD:: singular value decomposition

References

W.J. Hess: Pitch Determination of Speech Signals - Algorithms and Devices (Springer, Berlin, Heidelberg 1983)
Book Google Scholar
R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. 34, 744-754 (1986)
Article Google Scholar
E. Zwicker, W.J. Hess, E. Terhardt: Erkennung gesprochener Zahlworte mit Funktionsmodell und Rechenanlage, Kybernetik 3, 267-272 (1967), (in German)
Article Google Scholar
E. Terhardt: Calculating virtual pitch, Hearing Res. 1, 155-182 (1979)
Article Google Scholar
R. Plomp: Aspects of Tone Sensation (Academic, London 1976)
Google Scholar
R. Meddis, L. OʼMard: A unitary model for pitch perception, J. Acoust. Soc. Am. 102, 1811-1820 (1997)
Article Google Scholar
K.J. Kohler: 25 Years of Phonetica: Preface to the special issue on pitch analysis, Phonetica 39, 185-187 (1992)
Google Scholar
W.J. Hess, H. Indefrey: Accurate time-domain pitch determination of speech signals by means of a laryngograph, Speech Commun. 6, 55-68 (1987)
Article Google Scholar
W.J. Hess: Pitch and voicing determination. In: Advances in Speech Signal Processing, ed. by M.M. Sondhi, S. Furui (Dekker, New York 1992), p.3-48
Google Scholar
A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41, 293-309 (1967)
Article Google Scholar
L.R. Rabiner: On the use of autocorrelation analysis for pitch detection, IEEE Trans. Acoust. Speech Signal Process. 25, 24-33 (1977)
Article Google Scholar
E. Terhardt, G. Stoll, M. Seewann: Algorithm for extraction of pitch and pitch salience from complex tonal signals, J. Acoust. Soc. Am. 71, 679-688 (1982)
Article Google Scholar
M.S. Harris, N. Umeda: Difference limens for fundamental frequency contours in sentences, J. Acoust. Soc. Am. 81, 1139-1145 (1987)
Article Google Scholar
J. ʼt Hart: Differential sensitivity to pitch distance, particularly in speech, J. Acoust. Soc. Am. 69, 811-822 (1981)
Article Google Scholar
H. Duifhuis, L.F. Willems, R.J. Sluyter: Measurement of pitch in speech: an implementation of Goldsteinʼs theory of pitch perception, J. Acoust. Soc. Am. 71, 1568-1580 (1982)
Article Google Scholar
D.J. Hermes: Measurement of pitch by subharmonic summation, J. Acoust. Soc. Am. 83, 257-264 (1988)
Article Google Scholar
D. Talkin: A robust algorithm for pitch tracking (RAPT). In: Speech Coding and Synthesis, ed. by B. Kleijn, K. Paliwal (Elsevier, Amsterdam 1995), p.-495-518
Google Scholar
P. Hedelin, D. Huber: Pitch period determination of aperiodic speech signals, Proc. IEEE ICASSP (1990) pp. 361-364
Google Scholar
H. Hollien: On vocal registers, J. Phonet. 2, 225-243 (1974)
Google Scholar
N.P. McKinney: Laryngeal Frequency Analysis for Linguistic Research (Univ. Michigan, Ann Arbor 1965), Res. Rept. No. 14
Google Scholar
H. Fujisaki, K. Hirose, K. Shimizu: A new system for reliable pitch extraction of speech, Proc. IEEE ICASSP (1986), paper 34.16
Google Scholar
M.M. Sondhi: New methods of pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 26, 262-266 (1968)
Google Scholar
J.D. Markel: The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Acoust. Speech Signal Process. 20, 149-153 (1972)
Google Scholar
V.N. Sobolev, S.P. Baronin: Investigation of the shift method for pitch determination, Elektrosvyaz 12, 30-36 (1968), in Russian
Google Scholar
J.A. Moorer: The optimum comb method of pitch period analysis of continuous digitized speech, IEEE Trans. Acoust. Speech Signal Process. 22, 330-338 (1974)
Article Google Scholar
T. Shimamura, H. Kobayashi: Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. Speech Audio Process. 9, 727-730 (2001)
Article Google Scholar
A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111, 1917-1930 (2002)
Article Google Scholar
K. Hirose, H. Fujisaki, S. Seto: A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag, Proc. IEEE ICASSP (1992) pp. 149-152
Google Scholar
D.E. Terez: Robust pitch determination using nonlinear state-space embedding, Proc. IEEE ICASSP (2002)
Google Scholar
C.M. Rader: Vector pitch detection, J. Acoust Soc. Am. 36(C), 1463 (1964)
Google Scholar
L.A. Yaggi: Full Duplex Digital Vocoder (Texas Instruments, Dallas 1962), Scientific Report No.1, SP14-A62; DDC-AD-282986
Google Scholar
Y. Medan, E. Yair, D. Chazan: Super resolution pitch determination of speech signals, IEEE Trans. Signal Process. 39, 40-48 (1991)
Article Google Scholar
M.R. Weiss, R.P. Vogel, C.M. Harris: Implementation of a pitch-extractor of the double spectrum analysis type, J. Acoust. Soc. Am. 40, 657-662 (1966)
Article Google Scholar
H. Indefrey, W.J. Hess, G. Seeser: Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain, Proc. IEEE ICASSP, Vol. 2 (1985), paper 11.12
Google Scholar
P. Martin: Comparison of pitch detection by cepstrum and spectral comb analysis, Proc. IEEE ICASSP (1982) pp. 180-183
Google Scholar
V.T. Sreenivas: Pitch estimation of aperiodic and noisy speech signals (Indian Institute of Technology, Bombay 1982), Diss., Department of Electrical Engineering, Indian Institute of Technology
Google Scholar
M.R. Schroeder: Period histogram and product spectrum: new methods for fundamental-frequency measurement, J. Acoust. Soc. Am. 43, 819-834 (1968)
Google Scholar
P. Martin: A logarithmic spectral comb method for fundamental frequency analysis, Proc. 11th Int. Congr. on Phonetic Sciences Tallinn (1987), paper 59.2
Google Scholar
P. Martin: WinPitchPro - a tool for text to speech alignment and prosodic analysis, Proc. Speech Prosody 2004 (2004) pp. 545-548, http://www.isca-speech.org/archive/sp2004 and http://www.winpitch.com
Google Scholar
J.C. Brown, M. Puckette: A high-resolution fundamental frequency determination based on phase changes of the Fourier transform, J. Acoust. Soc. Am. 94, 662-667 (1993)
Article Google Scholar
J.C. Brown: Musical fundamental frequency tracking using a pattern recognition method, J. Acoust. Soc. Am. 92, 1394-1402 (1992)
Article Google Scholar
F. Charpentier: Pitch detection using the short-term phase spectrum, Proc. IEEE ICASSP (1986) pp. 113-116
Google Scholar
M. Lahat, R.J. Niederjohn, D.A. Krubsack: A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. Acoust. Speech Signal Process. 35, 741-750 (1987)
Article Google Scholar
B. Doval, X. Rodet: Estimation of fundamental frequency of musical sound signals, Proc. IEEE ICASSP (1991) pp. 3657-3660
Google Scholar
T. Abe, K. Kobayashi, S. Imai: Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, Proc. ICSLPʼ96 (1996) pp. 1277-1280, http://www.isca-speech.org/archive/icslp_1996
Google Scholar
T. Nakatani, T. Irino: Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am. 116, 3690-3700 (2004)
Article Google Scholar
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative study of several pitch detection algorithms, IEEE Trans. Acoust. Speech 24, 399-423 (1976)
Article Google Scholar
L. Arévalo: Beiträge zur Schätzung der Frequenz gestörter Schwingungen kurzer Dauer und eine Anwendung auf die Analyse von Sprachsignalen (Ruhr-Universität, Bochum 1991), Diss. in German
Google Scholar
A.M. Noll, A. Michael: Pitch determination of human speech by the harmonic product spectrum the harmonic sum spectrum and a maximum likelihood estimate, Symp. Comput. Process. Commun. 19, 779-797 (1970), ed. by the Microwave Inst., New York: Univ. of Brooklyn Press
Google Scholar
D.H. Friedman: Pseudo-maximum-likelihood speech pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 25, 213-221 (1977)
Article Google Scholar
R.J. McAulay, T.F. Quatieri: Pitch estimation and voicing detection based on a sinusoidal speech model, Proc. IEEE ICASSP (1990) pp. 249-252
Google Scholar
A. Moreno, J.A.R. Fonollosa: Pitch determination of noisy speech using higher order statistics, Proc. IEEE ICASSP (1992) pp. 133-136
Google Scholar
B.B. Wells: Voiced/Unvoiced decision based on the bispectrum, Proc. IEEE ICASSP (1985) pp. 1589-1592
Google Scholar
J. Tabrikian, S. Dubnov, Y. Dickalov: Speech enhancement by harmonic modeling via MAP pitch tracking, Proc. IEEE ICASSP (2002) pp. 3316-3319
Google Scholar
S. Godsill, M. Davy: Bayesian harmonic models for musical pitch estimation and analysis, Proc. IEEE ICASSP (2002) pp. 1769-1772
Google Scholar
C.A. McGonegal, L.R. Rabiner, A.E. Rosenberg: A subjective evaluation of pitch detection methods using LPC synthesized speech, IEEE Trans. Acoust. Speech Signal Process. 25, 221-229 (1977)
Article Google Scholar
C. Hamon, E. Moulines, F. Charpentier: A diphone synthesis system based on time-domain prosodic modifications of speech, Proc. IEEE ICASSP (1989) pp. 238-241
Google Scholar
D.M. Howard: Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am. 86, 902-910 (1989)
Article Google Scholar
I. Dologlou, G. Carayannis: Pitch detection based on zero-phase filtering, Speech Commun. 8, 309-318 (1989)
Article Google Scholar
W.J. Hess: An algorithm for digital time-domain pitch period determination of speech signals and its application to detect F0 dynamics in VCV utterances, Proc. IEEE ICASSP (1976) pp. 322-325
Google Scholar
T.V. Ananthapadmanabha, B. Yegnanarayana: Epoch extraction of voiced speech, IEEE Trans. Acoust. Speech Signal Process. 23, 562-569 (1975)
Article Google Scholar
L.O. Dolanský: An instantaneous pitch-period indicator, J. Acoust. Soc. Am. 27, 67-72 (1955)
Article Google Scholar
I.S. Howard, J.R. Walliker: The implementation of a portable real-time multilayer-perceptron speech fundamental period estimator, Proc. EUROSPEECH-89 (1989) pp. 206-209, http://www.isca-speech.org/archive/eurospeech_1989
Google Scholar
W.J. Hess: A pitch-synchronous digital feature extraction system for phonemic recognition of speech, IEEE Trans. Acoust. Speech Signal Process. 24, 14-25 (1976)
Article Google Scholar
A. Davis, S. Nordholm, R. Togneri: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold, IEEE Trans. Audio Speech Lang. Process. 14, 412-424 (2006)
Article Google Scholar
L.J. Siegel, A.C. Bessey: Voiced/unvoiced/mixed excitation classification of speech, IEEE Trans. Acoust. Speech Signal Process. 30, 451-461 (1982)
Article Google Scholar
S. Ahmadi, A.S. Spanias: Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech Audio Process. 7, 333-338 (1999)
Article Google Scholar
B.M. Lobanov, M. Boris: Automatic discrimination of noisy and quasi periodic speech sounds by the phase plane method, Soviet Physics - Acoustics 16, 353-356 (1970) Original (in Russian) in Akusticheskiy Zhurnal 16, 425-428 (1970)
Google Scholar
E. Fisher, J. Tabrikian, S. Dubnov: Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Trans. Audio Speech Lang. Process. 14, 502-510 (2006)
Article Google Scholar
B.S. Atal, L.R. Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 24, 201-212 (1976)
Article Google Scholar
O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Acoust. Speech Signal Process. 16, 68-72 (1968)
Google Scholar
A.K. Krishnamurthy, D.G. Childers: Two-channel speech analysis, IEEE Trans. Acoust Speech Signal Process. 34, 730-743 (1986)
Article Google Scholar
K.N. Stevens, D.N. Kalikow, T.R. Willemain: A miniature accelerometer for detecting glottal waveforms and nasalization, J. Speech Hear. Res. 18, 594-599 (1975)
Article Google Scholar
V.R. Viswanathan, W.H. Russell: Subjective and objective evaluation of pitch extractors for LPC and harmonic-deviations vocoders (Bolt Beranek and Newman, Cambridge 1984), MA: Report No. 5726
Google Scholar
A.J. Fourcin, E. Abberton: First applications of a new laryngograph, Med Biol Illust 21, 172-182 (1971)
Google Scholar
D.G. Childers, M. Hahn, J.N. Larar: Silent and voiced/Unvoiced/Mixed excitation (four-way) classification of speech, IEEE Trans. Acoust. Speech Signal Process. 37, 1771-1774 (1989)
Article Google Scholar
E. Mousset, W.A. Ainsworth, J.A.R. Fonollosa: A comparison of several recent methods of fundamental frequency and voicing decision estimation, Proc. ICSLPʼ96 (1996) pp. 1273-1276, http://www.isca-speech.org/archive/icslp_1996
Google Scholar
D.A. Krubsack, R.J. Niederjohn: An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech, IEEE Trans. Signal Process. 39, 319-329 (1991)
Article Google Scholar
Y. Xu, X. Sun: Maximum speed of pitch change and how it may relate to speech, J. Acoust. Soc. Am. 111, 1399-1413 (2002)
Article Google Scholar
B.G. Secrest, G.R. Doddington: Postprocessing techniques for voice pitch trackers, Proc. IEEE ICASSP (1982) pp. 172-175
Google Scholar
F. Plante, G.F. Meyer, W.A. Ainsworth: A pitch extraction reference database, Proc. EUROSPEECHʼ95 (1995) pp. 837-840, http://www.isca-speech.org/archive/eurospeech_1995
Google Scholar
H. Kawahara, H. Katayose, A. de Cheveigné, R.D. Patterson: Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. EUROSPEECHʼ99 (1999) pp. 2781-2784, http://www.isca-speech.org/archive/eurospeech_1999
Google Scholar
L.R. Rabiner, M.R. Sambur, C.E. Schmidt: Applications of nonlinear smoothing algorithm to speech processing, IEEE Trans. Acoust. Speech Signal Process. 23, 552-557 (1975)
Article Google Scholar
P. Specker: A powerful postprocessing algorithm for time-domain pitch trackers, Proc. IEEE ICASSP (1984), paper 28B.2
Google Scholar
F. Itakura: Minimum prediction residual applied to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 23, 67-72 (1975)
Article Google Scholar
Y.R. Wang, I.J. Wong, T.C. Tsao: A statistical pitch detection algorithm, Proc. IEEE ICASSP (2002) pp. 357-360
Google Scholar
Y. Sagisaka, N. Campbell, N. Higuchi (eds.): Computing prosody. Computational models for processing spontaneous speech (Springer, New York 1996)
MATH Google Scholar
P. Bagshaw: Automatic prosodic analysis for computer aided pronunciation teaching (Univ. of Edinburgh, Edinburgh 1993), PhD Thesis http://www.cstr.ed.ac.uk/projects/fda/Bagshaw_PhDThesis.pdf
Google Scholar
R.J. Baken: Clinical Measurement of Speech and Voice (Taylor Francis, London 1987)
Google Scholar
A. Askenfelt: Automatic notation of played music: The Visa project, Fontes Artis Musicae 26, 109-120 (1979)
Google Scholar
Y.M. Cheng, D. OʼShaughnessy: Automatic and reliable estimation of glottal closure instant and period, IEEE Trans. Acoust. Speech Signal Process. 37, 1805-1815 (1989)
Article Google Scholar
W.J. Hess: Determination of glottal excitation cycles in running speech, Phonetica 52, 196-204 (1995)
Article Google Scholar
W.J. Hess: Pitch determination of acoustic signals - an old problem and new challenges, Proc. 18th Intern. Congress on Acoustics, Kyoto (2004), paper Tu2.H.1
Google Scholar
B. Yegnanarayana, R. Smits: A robust method for determining instants of major excitations in voiced speech, Proc. IEEE ICASSP (1995) pp. 776-779
Google Scholar
M. Brookes, P.A. Naylor, J. Gudnason: A quantitative assessment of group delay methods for identifying glottal closures in voiced speech, IEEE Trans. Audio Speech Language Process. 14, 456-466 (2006)
Article Google Scholar
C.X. Ma, Y. Kamp, L.F. Willems: A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process. 2, 258-265 (1994)
Article Google Scholar
L. Du, Z. Hou: Determination of the instants of glottal closure from speech wave using wavelet transform, http://www.icspat.com/papers/329mfi.pdf
K.E. Barner: Colored L-ℓ filters and their application in speech pitch detection, IEEE Trans. Signal Process. 48, 2601-2606 (2000)
Article Google Scholar
J.L. Navarro-Mesa, I. Esquerra-Llucià: A time-frequency approach to epoch detection, Proc. EUROSPEECHʼ95 (1995) pp. 405-408, http://www.isca-speech.org/archive/eurospeech_1995
Google Scholar
A. de Cheveigné: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am. 93, 3279-3290 (1993)
Article Google Scholar
A.P. Klapuri: Signal processing methods for the automatic transcription of music (Tampere Univ. Technol., Tampere 2004), Ph.D. diss. http://www.cs.tut.fi/sgn/arg/klap/klap_phd.pdf
Google Scholar
M. Goto: A predominant F0-estimation method for polyphonic musical audio signals, Proc. 18th Intern. Congress on Acoustics Kyoto (2004), paper Tu2.H.4
Google Scholar
T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8, 708-716 (2000)
Article Google Scholar
H. Kameoka, T. Nishimoto, S. Sagayama: Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds, Proc. IEEE ICASSP (2004), paper AE-P5.9
Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, 1-38 (1977)
MathSciNet MATH Google Scholar
L. Yoo, I. Fujinaga: A comparative latency study of hardware and software pitch-trackers, Proc. 1999 Int. Computer Music Conf. (1999) pp. 36-40
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Communication Sciences, Dept. of Communication, Language, and Speech, University of Bonn, Poppelsdorfer Allee 47, 53115, Bonn, Germany
Wolfgang J. Hess Prof.

Authors

Wolfgang J. Hess Prof.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang J. Hess Prof. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hess, W.J. (2008). Pitch and Voicing Determination of Speech with an Extension Toward Music Signals. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics