Abstract
This chapter reviews selected methods for pitch determination of speech and music signals. As both these signals are time variant we first define what is subsumed under the term pitch. Then we subdivide pitch determination algorithms (PDAs) into short-term analysis algorithms, which apply some spectral transform and derive pitch from a frequency or lag domain representation, and time-domain algorithms, which analyze the signal directly and apply structural analysis or determine individual periods from the first partial or compute the instant of glottal closure in speech. In the 1970s, when many of these algorithms were developed, the main application in speech technology was the vocoder, whereas nowadays prosody recognition in speech understanding systems and high-accuracy pitch period determination for speech synthesis corpora are emphasized. In musical acoustics, pitch determination is applied in melody recognition or automatic musical transcription, where we also have the problem that several pitches can exist simultaneously.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ACF:
-
autocorrelation function
- CH:
-
call home
- DFT:
-
discrete Fourier transform
- FFT:
-
fast Fourier transform
- FIR:
-
finite impulse response
- GCI:
-
glottal closure instant
- IP:
-
internet protocol
- LP:
-
linear prediction
- MAP:
-
maximum a posteriori
- PDA:
-
pitch determination algorithms
- PDF:
-
probability density function
- SNR:
-
signal-to-noise ratio
- SVD:
-
singular value decomposition
References
W.J. Hess: Pitch Determination of Speech Signals - Algorithms and Devices (Springer, Berlin, Heidelberg 1983)
R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. 34, 744-754 (1986)
E. Zwicker, W.J. Hess, E. Terhardt: Erkennung gesprochener Zahlworte mit Funktionsmodell und Rechenanlage, Kybernetik 3, 267-272 (1967), (in German)
E. Terhardt: Calculating virtual pitch, Hearing Res. 1, 155-182 (1979)
R. Plomp: Aspects of Tone Sensation (Academic, London 1976)
R. Meddis, L. OʼMard: A unitary model for pitch perception, J. Acoust. Soc. Am. 102, 1811-1820 (1997)
K.J. Kohler: 25 Years of Phonetica: Preface to the special issue on pitch analysis, Phonetica 39, 185-187 (1992)
W.J. Hess, H. Indefrey: Accurate time-domain pitch determination of speech signals by means of a laryngograph, Speech Commun. 6, 55-68 (1987)
W.J. Hess: Pitch and voicing determination. In: Advances in Speech Signal Processing, ed. by M.M. Sondhi, S. Furui (Dekker, New York 1992), p.3-48
A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41, 293-309 (1967)
L.R. Rabiner: On the use of autocorrelation analysis for pitch detection, IEEE Trans. Acoust. Speech Signal Process. 25, 24-33 (1977)
E. Terhardt, G. Stoll, M. Seewann: Algorithm for extraction of pitch and pitch salience from complex tonal signals, J. Acoust. Soc. Am. 71, 679-688 (1982)
M.S. Harris, N. Umeda: Difference limens for fundamental frequency contours in sentences, J. Acoust. Soc. Am. 81, 1139-1145 (1987)
J. ʼt Hart: Differential sensitivity to pitch distance, particularly in speech, J. Acoust. Soc. Am. 69, 811-822 (1981)
H. Duifhuis, L.F. Willems, R.J. Sluyter: Measurement of pitch in speech: an implementation of Goldsteinʼs theory of pitch perception, J. Acoust. Soc. Am. 71, 1568-1580 (1982)
D.J. Hermes: Measurement of pitch by subharmonic summation, J. Acoust. Soc. Am. 83, 257-264 (1988)
D. Talkin: A robust algorithm for pitch tracking (RAPT). In: Speech Coding and Synthesis, ed. by B. Kleijn, K. Paliwal (Elsevier, Amsterdam 1995), p.-495-518
P. Hedelin, D. Huber: Pitch period determination of aperiodic speech signals, Proc. IEEE ICASSP (1990) pp. 361-364
H. Hollien: On vocal registers, J. Phonet. 2, 225-243 (1974)
N.P. McKinney: Laryngeal Frequency Analysis for Linguistic Research (Univ. Michigan, Ann Arbor 1965), Res. Rept. No. 14
H. Fujisaki, K. Hirose, K. Shimizu: A new system for reliable pitch extraction of speech, Proc. IEEE ICASSP (1986), paper 34.16
M.M. Sondhi: New methods of pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 26, 262-266 (1968)
J.D. Markel: The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Acoust. Speech Signal Process. 20, 149-153 (1972)
V.N. Sobolev, S.P. Baronin: Investigation of the shift method for pitch determination, Elektrosvyaz 12, 30-36 (1968), in Russian
J.A. Moorer: The optimum comb method of pitch period analysis of continuous digitized speech, IEEE Trans. Acoust. Speech Signal Process. 22, 330-338 (1974)
T. Shimamura, H. Kobayashi: Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. Speech Audio Process. 9, 727-730 (2001)
A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111, 1917-1930 (2002)
K. Hirose, H. Fujisaki, S. Seto: A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag, Proc. IEEE ICASSP (1992) pp. 149-152
D.E. Terez: Robust pitch determination using nonlinear state-space embedding, Proc. IEEE ICASSP (2002)
C.M. Rader: Vector pitch detection, J. Acoust Soc. Am. 36(C), 1463 (1964)
L.A. Yaggi: Full Duplex Digital Vocoder (Texas Instruments, Dallas 1962), Scientific Report No.1, SP14-A62; DDC-AD-282986
Y. Medan, E. Yair, D. Chazan: Super resolution pitch determination of speech signals, IEEE Trans. Signal Process. 39, 40-48 (1991)
M.R. Weiss, R.P. Vogel, C.M. Harris: Implementation of a pitch-extractor of the double spectrum analysis type, J. Acoust. Soc. Am. 40, 657-662 (1966)
H. Indefrey, W.J. Hess, G. Seeser: Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain, Proc. IEEE ICASSP, Vol. 2 (1985), paper 11.12
P. Martin: Comparison of pitch detection by cepstrum and spectral comb analysis, Proc. IEEE ICASSP (1982) pp. 180-183
V.T. Sreenivas: Pitch estimation of aperiodic and noisy speech signals (Indian Institute of Technology, Bombay 1982), Diss., Department of Electrical Engineering, Indian Institute of Technology
M.R. Schroeder: Period histogram and product spectrum: new methods for fundamental-frequency measurement, J. Acoust. Soc. Am. 43, 819-834 (1968)
P. Martin: A logarithmic spectral comb method for fundamental frequency analysis, Proc. 11th Int. Congr. on Phonetic Sciences Tallinn (1987), paper 59.2
P. Martin: WinPitchPro - a tool for text to speech alignment and prosodic analysis, Proc. Speech Prosody 2004 (2004) pp. 545-548, http://www.isca-speech.org/archive/sp2004 and http://www.winpitch.com
J.C. Brown, M. Puckette: A high-resolution fundamental frequency determination based on phase changes of the Fourier transform, J. Acoust. Soc. Am. 94, 662-667 (1993)
J.C. Brown: Musical fundamental frequency tracking using a pattern recognition method, J. Acoust. Soc. Am. 92, 1394-1402 (1992)
F. Charpentier: Pitch detection using the short-term phase spectrum, Proc. IEEE ICASSP (1986) pp. 113-116
M. Lahat, R.J. Niederjohn, D.A. Krubsack: A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. Acoust. Speech Signal Process. 35, 741-750 (1987)
B. Doval, X. Rodet: Estimation of fundamental frequency of musical sound signals, Proc. IEEE ICASSP (1991) pp. 3657-3660
T. Abe, K. Kobayashi, S. Imai: Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, Proc. ICSLPʼ96 (1996) pp. 1277-1280, http://www.isca-speech.org/archive/icslp_1996
T. Nakatani, T. Irino: Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am. 116, 3690-3700 (2004)
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative study of several pitch detection algorithms, IEEE Trans. Acoust. Speech 24, 399-423 (1976)
L. Arévalo: Beiträge zur Schätzung der Frequenz gestörter Schwingungen kurzer Dauer und eine Anwendung auf die Analyse von Sprachsignalen (Ruhr-Universität, Bochum 1991), Diss. in German
A.M. Noll, A. Michael: Pitch determination of human speech by the harmonic product spectrum the harmonic sum spectrum and a maximum likelihood estimate, Symp. Comput. Process. Commun. 19, 779-797 (1970), ed. by the Microwave Inst., New York: Univ. of Brooklyn Press
D.H. Friedman: Pseudo-maximum-likelihood speech pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 25, 213-221 (1977)
R.J. McAulay, T.F. Quatieri: Pitch estimation and voicing detection based on a sinusoidal speech model, Proc. IEEE ICASSP (1990) pp. 249-252
A. Moreno, J.A.R. Fonollosa: Pitch determination of noisy speech using higher order statistics, Proc. IEEE ICASSP (1992) pp. 133-136
B.B. Wells: Voiced/Unvoiced decision based on the bispectrum, Proc. IEEE ICASSP (1985) pp. 1589-1592
J. Tabrikian, S. Dubnov, Y. Dickalov: Speech enhancement by harmonic modeling via MAP pitch tracking, Proc. IEEE ICASSP (2002) pp. 3316-3319
S. Godsill, M. Davy: Bayesian harmonic models for musical pitch estimation and analysis, Proc. IEEE ICASSP (2002) pp. 1769-1772
C.A. McGonegal, L.R. Rabiner, A.E. Rosenberg: A subjective evaluation of pitch detection methods using LPC synthesized speech, IEEE Trans. Acoust. Speech Signal Process. 25, 221-229 (1977)
C. Hamon, E. Moulines, F. Charpentier: A diphone synthesis system based on time-domain prosodic modifications of speech, Proc. IEEE ICASSP (1989) pp. 238-241
D.M. Howard: Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am. 86, 902-910 (1989)
I. Dologlou, G. Carayannis: Pitch detection based on zero-phase filtering, Speech Commun. 8, 309-318 (1989)
W.J. Hess: An algorithm for digital time-domain pitch period determination of speech signals and its application to detect F0 dynamics in VCV utterances, Proc. IEEE ICASSP (1976) pp. 322-325
T.V. Ananthapadmanabha, B. Yegnanarayana: Epoch extraction of voiced speech, IEEE Trans. Acoust. Speech Signal Process. 23, 562-569 (1975)
L.O. Dolanský: An instantaneous pitch-period indicator, J. Acoust. Soc. Am. 27, 67-72 (1955)
I.S. Howard, J.R. Walliker: The implementation of a portable real-time multilayer-perceptron speech fundamental period estimator, Proc. EUROSPEECH-89 (1989) pp. 206-209, http://www.isca-speech.org/archive/eurospeech_1989
W.J. Hess: A pitch-synchronous digital feature extraction system for phonemic recognition of speech, IEEE Trans. Acoust. Speech Signal Process. 24, 14-25 (1976)
A. Davis, S. Nordholm, R. Togneri: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold, IEEE Trans. Audio Speech Lang. Process. 14, 412-424 (2006)
L.J. Siegel, A.C. Bessey: Voiced/unvoiced/mixed excitation classification of speech, IEEE Trans. Acoust. Speech Signal Process. 30, 451-461 (1982)
S. Ahmadi, A.S. Spanias: Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech Audio Process. 7, 333-338 (1999)
B.M. Lobanov, M. Boris: Automatic discrimination of noisy and quasi periodic speech sounds by the phase plane method, Soviet Physics - Acoustics 16, 353-356 (1970) Original (in Russian) in Akusticheskiy Zhurnal 16, 425-428 (1970)
E. Fisher, J. Tabrikian, S. Dubnov: Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Trans. Audio Speech Lang. Process. 14, 502-510 (2006)
B.S. Atal, L.R. Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 24, 201-212 (1976)
O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Acoust. Speech Signal Process. 16, 68-72 (1968)
A.K. Krishnamurthy, D.G. Childers: Two-channel speech analysis, IEEE Trans. Acoust Speech Signal Process. 34, 730-743 (1986)
K.N. Stevens, D.N. Kalikow, T.R. Willemain: A miniature accelerometer for detecting glottal waveforms and nasalization, J. Speech Hear. Res. 18, 594-599 (1975)
V.R. Viswanathan, W.H. Russell: Subjective and objective evaluation of pitch extractors for LPC and harmonic-deviations vocoders (Bolt Beranek and Newman, Cambridge 1984), MA: Report No. 5726
A.J. Fourcin, E. Abberton: First applications of a new laryngograph, Med Biol Illust 21, 172-182 (1971)
D.G. Childers, M. Hahn, J.N. Larar: Silent and voiced/Unvoiced/Mixed excitation (four-way) classification of speech, IEEE Trans. Acoust. Speech Signal Process. 37, 1771-1774 (1989)
E. Mousset, W.A. Ainsworth, J.A.R. Fonollosa: A comparison of several recent methods of fundamental frequency and voicing decision estimation, Proc. ICSLPʼ96 (1996) pp. 1273-1276, http://www.isca-speech.org/archive/icslp_1996
D.A. Krubsack, R.J. Niederjohn: An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech, IEEE Trans. Signal Process. 39, 319-329 (1991)
Y. Xu, X. Sun: Maximum speed of pitch change and how it may relate to speech, J. Acoust. Soc. Am. 111, 1399-1413 (2002)
B.G. Secrest, G.R. Doddington: Postprocessing techniques for voice pitch trackers, Proc. IEEE ICASSP (1982) pp. 172-175
F. Plante, G.F. Meyer, W.A. Ainsworth: A pitch extraction reference database, Proc. EUROSPEECHʼ95 (1995) pp. 837-840, http://www.isca-speech.org/archive/eurospeech_1995
H. Kawahara, H. Katayose, A. de Cheveigné, R.D. Patterson: Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. EUROSPEECHʼ99 (1999) pp. 2781-2784, http://www.isca-speech.org/archive/eurospeech_1999
L.R. Rabiner, M.R. Sambur, C.E. Schmidt: Applications of nonlinear smoothing algorithm to speech processing, IEEE Trans. Acoust. Speech Signal Process. 23, 552-557 (1975)
P. Specker: A powerful postprocessing algorithm for time-domain pitch trackers, Proc. IEEE ICASSP (1984), paper 28B.2
F. Itakura: Minimum prediction residual applied to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 23, 67-72 (1975)
Y.R. Wang, I.J. Wong, T.C. Tsao: A statistical pitch detection algorithm, Proc. IEEE ICASSP (2002) pp. 357-360
Y. Sagisaka, N. Campbell, N. Higuchi (eds.): Computing prosody. Computational models for processing spontaneous speech (Springer, New York 1996)
P. Bagshaw: Automatic prosodic analysis for computer aided pronunciation teaching (Univ. of Edinburgh, Edinburgh 1993), PhD Thesis http://www.cstr.ed.ac.uk/projects/fda/Bagshaw_PhDThesis.pdf
R.J. Baken: Clinical Measurement of Speech and Voice (Taylor Francis, London 1987)
A. Askenfelt: Automatic notation of played music: The Visa project, Fontes Artis Musicae 26, 109-120 (1979)
Y.M. Cheng, D. OʼShaughnessy: Automatic and reliable estimation of glottal closure instant and period, IEEE Trans. Acoust. Speech Signal Process. 37, 1805-1815 (1989)
W.J. Hess: Determination of glottal excitation cycles in running speech, Phonetica 52, 196-204 (1995)
W.J. Hess: Pitch determination of acoustic signals - an old problem and new challenges, Proc. 18th Intern. Congress on Acoustics, Kyoto (2004), paper Tu2.H.1
B. Yegnanarayana, R. Smits: A robust method for determining instants of major excitations in voiced speech, Proc. IEEE ICASSP (1995) pp. 776-779
M. Brookes, P.A. Naylor, J. Gudnason: A quantitative assessment of group delay methods for identifying glottal closures in voiced speech, IEEE Trans. Audio Speech Language Process. 14, 456-466 (2006)
C.X. Ma, Y. Kamp, L.F. Willems: A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process. 2, 258-265 (1994)
L. Du, Z. Hou: Determination of the instants of glottal closure from speech wave using wavelet transform, http://www.icspat.com/papers/329mfi.pdf
K.E. Barner: Colored L-ℓ filters and their application in speech pitch detection, IEEE Trans. Signal Process. 48, 2601-2606 (2000)
J.L. Navarro-Mesa, I. Esquerra-Llucià : A time-frequency approach to epoch detection, Proc. EUROSPEECHʼ95 (1995) pp. 405-408, http://www.isca-speech.org/archive/eurospeech_1995
A. de Cheveigné: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am. 93, 3279-3290 (1993)
A.P. Klapuri: Signal processing methods for the automatic transcription of music (Tampere Univ. Technol., Tampere 2004), Ph.D. diss. http://www.cs.tut.fi/sgn/arg/klap/klap_phd.pdf
M. Goto: A predominant F0-estimation method for polyphonic musical audio signals, Proc. 18th Intern. Congress on Acoustics Kyoto (2004), paper Tu2.H.4
T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8, 708-716 (2000)
H. Kameoka, T. Nishimoto, S. Sagayama: Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds, Proc. IEEE ICASSP (2004), paper AE-P5.9
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, 1-38 (1977)
L. Yoo, I. Fujinaga: A comparative latency study of hardware and software pitch-trackers, Proc. 1999 Int. Computer Music Conf. (1999) pp. 36-40
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hess, W.J. (2008). Pitch and Voicing Determination of Speech with an Extension Toward Music Signals. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)