Text and Language-Independent Speaker Recognition Using Suprasegmental Features and Support Vector Machines

Bajpai, Anvita; Pathangay, Vinod

doi:10.1007/978-3-642-03547-0_29

Text and Language-Independent Speaker Recognition Using Suprasegmental Features and Support Vector Machines

Anvita Bajpai⁹ &
Vinod Pathangay¹⁰

Conference paper

1159 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 40))

Abstract

In this paper, presence of the speaker-specific suprasegmental information in the Linear Prediction (LP) residual signal is demonstrated. The LP residual signal is obtained after removing the predictable part of the speech signal. This information, if added to existing speaker recognition systems based on segmental and subsegmental features, can result in better performing combined system. The speaker-specific suprasegmental information can not only be perceived by listening to the residual, but can also be seen in the form of excitation peaks in the residual waveform. However, the challenge lies in capturing this information from the residual signal. Higher order correlations among samples of the residual are not known to be captured using standard signal processing and statistical techniques. The Hilbert envelope of residual is shown to further enhance the excitation peaks present in the residual signal. A speaker-specific pattern is also observed in the autocorrelation sequence of the Hilbert envelope, and further in the statistics of this autocorrelation sequence. This indicates the presence of the speaker-specific suprasegmental information in the residual signal. In this work, no distinction between voiced and unvoiced sounds is done for extracting these features. Support Vector Machine (SVM) is used to classify the patterns in the variance of the autocorrelation sequence for the speaker recognition task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Furui, S.: Speaker-independent and speakeradaptive recognition techniques. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech signal processing, pp. 597–622. Marcel Dekker (1991)
Google Scholar
Makhoul, J.: Linear Prediction: A Tutorial Review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Yegnanarayana, B., Prasanna, S.R.M., Rao, K.S.: Speech Enhancement using Excitation Source Information. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, Orlando, FL, USA (May 2002)
Google Scholar
Ananthapadmanabha, T.V., Yegnanarayana, B.: Epoch Extraction from Linear Prediction Residual for Identification of Closed Glottis Interval. IEEE Trans. Acoust., Speech, Signal Processing ASSP-27(4), 309–319 (1979)
Article Google Scholar
Yegnanarayana, B., Prasanna, S.R.M., Zachariah, J.M., Gupta, C.S.: Combining Evidence from Source, Suprasegmental and Spectral Features for a Fixed-Text Speaker Verification System. IEEE Trans. Speech and Audio Processing 13(4) (July 2005)
Google Scholar
Campbell, J.P.: Speaker recognition: A tutorial. Proc. IEEE 85(9), 1436–1462 (1997)
Article Google Scholar
Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 4, 430–451 (2004)
Article Google Scholar
Yegnanarayana, B., Reddy, K.S., Kishore, S.P.: Source and System Features for Speaker Recognition using AANN Models. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, Saltlake City, Utah, USA (May 2001)
Google Scholar
Prasanna, S.R.M., Gupta, C.S., Yegnanarayana, B.: Autoassociative Neural Network Models for Speaker Verification using Source Features. In: Proc. Int. Conf. Cognitive and Neural Systems, Boston, USA (May 2002)
Google Scholar
Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Amer. 35, 354–358 (1963)
Article Google Scholar
Li, K.P., et al.: Experimental studies in speaker verification using a adaptive system. J. Acoust. Soc. Amer. 40, 966–978 (1966)
Article Google Scholar
Doddington, G.: A method of speaker verification. J. Acoust. Soc. Amer. 49, 139 (A) (1971)
Article Google Scholar
Li, K.P., Hughes, G.W.: Talker differences as they appear in correlation matrices of continuous speech spectra. J. Acoust. Soc. Amer. 55(4), 833–837 (1974)
Article CAS Google Scholar
Beek, B., et al.: An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Processing 25, 310–322 (1977)
Article Google Scholar
Sambur, M.R.: Speaker recognition using orthogonal linear prediction. IEEE Trans. Acoust., Speech, Signal Processing 24, 283–289 (1976)
Article Google Scholar
Furui, S., Itakura, F., Satio, S.: Talker recognition by long-time averaged speech spectrum. Electron Commun., Jap. 55-A, 54–61 (1972)
Google Scholar
Soong, F.K., Rosenberg, A.E., Rabiner, L.R., Juang, B.H.: A vector quantization approach to speaker recognition. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 387–390 (1985)
Google Scholar
Rosenberg, A.E., Soong, F.K.: Evaluation of a vector quantization talker recognition system in a text independent and text dependent modes. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 873–876 (1986)
Google Scholar
Poritz, A.B.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 1291–1294 (1982)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using gaussian mixture models. Speech Comm. 17, 91–108 (1995)
Article Google Scholar
Higgins, A.L., Bahler, L., Porter, J.: Voice identification using nonparametric density matching. In: Lee, C.H., Soong, F.K., Paliwal, K.K. (eds.) Automatic Speech and Speaker Recognition, pp. 211–232. Kluwer Academic, Boston (1996)
Chapter Google Scholar
Doddington, G.R.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech, pp. 2521–2524 (2001)
Google Scholar
Prasanna, S.R.M., Gupta, C.S., Yegnanarayana, B.: Source Information from Linear Prediction Residual for Speaker Recognition. Communicated to J. Acoust. Soc. Amer. (2002)
Google Scholar
Collobert, R., Bengio, S.: Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan College Publishing Company, New York (1994)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

DeciDyn Systems, Bangalore, India
Anvita Bajpai
Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, India
Vinod Pathangay

Authors

Anvita Bajpai
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Pathangay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Sciences, University of Florida, Gainesville, FL, 32611, USA
Sanjay Ranka
Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, USA
Srinivas Aluru
Grid Computing and Distributed Systems Laboratory and, NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia
Rajkumar Buyya
Department of Computer Science, National Tsing Hua University, Taiwan
Yeh-Ching Chung
Computer Science, College of Engineering and Science, Louisiana Tech University, Ruston, LA, 71272, USA
Sumeet Dua & Vir V. Phoha &
Department of Computer Sciences, Purdue University, W. Lafayette, IN, 47907, USA
Ananth Grama
Arizona State University, Tempe, AZ, 85281, USA
Sandeep K. S. Gupta
Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, 721 302, WB, India
Rajeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bajpai, A., Pathangay, V. (2009). Text and Language-Independent Speaker Recognition Using Suprasegmental Features and Support Vector Machines. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-03547-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03546-3
Online ISBN: 978-3-642-03547-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics