Abstract
This chapter presents applications of graph embedding to the problem of text-independent speaker recognition. Speaker recognition is a general term encompassing multiple applications. At the core is the problem of speaker comparison—given two speech recordings (utterances), produce a score which measures speaker similarity. Using speaker comparison, other applications can be implemented—speaker clustering (grouping similar speakers in a corpus), speaker verification (verifying a claim of identity), speaker identification (identifying a speaker out of a list of potential candidates), and speaker retrieval (finding matches to a query set).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Reynolds DA, Quatieri TF, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41
Campbell WM (2002) Generalized linear discriminant sequence kernels for speaker recognition. (IEEE) In: Proceedings of ICASSP, pp 161–164
Stolcke A, Ferrer L, Kajarekar S, Shriberg E, Venkataraman A (2005) MLLR transforms as features in speaker recognition. In: Proceedings of Interspeech, pp 2425–2428
Campbell WM, Campbell JP, Reynolds DA, Jones DA, Leek TR (2004) High-level speaker verification with support vector machines. Curran Associates, Inc. (IEEE) In: Proceedings of ICASSP, pp I–73–76
Shriberg E, Ferrer L, Venkataraman A, Kajarekar S (2004) SVM modeling of SNERF-grams for speaker recognition. Curran Associates, Inc. In: Proceedings of interspeech, pp 1409–1412
Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. (IEEE) In: Proceedings of Odyssey-04, the speaker and language recognition workshop, pp 57–62
Campbell WM, Sturim DE, Reynolds DA, Solomonoff A (2006) SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. (IEEE) In: Proceedings of ICASSP, pp I–97–I–100
You CH, Lee KA, Li H (2009) An SVM kernel with GMM-supervector based on the bhattacharyya distance for speaker recognition. IEEE Sign Process Lett 16(1):49–52
Campbell WM, Liu H (2001) Using feature transformation and selection with polynomial networks. In: Applications and science of computational intelligence IV, SPIE Aerosense
Kajarekar SS (2005) Four weightings and a fusion: A cepstral-SVM system for speaker recognition. (IEEE) In: Proceedings of ASRU
Solomonoff A, Campbell WM, Boardman I (2005) Advances in channel compensation for SVM speaker recognition. (IEEE) In: Proceedings of ICASSP
Kenny P, Dumouchel P (2004) Experiments in speaker verification using factor analysis likelihood ratios. (IEEE) In: Proceedings of Odyssey04, pp 219–226
Hatch AO, Kajarekar S, Stolcke A (2006) Within-class covariance normalization for SVM-based speaker recognition. (IEEE) In: Proceedings of the international conference on spoken-language processing, pp 1471–1474
Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. In: Transactions on Audio, Speech and Language Processing
Dehak N, Dehak R, Kenny P, Brummer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. Curran Associates, Inc. In: Proceedings of interspeech
Campbell WM, Karam ZN, Sturim DE (2009) Inner product discriminant functions. In: Advances in neural information processing systems 22. MIT, Cambridge, MA
Karam Z, Campbell WM (2010) Graph embedding for speaker recognition. Curran Associates, Inc. In: Proceedings of interspeech, pp 2742–2745
Karam Z, Campbell WM, Dehak N (2011) Graph relational features for speaker recognition and mining. In: IEEE statistical signal processing workshop, pp 525–528
Dehak N, Karam ZN, Reynolds DA, Dehak R, Campbell WM, Glass JR (2011) A channel-blind system for speaker verification. (IEEE) In: Proceedings of ICASSP
Campbell WM (2010) Weighted nuisance attribute projection. In: Proceedings of IEEE Odyssey
Campbell W, Karam Z (2010) Simple and efficient speaker comparison using approximate KL divergence. Curran Associates, Inc. In: Proceedings of interspeech
Campbell W, Karam Z (2009) Variability compensated support vector machines applied to speaker verification. Curran Associates, Inc. In: Proceedings of interspeech
McCree A, Sturim D, Reynolds D (2011) A new perspective on GMM subspace compensation based on PPCA and wiener filtering. In: Proceedings of interspeech, Florence, Italy, 2011
Belkin M, Niyogi P (2003) Using manifold stucture for partially labeled classification. (IEEE) In: Thrun S, Becker S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge, MA, pp 929–936
Zhu X (2007) Semi-supervised learning literature survey. Tech. Rep., University of Wisconsin, Madison
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numerische Math Springer Berlin / Heidelberg Signal, 1:269–271
Wan V, Campbell WM (2000) Support vector machines for verification and identification. In: Neural networks for signal processing X, proceedings of the 2000 IEEE signal processing workshop, pp 775–784
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. (AAAS), Science 290
Tenenbaum J (xxxx) Matlab package for a global geometric framework for nonlinear dimensionality reduction. http://isomap.stanford.edu/
Cox TF, Cox MAA (2000) Multidimensional scaling, 2nd edn. Chapman and Hall, London
The NIST year 2004 speaker recognition evaluation plan (2004) Accessed date on September 30, 2012. http://www.itl.nist.gov/iad/mig/tests/sre/2004/index.html
The NIST year 2006 speaker recognition evaluation plan (2005) Accessed date on September 30, 2012. http://www.itl.nist.gov/iad/mig/tests/sre/2006/index.html
The NIST year 2008 speaker recognition evaluation plan (2008) Accessed date on September 30, 2012. http://www.itl.nist.gov/iad/mig/te
Adar E (2006) Guess: A language and interface for graph exploration. (ACM) In: CHI
Battista D, Eades P, Tamassia R, Tollis IG (2002) Graph drawing: Algorithms for visualization of graphs. Prentice Hall, Englewood Cliffs
The NIST year 2010 speaker recognition evaluation plan (2010) Accessed date on September 30, 2012. http://www.itl.nist.gov/iad/mig/te
Bishop CM (2009) Pattern recognition and machine learning. Springer, Berlin
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Proceedings of eurospeech, pp 1895–1898
Linguistic Data Consortium (xxxx) Switchboard-2 corpora. http://www.ldc.upenn.edu
The NIST year 2005 speaker recognition evaluation plan (2005) Accessed date on September 30, 2012. http://www.itl.nist.gov/iad/mig/tests/sre/2005/index.html
White S, Smyth P (2003) Algorithms for estimating relative importance in networks. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. (ACM) In: Proceedings of the 12th international conference on information and knowledge management
Joachims T (2005) A support vector method for multivariate performance measures. ACM Press. Proceedings of the 22nd International Conference on Machine Learning 377–384
Singer E, Reynolds DA (2004) Analysis of multitarget detection for speaker and language recognition. In: Proceedings of Odyssey, pp 301–308
Szummer M, Jaakkola T (2001) Partially labeled classification with random walks. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT, Cambridge, MA
Lin F, Cohen WW, (2010) Semi-Supervised Classification of Network Data Using Very Few Labels, asonam, pp. 192–199, 2010 International Conference on Advances in Social Networks Analysis and Mining, ASONAM Publisher is Institute of Electrical and Electronics Engineers (IEEE)
Macskassy SA, Provost F (2007) Classification in networked data: A toolkit and a univariate case study. J Mach Learn Res 8:935–983
Acknowledgements
This work was sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Karam, Z.N., Campbell, W.M. (2013). Graph Embedding for Speaker Recognition. In: Fu, Y., Ma, Y. (eds) Graph Embedding for Pattern Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4457-2_10
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4457-2_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4456-5
Online ISBN: 978-1-4614-4457-2
eBook Packages: EngineeringEngineering (R0)