Abstract
In this paper we propose liveness verification for enhancing the robustness of audio-visual biometric person authentication systems. Liveness verification ensures that biometric cues are acquired from a live person who is actually present at the time of capture for authenticating the identity. The proposed liveness checking technique based on cross-modal association models involves hybrid fusion of acoustic and visual speech correlation features, which measure the degree of synchrony between the lips and the voice extracted from speaking face video sequences. Performance evaluation in terms of DET (Detector Error Tradeoff) curves and EERs (Equal Error Rates) on publicly available audiovisual speech databases show a significant improvement in robustness of system against different types of simulated replay attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Battocchi, A., Pianesi, F.: DaFEx: Un Database di Espressioni Facciali Dinamiche. In: Proceedings of the SLI-GSCP Workshop, Padova, Italy (2004)
Chaudhari, U.V., Ramaswamy, G.N., Potamianos, G., Neti, C.: Information Fusion and Decision Cascading for Audio-Visual Speaker Recognition Based on Time- Varying Stream Reliability Prediction. In: IEEE International Conference on Multimedia Expo., Baltimore, USA, vol. III, pp. 9–12 (2003)
Chibelushi, C.C., Deravi, F., Mason, J.: A Review of Speech-Based Bimodal Recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)
Chetty, G., Wagner, M.: Robust face-voice based speaker identity verification using multilevel fusion. Image and Vision Computing 26(9), 1249–1260 (2008)
Fisher III, J.W., Darrell, T., Freeman, W.T., Viola, P.: Learning joint statistical models for audio-visual fusion and segregation. In: Advances in Neural Information Processing Systems (NIPS), pp. 772–778 (2000)
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. Issues in Visual and Audio-Visual Speech Processing (2004)
Goecke, R., Millar, J.B.: Statistical Analysis of the Relationship between Audio and Video Speech Parameters for Australian English. In: Schwartz, J.L., Berthommier, F., Cathiard, M.A., Sodoyer, D. (eds.) Proceedings of the ISCA Tutorial and Research Workshop on Auditory-Visual Speech Processing AVSP 2003, St. Jorioz, France, pp. 133–138 (2003)
Gurbuz, S., Tufekci, Z., Patterson, T., Gowdy, J.N.: Multi-Stream Product Modal Audio-Visual Integration Strategy for Robust Adaptive Speech Recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Orlando (2002)
Hershey, J., Movellan, J.: Using audio-visual synchrony to locate sounds. In: Proc. Advances in Neural Information Processing Systems (NIPS), pp. 813–819 (1999)
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation 16(12), 2639–2664 (2004)
Jain, A., Nandakumar, K., Ross, A.: Score Normalization in Multimodal Biometric Systems. Pattern Recognition (2005)
Jiang, J., Alwan, A., Keating, P.A., Auer Jr., E.T., Bernstein, L.E.: On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics. EURASIP Journal on Applied Signal Processing 11, 1174–1188 (2002)
Lai, P.L., Fyfe, C.: Canonical correlation analysis using artificial neural networks. In: Proc. European Symposium on Artificial Neural Networks, ESANN (1998)
Li, M., Li, D., Dimitrova, N., Sethi, I.K.: Audio-visual talking face detection. In: Proc. International Conference on Multimedia and Expo. (ICME), Baltimore, MD, pp. 473–476 (2003)
Li, D., Wei, G., Sethi, I.K., Dimitrova, N.: Person Identification in TV programs. Journal on Electronic Imaging 10(4), 930–938 (2001)
Liu, X., Liang, L., Zhaa, Y., Pi, X., Nefian, A.V.: Audio-Visual Continuous Speech Recognition using a Coupled Hidden Markov Model. In: Proc. International Conference on Spoken Language Processing (2002)
MacDonald, J., McGurk, H.: Visual influences on speech perception process. Perception and Psychophysics 24, 253–257 (1978)
Mana, N., Cosi, P., Tisato, G., Cavicchio, F., Magno, E., Pianesi, F.: An Italian Database of Emotional Speech and Facial Expressions. In: Proceedings of Workshop on Emotion: Corpora for Research on Emotion and Affect, in association with 5th International Conference on Language, Resources and Evaluation (LREC 2006), Genoa (2006)
Molholm, S., et al.: Multisensory Auditory-visual Interactions During Early Sensory Processing in Humans: a high-density electrical mapping study. Cognitive Brain Research 14, 115–128 (2002)
Movellan, J., Mineiro, P.: Bayesian robustification for audio visual fusion. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Proceedings of the Conference on Advances in Neural information Processing Systems, Denver, Colorado, United States, vol. 10, pp. 742–748. MIT Press, Cambridge (1997)
Nefian, V., Liang, L.H., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian Networks for Audio-visual Speech Recognition. EURASIP Journal on Applied Signal Processing, 1274–1288 (2002)
Pan, H., Liang, Z., Huang, T.: A New Approach to Integrate Audio and Visual Features of Speech. In: Proc. IEEE International Conference on Multimedia and Expo., pp. 1093–1096 (2000)
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. Issues in Visual and Audio-Visual Speech Processing (2004)
Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008), ISBN 978-3-639-02769-3
Tabachnick, B., Fidell, L.S.: Using multivariate statistics. Allyn and Bacon Press, Boston (1996)
Yehia, H.C., Kuratate, T., Vatikiotis-Bateson, E.: Using speech acoustics to drive facial motion. In: Proc. the 14th International Congress of Phonetic Sciences, San Francisco, Calif, USA, pp. 631–634 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chetty, G. (2010). Robust Audio Visual Biometric Person Authentication with Liveness Verification. In: Sencar, H.T., Velastin, S., Nikolaidis, N., Lian, S. (eds) Intelligent Multimedia Analysis for Security Applications. Studies in Computational Intelligence, vol 282. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11756-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-11756-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11754-1
Online ISBN: 978-3-642-11756-5
eBook Packages: EngineeringEngineering (R0)