Abstract
In the paper, we present a framework of speech driven face animation system with expressions. It systematically addresses audio-visual data acquisition, expressive trajectory analysis and audio-visual mapping. Based on this framework, we learn the correlation between neutral facial deformation and expressive facial deformation with Gaussian Mixture Model (GMM). A hierarchical structure is proposed to map the acoustic parameters to lip FAPs. Then the synthesized neutral FAP streams will be extended with expressive variations according to the prosody of the input speech. The quantitative evaluation of the experimental result is encouraging and the synthesized face shows a realistic quality.
The work is supported by the National Natural Science Foundation of China (No. 60575032) and the 863 Program (No. 2006AA01Z138).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Massaro, D.W., Beskow, J., Cohen, M.M., Fry, C.L., Rodriguez, T.: Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks. In: Proceedings of AVSP’99, Santa Cruz, CA, pp. 133–138 (August 1999)
Ezzat, T., Poggio, T.: MikeTalk: A Talking Facial Display Based on Morphing Visemes. In: Proc. Computer Animation Conference, Philadelphia, USA (1998)
Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication 26, 105–115 (1998)
Hong, P., Wen, Z., Huang, T.S.: Real-time speech-driven face animation with expressions using neural networks. IEEE Trans on Neural Networks 13(4) (2002)
Brand, M.: Voice Puppetry. In: Pro of SIGGRAPH 1999. pp. 21–28
Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio, ACM SIGGRAPH (1997)
Cosatto, E., Potamianos, G., Graf, H.P.: Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo, ICME, vol. 2, pp. 619–622 (2000)
Yin, P., Tao, J.: Dynamic mapping method based speech driven face animation system. In: The First International Conference on Affective Computing and Intelligent Interaction (2005)
Tekalp, A.M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4, Signal Processing. Image Communication 15, 387–421 (2000)
Wang, J.-Q., Wong, K.-H., Pheng, P.-A., Meng, H.M., Wong, T.-T.: A real-time Cantonese text-to-audiovisual speech synthesizer, Acoustics, Speech, and Signal Processing. In: Proceedings (ICASSP 2004), vol. 1, pp. 653–656 (2004)
Arun, K.S., Huang, T.S., Blostein, S.D.: Least-square fitting of two 3-D point sets. IEEE Trans. Pattern Analysis and Machine Intelligence 9(5), 698–700 (1987)
Verma, A., Subramaniam, L.V., Rajput, N., Neti, C., Faruquie, T.A.: Animating Expressive Faces Across Languages. IEEE Trans on Multimedia 6(6) (2004)
Tao, J., Tan, T.: Emotional Chinese Talking Head Syste. In: Proc. of ACM 6th International Conference on Multimodal Interfaces (ICMI 2004), State College, PA (October 2004)
Gutierrez-Osuna, R., Kakumanu, P.K., Esposito, A., Garcia, O.N., Bojorquez, A., Castillo, J.L., Rudomin, I.: Speech-Driven Facial Animation with Realistic Dynamics. IEEE Trans. on Multimedia 7(1) (2005)
Huang, Y., Lin, S., Ding, X., Guo, B., Shum, H.-Y.: Real-time Lip Synchronization Based on Hidden Markov Models. ACCV (2002)
Li, Y., Yu, F., Xu, Y.-Q., Chang, E., Shum, H.-Y.: Speech-Driven Cartoon Animation with Emotions. In: Proceedings of the ninth ACM international conference on Multimedia (2001)
Rao, R., Chen, T.: Audio-to-Visual Conversion for Multimedia Communication. IEEE Transactions on Industrial Electronics 45(1), 15–22 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, P., Zhao, L., Huang, L., Tao, J. (2007). Expressive Face Animation Synthesis Based on Dynamic Mapping Method. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2007. Lecture Notes in Computer Science, vol 4738. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74889-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-74889-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74888-5
Online ISBN: 978-3-540-74889-2
eBook Packages: Computer ScienceComputer Science (R0)