Modeling Speaker Behavior: A Comparison of Two Approaches

Lee, Jina; Marsella, Stacy

doi:10.1007/978-3-642-33197-8_17

Jina Lee²² &
Stacy Marsella²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7502))

Included in the following conference series:

International Conference on Intelligent Virtual Agents

3602 Accesses
16 Citations
1 Altmetric

Abstract

Virtual agents are autonomous software characters that support social interactions with human users. With the emergence of better graphical representation and control over the virtual agent’s embodiment, communication through nonverbal behaviors has become an active research area. Researchers have taken different approaches to author the behaviors of virtual agents. In this work, we present our machine learning-based approach to model nonverbal behaviors, in which we explore several different learning techniques (HMM, CRF, LDCRF) to predict speaker’s head nods and eyebrow movements. Quantitative measurements show that LDCRF yields the best learning rate for both head nod and eyebrow movements. An evaluation study was also conducted to compare the behaviors generated by the Machine Learning-based models described in this paper to a Literature-based model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bergmann, K., Kopp, S., Eyssel, F.: Individualized Gesturing Outperforms Average Gesturing – Evaluating Gesture Production in Virtual Humans. In: Allbeck, J., Badler, N., Bickmore, T., Pelachaud, C., Safonova, A. (eds.) IVA 2010. LNCS, vol. 6356, pp. 104–117. Springer, Heidelberg (2010)
Google Scholar
Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech and Language Processing 15(3), 1075–1086 (2007)
Article Google Scholar
Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus. Language Resources and Evaluation Journal 41(2), 181–190 (2007)
Article Google Scholar
Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: BEAT: the behavior expression animation toolkit. In: SIGGRAPH 2001: Proc. of the 28th Annual Conf. on Computer Graphics and Interactive Techniques, pp. 477–486 (2001)
Google Scholar
HCRF library (including CRF and LDCRF) (2012), http://sourceforge.net/projects/hcrf/
Hoffmann, L., Krämer, N.C., Lam-chi, A., Kopp, S.: Media Equation Revisited: Do Users Show Polite Reactions towards an Embodied Agent? In: Ruttkay, Z., Kipp, M., Nijholt, A., Vilhjálmsson, H.H. (eds.) IVA 2009. LNCS, vol. 5773, pp. 159–165. Springer, Heidelberg (2009)
Chapter Google Scholar
Kipp, M., Neff, M., Kipp, K.H., Albrecht, I.: Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 15–28. Springer, Heidelberg (2007)
Chapter Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of the Eighteenth Int. Conf. on Machine Learning, pp. 282–289 (2001)
Google Scholar
Lee, J., Marsella, S.C.: Nonverbal Behavior Generator for Embodied Conversational Agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)
Chapter Google Scholar
Lee, J., Marsella, S.: Learning a model of speaker head nods using gesture corpora. In: Proc. of the 8th Int. Joint Conf. on Autonomous Agents and Multiagent Systems (2009)
Google Scholar
Lee, J., Marsella, S.C.: Predicting speaker head nods and the effects of affective information. IEEE Transactions on Multimedia 12(6), 552–562 (2010)
Article Google Scholar
Lee, J., Neviarouskaya, A., Prendinger, H., Marsella, S.: Learning models of speaker head nods with affective information. In: Proc. of the 3rd Int. Conf. on Affective Computing and Intelligent Interaction (2009)
Google Scholar
McClave, E.Z.: Linguistic functions of head movements in the context of speech. Journal of Pragmatics 32, 855–878 (2000)
Article Google Scholar
Morency, L.-P., de Kok, I., Gratch, J.: Predicting Listener Backchannels: A Probabilistic Multimodal Approach. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 176–190. Springer, Heidelberg (2008)
Chapter Google Scholar
Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: IEEE Conf. on Computer Vision and Pattern Recognition (2007)
Google Scholar
Nass, C., Moon, Y., Carney, P.: Are People Polite to Computers? Responses to Computer-Based Interviewing Systems. Journal of Applied Social Psychology 29(5), 1093–1109 (1999)
Article Google Scholar
Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The measurement of meaning, p. 197. University of Illinois Press (1957)
Google Scholar
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count: LIWC 2001. Word Journal of the International Linguistic Association (2001)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Int. Conf. on Computer Vision, pp. 1808–1815 (2005)
Google Scholar
ICT Virtual Human Toolkit (2012), http://vhtoolkit.ict.usc.edu

Download references

Author information

Authors and Affiliations

Institute for Creative Technologies, University of Southern California, 12015 Waterfront Drive, Playa Vista, CA, 90094, USA
Jina Lee & Stacy Marsella

Authors

Jina Lee
View author publications
You can also search for this author in PubMed Google Scholar
Stacy Marsella
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer and Information Science, Seikei University, Musashino-shi, 180-8633, Tokyo, Japan
Yukiko Nakano
Department of Computer Science and Program for Technocultural Studies, University of California, 1 Shields Avenue, 95616, Davis, CA, U.S.A.
Michael Neff
Baskin School of Engineering, University of California Santa Cruz, 1156 N. High SOE-3, 95064, Santa Cruz, CA, USA
Ana Paiva & Marilyn Walker &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, J., Marsella, S. (2012). Modeling Speaker Behavior: A Comparison of Two Approaches. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds) Intelligent Virtual Agents. IVA 2012. Lecture Notes in Computer Science(), vol 7502. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33197-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-33197-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33196-1
Online ISBN: 978-3-642-33197-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics