Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings

Kolář, Jáchym; Shriberg, Elizabeth; Liu, Yang

doi:10.1007/11846406_79

Jáchym Kolář^21,22,
Elizabeth Shriberg^21,23 &
Yang Liu^21,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1053 Accesses
7 Citations

Abstract

We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer (ASR). We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that (1) information from pauses is important, including pause duration both at the boundary and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stolcke, A., Shriberg, E., Bates, R., Ostendorf, M., Hakkani, D., Plauche, M., Tur, G., Lu, Y.: Automatic Detection of Sentence Boundaries and Disfluencies Based on Recognized Words. In: Proc. ICSLP 1998, Sydney, pp. 2247–2250 (1998)
Google Scholar
Shriberg, E., Stolcke, A., Hakkani-Tur, D., Tur, G.: Prosody-based Automatic Segmentation of Speech into Sentences and Topics. Speech Communication 32(1-2), 127–154 (2000)
Article Google Scholar
Warnke, V., Kompe, R., Niemann, H., Nöth, E.: Integrated Dialog Act Segmentation and Classification Using Prosodic Features and Language Models. In: Proc. EUROSPEECH 1997, Rhodes, Greece, pp. 207–210 (1997)
Google Scholar
Huang, J., Zweig, G.: Maximum Entropy Model for Punctuation Annotation from Speech. In: Proc. ICSLP 2002, Denver, pp. 917–920 (2002)
Google Scholar
Kim, J.H., Woodland, P.: A Combined Punctuation Generation and Speech Recognition System and Its Performance Enhancement Using Prosody. Speech Communication 41(4), 563–577 (2003)
Article Google Scholar
Liu, Y., Stolcke, A., Harper, M., Shriberg, E.: Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech. In: Proc. EMNLP, Barcelona, Spain (2004)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Peskin, B., Harper, M.: The ICSI-SRI-UW Metadata Extraction System. In: ICSLP 2004, Jeju, Korea (2004)
Google Scholar
Kolář, J., Švec, J., Psutka, J.: Automatic Punctuation Annotation in Czech Broadcast News Speech. In: Proc. SPECOM 2004, St. Petersburg, Russia (2004)
Google Scholar
Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using Conditional Random Fields for Sentence Boundary Detection in Speech. In: Proc. ACL, Ann Arbor, pp. 451–458 (2005)
Google Scholar
Ang, J., Liu, Y., Shriberg, E.: Automatic Dialog Act Segmentation and Classification in Multiparty Meetings. In: Proc. IEEE ICASSP 2005, Philadelphia, pp. 1061–1064 (2005)
Google Scholar
Ji, G., Bilmes, J.: Dialog Act Tagging Using Graphical Models. In: Proc. IEEE ICASSP 2005, Philadelphia, pp. 33–36 (2005)
Google Scholar
Zimmermann, M., Stolcke, A., Shriberg, E.: Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings. In: Proc.: IEEE ICASSP 2006, Toulouse, France (2006)
Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: Proc. IEEE ICASSP 2003, Hong Kong, pp. 364–367 (2003)
Google Scholar
Dhillon, R., et al.: Meeting Recorder Project: Dialog Act Labeling Guide. ICSI Technical Report TR-04-02, International Computer Science Institute, Berkeley (2004)
Google Scholar
Shriberg, E., et al.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. SIGDIAL, Cambridge, MA, USA (2004)
Google Scholar
Zhu, Q., Stolcke, A., Chen, B., Morgan, N.: Using MLP Features in SRI’s Conversational Speech Recognition System. In: Proc. INTERSPEECH 2005, Lisboa, pp. 2141–2144 (2005)
Google Scholar
Buckow, J., Warnke, V., Huber, R., Batliner, A., Nöth, E., Niemann, H.: Fast and Robust Features for Prosodic Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 193–198. Springer, Heidelberg (1999)
Chapter Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Harper, M.: Using Machine Learning to Cope with Imbalanced Classes in Natural Speech: Evidence from Sentence Boundary and Disfluency Detection. In: Proc ICSLP 2004, Jeju, Korea (2004)
Google Scholar
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 39(2/3), 135–168 (2000)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, USA
Jáchym Kolář, Elizabeth Shriberg & Yang Liu
Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic
Jáchym Kolář
SRI International, Menlo Park, CA, USA
Elizabeth Shriberg
University of Texas at Dallas, TX, USA
Yang Liu

Authors

Jáchym Kolář
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Shriberg
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kolář, J., Shriberg, E., Liu, Y. (2006). Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_79

Download citation

DOI: https://doi.org/10.1007/11846406_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics