Skip to main content

Spoken Language Dialogue Models

  • Chapter
  • First Online:
Speech Technology

Abstract

Spoken language interactive systems range from speech-enabled command interfaces to dialogue systems which conduct spoken conversations with the user. In the first case, spoken language is used as an alternative input and output modality, so that the commands, which the user could type or select from the menu, may also be uttered. The system responses can also be given as spoken utterances, instead of written language or drawings on the screen, so the whole interaction can be conducted in speech. Spoken dialogue systems, however, are built on models concerning spoken conversations between participants so as to allow flexible interaction capabilities. Although interactions are limited concerning topics, turn-taking principles and conversational strategies, the systems aim at human–computer interaction that would support natural interaction which enables the user to interact with the system in an intuitive manner. Moreover, trying to combine insights of the processes that underlie typical human interactions, spoken dialogue modelling also seeks to advance our knowledge and understanding of the principles that govern communicative situations in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ai, H., Raux, A., Bohus, D., Eskenazi, M., Litman, D. (2007). Comparing spoken dialog corpora collected with recruited subjects versus real users. In: Proc. 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium.

    Google Scholar 

  2. Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A. (2000). An architecture for a generic dialog shell. Nat. Lang. Eng., 6 (3), 1–16.

    Google Scholar 

  3. Allen, J., Perrault, C.R. (1980). Analyzing intention in utterances. Artif. Intell., 15, 143–178.

    Google Scholar 

  4. Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W. Poesio, M., Traum, D. R. (1995). The TRAINS Project: A case study in building a conversational planning agent. J. Exp. Theor. AI, 7, 7–48. Also available as TRAINS Technical Note 94–3 and Technical Report 532, Computer Science Department, University of Rochester, September 1994.

    MATH  Google Scholar 

  5. Allwood, J. (1976). Linguistic Communication as Action and Cooperation. Department of Linguistics, University of Göteborg. Gothenburg Monographs in Linguistics, 2.

    Google Scholar 

  6. Allwood, J. (1977). A critical look at speech act theory. In: Dahl, Ö. (ed.) Logic, Pragmatics, and Grammar, Studentlitteratur, Lund.

    Google Scholar 

  7. Allwood, J. (1994). Obligations and options in dialogue. Think Q., 3, 9–18.

    Google Scholar 

  8. Allwood, J. Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In: Martin, J. C., Paggio, P., Kuenlein, P., Stiefelhagen, R., Pianesi F. (eds), Multimodal Corpora For Modelling Human Multimodal Behaviour. Int. J. Lang. Res. Eval. (Special Issue), 41 (3–4), 273–287.

    Google Scholar 

  9. Allwood, J., Traum, D., Jokinen, K. (2000). Cooperation, dialogue, and ethics. Int. J. Hum. Comput. Studies, 53, 871–914.

    MATH  Google Scholar 

  10. Anderson, A. H., Bader, M., Bard, E. G., Boyle, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H. S., Weinert, R. (1991). The HCRC map task corpus. Lang. Speech, 34 (4), 351–366.

    Google Scholar 

  11. Appelt, D. E. (1985). Planning English Sentences. Cambridge University Press, Cambridge.

    MATH  Google Scholar 

  12. Aust, H., Oerder, M., Seide, F., Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Commun., 17, 249–262.

    MATH  Google Scholar 

  13. Austin, J. L. (1962). How to do Things with Words. Clarendon Press, Oxford.

    Google Scholar 

  14. Axelrod, R. (1984). Evolution of Cooperation. Basic Books, New York.

    MATH  Google Scholar 

  15. Ballim, A., Wilks, Y. (1991). Artificial Believers. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  16. Black, W., Allwood, J., Bunt, H., Dols, F., Donzella, C., Ferrari, G., Gallagher, J., Haidan, R., Imlah, B., Jokinen, K., Lancel, J.-M., Nivre, J., Sabah, G., Wachtel, T. (1991). A pragmatics based language understanding system. In: Proc. ESPRIT Conf. Brussels, Belgium.

    Google Scholar 

  17. Bolt, R.A. (1980). Put-that-there: Voice and gesture at the graphic interface. Comput. Graphics, 14 (3), 262–270.

    Google Scholar 

  18. Bos, J., Klein, E., Oka T. (2003). Meaningful conversation with a mobile robot. In: Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL’03), Budapest, 71–74.

    Google Scholar 

  19. Brown, P., Levinson, S. C. (1999) [1987]. Politeness: Some universals in language usage. In: Jaworski, A., Coupland, N. (eds) The Discourse Reader. Routledge, London, 321–335.

    Google Scholar 

  20. Bunt, H. C. (1990). DIT – Dynamic interpretation in text and dialogue. In: Kálmán, L., Pólos, L. (eds) Papers from the Second Symposium on Language and Logic. Akademiai Kiadó, Budapest.

    Google Scholar 

  21. Bunt, H. C. (2000). Dynamic interpretation and dialogue theory. In: Taylor, M. M. Néel, F., Bouwhuis, D. G. (eds) The Structure of Multimodal Dialogue II., John Benjamins, Amsterdam, 139–166.

    Google Scholar 

  22. Bunt, H. C. (2005). A framework for dialogue act specification. In: Fourth Workshop on Multimodal Semantic Representation (ACL-SIGSEM and ISO TC37/SC4), Tilburg.

    Google Scholar 

  23. Carberry, S. (1990). Plan Recognition in Natural Language Dialogue. MIT Press, Cambridge, MA.

    Google Scholar 

  24. Carletta, J. (2006). Announcing the AMI Meeting Corpus. ELRA Newslett., 11 (1), 3–5.

    Google Scholar 

  25. Carletta, J., Dahlbäck, N., Reithinger, N., Walker, M. (eds) (1997). Standards for Dialogue Coding in Natural Language Processing. Dagstuhl-Seminar Report 167.

    Google Scholar 

  26. Carlson R. (1996). The dialog component in the Waxholm system. In: LuperFoy, S., Nijholt, A., Veldhuijzen van Zanten, G. (eds) Proc. Twente Workshop on Language Technology. Dialogue Management in Natural Language Systems (TWLT 11), Enschede, The Netherlands, 209–218.

    Google Scholar 

  27. Chin, D. (1989). KNOME: Modeling what the user knows in UC. In: Kobsa, A., Wahlster, W. (eds) User Modeling in Dialogue Systems. Springer-Verlag Berlin, Heidelberg, 74–107.

    Google Scholar 

  28. Chomsky, N. (1957). Syntactic Structures. Mouton, The Hague/Paris.

    MATH  Google Scholar 

  29. Chu-Carroll, J., Brown, M. K. (1998). An evidential model for tracking initiative in collaborative dialogue interactions. User Model. User-Adapted Interact., 8 (3–4), 215–253.

    Google Scholar 

  30. Chu-Carroll, J., Carpenter, B. (1999). Vector-based natural language call routing. Comput. Linguist., 25 (3), 256–262.

    Google Scholar 

  31. Clark, H. H., Schaefer, E. F. (1989). Contributing to discourse. Cogn. Sci., 13, 259–294.

    Google Scholar 

  32. Clark, H. H., Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39.

    Google Scholar 

  33. Cohen, P. R., Levesque, H. J. (1990a). Persistence, intention, and commitment. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press, Cambridge, MA, 33–69.

    Google Scholar 

  34. Cohen, P. R., Levesque, H. J. (1990b). Rational interaction as the basis for communication. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press, Cambridge, MA, 221–255.

    Google Scholar 

  35. Cohen, P. R., Levesque, H. J. (1991). Teamwork. Nous, 25 (4), 487–512.

    Google Scholar 

  36. Cohen, P. R., Morgan, J., Pollack, M. (eds) (1990). Intentions in Communication. MIT Press, Cambridge.

    Google Scholar 

  37. Cohen, P. R., Perrault, C. R. (1979). Elements of plan-based theory of speech acts. Cogn. Sci., 3, 177–212.

    Google Scholar 

  38. Cole, R. A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (eds) (1996). Survey of the State of the Art in Human Language Technology. Also available at http://www.cse.ogi.edu/CSLU/HLTSurvey/

  39. Core, M. G., Allen, J. F. (1997). Coding dialogs with the DAMSL annotation scheme. In: Working Notes of AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, MA. 

    Google Scholar 

  40. Danieli M., Gerbino E. (1995). Metrics for evaluating dialogue strategies in a spoken language system. In: Proc. AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, Stanford University, 34–39.

    Google Scholar 

  41. Dybkjaer, L., Bernsen, N. O., Dybkjaer, H. (1996). Evaluation of spoken dialogue systems. In: Proc. 11th Twente Workshop on Language Technology, Twente.

    Google Scholar 

  42. Erman, L. D., Hayes-Roth, F., Lesser, V. R., Reddy, D. R. (1980). The HEARSAY-II speech understanding system: Integrating knowledge to resolve uncertainty. Comput. Surv., 12 (2), 213–253.

    Google Scholar 

  43. Esposito, A., Campbell, N., Vogel, C., Hussain, A., and Nijholt, A. (Eds.). Development of Multimodal Interfaces: Active Listening and Synchrony. Springer Publishers.

    Google Scholar 

  44. Galliers, J. R. (1989). A theoretical framework for computer models of cooperative dialogue, acknowledging multi-agent conflict. Technical Report 17.2, Computer Laboratory, University of Cambridge.

    Google Scholar 

  45. Gmytrasiewicz, P. J., Durfee, E. H. (1993). Elements of utilitarian theory of knowledge and action. In: Proc. 12th Int. Joint Conf. on Artificial Intelligence, Chambry, France, 396–402.

    Google Scholar 

  46. Gmytrasiewicz, P. J., Durfee, E. H., Rosenschein, J. S. (1995). Towards rational communicative behavior. In: AAAI Fall Symp. on Embodied Language, AAAI Press, Cambridge, MA.

    Google Scholar 

  47. Goodwin, C. (1981). Conversational Organization: Interaction between Speakers and Hearers. Academic Press, New York.

    Google Scholar 

  48. Gorin, A. L., Riccardi, G., Wright, J. H. (1997). How may i help you? Speech Commun., 23 (1/2), 113–127.

    MATH  Google Scholar 

  49. Grice, H. P. (1975). Logic and conversation. In: Cole, P., Morgan, J. L. (eds) Syntax and Semantics. Vol 3: Speech Acts. Academic Press, New York, 41–58.

    Google Scholar 

  50. Grosz, B. J. (1977). The Representation and Use of Focus in Dialogue Understanding. SRI Stanford Research Institute, Stanford, CA.

    Google Scholar 

  51. Grosz, B. J., Hirschberg, J. (1992). Some international characteristics of discourse. Proceedings of the Second International Conference on Spoken Language Processing (ICSLP’92), Banff, Alberta, Canada, 1992, 429–432.

    Google Scholar 

  52. Grosz, B. J., Kraus, S. (1995). Collaborative plans for complex group action. Technical Report TR-20-95, Harvard University, Center for Research in Computing Technology.

    Google Scholar 

  53. Grosz, B. J., Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Comput. Linguist., 12 (3), 175–203.

    Google Scholar 

  54. Grosz, B. J., Sidner, C. L. (1990). Plans for discourse. In: Cohen, P. R., Morgan, J., Pollack, M. E. (eds) Intentions in Communication. The MIT Press. Cambridge, MA, 417–444.

    Google Scholar 

  55. Guinn, C. I. (1996). Mechanisms for mixed-initiative human-computer collaborative discourse. In: Proc. 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, USA, 278–285.

    Google Scholar 

  56. Hasida, K., Den, Y., Nagao, K., Kashioka, H., Sakai, K., Shimazu, A. (1995). Dialeague: A proposal of a context for evaluating natural language dialogue systems. In: Proc. 1st Annual Meeting of the Japanese Natural Language Processing Society, Tokyo, Japan, 309–312 (in Japanese).

    Google Scholar 

  57. Heeman, P. A., Allen, J. F. (1997). International boundaries, speech repairs, and discourse markers: Modelling spoken dialog. In: Proc. 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.

    Google Scholar 

  58. Hirasawa, J., Nakano, M., Kawabata, T., Aikawa, K. (1999). Effects of system barge-in responses on user impressions. In: Sixth Eur. Conf. on Speech Communication and Technology, Budapest, Hungary, 3, 1391–1394.

    Google Scholar 

  59. Hirschberg, J., Litman, D. (1993). Empirical studies on the disambiguation of cue phrases Comput. Linguist., 19 (3), 501–530.

    Google Scholar 

  60. Hirschberg, J., Nakatani, C. (1998). Acoustic indicators of topic segmentation. In: Proc. Int. Conf. on Spoken Language Processing, Sydney, Australia, 976–979.

    Google Scholar 

  61. Hobbs, J. (1979). Coherence and coreference. Cogn. Sci., 3 (1), 67–90.

    Google Scholar 

  62. Hovy, E. H. (1988). Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  63. Isard, A., McKelvie, D., Cappelli, B., Dybkjær, L., Evert, S., Fitschen, A., Heid, U., Kipp, M., Klein, M., Mengel, A., Møller, M. B., Reithinger, N. (1998). Specification of workbench architecture. MATE Deliverable D3.1.

    Google Scholar 

  64. Jekat, S., Klein, A., Maier, E., Maleck, I., Mast, M., Quantz, J. (1995). Dialogue acts in VERBMOBIL. Technical Report 65, BMBF Verbmobil Report.

    Google Scholar 

  65. Jokinen, K. (1996). Goal formulation based on communicative principles. In: Proc. 16th Int. Conf. on Computational Linguistics (COLING - 96) Copenhagen, Denmark, 598–603.

    Google Scholar 

  66. Jokinen, K. (2009). Constructive Dialogue Modelling – Speech Interaction and Rational Agents. John Wiley, Chichester.

    Google Scholar 

  67. Jokinen, K., Hurtig, T. (2006). User expectations and real experience on a multimodal interactive system. In: Proc. 9th Int. Conf. on Spoken Language Processing (Interspeech 2006 – ICSLP) Pittsburgh, US.

    Google Scholar 

  68. Jokinen, K., Hurtig, T., Hynnä, K., Kanto, K., Kerminen, A., Kaipainen, M. (2001). Self-organizing dialogue management. In: Isahara, H., Ma, Q. (eds) NLPRS2001 Proc. 2nd Workshop on Natural Language Processing and Neural Networks, Tokyo, Japan, 77–84.

    Google Scholar 

  69. Joshi, A., Webber, B. L., Weischedel, R. M. (1984). Preventing false inferences. In: Proc. 10th In Conf. on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, 1984, Stanford, California, USA, 34–138.

    Google Scholar 

  70. Jurafsky, D., Shriberg, E., Fox, B., Curl, T. (1998). Lexical, prosodic, and syntactic cues for dialog acts. In: ACL/COLING-98 Workshop on Discourse Relations and Discourse Markers. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Montreal, Quebec, Canada.

    Google Scholar 

  71. Kearns, M., Isbell, C., Singh, S., Litman, D., Howe, J. (2002). CobotDS: A spoken dialogue system for chat. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, Edmonton, Alberta.

    Google Scholar 

  72. Keizer, S., Akker, R. op den, Nijholt, A. (2002). Dialogue act recognition with Bayesian Network for Dutch dialogues. In: Jokien, K., McRoy, S. (eds.) Proc. 3rd SIGDial Workshop on Discourse and Dialogue, Philadelphia, US.

    Google Scholar 

  73. Kerminen, A., Jokinen, K. (2003). Distributed dialogue management. In: Jokinen, K., Gambäck, B., Black, W. J., Catizone, R., Wilks, Y. (eds.) Proc. EACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management. Budapest, Hungary.

    Google Scholar 

  74. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge. In: Proc. 13th Eur. Conf. on Artificial Intelligence (ECAI).

    Google Scholar 

  75. Kipp, M. (2001). Anvil – A generic annotation tool for multimodal dialogue. In: Proc. 7th Eur. Conf. on Speech Communication and Technology, (Eurospeech), Aalborg, Denmark, 1367–1370.

    Google Scholar 

  76. Koeller, A., Kruijff, G.-J. (2004). Talking robots with LEGO mindstorms. In: Proc. 20th COLING, Geneva.

    Google Scholar 

  77. Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y. (1998). An analysis of turn taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. Lang. Speech, 41 (3–4), 295–321.

    Google Scholar 

  78. Krahmer, E., Swerts, M., Theune, M., Weegels, M. (1999). Problem spotting in human-machine interaction. In: Proc. Eurospeech ‘99, Budapest, Hungary, 3, 1423–1426.

    MATH  Google Scholar 

  79. Lemon, O., Bracy, A., Gruenstein, A., Peters, S. (2001). The WITAS multi-modal dialogue system I. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark.

    Google Scholar 

  80. Lendvai, P., Bosch, A. van den, Krahmer, E. (2003). Machine learning for shallow interpretation of user utterances in spoken dialogue systems. In: Jokinen, K., Gambäck B., Black, W. J., Catizone, R., Wilks, Y. (eds) Proc. ACL Workshop on Dialogue Systems: Interaction, Adaptation and Styles of Management, Budapest, Hungary, 69–78.

    Google Scholar 

  81. Lesh, N., Rich, C., Sidner, C. L. (1998). Using plan recognition in human-computer collaboration. MERL Technical Report.

    Google Scholar 

  82. Levesque, H. J., Cohen, P. R., Nunes, J. H. T. (1990). On acting together. In: Proc. AAAI-90, 94–99. Boston, MA.

    Google Scholar 

  83. Levin, E., Pieraccini, R. (1997). A stochastic model of computer-human interaction for learning dialogue strategies. In: Proc. Eurospeech, 1883–1886, Rhodes, Greece.

    Google Scholar 

  84. Levin, E., Pieraccini, R., Eckert, W. (2000). A stochastic model of human-machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process., 8, 1.

    Google Scholar 

  85. Levinson, S. (1983). Pragmatics. Cambridge University Press, Cambridge.

    Google Scholar 

  86. Litman, D. J., Allen, J. (1987). A plan recognition model for subdialogues in conversation. Cogn. Sci., 11(2), 163–200.

    Google Scholar 

  87. Litman, D., Kearns, M., Singh, S., Walker, M. (2000). Automatic optimization of dialogue management. In: Proc. 18th Int. Conf. on Computational Linguistics (COLING 2000) Saarbrcken, Germany, 502–508.

    Google Scholar 

  88. Lopez Cozar, R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems. Wiley, New York, NY.

    Google Scholar 

  89. Majaranta, P., Räihä, K. (2002). Twenty years of eye typing: Systems and design issues. In: Proc. 2002 Symp. on Eye Tracking Research & Applications (ETRA '02), ACM, New York, 15–22.

    Google Scholar 

  90. Martin, D., Cheyer, A., Moran, D. (1998). Building distributed software systems with the Open Agent Architecture. In: Proc. 3rd Int. Conf. on the Practical Application of Intelligent Agents and Multi-Agent Technology, Blackpool, UK. The Practical Application Company, Ltd.

    Google Scholar 

  91. McCoy, K. F. (1988). Reasoning on a highlighted user model to respond to misconceptions. Comput. Linguist., 14 (3), 52–63.

    Google Scholar 

  92. McGlashan, S., Fraser, N. M, Gilbert, N., Bilange, E., Heisterkamp, P., Youd, N. J. (1992). Dialogue management for telephone information services. In: Proc. Int. Conf. on Applied Language Processing, Trento, Italy.

    Google Scholar 

  93. McRoy, S. W., Hirst, G. (1995). The repair of speech act misunderstandings by abductive inference. Comput. Linguist., 21 (4), 435–478.

    Google Scholar 

  94. McTear, M. (2004). Spoken Dialogue Technology: Toward the Conversational User Interface. Springer Verlag, London.

    Google Scholar 

  95. Miikkulainen, R. (1993). Sub-symbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. MIT Press, Cambridge.

    Google Scholar 

  96. Minsky, M. (1974). A Framework for Representing Knowledge. AI Memo 306. M.I.T. Artificial Intelligence Laboratory, Cambridge, MA.

    Google Scholar 

  97. Moore, J. D., Swartout, W. R. (1989). A reactive approach to explanation. In: Proc. 11th Int. Joint Conf. on Artificial Intelligence (IJCAI), Detroit, MI, 20–25.

    Google Scholar 

  98. Motooka, T., Kitsuregawa, M., Moto-Oka, T., Apps, F. D. R. (1985). The Fifth Generation Computer: The Japanese Challenge. Wiley, New York, NY.

    Google Scholar 

  99. Möller, S. (2002). A new taxonomy for the quality of telephone services based on spoken dialogue systems. In: Jokinen, K., McRoy, S. (eds) Proc. 3rd SIGdial Workshop on Discourse and Dialogue, Philadelphia, PA, 142–153.

    Google Scholar 

  100. Nagata, M., Morimoto, T. (1994). First steps towards statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Commun., 15 (3–4), 193–203.

    Google Scholar 

  101. Nakano, M., Miyazaki, N., Hirasawa, J., Dohsaka, K., Kawabata, T. (1999). Understanding unsegmented user utterances in real-time spoken dialogue systems. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Maryland, USA, 200–207.

    Google Scholar 

  102. Nakano, M., Miyazaki, N., Yasuda, N., Sugiyama, A., Hirasawa, J., Dohsaka, K., Aikawa, K. (2000). WIT: Toolkit for building robust and real-time spoken dialogue systems. In: Dybkjær, L., Hasida, K., Traum, D. (eds) Proc. 1st SIGDial workshop on Discourse and Dialouge – Volume 10, Hong Kong, 150–159.

    Google Scholar 

  103. Nakatani, C., Hirschberg, J. (1993). A speech-first model for repair detection and correction. In: Proc. 31st Annual Meeting on Association for Computational Linguistics, Columbus, OH, 46–53.

    Google Scholar 

  104. Nakatani, C., Hirschberg, J., Grosz, B. (1995). Discourse structure in spoken language: Studies on speech corpora. In: Working Notes of the AAAI-95 Spring Symposium on Empirical Methods in Discourse Interpretation, Palo Alto, CA.

    Google Scholar 

  105. Newell, A., Simon, H. (1976). Computer science as empirical inquiry: Symbols and search. Commun. ACM, 19, 113–126.

    MathSciNet  Google Scholar 

  106. Nielsen, J. (1994). Heuristic evaluation. In: Nielsen, J., Mack, R. L. (eds) Usability Inspection Methods, Chapter 2, John Wiley, New York.

    Google Scholar 

  107. Norman, D. A., Draper, S. W. (eds) (1986). User Centered System Design: New Perspectives on Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  108. Paek; T., Pieraccini, R. (2008). Automating spoken dialogue management design using machine learning: an industry perspective. In: McTear, M. F, Jokinen, K., Larson, J. (eds) Evaluating New Methods and Models for Advanced Speech-Based Interactive Systems. Special Issue of Speech Commun., 50 (8–9).

    Google Scholar 

  109. Paris, C. L. (1988). Tailoring object descriptions to a user’s level of expertise. Comput. Linguist., 14 (3), 64–78.

    Google Scholar 

  110. Power, R. (1979). Organization of purposeful dialogue. Linguistics, 17, 107–152.

    Google Scholar 

  111. Price, P., Hirschman, L., Shriberg, E., Wade, E. (1992). Subject-based evaluation measures for interactive spoken language systems. In: Proc. Workshop on Speech and Natural Language, Harriman, New York, 34–39.

    Google Scholar 

  112. Reichman, R. (1985). Getting Computers to Talk Like You and Me. Discourse Context, Focus, and Semantics (An ATN Model). The MIT Press, Cambridge, MA.

    Google Scholar 

  113. Reithinger, N., Maier, E. (1995). Utilizing statistical dialogue act processing in Verbmobil. In: Proc. 33rd Annual Meeting of ACL, MIT, Cambridge, US, 116–121.

    Google Scholar 

  114. Ries, K. (1999). HMM and neural network based speech act detection. ICASSP. Also available: citeseer.nj.nec.com/ries99hmm.html

    Google Scholar 

  115. Roy, N., Pineau, J., Thrun, S. (2000). Spoken dialog management for robots. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong.

    Google Scholar 

  116. Rudnicky, A., Thayer, E, Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., Oh, A. (1999). Creating natural dialogs in the Carnegie Mellon Communicator System. In: Proc. 6th Eur. Conf. on Speech Communication and Technology (Eurospeech-99), Budapest, 1531–1534.

    Google Scholar 

  117. Sacks, H., Schegloff, E., Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50 (4), 696–735.

    Google Scholar 

  118. Sadek, D., Bretier, P., Panaget, F. (1997). ARTIMIS: Natural dialogue meets rational agency. In: Proc. IJCAI-97, Nagoya, Japan, 1030–1035.

    Google Scholar 

  119. Samuel, K., Carberry, S., Vijay-Shanker, K. (1998). Dialogue act tagging with transformation-based learning. In: Proc. 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (ACL-COLING), Montreal, Quebec, Canada, 1150–1156.

    Google Scholar 

  120. Schank, R. C., Abelson, R. P. (1977). Scripts, Plans, Goals, and Understanding. Lawrence Erlbaum Associates, Hillsdale, NJ.

    MATH  Google Scholar 

  121. Schatzmann, J., Weilhammer, K., Stuttle, M. N., Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Eng. Rev., 21 (2), 97–126.

    Google Scholar 

  122. Scheffler, K., Young, S. (2000). Probabilistic simulation of human-machine dialogues. In: Proc. IEEE ICASSP, Istanbul, Turkey, 1217–1220.

    Google Scholar 

  123. Searle, J. R. (1979). Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge University Press, Cambridge.

    Google Scholar 

  124. Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P., Zue, V. (1998). GALAXY-II: A reference architecture for conversational system development. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP 98). Sydney, Australia.

    Google Scholar 

  125. Shriberg, E., Bates, R., Taylor, P., Stolcke, A., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C. (1998). Can prosody aid the automatic classification of dialog acts in conversational speech? Lang. Speech, 41, 3–4, 439–487.

    Google Scholar 

  126. Sinclair, J. M., Coulthard, R. M. (1975). Towards an Analysis of Discourse: The English Used by Teacher and Pupils. Oxford University Press, Oxford.

    Google Scholar 

  127. Smith, R. W. (1998). An evaluation of strategies for selectively verifying utterance meanings in spoken natural language dialog. Int. J. Hum. Comput. Studies, 48, 627–647.

    Google Scholar 

  128. Smith, R. W., Hipp, D. R. (1994). Spoken Natural Language Dialog Systems – A Practical Approach. Oxford University Press, New York, NY.

    MATH  Google Scholar 

  129. Stent, A., Dowding, J., Gawron, J. M., Owen-Bratt, E., Moore, R. (1999). The CommandTalk spoken dialogue system. In: Proc. 37th Annual Meeting of the Association for Computational Linguistics, College Park, Maryland, US, 20–26.

    Google Scholar 

  130. Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C., Meteer, M. (2000). Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Comput. Linguist., 26 (3), 339–373.

    Google Scholar 

  131. Suhm, B., Geutner, P., Kemp, T., Lavie, A., Mayfield, L., McNair, A. E., Rogina, I., Schultz, T., Sloboda, T., Ward, W., Woszczyna, M., Waibel, A. (1995). JANUS: Towards multilingual spoken language translation. In: Proc. ARPA Spoken Language Workshop, Austin, TX.

    Google Scholar 

  132. Swerts, M., Hirschberg, J., Litman, D. (2000). Correction in spoken dialogue systems. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP-2000), Beijing, China, 615–618.

    Google Scholar 

  133. Takezawa, T., Morimoto, T., Sagisaka, Y., Campbell, N., Iida, H., Sugaya, F., Yokoo, A., Yamamoto, S. (1998). A Japanese-to-English speech translation system: ATR-MATRIX. In: Proc. (ICSLP-98), Sydney, Australia, 957–960.

    Google Scholar 

  134. Traum, D. R. (2000). 20 questions on dialogue act taxonomies. J. Semantics, 17, 7–30.

    Google Scholar 

  135. Traum, D. R., Allen, J. F. (1994). Discourse obligations in dialogue processing. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, USA, 1–8.

    Google Scholar 

  136. Traum, D., Roque, A., Leuski, A., Georgiou, P., Gerten, J., Martinovski, B., Narayanan, S., Robinson, S., Vaswani Hassan, A. (2007). A virtual human for tactical questioning. In: Proc. 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium, 71–74.

    Google Scholar 

  137. Turing, A. M. (1950). Computing machinery and intelligence. Mind, 49, 433–460.

    MathSciNet  Google Scholar 

  138. Wahlster, W. (ed) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.

    MATH  Google Scholar 

  139. Wahlster, W., Marburger, H., Jameson, A., Busemann, S. (1983). Overanswering yes-no Questions: Extended responses in a NL interface to a vision system. In: Proc. 8th Int. Joint Conf. on Artificial Intelligence (IJCAI'83), Karlsruhe, 643–646.

    Google Scholar 

  140. Walker, M. A., Fromer, J. C., Narayanan, S. (1998). Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In: Proc. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Montreal, Quebec, Canada.

    Google Scholar 

  141. Walker, M. A., Hindle, D., Fromer, J., Di Fabbrizio, G., Mestel, G. (1997a). Evaluating competing agent strategies for a voice email agent. In: Proc. 5th Eur. Conf. on Speech Communication and Technology. (Eurospeech 97), Rhodes, Greece.

    Google Scholar 

  142. Walker, M. A., Litman, D. J., Kamm, C. A., Abella, A. (1997b). Evaluating spoken dialogue agents with PARADISE: Two case studies. Comput. Speech Lang., 12 (3), 317–347.

    Google Scholar 

  143. Wallace, M. D., Anderson, T. J. (1993). Approaches to interface design. Interacting Comput., 5 (3), 259–278.

    Google Scholar 

  144. Ward, N., Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics, 23, 1177–1207.

    Google Scholar 

  145. Weinschenk, S., Barker, D. (2000). Designing Effective Speech Interfaces. Wiley, London.

    Google Scholar 

  146. Weiser, M. (1991). The computer for the twenty-first century. Sci. Am., September 1991 (Special Issue: Communications, Computers and Networks), 265(3), 94–104.

    Google Scholar 

  147. Weizenbaum, J. (1966). ELIZA – A computer program for the study of natural language communication between man and machine. Commun. ACM, 9, 36–45.

    Google Scholar 

  148. Wermter, S., Weber, V. (1997). SCREEN: Learning a flat syntactic and semantic spoken language analysis using artificial neural networks. J. Artif. Intell. Res., 6 (1), 35–85.

    Google Scholar 

  149. Williams, J. D., Young, S. J. (2007). Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang., 21 (2), 231–422.

    Google Scholar 

  150. Winograd, T. (1972). Understanding Natural Language. Academic Press, New York.

    Google Scholar 

  151. Woods, W. A., Kaplan, R. N., Webber, B. N. (1972). The lunar sciences natural language information system: Final Report. BBN Report 2378, Bolt Beranek and Newman Inc., Cambridge, MA.

    Google Scholar 

  152. Yankelovich, N. (1996). How do users know what to say? Interactions, 3 (6), 32–43.

    Google Scholar 

  153. Young, S. L., Hauptmann, A. G., Ward, W. H., Smith, E. T., Werner, P. (1989). High-level knowledge sources in usable speech recognition systems, Commun. ACM, 32 (2), 183–194.

    Google Scholar 

  154. Zock, M., Sabah, G. (eds) (1988). Advances in Natural Language Generation: An Interdisciplinary Perspective. Pinter Publishers, London.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristiina Jokinen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Jokinen, K. (2010). Spoken Language Dialogue Models. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-73819-2_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-73818-5

  • Online ISBN: 978-0-387-73819-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics