ABSTRACT
Speakers convey much of the information hearers use to interpret discourse by varying prosodic features such as PHRASING, PITCH ACCENT placement, TUNE, and PITCH RANGE. The ability to emulate such variation is crucial to effective (synthetic) speech generation. While text-to-speech synthesis must rely primarily upon structural information to determine appropriate intonational features, speech synthesized from an abstract representation of the message to be conveyed may employ much richer sources. The implementation of an intonation assignment component for Direction Assistance, a program which generates spoken directions, provides a first approximation of how recent models of discourse structure can be used to control intonational variation in ways that build upon recent research in intonational meaning. The implementation further suggests ways in which these discourse models might be augmented to permit the assignment of appropriate intonational features.
- Barbara Grosz. The Representation and Use of Focus in Dialogue Understanding. Phd thesis, University of California at Berkeley, 1976. Google ScholarDigital Library
- B. Grosz, A. K. Joshi, and S. Weinstein. Providing a Unified Account of Definite Noun Phrases in Discourse. Proceedings of the Association for Computational Linguistics, pages 44--50, June 1983. Google ScholarDigital Library
- Candace Sidner. Towards a computational theory of definite anaphora comprehension in English discourse. PhD thesis, MIT, 1979.Google Scholar
- M. Anderson, J. Pierrehumbert, and M. Liberman. Synthesis by rule of English intonation patterns. Proceedings of the conference on Acoustics, Speech, and Signal Processing, page 2.8.1 to 2.8.4, 1984.Google Scholar
- Gillian Brown. Prosodic structure and the given/new distinction. In Cutler and Ladd, editors, Prosody: Models and Measurements, chapter 6, Springer Verlag, 1983.Google Scholar
- James R. Davis. Giving directions: a voice interface to an urban navigation program. In American Voice I/O Society, pages 77--84, Sept 1986.Google Scholar
- James R. Davis and Thomas F. Trobaugh. Direction Assistance. Technical Report, MIT Media Technology Lab, Dec 1987.Google Scholar
- Marcia A. Derr and Kathleen R. McKeown. Using focus to generate complex and simple sentences. Proceedings of the Tenth International Conference on Computational Linguistics, pages 319--325, 1984. Google ScholarDigital Library
- Barbara J. Grosz and Candace L. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175--204, 1986. Google ScholarDigital Library
- Dwight Bolinger. Accent is predictable (if you're a mind-reader). Language, 48:633--644, 1972.Google ScholarCross Ref
- M. A. K. Halliday. Intonation and Grammar in British English. Mouton, 1967.Google ScholarCross Ref
- J. Hirschberg and J. Pierrehumbert. The intonational structure of discourse. Proceedings of the Association for Computational Linguistics, pages 136--144, July 1986. Google ScholarDigital Library
- Kathleen R. McKeown. Discourse strategies for generating natural-language text. Artificial Intelligence, 27(1): 1--41, 85. Google ScholarDigital Library
- S. G. Nooteboom and J. M. B. Terken. What makes speakers omit pitch accents? an experiment. Phonetica, 39:317--336, 1982.Google ScholarCross Ref
- J. Pierrehumbert and J. Hirschberg. The meaning of intonation contours in the interpretation of discourse. In Plans and Intentions in Communication, SDF Benchmark Series in Computational Linguistics, MIT Press, forthcoming.Google Scholar
- Janet B. Pierrehumbert. The Phonology and Phonetics of English Intonation. PhD thesis, MIT, Dept of Linguistics, 1980.Google Scholar
- Ellen F. Prince. Toward a taxonomy of given - new information. In Peter Cole, editor, Radical Pragmatics, pages 223--256, Academic Press, 1981.Google Scholar
- Kim E. A. Silverman. Natural prosody for synthetic speech, PhD thesis, Cambridge University, 1987.Google Scholar
- L. Witten and P. Madams. The telephone inquiry service: a man-machine system using synthetic speech. International Journal of Man-Machine Studies, 9:449--464, 1977.Google ScholarCross Ref
- S. J. Young and F. Fallside. Speech synthesis from concept: a method for speech output from information systems. Journal of the Acoustic Society of America, 66(3):685--695, Sept 1979.Google ScholarCross Ref
- J. P. Olive and M. Y. Liberman. Text to speech - An overview. Journal of the Acoustic Society of America, Suppl. 1, 78(3):s6, Fall 1985.Google ScholarCross Ref
- Assigning intonational features in synthesized spoken directions
Recommendations
Automatic recognition of intonational features
ICASSP'92: Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1This paper reports the initial development of an algorithm to automatically detect boundary tones and prominences in continuous speech. Utilizing phoneme durations given by a speech recognizer, we employ a tree quantizer and hidden Markov model to label ...
Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners
This experiment assessed the effect of variation in speech rate on comprehension and persuasiveness of a message presented in text-to-speech (TTS) synthesis to native and non-native listeners. Eighty non-native speakers of English and 80 native speakers ...
Comments