Abstract
There are many speech recognition applications that require only partial information to be extracted from a speech utterance. These applications include human-machine interactions where it may be difficult to constrain users’ utterances to be within the domain of the machine. Other types of applications that are of interest are those where speech utterances arise from human-human interaction, interaction with speech messaging systems, or any other domain that can be characterized as being unconstrained or spontaneous. This chapter is concerned with the problem of spotting keywords in continuous speech utterances. Many important speech input applications involving word spotting will be described. The chapter will also discuss Automatic Speech Recognition (ASR) problems that are particularly important in word spotting applications. These problems include rejection of out-of-vocabulary utterances, derivation of measures of confidence, and the development of efficient and flexible search algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition. En-glewood Cliffs, N. J.: Prentice Hall, 1993.
K. F. Lee, Automatic Speech Recognition. Norwell, Mass.: Kluwer, 1989.
P. J. Bickel and K. A. Doksum, Mathematical Statistics. Englewood Cliffs, N. J.: Prentice Hall, 1977.
H. V. Poor, An Introduction to Signal Detection and Estimation. New York, N. Y.: Springer-Verlag, 1988.
J. S. Bridle, “An efficient elastic template method for detecting keywords in running speech,” Brit. Acoust. Soc. Meeting, pp. 1–4, April 1973.
J. B. Kruskal and D. Sankoff, “An anthology of algorithms and concepts for sequence comparison,” in Time Warps, String Edits, and Macromolecules: The theory and practice of string comparison (D. Sankoff and J. B. Kruskal, eds.), Addison-Welsley, 1983.
P. Nowell and R. Moore, “A subword approach to topic spotting,” Speech Research Symposium, June 1994.
J. R. Rohlicek, W. Russel, S. Roucos, and H. Gish, “Continuous HMM for speaker independent word spotting,” Proc. Int. Conf on Acoust., Speech, and Sig. Processing, May 1989.
L. D. Wilcox and M. A. Bush, “HMM word spotting for voice editing and indexing,” Proc. European Conf. on Speech Communications, pp. 25–28, Sept. 1991.
R. C. Rose and D. B. Paul, “A hidden Markov model based keyword recognition system,” Proc. Int. Conf. on Acousi., Speech, and Sig. Processing, April 1990.
J. R. Rohlicek, P. Jeanrenaud, K. Ng, H. Gish, B. Musicus, and M. Shi, “Phonetic training and language modeling for word spotting,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, April 1993.
L. Gillick, J. Baker, J. Baker, J. Bridle, M. Hunt, Y. Ito, S. Lowe, J. Orloff, B. Peskin, R. Roth, and F. Scattone, “Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, April 1993.
M. Weintraub, “Keyword spotting using SRI’s decipher large vocabulary speech recognition system,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, April 1993.
J. G. Wilpon, L. R. Rabiner, C. H. Lee, and E. R. Goldman, “Automatic recognition of keywords in unconstrained speech using hidden Markov models,” IEEE Trans on Acous. Speech and Sig. Proc, vol. 38, no. 11, pp. 1870–1878, 1990.
M. W. Feng and B. Mazor, “Continuous wordspotting for telecommunications applications,” Proc. Int. Conf. on Spoken Lang. Processing, October 1992.
E. Lleida, J. B. Marino, J. Slavedra, A. Bonafonte, E. Monte, and A. Martinez, “Out-of-vocabulary word modelling and rejection for keywrod spotting,” Proc. European Conf on Speech Communications, pp. 1265–1268, September 1993.
T. Zeppenfeld and A. H. Waibel, “A hybrid neural network, dynamic programming word spotter,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. II77–II80, April 1992.
R. P. Lippmann and E. Singer, “Hybrid neural-network/HMM approaches to wordspotting,” Proc. Int. Conf on Acoust., Speech, and Sig. Processing, pp. 1565–1568, April 1993.
R. C. Rose, “Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, March 1992.
R. A. Sukkar and J. G. Wilpon, “A two pass classifier for utterance rejection in keyword spotting,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. II451–II454, April 1993.
D. P. Morgan, C. I. Scofield, and J. E. Adcock, “Multiple neural network topologies applied to keyword spotting,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 313–316, April 1991.
R. C. Rose and E. M. Hofstetter, “Task independent wordspotting using decision tree based allophone clustering,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 11–467 to 11–470, April 1993.
A. Asadi, R. Schwartz, and J. Makhoul, “Automatic modeling for adding new words to a large vocabulary speech recognition system,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 305–308, April 1991.
S. R. Young and W. H. Ward, “Recognition confidence measures for spontaneous spoken dialog,” Proc. European Conf. on Speech Communications, pp. 1177–1179, September 1993.
B. Mazor and M. W. Feng, “Improved a-posteriori processing for keyword spotting,” Proc. European Conf. on Speech Communications, September 1993.
D. S. Pallett, J. G. Fiscus, W. M. Fisher, J. S. Garofolo, B. A. Lund, A. Martin, and M. A. Przybocki, “1994 benchmark tests for the ARPA spoken language program,” Proc. DARPA Speech and Natural Language Workshop, January 1995.
A. L. Higgins and R. E. Wohlford, “Keyword recognition using template concatenation,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 1233–1236, April 1985.
R. C. Rose, “Definition of acoustic subword units for word spotting,” Proc. European Conf. on Speech Communications, pp. 1049–1052, Sept. 1993.
J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switchboard: Telephone speech corpus for research and development,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, March 1992.
J. M. Boite, H. Bourlard, B. D’hoore, and M. Haesen, “A new approach to keyword spotting,” Proc. European Conf. on Speech Communications, September 1993.
J. C. Spohrer, P. F. Brown, P. H. Hochschild, and J. K. Baker, “Partial backtrace in continuous speech recognition,” Proc. Int. Conf. on Systems, Man, and Cybernetics, pp. 36–42, 1980.
M. Weintraub, “LVCSR log-likelihood ratio scoring for keyword spotting,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 297–300, April 1995.
C. Torre and A. Acero, “Discriminative training of garbage model for non-vocabulary utterance rejection,” Proc. Int. Conf. on Spoken Lang. Processing, June 1994.
R. C. Rose, B. H. Juang, and C. H. Lee, “A training procedure for verifying string hypotheses in continuous speech recognition,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 281–284, April 1995.
B. H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans, on Signal Proc, pp. 3043–3054, December 1992.
R. C. Rose, “Techniques for information retrieval from speech messages,” Lincoln Laboratory Journal, vol. 4, no. 1, pp. 45–60, 1991.
D. A. James and S. J. Young, “A fast lattice-based approach to vocabulary independent wordspotting,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, pp. 1377–1380, April 1994.
K. F. Lee, “The conversational computer: an Apple perspective,” Proc. European Conf. on Speech Communications, pp. 1377–1384, Sept. 1993.
P. Gopalakrishnan and D. Nahamoo, “Immediate recognition of embedded command words,” Proc. European Conf. on Speech Communications, pp. 21–24, Sept. 1991.
B. Chigier, “Rejection and keyword spotting algorithms for a directory assistance city name recognition application,” Proc. Int. Conf. on Acoust., Speech, and Sig. Processing, March 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Rose, R.C. (1996). Word Spotting from Continuous Speech Utterances. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_13
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1367-0_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive