Copyright © 2004 Elsevier Ltd. All rights reserved.
Improving out-of-vocabulary name resolution
Received 24 March 2003;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings, which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.
Article Outline
- 1. Introduction
- 2. The problem of OOVs and names
- 3. Related work
- 4. General modules and application contexts
- 4.1. Offline word list generation
- 4.2. Name error detection
- 4.3. Online list pruning
- 4.4. Error resolution
- 5. Vocabulary coverage and name list generation
- 5.1. Analysis of OOV names in broadcast news
- 5.1.1. “New” names of global importance
- 5.1.2. News reporters
- 5.1.3. Spelling and morphological variants
- 5.1.4. Sports figures
- 5.1.5. Villagers and human interest personalities
- 5.2. Filtering vocabulary lists using text-based IE
- 6. Phonetic distance and list ranking
- 6.1. Phonetic distance
- 6.2. Distance-based name list pruning
- 6.3. Phonetic distance for ASR error correction
- 6.4. Name normalization
- 7. Conclusions and future work
- Acknowledgements
- References






E-mail Article
Add to my Quick Links

Cited By in Scopus (2)






