ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computer Speech & Language
Volume 19, Issue 1, January 2005, Pages 107-128
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (260 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2004.03.002    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2004 Elsevier Ltd. All rights reserved.

Improving out-of-vocabulary name resolution

David D. PalmerCorresponding Author Contact Information, E-mail The Corresponding Author, a and Mari Ostendorfb

a Advanced Technology Group, Virage Inc., Woburn, MA 01801, USA b Electrical Engineering Department, University of Washington, Seattle, WA 98195, USA

Received 24 March 2003; 
Revised 25 March 2004; 
accepted 25 March 2004. 
Available online 10 May 2004.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings, which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.

Article Outline

1. Introduction
2. The problem of OOVs and names
3. Related work
3.1. OOV detection and correction in ASR
3.2. Spelling correction
4. General modules and application contexts
4.1. Offline word list generation
4.2. Name error detection
4.3. Online list pruning
4.4. Error resolution
5. Vocabulary coverage and name list generation
5.1. Analysis of OOV names in broadcast news
5.1.1. “New” names of global importance
5.1.2. News reporters
5.1.3. Spelling and morphological variants
5.1.4. Sports figures
5.1.5. Villagers and human interest personalities
5.2. Filtering vocabulary lists using text-based IE
6. Phonetic distance and list ranking
6.1. Phonetic distance
6.2. Distance-based name list pruning
6.3. Phonetic distance for ASR error correction
6.4. Name normalization
7. Conclusions and future work
Acknowledgements
References






Computer Speech & Language
Volume 19, Issue 1, January 2005, Pages 107-128
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.