ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computer Speech & Language
Volume 18, Issue 4, October 2004, Pages 375-395
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (279 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2003.09.005    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2003 Elsevier Ltd. All rights reserved.

Pronunciation change in conversational speech and its implications for automatic speech recognition*1

Murat SaraçlarCorresponding Author Contact Information, E-mail The Corresponding Author, a and Sanjeev KhudanpurE-mail The Corresponding Author, b

a AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932, USA b Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA

Received 24 March 2003; 
Revised 16 September 2003; 
accepted 19 September 2003. 
Available online 17 October 2003.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Pronunciations in spontaneous speech differ significantly from citation form and pronunciation modeling for automatic speech recognition has received considerable attention in the last few years. Most methods describe alternate pronunciations of a word using multiple entries in a dictionary or using a network of phones, assuming implicitly that a deviation from the canonical pronunciation results in a “complete” change as described by the alternate pronunciation. We investigate this implicit assumption about pronunciation change in conversational speech and demonstrate here that in most cases, the change is only partial; a phone is not completely deleted or substituted by another phone but is modified only partially. Evidence supporting this conclusion comes from the three-way analysis of features extracted from the acoustic signal for use in a speech recognition system, canonical pronunciations from a dictionary, and careful phonetic transcriptions produced by human labelers. Most often, when a deviation from the canonical pronunciation is marked, neither the canonical nor the manually labeled phones represent the actual acoustics adequately. Further analysis of the manual phonetic transcription reveals a significant number (>20%) of instances where even human labelers disagree on the identity of the surface-form. In light of this evidence, two methods are suggested for accommodating such partial pronunciation change in the automatic recognition of spontaneous speech and experimental results are presented for each method.

Article Outline

1. Introduction
1.1. The Switchboard corpus
2. Acoustic analysis of pronunciation variation
2.1. Acoustics of the alternate realizations
2.2. Acoustic likelihood of alternate realizations
2.3. Temporal characteristics of alternate realizations
3. Automatic phonetic transcription of acoustic data
4. Speech recognition experiments
4.1. Explicit pronunciation models
4.2. Implicit pronunciation models
4.2.1. State level pronunciation models
4.2.2. Additional ASR experiments with the SLPM
5. Concluding remarks
Appendix A. Mapping 39-dimensional means to 2-dimensional space
References







Computer Speech & Language
Volume 18, Issue 4, October 2004, Pages 375-395
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.