Copyright © 2003 Elsevier Ltd. All rights reserved.
Pronunciation change in conversational speech and its implications for automatic speech recognition*1
Received 24 March 2003;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
Pronunciations in spontaneous speech differ significantly from citation form and pronunciation modeling for automatic speech recognition has received considerable attention in the last few years. Most methods describe alternate pronunciations of a word using multiple entries in a dictionary or using a network of phones, assuming implicitly that a deviation from the canonical pronunciation results in a “complete” change as described by the alternate pronunciation. We investigate this implicit assumption about pronunciation change in conversational speech and demonstrate here that in most cases, the change is only partial; a phone is not completely deleted or substituted by another phone but is modified only partially. Evidence supporting this conclusion comes from the three-way analysis of features extracted from the acoustic signal for use in a speech recognition system, canonical pronunciations from a dictionary, and careful phonetic transcriptions produced by human labelers. Most often, when a deviation from the canonical pronunciation is marked, neither the canonical nor the manually labeled phones represent the actual acoustics adequately. Further analysis of the manual phonetic transcription reveals a significant number (>20%) of instances where even human labelers disagree on the identity of the surface-form. In light of this evidence, two methods are suggested for accommodating such partial pronunciation change in the automatic recognition of spontaneous speech and experimental results are presented for each method.
Article Outline
- 1. Introduction
- 2. Acoustic analysis of pronunciation variation
- 2.1. Acoustics of the alternate realizations
- 2.2. Acoustic likelihood of alternate realizations
- 2.3. Temporal characteristics of alternate realizations
- 3. Automatic phonetic transcription of acoustic data
- 4. Speech recognition experiments
- 5. Concluding remarks
- Appendix A. Mapping 39-dimensional means to 2-dimensional space
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (4)






