doi:10.1016/j.specom.2005.01.003
Copyright © 2005 Elsevier B.V. All rights reserved.
Phonological and statistical effects on timing of speech perception: Insights from a database of Dutch diphone perception
Max Planck Institute for Psycholinguistics, Postbus 310, 6500 AH Nijmegen, The Netherlands
Received 18 December 2003;
revised 26 January 2005;
accepted 27 January 2005.
Available online 13 March 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
We report detailed analyses of a very large database on timing of speech perception collected by Smits et al. (Smits, R., Warner, N., McQueen, J.M., Cutler, A., 2003. Unfolding of phonetic information over time: A database of Dutch diphone perception. J. Acoust. Soc. Am. 113, 563–574). Eighteen listeners heard all possible diphones of Dutch, gated in portions of varying size and presented without background noise. The present report analyzes listeners’ responses across gates in terms of phonological features (voicing, place, and manner for consonants; height, backness, and length for vowels). The resulting patterns for feature perception differ from patterns reported when speech is presented in noise. The data are also analyzed for effects of stress and of phonological context (neighboring vowel vs. consonant); effects of these factors are observed to be surprisingly limited. Finally, statistical effects, such as overall phoneme frequency and transitional probabilities, along with response biases, are examined; these too exercise only limited effects on response patterns. The results suggest highly accurate speech perception on the basis of acoustic information alone.
Keywords: Speech perception; Diphone; Timing; Dutch; Feature
Fig. 1. Percent transmitted information for the consonantal features, by gate. In each panel, the higher set of curves represents phonemes in first position in the diphone, and the lower set of curves represents phonemes in second position in the diphone. (A) Percent transmitted information of the features manner (‘man’), place (‘pla’), and voice (‘voi’), over all segments. (B) Percent transmitted information of the place feature plotted separately for stops (‘stop’), fricatives (‘fric’), nasals (‘nas’), glides (‘gli’), and liquids (‘liq’). Dutch affricates occur in only one place and are therefore not represented. (C) Percent transmitted information of the voicing feature for stops and fricatives (the two manners that distinguish voice) separately.
Fig. 2. Percent transmitted information for the vocalic features length (‘leng’), backness (‘back’), and height (‘heig’). The upper set of curves are for phonemes in first position in the diphone, and the lower set for phonemes in second position in the diphone.
Fig. 3. Percent correct recognition for consonants (averaged over individual consonants) in various segmental environments. (A) Consonants as first phoneme of the diphone, followed by another consonant (Cc) or a vowel (Cv). The vowel in or following the diphone is stressed in both cases (pre-stress consonant). (B) Consonants as second phoneme of the diphone, preceded by a consonant (cC) or a vowel (vC). In both cases, the vowel following the target consonant is stressed (i.e. consonant is pre-stress). (C) Same as in B, but with the consonant preceded by a stressed vowel (i.e. consonant is post-stress).
Fig. 4. Percent correct recognition for vowels (averaged over individual vowels) in various segmental environments. (A) Vowels as first phoneme of the diphone, followed by a consonant (Vc) or a vowel (Vv), in both cases with the target vowel stressed (‘str’). (B) Vowels as second phoneme of the diphone preceded by a consonant (cV) or a vowel (vV), in both cases with the target vowel stressed. (C) Same as A, but with the target vowel unstressed (‘unstr’). (D) Same as B, but with the target vowel unstressed.
Fig. 5. Percent correct recognition for consonants (averaged over individual consonants) by stress. (A) Consonants in first position in the diphone, followed by vowels, where the vowel of the diphone is either stressed (‘Cv str’) or unstressed (‘Cv unstr’). (B) Consonants in second position in the diphone, preceded by vowels, where the vowel of the diphone is stressed and the vowel following the diphone unstressed (‘vC post’), or the vowel of the diphone is unstressed and the vowel following it is stressed (‘vC pre’).
Fig. 6. Percent correct recognition for vowels (averaged over individual vowels) by stress. (A) Vowels in first position in the diphone, followed by consonants, where the vowel is either stressed (‘Vc s’) or unstressed (‘Vc u’). (B) Vowels in second position in the diphone, preceded by consonants, where the vowel is either stressed (‘cV s’) or unstressed (‘cV u’). (C) Vowels in initial position of vowel–vowel diphones, where the two vowels are both stressed (‘Vv ss’), both unstressed (‘Vv uu’), stressed–unstressed (‘Vv su’), or unstressed–stressed (‘Vv us’). D: Same as C for vowels in second position of vowel–vowel diphones (vV).
Fig. 7. Percent of all responses for the five overall most popular responses for the second phoneme as a function of gate (‘@’ indicates /
/).
Fig. 8. Overall observed phoneme probabilities for gate 1 of the second phoneme plotted against phoneme probabilities estimated from the CELEX database. Phoneme symbols are in correspondence with IPA, except for A (indicating /
/), E (/ε/), I (/i/), K (/εi/), L (/œy/), M (/
u/), O (/
/), @ (/
/), ø(/œ/), S (/
/), Z (/
/) and J (/d
/).