Elsevier

Journal of Phonetics

Volume 36, Issue 2, April 2008, Pages 268-294
Journal of Phonetics

Perceptual learning of Cantonese lexical tones by tone and non-tone language speakers

https://doi.org/10.1016/j.wocn.2007.06.005Get rights and content

Abstract

Two groups of listeners, one of native speakers of a tone language (Mandarin Chinese) and one of native speakers of a non-tone language (English) were trained to recognize Cantonese lexical tones. Performance before and after training was measured using closed response-set identification and pairwise difference rating tasks. Difference ratings were submitted to multidimensional scaling (MDS) analyses to investigate training-related changes in listeners’ perceptual space. Both groups showed comparable initial performance and significant improvement in tone identification following training. However, the two groups differed in terms of the tones they found most difficult to identify, and in terms of the tones that were learned best. Differences between the two groups’ training-induced changes in identification (confusions) and perceptual spaces demonstrated that listeners’ native language experience with intonational as well as tone categories affects the perception and acquisition of non-native suprasegmental categories.

Introduction

When learning a foreign language, listeners must learn to redistribute attention, directing more attention to previously ignored acoustic patterns (Francis & Nusbaum, 2002; Guion & Pederson, 2007) and/or ignoring properties that function in the native language but not in the new one (Yamada & Tohkura, 1992). Previous research has shown that both linguistic experience and perceptual training can change the way listeners attend to the acoustic patterns of speech (Francis, Baldwin, & Nusbaum, 2000; Francis & Nusbaum, 2002; Guion & Pederson, 2007; McCandliss, Fiez, Protopapas, Conway, & McClelland, 2002; Pisoni, Lively, & Logan, 1994). However, until recently, the vast majority of research on phonetic learning has been carried out on features of segmental (vowel and consonant) categories (though cf. Mennen (2004) for a cross-linguistic study of learning intonation). In the present article we extend this research into the domain of lexical tones.

In a tone language, meaningful lexical differences can be indicated simply by changing the fundamental frequency (f0) pattern of a given syllable. For example, in Cantonese the syllable [ji] means “cure” when produced with a 55 (High Level) tone, but when produced with a 21 (Low Falling) tone it means “son”.1 Although other acoustic properties can also function as perceptual cues to tone identity, it has been shown that, in Mandarin, non-frequency-related features, including amplitude envelope contour and duration and other vocal quality properties such as glottalization, typically function as secondary cues and listeners generally base their tone category decisions primarily on f0 contours when these are available (Fu, Zeng, Shannon, & Soli, 1998; Liu & Samuel, 2004; Whalen & Xu, 1992). In Cantonese, no non-f0-related properties have yet been shown to correlate consistently with tone identity, or to be used consistently by listeners even in the absence of f0 information (Ciocca, Francis, Aisha, & Wong, 2002; Fok Chan, 1974; Vance, 1976).

Recently, a number of studies have begun to examine tone perception by both tone and non-tone language speakers (e.g. Gandour, Dzemidzic, et al., 2003; Hallé, Chang, & Best, 2004; Krishnan, Xu, Gandour, & Cariani, 2005; Wang, Behne, Jongman, & Sereno, 2004; Wong, 2002; Xu, Gandour, & Francis, 2006). In general, results suggest that speakers of non-tone languages process speech f0 differently, both behaviorally and neurologically, than do speakers of tone languages. Such differences may give rise to the well-known difficulty experienced by adult speakers of non-tone languages when attempting to learn an unfamiliar tone language (cf. Wang, Spence, Jongman, & Sereno, 1999).

Two general kinds of theories have been proposed to account for this observed difficulty. According to what might be termed a “levels of representation” account, speakers of non-tone languages are simply unable to relate lexical tones to familiar (native) linguistic categories because there is nothing in their native grammar that prepares them for using prosodic properties such as f0 in a lexically contrastive manner. Thus, although speakers of a language such as English have categories defined by f0 characteristics, these categories are intonational, rather than lexical, and it is this difference that creates the difficulty. On the other hand, a “category assimilation” account might attribute this difficulty to difficulty in mapping foreign tone categories onto native ones, whether intonational or lexical. According to this second kind of theory, which is compatible with, if not explicitly derived from, current models of prosodic phonology that treat intonational categories as comparable to segmental ones (see e.g. Ladd (1992), Ladd (1996)), non-tone language speakers will process foreign lexical tones with reference to their native intonational categories, just as tone language speakers will process foreign tones with reference to their native tone categories. The increased difficulty experienced by non-tone language speakers derives either (or both) from a greater degree of mismatch between the native intonational and foreign tone categories as compared to that between native and foreign tone categories, and (perhaps) from a weaker (less categorical) mental representation of native intonational categories as compared to that of native tone categories.

The discussion presented by Wayland and Guion (2004) represents a good example of the first (levels of representation) account. They found that native Chinese (Mandarin and Taiwanese) speakers were better at discriminating a Thai tone contrast (mid- vs. low-tone) than were native English speakers, both before and after training, although both groups still performed worse than did native Thai speakers (who also showed improvement at discriminating this difficult contrast after training). Moreover, the native English group showed no significant improvement even after a week (five 30-min sessions) of training. Wayland and Guion (2004) argue that these results suggest either that native Chinese listeners were able to transfer or extended a native-language (L1)-based propensity for tracking f0 to the perception of a new language while English speakers could not, or that native Chinese listeners were able to map the non-native tones onto (different) native Chinese (phonological) tone categories while English speakers could not. In either case, the assumption is that English speakers are lacking some fundamental capability or representation that Chinese speakers possess and that facilitates the perception and acquisition of non-native lexical tone categories.

In contrast, the work of Hallé et al. (2004), insofar as it adopts Best's (1995) perceptual assimilation model (PAM), appears to represent the second (category assimilation) approach, although their specific conclusions in fact argue for little or no assimilatory effects in the specific case examined. Hallé et al. (2004, experiments 1 and 3) tested French and Taiwanese listeners on discrimination of three Mandarin lexical tone continua, and found that French listeners were uniformly sensitive to f0 differences across the continua, while Taiwanese listeners were more sensitive to differences between tokens belonging to different tone categories than they were to similar acoustic differences between same-category pairs. They argue that these results suggest that Taiwanese listeners’ perception was influenced by their native tone categories, while French listeners were unable to treat the Mandarin tones as “basic prosodic units bearing contrastive linguistic significance” (i.e. phonological categories) despite being able to process them as “prosodic aspects of speech” (p. 417). While this characterization evokes a “levels of representation” model (i.e. French speakers do not perceive Mandarin tones correctly because they have no native tone categories to assimilate them to), Hallé et al. (2004) also state that “Tone contours thus are not completely irrelevant to a French ear with respect to their putative linguistic value” because “the acoustic correlates of tones, f0 and intensity contour, are used in French, just as in any language, at the sentential intonation level.” Thus, French listeners must have been primarily influenced by “the perceived salience of the phonetic (i.e., intonational) differences involved” which “might be non-language-specific in this case, or they could be evocative of language-specific intonation patterns” (pp. 417–418). In other words, according to Hallé et al. (2004), French speakers were unable to perceive Mandarin tones in a categorical manner not because they were unable to process tone per se, but rather merely because they were unable to clearly map the Mandarin tones onto any particular native French phonological (intonational) categories.

A possible third approach may be found in the work of Wang et al. (2004), who compared hemispheric lateralization of Mandarin lexical tones by Mandarin, English and Norwegian listeners. They found that, even though Norwegian listeners were familiar with lexical tones from their native language, these listeners, like the non-tone language speaking English listeners, showed no hemispheric asymmetry for Mandarin tones, although Mandarin and Mandarin–English bilingual subjects did. They interpret these results as suggesting that hemispheric lateralization depends on familiarity with the specific acoustic properties of the stimuli. Norwegian listeners, although familiar with the use of pitch as a cue to lexical tone, and although they did possess native tone category representations, were not familiar with the specific acoustic features that distinguish Mandarin tones, and therefore processed them bilaterally just as English listeners did. This argument suggests that it is not the native tone or intonational categories that play the pivotal role in influencing cross-language perception of tones, but rather that what may matter most in cross-tone-language perception of tone categories is the degree to which the acoustic features used to define tones in the native language correspond to those used to define tones in the foreign language.

Such a feature-based perspective is supported by research on the cross-language perception of f0 patterns by both tone and non-tone language speakers. Gandour and colleagues have shown that the basic perceptual dimensions of lexical tone space tend to be the same across tone and non-tone languages, but that the relative weighting of each dimension varies across languages (Gandour, 1983; Gandour & Harshman, 1978). These findings support the idea that native intonational as well as tone representations influence the perception of the same acoustic features: f0 height and direction of change. For example, Gandour (1983; see also Guion & Pederson, 2007) showed that Mandarin, Taiwanese, Thai, and English speakers’ perception of f0 contours can be characterized adequately within the same two-dimensional space defined by height (average f0) and direction of change of f0 (level, rising or falling) across the syllable. However, English and Cantonese speakers give more weight to the height dimension than do speakers from the other three languages, while Cantonese and Mandarin speakers give more weight to the direction dimension than do English speakers. These patterns (at least with respect to Mandarin, English and Cantonese listeners) are presumably a consequence of the need to optimize perception of f0 patterns for distinguishing frequency-based categories irrespective of whether these categories are tone (Cantonese, Mandarin) or intonational (English).

Standard (Beijing) Mandarin is typically described as having four tone categories: a High (55) tone (Tone 1), a Rising (25) tone (Tone 2), a Dipping (214) tone (Tone 3), and a Falling (51) tone (Tone 4), as shown in Fig. 1 (Li & Thompson, 1989; Norman, 1988). These f0 patterns are generally maintained in fluent speech, with a few exceptions known as tone sandhi in which the f0 contour of one syllable will vary depending on the tones following (and possibly preceding) it. In most cases, Mandarin tone sandhi changes one of these f0 contours into another one in the list (a phonological change). For example, when a syllable with a Dipping tone (214) is directly followed by another syllable with the same tone, the first syllable will be produced as a rising tone (25) instead. However, one sandhi rule does result in the production of a fifth f0 contour: When a syllable with a Dipping tone is directly followed by any tone other than another Dipping tone, the first syllable will be produced with a low falling (21) tone, similar to the Low Falling tone of Cantonese (Li & Thompson, 1989). Finally, some syllables receive a lexically determined “weak stress” in connected speech, resulting in a reduced f0 contour strongly influenced by the tone of the preceding (fully stressed) syllable but most plausibly described as having a mid-level f0 target (Chen & Xu, 2006).

Other, non-f0, features have also been shown to function as cues to Mandarin tones in the absence of identifiable f0 cues, including the shape of the amplitude envelope and syllable duration (Fu et al., 1998; Liu & Samuel, 2004; Whalen & Xu, 1992). Furthermore, although the Mandarin fourth (High Falling) tone is frequently reported to be accompanied by a glottalized (vocal fry) voice quality (Liu & Samuel, 2004), we are not aware of any studies demonstrating that this property functions as an acoustic cue to this lexical tone in the same way that f0, duration and amplitude envelope have been shown to function as cues in the absence of fundamental frequency information. Even still, all of these properties are clearly subordinate to f0 cues in the perception of natural, unmodified speech, and, individuals with poor pitch perception (e.g. cochlear implant users) typically exhibit poor perception of Mandarin tones (Wei, Cao, & Zeng, 2004) though some individuals may perform quite well (Peng, Tomblin, Cheung, Lin, & Wang, 2004).

In Hong Kong Cantonese there are six contrastive tones (Bauer & Benedict, 1997; Fok Chan, 1974): High Level (HL): 55; High Rising (HR): 25; Mid-Level (ML): 33; Low Rising (LR): 23; Low Level (LL): 22; Low Falling (LF): 21. Representative f0 contours from an adult male native speaker are shown in Fig. 2. With respect to non-f0 cues, the Low Falling tone is commonly produced with some degree of glottalization, but it has been shown that this property does not function as a consistent cue for Cantonese listeners (Vance, 1976). No other non-f0 cues have been proposed for Cantonese, and, as with Mandarin, Cantonese-speaking cochlear implant users have considerable difficulty identifying Cantonese tones, suggesting that f0 patterns serve as the primary, and perhaps sole, cues to lexical tones in Cantonese (Ciocca et al., 2002; Lee, van Hasselt, Chiu, & Cheung, 2002).

Characterizing the English tone inventory is less straightforward, and there are, historically, a number of different characterizations. There is general agreement concerning the existence of tone patterns specified on prominent syllables and other aspects of tone patterns specified at final edges (Cruttenden, 1997; Ladd, 1996). For the sake of clarity, we adopt the Tones and Breaks Indices (ToBI) transcription system (Beckman & Hirschberg, 1994; Beckman, Hirschberg, & Shattuck-Hufnagel, 2005; Pierrehumbert, 1980) because of its familiarity, though our purpose is not to argue for or against this particular system. In this spirit, it may be observed that research on the interpretation of ToBI-defined intonational patterns (Pierrehumbert & Hirschberg, 1990) suggests that the intonational system of (American) English should be able to distinguish between about 22 different simple contours. This number is derived by identifying all possible combinations of the six pitch accents, two phrase accents, and two boundary tones defined in the ToBI system, and eliminating combinations that result in acoustically indistinguishable contours (see Ladd, 1992, pp. 81–82).2 While Pierrehumbert and Hirschberg (1990) do not seem to take a strong position on whether every one of these possible contours can actually serve to indicate categorically distinct meanings as a phoneme should, their discussion of the compositionality of tune meaning suggests that they assume that this will prove to be the case. Moreover, Ladd (1996, Table 3.1) shows that these 22 legal intonational patterns of American English correspond surprisingly well with patterns proposed in other (e.g. British) intonational frameworks. Note also that even researchers that explicitly reject this approach still tend to focus on a relatively small number of tones with relatively distinct meanings (e.g. Cruttenden's (1997, pp. 50–54) derivation of seven nuclear tones that “suffice for the usual level of delicacy that is required”). Indeed, for our purposes, the precise number of intonational categories, or even their specific characterization in terms of an autosegmental vs. non-autosegmental model, is not an issue. For present purposes, the important conclusion to draw from this research is what Ladd (1996) calls the “linguist's theory of intonational meaning” (pp. 39–40), namely that intonational patterns, however they are defined, are associated with specific meanings (see discussions of this approach in Ladd, 1996, Chap. 3, especially 3.4, and a similar discussion from a variety of theoretical perspectives in Cruttenden, 1997, Section 4.4), and thus may be represented as part of the listener's phonological system in a more or less categorical manner comparable to that of segmental phonemes.

Research by Ward and Hirschberg (1985) strongly supports this hypothesis. They showed that, in American English, the fall–rise contour clearly expresses talker uncertainty, and subsequent research suggests that this intonational pattern, at least, may be considered to be perceived categorically (Pierrehumbert & Steele, 1989). However, the categorical status of other contours is more heavily debated (see Ladd, 1996, Chap. 3 for discussion). A recent investigation of cross-language perception of intonation found evidence supporting a distinction only between two main contours, one falling and one rising, with some degree of (complex) variation within the two classes (Grabe, Rosner, García-Albea, & Zhou, 2003).

Despite the current uncertainty regarding the categorical status of specific English intonational contours, consideration of the set of 22 possible contours described by Pierrehumbert and Hirschberg (1990) (or their British-style equivalents provided by Ladd, 1996, Table 3.1) and their putative meanings provides a plausible basis for characterizing English listeners’ native language experience with f0-based linguistic distinctions. For example, it seems quite reasonable to assume that English speakers should be familiar with making intonational distinctions between a f0 rise (e.g. standard yes–no question contour) and fall (e.g. neutral declarative intonation). They may also be adept at distinguishing the relative endpoint of rising contours: Compare the high-rise (implied) question inviting the listener to agree (H*HH%), as in “I thought it was good…” (implying, what do you think?) with standard yes-know question intonation (L*HH%) as in “Did you think it was good?” (example from Pierrehumbert and Hirschberg (1990), see pp. 290–291 for discussion). Finally, the basic structure of the ToBI system, and indeed, all autosegmental-metrical approaches to American English intonation, in which intonational contours are considered to be derived from sequences of fundamentally low (L) and high (H) tones (cf. Cruttenden, 1997, Chap. 3; Ladd, 1996, Chap. 3), also suggests that English speakers may be able to distinguish more or less categorically between a relatively low tone (L) and a relatively higher one (H).

Having established the existence of prosodic categories in both Mandarin and English, it is possible to use the similarity of L2 tokens to L1 categories to make some tentative predictions regarding the perception of Cantonese by Mandarin and English speakers from the perspectives of a model of category assimilation (e.g. PAM, Best, 1995). Mandarin listeners should be good at identifying the Cantonese 55 (High Level) and 25 (High Rising) tones because they are acoustically virtually indistinguishable from, and thus would assimilate quite well to, the Mandarin 55 (Tone 1) and 25 (Tone 2) tones, respectively. It is also possible that the Cantonese 23 (Low Rising) tone might assimilate to the Mandarin 214 (Tone 3) tone, but here the mapping is not quite as straight forward and the predictions of different category-based theories might differ. For example, Flege's (1995) speech learning model (SLM) considers that a native category that is similar but not entirely identical to a non-native one might actually interfere with the perception (and certainly the acquisition) of a native one. Thus, the SLM might predict great difficulty for Mandarin speakers’ perception of the 23 tone due to interference from the similar, but not identical, native 214 tone while PAM might lead one to expect this to be treated as a kind of category goodness assimilation, with the 23 tone being perceived as a poor (but identifiable) realization of the native 214 tone. On the other hand, depending on the degree of perceived difference between the 23 and 214 tones, PAM could also predict that the 23 tone might be treated as an uncategorizable phoneme, and thus one that would be easy to distinguish but very difficult to learn to identify. Performance on the other three tones would also be expected to depend on the relative degree of perceived similarity, with the 33 (Mid-Level) tone possibly also mapping moderately well onto the 214 tone but the 22 (Low Level) and 21 (Low Falling) tone mapping poorly if at all to any of the canonical Mandarin tone categories. Ultimately, because these categories are defined in terms of multiple acoustic features (at least, average f0 height as well as direction of f0 change), the issue of how close the foreign category might be to the native one will necessarily depend on the relative weighting of each dimension in the native as compared to the target language, already suggesting that a feature weighting-based model of cross-language perception may have more utility in this domain.

Following a category assimilation model, English listeners should be able to identify the 25 (High Rising) and 23 (Low Rising) tones based on their similarity to English question intonation patterns and final continuation rises (H* HH% and L*LH%), although they might also confuse the two based on their auditory similarity, if the acoustic realization of the Cantonese tone contrast does not correspond sufficiently to that of the English one. They might also be able to identify the 21 (Low Falling) tone on the basis of its similarity to a number of English falling intonational contours.

From a feature-weighting perspective, the results presented by Gandour and colleagues (Gandour, 1983; Gandour & Harshman, 1978) and Guion and Pederson (2007) suggest that Mandarin listeners should be sensitive to both the direction of f0 change and relative (average) f0 height of a syllable, with more weight being given to direction than to height. Thus, with respect to confusions, we might expect Mandarin listeners to be prone to confusing the 23 and 25 (Low and High Rising) tones because these contrast more in terms of height than direction. On the other hand, the 21 and 23 tones (Low Falling and Low Rising) may be less easily confused because they differ both in terms of direction of f0 change and in average f0.

For English listeners, Gandour (1983), Gandour and Harshman (1978) and Guion and Pederson (2007) showed that relative f0 height was considerably more important than direction of f0 change. Therefore, we might expect English listeners to perform well at distinguishing the three Cantonese level tones (22, 33, and 55). Indeed, English listeners’ performance on Cantonese level tone perception might even be comparable to that of native Cantonese listeners, because direction of f0 change also plays a relatively weak role in Cantonese listeners’ tone perception compared to f0 height (Gandour and Harshman (1978), though see Gandour (1981) for somewhat different results for identification of specifically Cantonese tones). On the other hand, Gandour and Harshman's (1978) observation that English listeners gave less weight to the dimension of direction of f0 change than even Cantonese listeners might suggest that English listeners should have considerable difficulty distinguishing between, e.g. the 21 and 23 (Low Falling and Low Rising) tones which are relatively similar in average f0 but differ significantly in terms of direction of f0 change. Of course, this distinction could in principle still be accomplished on the basis of average f0 across the syllable, in which case English listeners should once again have little trouble with it.

A major advantage of a feature-weighting perspective on cross-language perception is that it allows more specific predictions to be made regarding acquisition. In particular, the relative weight given to both dimensions of contrast, height and direction, by successful learners in both the English and Mandarin groups should become more similar to the relative weighting shown by native Cantonese listeners. Based on the differences between the three language groups’ a priori dimensional weights as identified by Gandour (1983), English listeners should learn to give more weight to the direction dimension (improving their recognition of contour tones) while Mandarin listeners should learn to give more weight to the height dimension (improving their recognition of level tones). In support of this hypothesis, Guion and Pederson (2007) found that native English speakers with significant (adult) Mandarin experience showed a more Mandarin-like pattern of perceptual weighting of f0 slope as compared to native English speakers with no tone language experience.

The present experiment was designed to compare the perceptual learning of non-native tone categories by native speakers of a tone language (Mandarin Chinese) and a non-tone language (American English). The goal was to explore the degree to which patterns of cross-language tone perception and perceptual learning of tones may be understood in terms of the influence of native prosodic categories and/or biases in perceptual weighting of specific prosodic acoustic features resulting from listeners’ prior experience with f0-based linguistic contrasts (both tone and intonational). Thus, we focus not on differences in overall performance, as might be the case in a study of training methods, but rather on differences in the pattern of performance between groups demonstrating comparable degrees of overall learning. That is, instead of simply evaluating the two groups’ proportion correct tone identification across all trials, we compared each groups’ performance on individual tones as well as the pattern of their incorrect responses, in an effort to determine how they were treating the separate acoustic properties that define tone (and intonational) categories before and after training.

Section snippets

Subjects

Ten native speakers of Mandarin Chinese (five men and five women), 10 native speakers of American English (five men and five women), and 12 native speakers of Cantonese (four men and eight women) participated in this experiment. None of the Mandarin speakers had any familiarity with Cantonese or other tone language (by self-report), and none of the English speakers had any familiarity or exposure to Cantonese or any other tone language (also by self-report). Mandarin participants were all

Training performance

Despite differences in the spacing of training sessions and the number of tokens presented in each training session, the Mandarin and English groups showed nearly identical patterns of performance across the training sessions, as shown in Fig. 4. A mixed factorial ANOVA with the between groups factor of Group (English and Mandarin) and the within subjects factor of training Session (1–6) and subsequent post hoc (Tukey HSD) analysis showed no significant effect of Group, F(1,17)=0.14, p=0.71,

Overall performance

English and Mandarin listeners’ performance on identifying individual Cantonese tones did not differ significantly on the pretest. The observation that native speakers of a tone language and a non-tone language show qualitatively similar patterns of perception for non-native tone categories supports the conclusions of previous researchers that the mere presence or absence of lexical tone contrasts in the native language is not in itself sufficient to determine cross-language perception of

Conclusions

Both the specific structure of a listener's native tone (or intonational) category inventory as well as experience-dependant patterns of general perceptual weighting contribute to a better understanding of cross-language perception and acquisition of lexical tones. The assimilatory influence of well-defined native categories (e.g. the English question intonation (L*HH%) and the Mandarin 55 (Tone 1) and 25 (Tone 2) lexical tone categories) clearly explains situations in which non-native

Acknowledgments

This work was conducted while the first author was a postdoctoral fellow and the third author was a visiting research assistant professor in the Department of Speech and Hearing Sciences at the University of Hong Kong. We would like to thank See Lok Kan, Ka Man Wong, Choi Hung Law, Ho Yin Leung and Elaine Eramela for help with data collection. Some of these results were presented at the 147th Meeting of the Acoustical Society of America, New York, NY, May 24–28, 2004. We thank Ken de Jong and

References (66)

  • I. Mennen

    Bi-directional interference in the intonation of Dutch speakers of Greek

    Journal of Phonetics

    (2004)
  • A.G. Samuel

    Lexical activation produces potent phonemic percepts

    Cognitive Psychology

    (1997)
  • C.-G. Wei et al.

    Mandarin tone recognition in cochlear implant subjects

    Hearing Research

    (2004)
  • P.C.M. Wong

    Hemispheric specialization of linguistic pitch patterns

    Brain Research Bulletin

    (2002)
  • Y. Xu

    Effects of tone and focus on the formation and alignment of F0 contours

    Journal of Phonetics

    (1999)
  • P. Arabie et al.

    Three way scaling

    (1987)
  • Arvaniti, A., & Garding, G. (in press). Dialectal variation in the rising accents of American English. In J. Hualde, &...
  • R.S. Bauer et al.

    Modern Cantonese phonology

    (1997)
  • Beckman, M. E., & Hirschberg, J. (1994). The ToBI annotation conventions. Online MS. Available at...
  • M.E. Beckman et al.

    The original ToBI system and the evolution of the ToBI framework

  • C.T. Best

    A direct realistic view of cross-language speech perception

  • Boersma, P., & Weenink, D. (2004). Praat: Doing phonetics by computer version 4.2.04, downloaded May 10, 2004 from...
  • Bongaerts, T. (1999). Ultimate attainment in L2 pronunciation: The case of very advanced late L2 learners. In Birdsong,...
  • Y. Chen et al.

    Production of weak elements in speech—Evidence from F0 patterns of neutral tone in Standard Chinese

    Phonetica

    (2006)
  • V. Ciocca et al.

    The perception of Cantonese lexical tones by prelingually deaf cochlear implantees

    Journal of the Acoustical Society of America

    (2002)
  • A. Cruttenden

    Intonation

    (1997)
  • J.E. Flege

    Second language speech learning: Theory, findings, and problems

  • Y.Y. Fok Chan

    A perceptual study of tones in Cantonese

    (1974)
  • A.L. Francis et al.

    Effects of training on attention to acoustic cues

    Perception & Psychophysics

    (2000)
  • A.L. Francis et al.

    On the (non)categorical perception of lexical tones

    Perception & Psychophysics

    (2003)
  • A.L. Francis et al.

    Selective attention and the acquisition of new phonetic categories

    Journal of Experimental Psychology: Human Perception and Performance

    (2002)
  • Q.-J. Fu et al.

    Importance of tonal envelope cues in Chinese speech recognition

    Journal of the Acoustical Society of America

    (1998)
  • J.T. Gandour

    Perceptual dimensions of tone: Evidence from Cantonese

    Journal of Chinese Linguistics

    (1981)
  • Cited by (168)

    • The tone atlas of perceptual discriminability and perceptual distance: Four tone languages and five language groups

      2022, Brain and Language
      Citation Excerpt :

      They were insensitive to the optimal cue weighting in different tone languages and consistently relied on H > D across the tone systems (except for Singaporean Mandarin – H = D). This is in line with previous studies on English listeners’ reliance on height cues in their perception of Cantonese (Francis et al., 2008; Gandour, 1983; Gandour & Harshmen, 1978) and Thai (Burnham & Francis, 1997). Overall, these results depict a slightly different picture from previous cue weighting studies which suggest that listeners’ cross-language tone perception is determined by the cue weighting in their native language (e.g., Francis et al., 2008).

    • The effects of perceptual training on speech production of Mandarin sandhi tones by tonal and non-tonal speakers

      2022, Speech Communication
      Citation Excerpt :

      It is also believed that laboratory perceptual training can modify perceptual mechanisms of adults (Wayland and Li, 2008). The perceptual training is effective in improving L2 learners’ perception in the segmental level (Bradlow et al., 1997, 1999; Hazan et al, 2005) as well as the suprasegmental level such as tones (Francis et al., 2008; Wang et al., 2003; Wayland & Li, 2008). On the segmental level, a large number of studies explored the effects of perceptual training on second language acquisition of English such as the acquisition of English phonetic contrasts /ɹ/ and /l/ by native Japanese learners (e.g Bradlow et al. 1999.,

    View all citing articles on Scopus
    View full text