Human frequency-following response: representation of pitch contours in Chinese tones

https://doi.org/10.1016/S0378-5955(03)00402-7Get rights and content

Abstract

Auditory nerve single-unit population studies have demonstrated that phase-locking plays a dominant role in the neural encoding of both the spectrum and voice pitch of speech sounds. Phase-locked neural activity underlying the scalp-recorded human frequency-following response (FFR) has also been shown to encode certain spectral features of steady-state and time-variant speech sounds as well as pitch of several complex sounds that produce time-invariant pitch percepts. By extension, it was hypothesized that the human FFR may preserve pitch-relevant information for speech sounds that elicit time-variant as well as steady-state pitch percepts. FFRs were elicited in response to the four lexical tones of Mandarin Chinese as well as to a complex auditory stimulus which was spectrally different but equivalent in fundamental frequency (f0) contour to one of the Chinese tones. Autocorrelation-based pitch extraction measures revealed that the FFR does indeed preserve pitch-relevant information for all stimuli. Phase-locked interpeak intervals closely followed f0. Spectrally different stimuli that were equivalent in F0 similarly showed robust interpeak intervals that followed f0. These FFR findings support the viability of early, population-based ‘predominant interval’ representations of pitch in the auditory brainstem that are based on temporal patterns of phase-locked neural activity.

Introduction

Voice pitch is a fundamental auditory perceptual attribute that is important for the perception of speech and music. The evaluation of neural mechanisms underlying pitch perception provides an avenue to understand the neural basis of processing auditory information. Pitch perception and its physiological bases remain topics of controversy up to the present. Most periodic complex sounds (including speech) evoke low pitches associated with their fundamental frequency, sometimes termed periodicity pitch (deBoer, 1976, Evans, 1978, Moore, 1989). Energy may or may not be present at the fundamental frequency. In contrast, place or spectral pitch is associated with individual frequency components (Goldstein, 1973, Terhardt, 1973, Burns and Viemeister, 1976, Moore and Glasberg, 1986).

Several classes of neural information processing models have been proposed to account for the pitch of complex tones. Rate place neural models use spatial discharge rate patterns along tonotopically organized neural maps to represent the stimulus spectrum. Pitch is then extracted by spectrally based pattern recognition mechanisms that detect patterns of excitation produced by harmonically related components (Goldstein, 1973, Terhardt, 1973). Temporal place models utilize local discharge synchrony information between neighboring neurons (Young and Sachs, 1979), or interspike intervals within single neurons (Srulovicz and Goldstein, 1983) to form a frequency-based central spectrum representation. This frequency domain representation is then analyzed by the pattern recognition mechanisms (Whitfield, 1970, Miller and Sachs, 1984).

Purely temporal models use the population interval distribution derived by combining interspike intervals of single auditory neurons over a broad range of characteristic frequencies. It has long been appreciated that discharge periodicities and interspike intervals related to the fundamental are present in the responses of auditory nerve fibers (Young and Sachs, 1979, Rose, 1980, Delgutte, 1980, Voigt et al., 1982, Evans, 1983, Miller and Sachs, 1984, Greenberg, 1986, Palmer et al., 1986). The predominant interval hypothesis holds that the perceived pitch corresponds to the most frequent interspike interval present in the auditory nerve at any given time (Licklider, 1951, Moore, 1980, Meddis and Hewitt, 1991, Cariani and Delgutte, 1996). Using computer simulations of the auditory nerve, Meddis and Hewitt (1991) concretely demonstrated the plausibility of the hypothesis. In their electrophysiological study, Cariani and Delgutte (1996) recorded responses of cat auditory nerve fibers and combined interval distributions from many fibers to form an estimate of population interval distribution in the entire auditory nerve. Many deep correspondences between features of these interval distributions and to patterns of human pitch judgments for a variety of complex sounds were found in the two studies. It thus appears that a central processor capable of analyzing these intervals can provide a unified explanation for many different aspects of pitch perception (Meddis and O’Mard, 1997, Cariani, 1998). Thus, neural phase-locking plays a dominant role in the neural encoding of low pitch associated with complex sounds. Neural phase-locking in the auditory nerve and cochlear nucleus neurons has also been implicated in the temporal encoding of the spectra of steady-state and time-variant speech sounds (Young and Sachs, 1979, Sachs et al., 1983, Miller and Sachs, 1983, Miller and Sachs, 1984, Palmer et al., 1986, Blackburn and Sachs, 1990, Keilson et al., 1997, Rhode, 1998, Recio and Rhode, 2000).

The scalp-recorded human frequency-following response (FFR) reflects sustained phase-locked activity in a population of neural elements within the rostral brainstem (Worden and Marsh, 1968, Marsh et al., 1974, Smith et al., 1975, Glaser et al., 1976). Because the FFR encompasses responses of multiple neural subpopulations with different best frequencies and response latencies, more stimulus-related temporal information may be available in single units and local neuronal ensembles than in the population response as a whole. Stimulus-related temporal structure observed in the FFR therefore forms the lower limit of the neural timing information potentially available for neuronal information processing at the rostral brainstem level.

We recently demonstrated that the phase-locked activity underlying the FFR does indeed preserve spectral peaks corresponding to the first two formants of both steady-state speech-like sounds (Krishnan, 1999, Krishnan, 2002) and time-variant speech-like sounds (Krishnan and Parkinson, 2000, Plyler and Ananthanarayan, 2001). The human FFR preserves pitch-relevant information about complex sounds that produce time-invariant pitch (Greenberg et al., 1987), leading them to conclude that pitch-relevant neural activity is based on the temporal pattern of neural activity in the brainstem.

In light of these earlier findings, it is postulated that the phase-locked activity underlying the FFR generation is also sufficiently dynamic to encode time-varying pitch of speech sounds. The specific aims of this study are to determine whether the phase-locked activity underlying FFR generation (1) is sufficiently dynamic to represent the pitch of stimuli that produce a more complex range of pitch percepts, including changes in trajectory and direction of pitch change; (2) is more robust for rising versus falling pitch trajectories; (3) preserves certain spectral features of the complex stimuli; and (4) supports the predominant interval hypothesis by showing phase-locked interval bands for stimuli that are equivalent in pitch but differ in their spectra.

To address aims 1–3, FFRs were elicited from the four (Mandarin) Chinese tones [similar to, e.g., Tone 1, mahigh level ‘mother’; Tone 2, mahigh rising ‘hemp’; Tone 3, malow falling-rising ‘horse’; and Tone 4, mahigh falling ‘scold’ (Howie, 1976)]. This tonal space provides an optimal window for investigating FFRs in response to time-varying f0 contours associated with monosyllabic speech sounds. To address aim 4, FFRs in response to Chinese Tone 3, which exhibits a bidirectional f0 contour, are compared to FFRs elicited in response to a complex auditory stimulus that exhibits the same f0 contour but whose spectral composition is different.

Section snippets

Subjects

Thirteen adult native speakers of Mandarin, ranging in age from 21 to 27 years, participated in the study. Hearing sensitivity in all subjects was better than 15 dB HL for octave frequencies from 500 to 8000 Hz.

Stimuli

FFRs were elicited using a set of monosyllabic Chinese syllables that were chosen to contrast the four lexical tones (pinyin Roman phonemic transcription): yi1 ‘clothing’, yi2 ‘aunt’, yi3 ‘chair’, yi4 ‘easy’. This particular stimulus set allows us to address issues related to encoding of

Representation of voice pitch

Pitch contours extracted from the FFR (solid lines) to each of the four speech stimuli are superimposed on their corresponding stimulus f0 contours (broken lines) in Fig. 2. It is clear from this figure that the phase-locked FFR activity carrying pitch-relevant information faithfully follows the pitch changes presented in each stimulus.

The short-term autocorrelation functions and the running autocorrelograms for all five stimuli and their corresponding grand average FFRs are plotted in Fig. 3.

Representation of voice pitch

The results of this study clearly demonstrated that for several stimuli with time-varying f0 contours, the prominent interval band in the phase-locked FFR neural activity followed closely the fundamental period (1/f0). These findings suggest that a robust neural temporal representation for pitch is preserved in the phase-locked neural activity of an ensemble of neural elements in the rostral brainstem. These findings are consistent with Greenberg et al. (1987), who reported that the FFR encoded

Implications

Our knowledge about processing of speech sounds in the mammalian nervous system is largely derived from animal single-unit population studies at the level of the auditory nerve and cochlear nucleus. These studies have demonstrated that the temporal place code is indeed preserved at these auditory loci. However, it is not known if the temporal place scheme is preserved at more rostral levels in the brainstem where neural phase-locking is limited to frequencies below about 2000 Hz. For any scheme

Acknowledgments

This research was supported in part by a research grant from the National Institutes of Health (R01 DC04584-04; J.T.G.). We are grateful to the anonymous reviewers whose insightful suggestions have appreciably improved the manuscript.

References (72)

  • S.E. Shore et al.

    Cochlear microphonic responses of the peripheral auditory system to frequency-varying signals

    Am. J. Otolaryngol.

    (1984)
  • J.C. Smith et al.

    Far-field recorded frequency following responses: evidence for the locus of brainstem sources

    Electroencephalogr. Clin. Neurophysiol.

    (1975)
  • D. Van Lancker et al.

    Hemispheric specialization for pitch and tone: evidence from Thai

    J. Phon.

    (1973)
  • H.F. Voigt et al.

    Representation of whispered vowels in the discharge patterns of auditory-nerve fibers

    Hear. Res.

    (1982)
  • Y. Wang et al.

    Dichotic perception of Mandarin tones by Chinese and American listeners

    Brain Language

    (2001)
  • Y. Xu

    Contextual tonal variations in Mandarin

    J. Phon.

    (1997)
  • A.K. Ananthanarayan et al.

    The frequency-following response and the onset response: Evaluation of frequency specificity using a forward-masking paradigm

    Ear Hear.

    (1992)
  • C.C. Blackburn et al.

    The representation of steady-state vowel sound /e/ in the discharge patterns of cat anteroventral cochlear nucleus neurons

    J. Neurophysiol.

    (1990)
  • P. Boersma

    Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound

    Proc. Inst. Phon. Sci.

    (1993)
  • E.M. Burns et al.

    Nonspectral pitch

    J. Acoust. Soc. Am.

    (1976)
  • Cariani, P., 1998. Neural computations in the time domain. Poster, ARO Midwinter...
  • P.A. Cariani et al.

    Neural correlates of the pitch of complex tones. I. Pitch and pitch salience

    J. Neurophysiol.

    (1996)
  • M.J. Collins et al.

    Temporal integration of tone glides

    J. Acoust. Soc. Am.

    (1978)
  • J.K. Cullen et al.

    Rate effects in the detection of short duration tonal glides

    J. Acoust. Soc. Am.

    (1979)
  • deBoer, E., 1976. On the residue and auditory pitch perception. In: Keidel, W.D., Neff, W.D. (Eds.), Handbook of...
  • B. Delgutte

    Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers

    J. Acoust. Soc. Am.

    (1980)
  • B. Delgutte et al.

    Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds

    J. Acoust. Soc. Am.

    (1984)
  • E.F. Evans

    Place and time coding of frequency in the peripheral auditory system:some physiological pros and cons

    Audiology

    (1978)
  • Evans, E.F., 1983. Pitch and cochlear nerve fibre temporal discharge patterns. In: Klinke, R., Hartmann, R. (Eds.),...
  • J.L. Flanagan et al.

    On the pitch of periodic pulses

    J. Acoust. Soc. Am.

    (1960)
  • J.L. Flanagan et al.

    Pitch of periodic pulses without fundamental component

    J. Acoust. Soc. Am.

    (1960)
  • J. Gandour et al.

    Crosslanguage differences in tone perception: A multidimensional scaling investigation

    Language Speech

    (1978)
  • J. Gardi et al.

    Scalp recorded frequency following responses in neonates

    Audiology

    (1979)
  • O. Ghitza

    Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment

    J. Phon.

    (1988)
  • J.L. Goldstein

    An optimum processor theory for the central formation of the pitch of complex tones

    J. Acoust. Soc. Am.

    (1973)
  • Greenberg, S., 1980. Neural Temporal Coding of Pitch and Vowel Quality. UCLA Working Papers in Phonetics, Volume 52...
  • Cited by (139)

    • Frequency following responses and rate change complexes in cochlear implant users

      2021, Hearing Research
      Citation Excerpt :

      Although cortical regions phase-lock to these high rates (Coffey et al., 2016), the main generator(s) that contribute to the scalp-recorded FFR are located in the brainstem (Bidelman, 2018). FFR strength has been associated, in the acoustically-stimulated auditory pathway, with speech-perception-in-noise performance (Coffey et al., 2017a) and pitch perception (Krishnan et al., 2004; Swaminathan et al., 2008; Zhang and Gong, 2017). Although the FFR does not reflect the processing of pitch per se (Gockel et al., 2011), it could potentially be used to assess the phase-locking ability of the brainstem to F0 in individual CI users.

    • High gamma cortical processing of continuous speech in younger and older listeners

      2020, NeuroImage
      Citation Excerpt :

      The characteristic frequency of such time-locked activity is generally thought to decrease along the ascending auditory pathway. For example, subcortical activity at ∼100 Hz and above may directly encode the temporal pitch information of voiced speech (Forte et al., 2017; Krishnan et al., 2004), while cortical activity below ∼10 Hz, which time-locks to the slowly varying envelope of speech, also time-locks to higher level features of language such as phoneme and word boundaries (Brodbeck et al., 2018a). Prior research has also found differences in both subcortical and cortical processing for older and younger listeners (Anderson et al., 2012; Presacco et al., 2016a, 2016b), which suggest age-related auditory temporal processing deficits.

    View all citing articles on Scopus
    View full text