Human frequency-following response: representation of pitch contours in Chinese tones
Introduction
Voice pitch is a fundamental auditory perceptual attribute that is important for the perception of speech and music. The evaluation of neural mechanisms underlying pitch perception provides an avenue to understand the neural basis of processing auditory information. Pitch perception and its physiological bases remain topics of controversy up to the present. Most periodic complex sounds (including speech) evoke low pitches associated with their fundamental frequency, sometimes termed periodicity pitch (deBoer, 1976, Evans, 1978, Moore, 1989). Energy may or may not be present at the fundamental frequency. In contrast, place or spectral pitch is associated with individual frequency components (Goldstein, 1973, Terhardt, 1973, Burns and Viemeister, 1976, Moore and Glasberg, 1986).
Several classes of neural information processing models have been proposed to account for the pitch of complex tones. Rate place neural models use spatial discharge rate patterns along tonotopically organized neural maps to represent the stimulus spectrum. Pitch is then extracted by spectrally based pattern recognition mechanisms that detect patterns of excitation produced by harmonically related components (Goldstein, 1973, Terhardt, 1973). Temporal place models utilize local discharge synchrony information between neighboring neurons (Young and Sachs, 1979), or interspike intervals within single neurons (Srulovicz and Goldstein, 1983) to form a frequency-based central spectrum representation. This frequency domain representation is then analyzed by the pattern recognition mechanisms (Whitfield, 1970, Miller and Sachs, 1984).
Purely temporal models use the population interval distribution derived by combining interspike intervals of single auditory neurons over a broad range of characteristic frequencies. It has long been appreciated that discharge periodicities and interspike intervals related to the fundamental are present in the responses of auditory nerve fibers (Young and Sachs, 1979, Rose, 1980, Delgutte, 1980, Voigt et al., 1982, Evans, 1983, Miller and Sachs, 1984, Greenberg, 1986, Palmer et al., 1986). The predominant interval hypothesis holds that the perceived pitch corresponds to the most frequent interspike interval present in the auditory nerve at any given time (Licklider, 1951, Moore, 1980, Meddis and Hewitt, 1991, Cariani and Delgutte, 1996). Using computer simulations of the auditory nerve, Meddis and Hewitt (1991) concretely demonstrated the plausibility of the hypothesis. In their electrophysiological study, Cariani and Delgutte (1996) recorded responses of cat auditory nerve fibers and combined interval distributions from many fibers to form an estimate of population interval distribution in the entire auditory nerve. Many deep correspondences between features of these interval distributions and to patterns of human pitch judgments for a variety of complex sounds were found in the two studies. It thus appears that a central processor capable of analyzing these intervals can provide a unified explanation for many different aspects of pitch perception (Meddis and O’Mard, 1997, Cariani, 1998). Thus, neural phase-locking plays a dominant role in the neural encoding of low pitch associated with complex sounds. Neural phase-locking in the auditory nerve and cochlear nucleus neurons has also been implicated in the temporal encoding of the spectra of steady-state and time-variant speech sounds (Young and Sachs, 1979, Sachs et al., 1983, Miller and Sachs, 1983, Miller and Sachs, 1984, Palmer et al., 1986, Blackburn and Sachs, 1990, Keilson et al., 1997, Rhode, 1998, Recio and Rhode, 2000).
The scalp-recorded human frequency-following response (FFR) reflects sustained phase-locked activity in a population of neural elements within the rostral brainstem (Worden and Marsh, 1968, Marsh et al., 1974, Smith et al., 1975, Glaser et al., 1976). Because the FFR encompasses responses of multiple neural subpopulations with different best frequencies and response latencies, more stimulus-related temporal information may be available in single units and local neuronal ensembles than in the population response as a whole. Stimulus-related temporal structure observed in the FFR therefore forms the lower limit of the neural timing information potentially available for neuronal information processing at the rostral brainstem level.
We recently demonstrated that the phase-locked activity underlying the FFR does indeed preserve spectral peaks corresponding to the first two formants of both steady-state speech-like sounds (Krishnan, 1999, Krishnan, 2002) and time-variant speech-like sounds (Krishnan and Parkinson, 2000, Plyler and Ananthanarayan, 2001). The human FFR preserves pitch-relevant information about complex sounds that produce time-invariant pitch (Greenberg et al., 1987), leading them to conclude that pitch-relevant neural activity is based on the temporal pattern of neural activity in the brainstem.
In light of these earlier findings, it is postulated that the phase-locked activity underlying the FFR generation is also sufficiently dynamic to encode time-varying pitch of speech sounds. The specific aims of this study are to determine whether the phase-locked activity underlying FFR generation (1) is sufficiently dynamic to represent the pitch of stimuli that produce a more complex range of pitch percepts, including changes in trajectory and direction of pitch change; (2) is more robust for rising versus falling pitch trajectories; (3) preserves certain spectral features of the complex stimuli; and (4) supports the predominant interval hypothesis by showing phase-locked interval bands for stimuli that are equivalent in pitch but differ in their spectra.
To address aims 1–3, FFRs were elicited from the four (Mandarin) Chinese tones [similar to, e.g., Tone 1, mahigh level ‘mother’; Tone 2, mahigh rising ‘hemp’; Tone 3, malow falling-rising ‘horse’; and Tone 4, mahigh falling ‘scold’ (Howie, 1976)]. This tonal space provides an optimal window for investigating FFRs in response to time-varying f0 contours associated with monosyllabic speech sounds. To address aim 4, FFRs in response to Chinese Tone 3, which exhibits a bidirectional f0 contour, are compared to FFRs elicited in response to a complex auditory stimulus that exhibits the same f0 contour but whose spectral composition is different.
Section snippets
Subjects
Thirteen adult native speakers of Mandarin, ranging in age from 21 to 27 years, participated in the study. Hearing sensitivity in all subjects was better than 15 dB HL for octave frequencies from 500 to 8000 Hz.
Stimuli
FFRs were elicited using a set of monosyllabic Chinese syllables that were chosen to contrast the four lexical tones (pinyin Roman phonemic transcription): yi1 ‘clothing’, yi2 ‘aunt’, yi3 ‘chair’, yi4 ‘easy’. This particular stimulus set allows us to address issues related to encoding of
Representation of voice pitch
Pitch contours extracted from the FFR (solid lines) to each of the four speech stimuli are superimposed on their corresponding stimulus f0 contours (broken lines) in Fig. 2. It is clear from this figure that the phase-locked FFR activity carrying pitch-relevant information faithfully follows the pitch changes presented in each stimulus.
The short-term autocorrelation functions and the running autocorrelograms for all five stimuli and their corresponding grand average FFRs are plotted in Fig. 3.
Representation of voice pitch
The results of this study clearly demonstrated that for several stimuli with time-varying f0 contours, the prominent interval band in the phase-locked FFR neural activity followed closely the fundamental period (1/f0). These findings suggest that a robust neural temporal representation for pitch is preserved in the phase-locked neural activity of an ensemble of neural elements in the rostral brainstem. These findings are consistent with Greenberg et al. (1987), who reported that the FFR encoded
Implications
Our knowledge about processing of speech sounds in the mammalian nervous system is largely derived from animal single-unit population studies at the level of the auditory nerve and cochlear nucleus. These studies have demonstrated that the temporal place code is indeed preserved at these auditory loci. However, it is not known if the temporal place scheme is preserved at more rostral levels in the brainstem where neural phase-locking is limited to frequencies below about 2000 Hz. For any scheme
Acknowledgments
This research was supported in part by a research grant from the National Institutes of Health (R01 DC04584-04; J.T.G.). We are grateful to the anonymous reviewers whose insightful suggestions have appreciably improved the manuscript.
References (72)
- et al.
The 500 Hz frequency-following potential in kangaroo rat: An evaluation with noise masking
Electroencephalogr. Clin. Neurophysiol.
(1980) Tone perception in Far Eastern languages
J. Phon.
(1983)- et al.
The human frequency following response: its behavior during continuous stimulation
Electroencephalogr. Clin. Neurophysiol.
(1976) - et al.
Neural temporal coding of low pitch. I. Human frequency following responses to complex tone
Hear. Res.
(1987) Human frequency-following responses: representation of steady-state synthetic vowels
Hear. Res.
(2002)- et al.
Representation of voice pitch in discharge patterns of auditory-nerve fibers
Hear. Res.
(1984) - et al.
Scalp recorded early responses in man to frequencies in the speech range
Electroencephalogr. Clin. Neurophysiol.
(1973) - et al.
Representation of vowel stimuli in the ventral cochlear nucleus of the chinchilla
Hear. Res.
(2000) Temporal coding of 200% amplitude modulated signal in the ventral cochlear nucleus of cat
Hear. Res.
(1994)Neural encoding of single-formant stimuli in the ventral cochlear nucleus of the chinchilla
Hear. Res.
(1998)
Cochlear microphonic responses of the peripheral auditory system to frequency-varying signals
Am. J. Otolaryngol.
Far-field recorded frequency following responses: evidence for the locus of brainstem sources
Electroencephalogr. Clin. Neurophysiol.
Hemispheric specialization for pitch and tone: evidence from Thai
J. Phon.
Representation of whispered vowels in the discharge patterns of auditory-nerve fibers
Hear. Res.
Dichotic perception of Mandarin tones by Chinese and American listeners
Brain Language
Contextual tonal variations in Mandarin
J. Phon.
The frequency-following response and the onset response: Evaluation of frequency specificity using a forward-masking paradigm
Ear Hear.
The representation of steady-state vowel sound /e/ in the discharge patterns of cat anteroventral cochlear nucleus neurons
J. Neurophysiol.
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
Proc. Inst. Phon. Sci.
Nonspectral pitch
J. Acoust. Soc. Am.
Neural correlates of the pitch of complex tones. I. Pitch and pitch salience
J. Neurophysiol.
Temporal integration of tone glides
J. Acoust. Soc. Am.
Rate effects in the detection of short duration tonal glides
J. Acoust. Soc. Am.
Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers
J. Acoust. Soc. Am.
Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds
J. Acoust. Soc. Am.
Place and time coding of frequency in the peripheral auditory system:some physiological pros and cons
Audiology
On the pitch of periodic pulses
J. Acoust. Soc. Am.
Pitch of periodic pulses without fundamental component
J. Acoust. Soc. Am.
Crosslanguage differences in tone perception: A multidimensional scaling investigation
Language Speech
Scalp recorded frequency following responses in neonates
Audiology
Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment
J. Phon.
An optimum processor theory for the central formation of the pitch of complex tones
J. Acoust. Soc. Am.
Cited by (139)
Methodological considerations when measuring and analyzing auditory steady-state responses with multi-channel EEG
2022, Current Research in NeurobiologyAuditory neurophysiological development in early childhood: A growth curve modeling approach
2021, Clinical NeurophysiologyFrequency following responses and rate change complexes in cochlear implant users
2021, Hearing ResearchCitation Excerpt :Although cortical regions phase-lock to these high rates (Coffey et al., 2016), the main generator(s) that contribute to the scalp-recorded FFR are located in the brainstem (Bidelman, 2018). FFR strength has been associated, in the acoustically-stimulated auditory pathway, with speech-perception-in-noise performance (Coffey et al., 2017a) and pitch perception (Krishnan et al., 2004; Swaminathan et al., 2008; Zhang and Gong, 2017). Although the FFR does not reflect the processing of pitch per se (Gockel et al., 2011), it could potentially be used to assess the phase-locking ability of the brainstem to F0 in individual CI users.
High gamma cortical processing of continuous speech in younger and older listeners
2020, NeuroImageCitation Excerpt :The characteristic frequency of such time-locked activity is generally thought to decrease along the ascending auditory pathway. For example, subcortical activity at ∼100 Hz and above may directly encode the temporal pitch information of voiced speech (Forte et al., 2017; Krishnan et al., 2004), while cortical activity below ∼10 Hz, which time-locks to the slowly varying envelope of speech, also time-locks to higher level features of language such as phoneme and word boundaries (Brodbeck et al., 2018a). Prior research has also found differences in both subcortical and cortical processing for older and younger listeners (Anderson et al., 2012; Presacco et al., 2016a, 2016b), which suggest age-related auditory temporal processing deficits.