Development of Japanese infant speech database from longitudinal recordings☆
Introduction
Infant speech development can be studied using several approaches. One involves analyzing infant utterances acoustically to reveal the developmental changes that occur with age. For this acoustic analysis, the infant utterances can be collected by using either a cross-sectional or a longitudinal approach.
With the cross-sectional approach (e.g., Eguchi and Hirsh, 1969, Keating and Buhr, 1978, Kent and Murray, 1982, Robb and Saxman, 1985), infant utterances are collected from many infants at different ages. A merit of this approach is that infant utterances are obtained quickly. However, its demerit is that individual differences may cause an artifact, because the utterances of infants at a particular age do not necessarily have the same acoustic characteristics. They might be affected by the developmental state of the speech organs and the speaking ability of each infant. Results obtained with a cross-sectional approach might not accurately reflect the course of speech development (Bennett, 1983, Kent, 1976).
In contrast to the cross-sectional approach, utterances are collected from the same infant at different ages with the longitudinal approach (e.g., Bennett, 1983, Fairbanks, 1942, Hollien et al., 1994, McRoberts and Best, 1997, Robb et al., 1989, Shepard and Lane, 1968). The longitudinal approach usually employs only a few infants and sometimes only one (e.g., McRoberts and Best, 1997, Reissland, 1998). The longitudinal approach is more robust against the artifacts of individual difference than the cross-sectional approach, because it traces the individual developmental change in each infant. However, with the longitudinal approach, it takes a very long time to obtain infant utterances, sometimes several years. Therefore, the longitudinal approach is more expensive than the cross-sectional approach in terms of time consumption. Because of this difficulty, the longitudinal approach has been employed by fewer studies than the cross-sectional approach. However, both these approaches are important to research on speech development.
To overcome the problem posed by the longitudinal approach, it is a good idea to develop an infant speech database and share it among researchers. The child language data exchange system (CHILDES) (MacWhinney, 2000) is a project based on this idea. CHILDES is a set of databases of infant speech in many languages. It contains an utterance file, its transcription, and other useful information about infant speech on a large scale. Some of the databases in CHILDES also contain video files of infant utterances. CHILDES is being developed in different languages by many researchers who have provided their databases.
There are four Japanese databases in CHILDES, namely the Noji, Ishii, Hamasaki, and Miyata databases.
The Noji database consists of data collected frequently from one infant between the ages of 0 and 7 years. However, this database only contains utterance transcriptions and does not provide utterance files, because it is based on Noji’s diary.
The other three databases provide utterance files and/or video files, although their collection periods are much shorter than that of Noji’s database. The Ishii database is based on one infant. Data were largely collected bimonthly. The collection periods were from 8 to 23 months and from 41 to 44 months. The Hamasaki database is also based on one infant. Data were collected two or three times per month between the ages of 26 and 43 months. The Miyata database is based on three infants. The data were collected from about 15 months to about 36 months. The Miyata database also provides video files.
The four databases described above are useful for analyzing the development of, for example, an infant’s vocabulary and syntactic rules. However, they might not prove suitable for a longitudinal acoustic analysis of infant utterances, because the collection periods are short, the number of utterances is not very large, and utterance files are not provided. These three factors make it difficult to observe longitudinal acoustic changes in infant utterances from birth to childhood.
With this as the background, we first digitally recorded the utterances of Japanese infants and parents over about 5 years. Then, we developed our infant speech database from these longitudinal recordings by extracting utterances and providing them with various pieces of information.
We laid particular stress on information obtained from acoustic analysis when we developed our infant speech database. An example of this information is the start and end times of an utterance. With information about these times, it is possible to analyze developmental changes in the utterance duration, pause duration, utterance overlap, speaking rate, and utterance rhythm. Another example of this information is the fundamental frequency, with which it is possible to analyze developmental changes in the pitch accent and intonation pattern. A database containing the above information about utterances would lead to a better understanding of the developmental acoustic changes in the time and frequency domains that reflect the development of articulation skill and proficiency in language processing.
Section snippets
Participants
Five infants [A(kk), B(sk), C(sa), D(ma), and E(mk)] and their parents participated voluntarily in the recording. They were all Japanese. All the infants were born and raised in Tokyo or in Kanagawa prefecture, which adjoins Tokyo. Infant gender, birth month, height, and weight are shown in Table 1. They had no symptoms of disorder with respect to speech perception or speech production. Infants B and E are siblings. Infants C and D are also siblings. Infant A has a brother who is 10 years older
Database development
An infant speech database was developed from the longitudinal recordings. The database consists of a session file, an utterance file, a transcription file, a property tag file, a time record file, a comment file, a fundamental frequency file, a voiced/unvoiced label file, and a phoneme label file. These files were stored in directories, which were classified by month and infant. HTML files were developed as links to these files. Fig. 1 is a schematic diagram of the database structure. The
Research applications
Our infant speech database has contributed to several pieces of developmental research. For example, Ishizuka et al. (2007) revealed developmental changes in the spectral peaks of vowels in an infant’s utterances. They found that the spectral peaks gradually diverge to form a set of Japanese vowels by 24 months of age. Amano et al. (2006) analyzed the developmental change of F0 in infants’ and parents’ utterances. They found that the infants’ F0 decreases almost constantly along with age in
Database release
Our infant speech database with its search software and a custom-made audio player is released by The Speech Resources Consortium (http://research.nii.ac.jp/src/eng/index.html) at a price of 85,500 yen. A waveform editor tuned to the database will be available from Arcadia Corporation (http://www.arcadia.co.jp/). This waveform editor is not included in the database price. The waveform editor automatically overlays the F0 data in the database onto a spectrogram. It also automatically overlays the
Conclusion
An infant speech database was developed from 5 years of recordings of utterances of five Japanese infants and their parents. This database contains a large number of utterances and their transcriptions, F0 values, and phoneme labels. This database makes it possible to trace the speech development of a particular infant from its birth until 5 years of age. It also offers the possibility to trace parents’ utterances addressed at an infant during this same period. Therefore, this database
References (22)
The pitch of ‘real’ and ‘rhetorical’ questions directed by a father to his daughter: a longitudinal case study
Infant Behav. Develop.
(1998)- et al.
Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings
J. Acoust. Soc. Amer.
(2006) A 3-year longitudinal study of school-aged children’s fundamental frequencies
J. Speech Hear. Res.
(1983)- et al.
Development of speech sounds in children
Acta Otolaryngol. (Suppl.)
(1969) An acoustical study of the pitch of infant hunger wails
Child Develop.
(1942)- et al.
Longitudinal research on adolescent voice change in males
J. Acoust. Soc. Amer.
(1994) - et al.
A study of word and sentence acquisition: quantitative analysis of longitudinal data
Cognit. Stud.
(2003) - et al.
Longitudinal developmental changes in spectral peaks of vowels produced by Japanese infants
J. Acoust. Soc. Amer.
(2007) - et al.
Development of speech communication between mother and child
Pafoumansu Kyouiku
(2004) - et al.
Speech overlap in Japanese mother–child conversations
J. Child Language
(2004)
Fundamental frequency in the speech of infants and children
J. Acoust. Soc. Amer.
Cited by (7)
Relationship between oxytocin and maternal approach behaviors to infants’ vocalizations
2020, Comprehensive PsychoneuroendocrinologyCitation Excerpt :To evaluate the degree of participants’ intent to approach or avoid the voice stimuli, we also asked them to rate their desire to pick up the baby (pick up) or leave the baby alone (ignore) using a VAS. Experimental voice stimuli were collected from the NTT infant voice database [33], which included voice samples recorded from 3- to 12-month-old infants. Two authors and one experiment cooperator independently labeled each voice stimulus as “crying,” “babbling,” or “laughing.”
Acquisition of vowel articulation in childhood investigated by acoustic-to-articulatory inversion
2017, Infant Behavior and DevelopmentCitation Excerpt :Especially, we analyzed longitudinal changes in combinations of multiple articulatory organs to show how flexible coordination of multiple articulatory organs develops. We used the NTT Japanese infant speech database (Amano et al., 2006, 2009; Ishizuka et al., 2007) for this study. This database contains the utterances of five normally developing children and their parents, recorded with 16-bit quantization at a sampling rate of 16 kHz.
Learnability of prosodic boundaries: Is infant-directed speech easier?
2016, Journal of the Acoustical Society of AmericaEvaluation of healthcare institutions for long-term preservation of electronic health records
2011, Communications in Computer and Information ScienceInfant Speech Database for Longitudinal Analysis of Spoken Language Development
2010, DiSS-LPSS Joint Workshop 2010 - The 5th Workshop on Disfluency in Spontaneous Speech and the 2nd International Symposium on Linguistic Patterns in Spontaneous Speech
- ☆
Parts of this research were presented at the 7th International Conference on Spoken Language Processing, Denver, CO, September 16–20, 2002.
- 1
Now at NTT Advanced Technology Corporation.