Abstract
This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data. This database currently includes data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects. Each of the individuals with speech impediments are given standardized assessments of speech-motor function by a speech-language pathologist. Acoustic data is obtained by one head-mounted and one directional microphone. Articulatory data is obtained by electromagnetic articulography, which allows the measurement of the tongue and other articulators during speech, and by 3D reconstruction from binocular video sequences. The stimuli are obtained from a variety of sources including the TIMIT database, lists of identified phonetic contrasts, and assessments of speech intelligibility. This paper also includes some analysis as to how dysarthric speech differs from non-dysarthric speech according to features such as length of phonemes, and pronunciation errors.
Similar content being viewed by others
Notes
Two sensors, in principle, are sufficient to characterize the six degrees of freedom related to rigid-body motions (Hoole and Zierdt 2010).
References
Aarabi, P., & Shi, G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics Part B, 34(4):1763–1773.
Bennett, J. W., van Lieshout, P., & Steele, C. M. (2007). Tongue control for speech and swallowing in healthy younger and older subjects. International Journal of Orofacial Myology, 33, 5–18.
Campbell, J. M., Bell, S. K., & Keith, L. K. (2001). Concurrent validity of the peabody picture vocabulary test-third edition as an intelligence and achievement screener for low SES African American children. Assessment, 8(1), 85–94.
Clear, J. H. (1993). The British national corpus. In: The digital word: Text-based computing in the humanities (pp. 163–187). Cambridge, MA: MIT Press.
Craig, M., van Lieshout, P., & Wong, W. (2007). Suitability of a UV-based video recording system for the analysis of small facial motions during speech. Speech Communication, 49(9), 679–686.
Enderby, P. M. (1983). Frenchay dysarthria assessment. San Diego: College Hill Press.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2006), Vol. 3, pp. 1060–1063.
Herndon, R. M. (1997). Handbook of neurologic rating scales (1st ed.). New York: Demos Medical Publishing.
Hoole, P., & Zierdt, A. (2010). Five-dimensional articulography. In: B. Maassen & P. H. van Lieshout (Eds.), Speech motor control: New developments in basic and applied research (Chap. 20, pp. 331–349). Oxford: Oxford University Press.
Hoole, P., Zierdt, A., & Geng, C. (2003) Beyond 2D in articulatory data acquisition and analysis. In: Proceedings of the fifteenth international congress of phonetic sciences, Barcelona, pp. 265–268.
Hosom, J. P., Kain, A. B., Mishra, T., van Santen, J. P. H., Fried-Oken, M., & Staehely, J. (2003). Intelligibility of modifications to dysarthric speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), Vol. 1, pp. 924–927.
Jayaram, G., & Abdelhamied, K. (1995) Experiments in dysarthric speech recognition using artificial neural networks. Journal of Rehabilitation Research and Development, 32(2), 162–169.
Kaburagi, T., Wakamiya, K., & Honda, M. (2005). Three-dimensional electromagnetic articulography: A measurement principle. Journal of the Acoustical Society of America, 118(1), 428–443.
Kain, A. B., Hosom, J. P., Niu, X., van Santen, J. P., Fried-Oken, M., & Staehely, J. (2007). Improving the intelligibility of dysarthric speech. Speech Communication, 49(9), 743–759.
Kent, R. D. (2000). Research on speech motor control and its disorders: A review and prospective. Journal of Communication Disorders, 33(5), 391–428.
Kent, R. D., & Rosen, K. (2004). Motor control perspectives on motor speech disorders. In: B. Maassen, R. Kent, H. Peters, P. V. Lieshout, & W. Hulstijn (Eds.), Speech motor control in normal and disordered speech (Chap. 12, pp 285–311). Oxford: Oxford University Press.
Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T., Watkin, K., et al. (2008). Dysarthric speech database for universal access research. In: Proceedings of the international conference on spoken language processing (Interspeech ’08), Brisbane, Australia, pp. 1741–1744.
Kroos, C. (2008). Measurement accuracy in 3D electromagnetic articulography (Carstens AG500). In: Proceedings of the 8th international seminar on speech production, pp. 61–64.
Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., et al. (2007). Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2007), Honolulu.
Markov, K., Dang, J., & Nakamura, S. (2006). Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework. Speech Communication, 48(2), 161–175.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.
Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzjo, J. E., & Bunnell, H. (1996). The nemours database of dysarthric speech. In: Proceedings of the fourth international conference on spoken language processing, Philadelphia, PA, USA.
Patel, R. (2002). Prosodic control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research, 45(5), 858–870.
Richmond, K., King, S., & Taylor, P. (2003). Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17, 153–172.
Roy, N., Leeper, H. A., Blomgren, M., & Cameron, R. M. (2001). A description of phonetic, acoustic, and physiological changes associated with improved intelligibility in a speaker with spastic dysarthria. American Journal of Speech-Language Pathology, 10, 274–290.
Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. In: Proceedings of the ninth international ACM SIGACCESS conference on computers and accessibility, Tempe, AZ.
Rudzicz, F. (2009). Applying discretized articulatory knowledge to dysarthric speech. In: Proceedings of the 2009 IEEE international conference on acoustics, speech, and signal processing (ICASSP 09), Taipei, Taiwan.
Rudzicz, F. (2010). Adaptive kernel canonical correlation analysis for estimation of task dynamics from acoustics. In: Proceedings of the 2010 IEEE international conference on acoustics, speech, and signal processing (ICASSP10), Dallas, TX.
Shi, G., Aarabi, P., & Jiang, H. (2007). Phase-based dual-microphone speech enhancement using a prior speech model. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 109–118.
Toda, T., Black, A. W., & Tokuda, K. (2008). Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, 50(3), 215–227. doi:10.1016/j.specom.2007.09.001.
Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics and Automation RA-3(4), 323–344.
van Lieshout, P., Hulstijn, W., Alfonso, P. J., & Peters, H. F. (1997). Higher and lower order influences on the stability of the dynamic coupling between articulators. In: W. Hulstijn, H. F. Peters, & P. van Lieshout (Eds.), Speech production: Motor control, brain research and fluency disorders (pp. 161–170). Amsterdam: Elsevier Science Publishers.
van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective on rhotic articulation problems: A descriptive case study. Asia Pacific Journal of Speech, Language, and Hearing, 11(4), 283–303.
Webber, S. G. (2005). Webber photo cards: Story starters.
Westbury, J. R. (1994). X-ray microbeam speech production database user’s handbook. Waisman Center on Mental Retardation & Human Development.
Wrench, A. (1999). The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html.
Yorkston, K. M., & Beukelman, D. R. (1981). Assessment of intelligibility of dysarthric speech. Tigard, OR: C.C. Publications Inc.
Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research, 51, 596–611.
Yunusova, Y., Green, J. R., & Mefferd, A. (2009) Accuracy assessment for AG500, electromagnetic articulograph. Journal of Speech, Language, and Hearing Research, 52, 547–555.
Zierdt, A., Hoole, P., & Tillmann, H. G. (1999). Development of a system for three-dimensional fleshpoint measurement of speech movements. In: Proceedings of the XIVth international congress of phonetic sciences, p. 3.
Zierdt, A., Hoole, P., Honda, M., Kaburagi, T., & Tillmann, H. G. (2000). Extracting tongues from moving heads. In: Proceedings of the 5th speech production seminar, pp. 313–316.
Zue, V., Seneff, S., Glass, J. (1989). Speech database development: TIMIT and beyond. In: Proceedings of ESCA tutorial and research workshop on speech input/output assessment and speech databases (SIOA-1989), Noordwijkerhout, The Netherlands, Vol. 2, pp. 35–40.
Acknowledgments
The authors acknowledge the support of Toronto Rehabilitation Institute which receives funding under the Provincial Rehabilitation Research Program from the Ministry of Health and Long-Term Care in Ontario. The views expressed do not necessarily reflect those of the Ministry. Equipment and space have been funded with grants from the Canada Foundation for Innovation, Ontario Innovation Trust and the Ministry of Research and Innovation. This project is also funded by Bell University Labs, the University of Toronto, and the Natural Sciences and Engineering Research Council of Canada
Author information
Authors and Affiliations
Corresponding author
Appendix: Articulatory contrasts
Rights and permissions
About this article
Cite this article
Rudzicz, F., Namasivayam, A.K. & Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation 46, 523–541 (2012). https://doi.org/10.1007/s10579-011-9145-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-011-9145-0