Skip to main content
Log in

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data. This database currently includes data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects. Each of the individuals with speech impediments are given standardized assessments of speech-motor function by a speech-language pathologist. Acoustic data is obtained by one head-mounted and one directional microphone. Articulatory data is obtained by electromagnetic articulography, which allows the measurement of the tongue and other articulators during speech, and by 3D reconstruction from binocular video sequences. The stimuli are obtained from a variety of sources including the TIMIT database, lists of identified phonetic contrasts, and assessments of speech intelligibility. This paper also includes some analysis as to how dysarthric speech differs from non-dysarthric speech according to features such as length of phonemes, and pronunciation errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The Cs5view real-time position display flags a coil in red if the RMS error exceeds 30 units; however, the RMS during rarely exceeded 8 units across all coils, which is suitable for minimizing position tracking errors (Kroos 2008; Yunusova et al. 2009).

  2. Two sensors, in principle, are sufficient to characterize the six degrees of freedom related to rigid-body motions (Hoole and Zierdt 2010).

References

  • Aarabi, P., & Shi, G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics Part B, 34(4):1763–1773.

    Article  Google Scholar 

  • Bennett, J. W., van Lieshout, P., & Steele, C. M. (2007). Tongue control for speech and swallowing in healthy younger and older subjects. International Journal of Orofacial Myology, 33, 5–18.

    Google Scholar 

  • Campbell, J. M., Bell, S. K., & Keith, L. K. (2001). Concurrent validity of the peabody picture vocabulary test-third edition as an intelligence and achievement screener for low SES African American children. Assessment, 8(1), 85–94.

    Article  Google Scholar 

  • Clear, J. H. (1993). The British national corpus. In: The digital word: Text-based computing in the humanities (pp. 163–187). Cambridge, MA: MIT Press.

  • Craig, M., van Lieshout, P., & Wong, W. (2007). Suitability of a UV-based video recording system for the analysis of small facial motions during speech. Speech Communication, 49(9), 679–686.

    Article  Google Scholar 

  • Enderby, P. M. (1983). Frenchay dysarthria assessment. San Diego: College Hill Press.

    Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  • Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2006), Vol. 3, pp. 1060–1063.

  • Herndon, R. M. (1997). Handbook of neurologic rating scales (1st ed.). New York: Demos Medical Publishing.

    Google Scholar 

  • Hoole, P., & Zierdt, A. (2010). Five-dimensional articulography. In: B. Maassen & P. H. van Lieshout (Eds.), Speech motor control: New developments in basic and applied research (Chap. 20, pp. 331–349). Oxford: Oxford University Press.

  • Hoole, P., Zierdt, A., & Geng, C. (2003) Beyond 2D in articulatory data acquisition and analysis. In: Proceedings of the fifteenth international congress of phonetic sciences, Barcelona, pp. 265–268.

  • Hosom, J. P., Kain, A. B., Mishra, T., van Santen, J. P. H., Fried-Oken, M., & Staehely, J. (2003). Intelligibility of modifications to dysarthric speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), Vol. 1, pp. 924–927.

  • Jayaram, G., & Abdelhamied, K. (1995) Experiments in dysarthric speech recognition using artificial neural networks. Journal of Rehabilitation Research and Development, 32(2), 162–169.

    Google Scholar 

  • Kaburagi, T., Wakamiya, K., & Honda, M. (2005). Three-dimensional electromagnetic articulography: A measurement principle. Journal of the Acoustical Society of America, 118(1), 428–443.

    Article  Google Scholar 

  • Kain, A. B., Hosom, J. P., Niu, X., van Santen, J. P., Fried-Oken, M., & Staehely, J. (2007). Improving the intelligibility of dysarthric speech. Speech Communication, 49(9), 743–759.

    Article  Google Scholar 

  • Kent, R. D. (2000). Research on speech motor control and its disorders: A review and prospective. Journal of Communication Disorders, 33(5), 391–428.

    Article  Google Scholar 

  • Kent, R. D., & Rosen, K. (2004). Motor control perspectives on motor speech disorders. In: B. Maassen, R. Kent, H. Peters, P. V. Lieshout, & W. Hulstijn (Eds.), Speech motor control in normal and disordered speech (Chap. 12, pp 285–311). Oxford: Oxford University Press.

  • Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.

    Google Scholar 

  • Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T., Watkin, K., et al. (2008). Dysarthric speech database for universal access research. In: Proceedings of the international conference on spoken language processing (Interspeech ’08), Brisbane, Australia, pp. 1741–1744.

  • Kroos, C. (2008). Measurement accuracy in 3D electromagnetic articulography (Carstens AG500). In: Proceedings of the 8th international seminar on speech production, pp. 61–64.

  • Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., et al. (2007). Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2007), Honolulu.

  • Markov, K., Dang, J., & Nakamura, S. (2006). Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework. Speech Communication, 48(2), 161–175.

    Article  Google Scholar 

  • Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.

    Article  Google Scholar 

  • Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzjo, J. E., & Bunnell, H. (1996). The nemours database of dysarthric speech. In: Proceedings of the fourth international conference on spoken language processing, Philadelphia, PA, USA.

  • Patel, R. (2002). Prosodic control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research, 45(5), 858–870.

    Article  Google Scholar 

  • Richmond, K., King, S., & Taylor, P. (2003). Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17, 153–172.

    Article  Google Scholar 

  • Roy, N., Leeper, H. A., Blomgren, M., & Cameron, R. M. (2001). A description of phonetic, acoustic, and physiological changes associated with improved intelligibility in a speaker with spastic dysarthria. American Journal of Speech-Language Pathology, 10, 274–290.

    Article  Google Scholar 

  • Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. In: Proceedings of the ninth international ACM SIGACCESS conference on computers and accessibility, Tempe, AZ.

  • Rudzicz, F. (2009). Applying discretized articulatory knowledge to dysarthric speech. In: Proceedings of the 2009 IEEE international conference on acoustics, speech, and signal processing (ICASSP 09), Taipei, Taiwan.

  • Rudzicz, F. (2010). Adaptive kernel canonical correlation analysis for estimation of task dynamics from acoustics. In: Proceedings of the 2010 IEEE international conference on acoustics, speech, and signal processing (ICASSP10), Dallas, TX.

  • Shi, G., Aarabi, P., & Jiang, H. (2007). Phase-based dual-microphone speech enhancement using a prior speech model. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 109–118.

    Article  Google Scholar 

  • Toda, T., Black, A. W., & Tokuda, K. (2008). Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, 50(3), 215–227. doi:10.1016/j.specom.2007.09.001.

  • Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics and Automation RA-3(4), 323–344.

    Article  Google Scholar 

  • van Lieshout, P., Hulstijn, W., Alfonso, P. J., & Peters, H. F. (1997). Higher and lower order influences on the stability of the dynamic coupling between articulators. In: W. Hulstijn, H. F. Peters, & P. van Lieshout (Eds.), Speech production: Motor control, brain research and fluency disorders (pp. 161–170). Amsterdam: Elsevier Science Publishers.

    Google Scholar 

  • van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective on rhotic articulation problems: A descriptive case study. Asia Pacific Journal of Speech, Language, and Hearing, 11(4), 283–303.

    Google Scholar 

  • Webber, S. G. (2005). Webber photo cards: Story starters.

  • Westbury, J. R. (1994). X-ray microbeam speech production database user’s handbook. Waisman Center on Mental Retardation & Human Development.

  • Wrench, A. (1999). The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html.

  • Yorkston, K. M., & Beukelman, D. R. (1981). Assessment of intelligibility of dysarthric speech. Tigard, OR: C.C. Publications Inc.

    Google Scholar 

  • Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research, 51, 596–611.

    Article  Google Scholar 

  • Yunusova, Y., Green, J. R., & Mefferd, A. (2009) Accuracy assessment for AG500, electromagnetic articulograph. Journal of Speech, Language, and Hearing Research, 52, 547–555.

    Article  Google Scholar 

  • Zierdt, A., Hoole, P., & Tillmann, H. G. (1999). Development of a system for three-dimensional fleshpoint measurement of speech movements. In: Proceedings of the XIVth international congress of phonetic sciences, p. 3.

  • Zierdt, A., Hoole, P., Honda, M., Kaburagi, T., & Tillmann, H. G. (2000). Extracting tongues from moving heads. In: Proceedings of the 5th speech production seminar, pp. 313–316.

  • Zue, V., Seneff, S., Glass, J. (1989). Speech database development: TIMIT and beyond. In: Proceedings of ESCA tutorial and research workshop on speech input/output assessment and speech databases (SIOA-1989), Noordwijkerhout, The Netherlands, Vol. 2, pp. 35–40.

Download references

Acknowledgments

The authors acknowledge the support of Toronto Rehabilitation Institute which receives funding under the Provincial Rehabilitation Research Program from the Ministry of Health and Long-Term Care in Ontario. The views expressed do not necessarily reflect those of the Ministry. Equipment and space have been funded with grants from the Canada Foundation for Innovation, Ontario Innovation Trust and the Ministry of Research and Innovation. This project is also funded by Bell University Labs, the University of Toronto, and the Natural Sciences and Engineering Research Council of Canada

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Rudzicz.

Appendix: Articulatory contrasts

Appendix: Articulatory contrasts

See Table 3

Table 3 Articulatory contrasts, from Kent et al. (1989)

.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rudzicz, F., Namasivayam, A.K. & Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation 46, 523–541 (2012). https://doi.org/10.1007/s10579-011-9145-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9145-0

Keywords

Navigation