The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Rudzicz, Frank; Namasivayam, Aravind Kumar; Wolff, Talya

doi:10.1007/s10579-011-9145-0

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Original Paper
Published: 26 March 2011

Volume 46, pages 523–541, (2012)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Frank Rudzicz¹,
Aravind Kumar Namasivayam^2,3 &
Talya Wolff⁴

3212 Accesses
156 Citations
Explore all metrics

Abstract

This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data. This database currently includes data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects. Each of the individuals with speech impediments are given standardized assessments of speech-motor function by a speech-language pathologist. Acoustic data is obtained by one head-mounted and one directional microphone. Articulatory data is obtained by electromagnetic articulography, which allows the measurement of the tongue and other articulators during speech, and by 3D reconstruction from binocular video sequences. The stimuli are obtained from a variety of sources including the TIMIT database, lists of identified phonetic contrasts, and assessments of speech intelligibility. This paper also includes some analysis as to how dysarthric speech differs from non-dysarthric speech according to features such as length of phonemes, and pronunciation errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Speech

How Dysarthric Prosody Impacts Naïve Listeners’ Recognition

Towards Enhancing the Acoustic Models for Dysarthric Speech

Notes

The Cs5view real-time position display flags a coil in red if the RMS error exceeds 30 units; however, the RMS during rarely exceeded 8 units across all coils, which is suitable for minimizing position tracking errors (Kroos 2008; Yunusova et al. 2009).
Two sensors, in principle, are sufficient to characterize the six degrees of freedom related to rigid-body motions (Hoole and Zierdt 2010).

References

Aarabi, P., & Shi, G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics Part B, 34(4):1763–1773.
Article Google Scholar
Bennett, J. W., van Lieshout, P., & Steele, C. M. (2007). Tongue control for speech and swallowing in healthy younger and older subjects. International Journal of Orofacial Myology, 33, 5–18.
Google Scholar
Campbell, J. M., Bell, S. K., & Keith, L. K. (2001). Concurrent validity of the peabody picture vocabulary test-third edition as an intelligence and achievement screener for low SES African American children. Assessment, 8(1), 85–94.
Article Google Scholar
Clear, J. H. (1993). The British national corpus. In: The digital word: Text-based computing in the humanities (pp. 163–187). Cambridge, MA: MIT Press.
Craig, M., van Lieshout, P., & Wong, W. (2007). Suitability of a UV-based video recording system for the analysis of small facial motions during speech. Speech Communication, 49(9), 679–686.
Article Google Scholar
Enderby, P. M. (1983). Frenchay dysarthria assessment. San Diego: College Hill Press.
Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2006), Vol. 3, pp. 1060–1063.
Herndon, R. M. (1997). Handbook of neurologic rating scales (1st ed.). New York: Demos Medical Publishing.
Google Scholar
Hoole, P., & Zierdt, A. (2010). Five-dimensional articulography. In: B. Maassen & P. H. van Lieshout (Eds.), Speech motor control: New developments in basic and applied research (Chap. 20, pp. 331–349). Oxford: Oxford University Press.
Hoole, P., Zierdt, A., & Geng, C. (2003) Beyond 2D in articulatory data acquisition and analysis. In: Proceedings of the fifteenth international congress of phonetic sciences, Barcelona, pp. 265–268.
Hosom, J. P., Kain, A. B., Mishra, T., van Santen, J. P. H., Fried-Oken, M., & Staehely, J. (2003). Intelligibility of modifications to dysarthric speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), Vol. 1, pp. 924–927.
Jayaram, G., & Abdelhamied, K. (1995) Experiments in dysarthric speech recognition using artificial neural networks. Journal of Rehabilitation Research and Development, 32(2), 162–169.
Google Scholar
Kaburagi, T., Wakamiya, K., & Honda, M. (2005). Three-dimensional electromagnetic articulography: A measurement principle. Journal of the Acoustical Society of America, 118(1), 428–443.
Article Google Scholar
Kain, A. B., Hosom, J. P., Niu, X., van Santen, J. P., Fried-Oken, M., & Staehely, J. (2007). Improving the intelligibility of dysarthric speech. Speech Communication, 49(9), 743–759.
Article Google Scholar
Kent, R. D. (2000). Research on speech motor control and its disorders: A review and prospective. Journal of Communication Disorders, 33(5), 391–428.
Article Google Scholar
Kent, R. D., & Rosen, K. (2004). Motor control perspectives on motor speech disorders. In: B. Maassen, R. Kent, H. Peters, P. V. Lieshout, & W. Hulstijn (Eds.), Speech motor control in normal and disordered speech (Chap. 12, pp 285–311). Oxford: Oxford University Press.
Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.
Google Scholar
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T., Watkin, K., et al. (2008). Dysarthric speech database for universal access research. In: Proceedings of the international conference on spoken language processing (Interspeech ’08), Brisbane, Australia, pp. 1741–1744.
Kroos, C. (2008). Measurement accuracy in 3D electromagnetic articulography (Carstens AG500). In: Proceedings of the 8th international seminar on speech production, pp. 61–64.
Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., et al. (2007). Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 2007), Honolulu.
Markov, K., Dang, J., & Nakamura, S. (2006). Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework. Speech Communication, 48(2), 161–175.
Article Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.
Article Google Scholar
Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzjo, J. E., & Bunnell, H. (1996). The nemours database of dysarthric speech. In: Proceedings of the fourth international conference on spoken language processing, Philadelphia, PA, USA.
Patel, R. (2002). Prosodic control in severe dysarthria: Preserved ability to mark the question-statement contrast. Journal of Speech, Language, and Hearing Research, 45(5), 858–870.
Article Google Scholar
Richmond, K., King, S., & Taylor, P. (2003). Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17, 153–172.
Article Google Scholar
Roy, N., Leeper, H. A., Blomgren, M., & Cameron, R. M. (2001). A description of phonetic, acoustic, and physiological changes associated with improved intelligibility in a speaker with spastic dysarthria. American Journal of Speech-Language Pathology, 10, 274–290.
Article Google Scholar
Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. In: Proceedings of the ninth international ACM SIGACCESS conference on computers and accessibility, Tempe, AZ.
Rudzicz, F. (2009). Applying discretized articulatory knowledge to dysarthric speech. In: Proceedings of the 2009 IEEE international conference on acoustics, speech, and signal processing (ICASSP 09), Taipei, Taiwan.
Rudzicz, F. (2010). Adaptive kernel canonical correlation analysis for estimation of task dynamics from acoustics. In: Proceedings of the 2010 IEEE international conference on acoustics, speech, and signal processing (ICASSP10), Dallas, TX.
Shi, G., Aarabi, P., & Jiang, H. (2007). Phase-based dual-microphone speech enhancement using a prior speech model. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 109–118.
Article Google Scholar
Toda, T., Black, A. W., & Tokuda, K. (2008). Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, 50(3), 215–227. doi:10.1016/j.specom.2007.09.001.
Tsai, R. Y. (1987). A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics and Automation RA-3(4), 323–344.
Article Google Scholar
van Lieshout, P., Hulstijn, W., Alfonso, P. J., & Peters, H. F. (1997). Higher and lower order influences on the stability of the dynamic coupling between articulators. In: W. Hulstijn, H. F. Peters, & P. van Lieshout (Eds.), Speech production: Motor control, brain research and fluency disorders (pp. 161–170). Amsterdam: Elsevier Science Publishers.
Google Scholar
van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective on rhotic articulation problems: A descriptive case study. Asia Pacific Journal of Speech, Language, and Hearing, 11(4), 283–303.
Google Scholar
Webber, S. G. (2005). Webber photo cards: Story starters.
Westbury, J. R. (1994). X-ray microbeam speech production database user’s handbook. Waisman Center on Mental Retardation & Human Development.
Wrench, A. (1999). The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html.
Yorkston, K. M., & Beukelman, D. R. (1981). Assessment of intelligibility of dysarthric speech. Tigard, OR: C.C. Publications Inc.
Google Scholar
Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research, 51, 596–611.
Article Google Scholar
Yunusova, Y., Green, J. R., & Mefferd, A. (2009) Accuracy assessment for AG500, electromagnetic articulograph. Journal of Speech, Language, and Hearing Research, 52, 547–555.
Article Google Scholar
Zierdt, A., Hoole, P., & Tillmann, H. G. (1999). Development of a system for three-dimensional fleshpoint measurement of speech movements. In: Proceedings of the XIVth international congress of phonetic sciences, p. 3.
Zierdt, A., Hoole, P., Honda, M., Kaburagi, T., & Tillmann, H. G. (2000). Extracting tongues from moving heads. In: Proceedings of the 5th speech production seminar, pp. 313–316.
Zue, V., Seneff, S., Glass, J. (1989). Speech database development: TIMIT and beyond. In: Proceedings of ESCA tutorial and research workshop on speech input/output assessment and speech databases (SIOA-1989), Noordwijkerhout, The Netherlands, Vol. 2, pp. 35–40.

Download references

Acknowledgments

The authors acknowledge the support of Toronto Rehabilitation Institute which receives funding under the Provincial Rehabilitation Research Program from the Ministry of Health and Long-Term Care in Ontario. The views expressed do not necessarily reflect those of the Ministry. Equipment and space have been funded with grants from the Canada Foundation for Innovation, Ontario Innovation Trust and the Ministry of Research and Innovation. This project is also funded by Bell University Labs, the University of Toronto, and the Natural Sciences and Engineering Research Council of Canada

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, ON, Canada
Frank Rudzicz
The Speech and Stuttering Institute, Toronto, ON, Canada
Aravind Kumar Namasivayam
Oral Dynamics Laboratory, Department of Speech-Language Pathology, University of Toronto, Toronto, ON, Canada
Aravind Kumar Namasivayam
Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON, Canada
Talya Wolff

Authors

Frank Rudzicz
View author publications
You can also search for this author in PubMed Google Scholar
Aravind Kumar Namasivayam
View author publications
You can also search for this author in PubMed Google Scholar
Talya Wolff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Rudzicz.

Appendix: Articulatory contrasts

See Table 3

Table 3 Articulatory contrasts, from Kent et al. (1989)

Full size table

.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rudzicz, F., Namasivayam, A.K. & Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation 46, 523–541 (2012). https://doi.org/10.1007/s10579-011-9145-0

Download citation

Published: 26 March 2011
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10579-011-9145-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Abstract

Access this article

Similar content being viewed by others

Evaluation of Speech

How Dysarthric Prosody Impacts Naïve Listeners’ Recognition

Towards Enhancing the Acoustic Models for Dysarthric Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Articulatory contrasts

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Abstract

Access this article

Similar content being viewed by others

Evaluation of Speech

How Dysarthric Prosody Impacts Naïve Listeners’ Recognition

Towards Enhancing the Acoustic Models for Dysarthric Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Articulatory contrasts

Appendix: Articulatory contrasts

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation