On the impact of dysarthric speech on contemporary ASR cloud platforms

De Russis, Luigi; Corno, Fulvio

doi:10.1007/s40860-019-00085-y

On the impact of dysarthric speech on contemporary ASR cloud platforms

Original Article
Published: 06 July 2019

Volume 5, pages 163–172, (2019)
Cite this article

Journal of Reliable Intelligent Environments Aims and scope Submit manuscript

714 Accesses
Explore all metrics

Abstract

The spread of voice-driven devices has a positive impact for people with disabilities in smart environments, since such devices allow them to perform a series of daily activities that were difficult or impossible before. As a result, their quality of life and autonomy increase. However, the speech recognition technology employed in such devices becomes limited with people having communication disorders, like dysarthria. People with dysarthria may be unable to control their smart environments, at least with the needed proficiency; this problem may negatively affect the perceived reliability of the entire environment. By exploiting the TORGO database of speech samples pronounced by people with dysarthria, this paper compares the accuracy of the dysarthric speech recognition as achieved by three speech recognition cloud platforms, namely IBM Watson Speech-to-Text, Google Cloud Speech, and Microsoft Azure Bing Speech. Such services, indeed, are used in many virtual assistants deployed in smart environments, such as Google Home. The goal is to investigate whether such cloud platforms are usable to recognize dysarthric speech, and to understand which of them is the most suitable for people with dysarthria. Results suggest that the three platforms have comparable performance in recognizing dysarthric speech and that the accuracy of the recognition is related to the speech intelligibility of the person. Overall, the platforms are limited when the dysarthric speech intelligibility is low (80–90% of word error rate), while they improve up to reach a word error rate of 15–25% for people without abnormality in their speech intelligibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition Performance of Selected Speech Recognition APIs – A Longitudinal Study

Automatic Speech Recognition for Portuguese: A Comparative Study

Automatic speech recognition in neurodegenerative disease

Article Open access 04 May 2021

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Ballati F, Corno F, De Russis L (2018) Assessing virtual assistant capabilities with italian dysarthric speech. In: Proceedings of the 20th international ACM SIGACCESS conference on computers and accessibility, ASSETS ’18, ACM, New York, pp 93–101. https://doi.org/10.1145/3234695.3236354
Ballati F, Corno F, De Russis L (2018) Hey siri, do you understand me?: Virtual assistants and dysarthria. In: Intelligent environments 2018: workshop proceedings of the 14th international conference on intelligent environments, IOS Press, Amsterdam, pp. 557–566. https://doi.org/10.3233/978-1-61499-874-7-557
Bigham JP, Kushalnagar R, Huang THK, Flores JP, Savage S (2017) On how deaf people might use speech to control devices. In: Proceedings of the 19th international ACM SIGACCESS conference on computers and accessibility-ASSETS’17. ACM Press. https://doi.org/10.1145/3132525.3134821
DeRosier R, Farber RS (2005) Speech recognition software as an assistive device: a pilot study of user satisfaction and psychosocial impact. Work 25(2):125–134
Google Scholar
Enderby P (1980) Frenchay dysarthria assessment. Int J Lang Commun Disord 15(3):165–173. https://doi.org/10.3109/13682828009112541
Article Google Scholar
Glasser AT, Kushalnagar KR, Kushalnagar RS (2017) Feasibility of using automatic speech recognition with voices of deaf and hard-of-hearing individuals. In: Proceedings of the 19th international ACM SIGACCESS conference on computers and accessibility-ASSETS’17. ACM Press. https://doi.org/10.1145/3132525.3134819
Google (2018) Cloud speech-to-text. https://cloud.google.com/speech-to-text/. Accessed 15 May 2019
Hawley MS (2002) Speech recognition as an input to electronic assistive technology. Br J Occup Therap 65(1):15–20. https://doi.org/10.1177/030802260206500104
Article Google Scholar
IBM (2018) Watson speech to text. https://www.ibm.com/cloud/watson-speech-to-text. Accessed 15 May 2019
Joy NM, Umesh S (2018) Improving acoustic models in torgo dysarthric speech database. IEEE Trans Neural Syst Rehabil Eng 26(3):637–645. https://doi.org/10.1109/TNSRE.2018.2802914
Article Google Scholar
Kent RD (2000) Research on speech motor control and its disorders: a review and prospective. J Commun Disord 33:391–427 quiz 428
Article Google Scholar
Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang T, Watkin K, Frame S (2008) Dysarthric speech database for universal access research. In: Interspeech, pp 1741–1744
Kim M, Kim Y, Yoo J, Wang J, Kim H (2017) Regularized speaker adaptation of kl-hmm for dysarthric speech recognition. IEEE Trans Neural Syst Rehabil Eng 25(9):1581–1591. https://doi.org/10.1109/TNSRE.2017.2681691
Article Google Scholar
Kim M, Wang J, Kim H (2016) Dysarthric speech recognition using kullback-leibler divergence-based hidden markov model. In: Interspeech 2016, pp 2671–2675. https://doi.org/10.21437/Interspeech.2016-776
Koester HH (2004) Usage, performance, and satisfaction outcomes for experienced users of automatic speech recognition. J Rehabil Res Dev 41(5):739. https://doi.org/10.1682/jrrd.2003.07.0106
Article Google Scholar
Menendez-Pidal X, Polikoff JB, Peters SM, Leonzio JE, Bunnell HT (1996) The nemours database of dysarthric speech. In: Proceeding of 4th international conference on spoken language processing. ICSLP ’96, 3:1962–1965. https://doi.org/10.1109/ICSLP.1996.608020
Microsoft (2018) Bing speech. https://azure.microsoft.com/en-us/services/cognitive-services/speech/. Accessed 15 May 2019
Pradhan A, Mehta K, Findlater L (2018) “Accessibility came by accident”: use of voice-controlled intelligent personal assistants by people with disabilities. In: Proceedings of the 2018 CHI conference on human factors in computing systems, CHI ’18, pp 459:1–459:13. ACM, New York, NY. https://doi.org/10.1145/3173574.3174033
Rudzicz F (2012) Using articulatory likelihoods in the recognition of dysarthric speech. Speech Commun 54(3):430–444. https://doi.org/10.1016/j.specom.2011.10.006
Article Google Scholar
Rudzicz F, Namasivayam AK, Wolff T (2012) The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Evaluat 46(4):523–541. https://doi.org/10.1007/s10579-011-9145-0
Article Google Scholar
Yu J, Xie X, Liu S, Hu S, Lam MWY, Wu X, Wong KH, Liu X, Meng H (2018) Development of the cuhk dysarthric speech recognition system for the ua speech corpus. In: Proceedings of Interspeech 2018, pp 2938–2942. https://doi.org/10.21437/Interspeech.2018-1541
Zue V, Seneff S, Glass J (1990) Speech database development at MIT: timit and beyond. Speech Commun 9(4):351–356. https://doi.org/10.1016/0167-6393(90)90010-7
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Fabio Ballati for his contribution to the data analysis and for the software implementation to interact with each ASR cloud platform.

Author information

Authors and Affiliations

Politecnico di Torino Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi, 24, 10129, Turin, Italy
Luigi De Russis & Fulvio Corno

Authors

Luigi De Russis
View author publications
You can also search for this author inPubMed Google Scholar
Fulvio Corno
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Luigi De Russis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Russis, L., Corno, F. On the impact of dysarthric speech on contemporary ASR cloud platforms. J Reliable Intell Environ 5, 163–172 (2019). https://doi.org/10.1007/s40860-019-00085-y

Download citation

Received: 07 February 2019
Accepted: 02 July 2019
Published: 06 July 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s40860-019-00085-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of dysarthric speech on contemporary ASR cloud platforms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recognition Performance of Selected Speech Recognition APIs – A Longitudinal Study

Automatic Speech Recognition for Portuguese: A Comparative Study

Automatic speech recognition in neurodegenerative disease

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now