Abstract
Previously we have shown that ASR technology can be used to objectively evaluate pathologic speech. Here we report on progress for routine clinical use: 1) We introduce an easy-to-use recording and evaluation environment. 2) We confirm our previous results for a larger group of patients. 3) We show that telephone speech can be analyzed with the same methods with only a small loss of agreement with human experts. 4) We show that prosodic information leads to more robust results. 5) We show that text reference instead of transliteration can be used for evaluation. Using word accuracy of a speech recognizer and prosodic features as features for SVM regression, we achieve a correlation of .90 between the automatic analysis and human experts.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Likert, R.: A Technique for the Measurement of Attitudes. Archives of Psychology 140 (1932)
Schuster, M., Maier, A., Haderlein, T., Nkenke, E., Wohlleben, U., Rosanowski, F., Eysholdt, U., Nöth, E.: Evaluation of Speech Intelligibility for Children with Cleft Lip and Palate by Means of Automatic Speech Recognition. International Journal of Pediatric Otorhinolaryngology 70, 1741–1747 (2006)
Schuster, M., Haderlein, T., Nöth, E., Lohscheller, J., Eysholdt, U., Rosanowski, F.: Intelligibility of Laryngectomees’ Substitute Speech: Automatic Speech Recognition and Subjective Rating. European Archives of Oto-Rhino-Larngology and Head & Neck 263, 188–193 (2006)
Schuster, M., Nöth, E., Haderlein, T., Steidl, S., Batliner, A., Rosanowski, F.: Can you Understand him? Let’s Look at his Word Accuracy — Automatic Evaluation of Tracheoesophageal Speech 1, 61–64
Brown, D., Hilgers, F., Irish, J., Balm, A.: Postlaryngectomy Voice Rehabilitation: State of the Art at the Millennium. World J. Surg. 27(7), 824–831 (2003)
Schutte, H., Nieboer, G.: Aerodynamics of esophageal voice production with and without a Groningen voice prosthesis. Folia Phoniatrica et Logopaedia 54, 8–18 (2002)
Robbins, J., Fisher, H., Blom, E., Singer, M.: A Comparative Acoustic Study of Normal, Esophageal, and Tracheoesophageal Speech Production. Journal of Speech and Hearing Disorders 49, 202–210 (1984)
Bellandese, M., Lerman, J., Gilbert, H.: An Acoustic Analysis of Excellent Female Esophageal, Tracheoesophageal, and Laryngeal Speakers. Journal of Speech, Language, and Hearing Research 44, 1315–1320 (2001)
Stemmer, G.: Modeling Variability in Speech Recognition. Studien zur Mustererkennung, vol. 19. Logos Verlag, Berlin (2005)
Schukat-Talamazzini, E., Niemann, H., Eckert, W., Kuhn, T., Rieck, S.: Automatic Speech Recognition without Phonemes. In: Proc. European Conf. on Speech Communication and Technology. Berlin, vol. 1, pp. 111–114 (1993)
Batliner, A., Buckow, A., Niemann, H., Nöth, E., Warnke, V.: The Prosody Module. [13], pp. 106–121
Haderlein, T., Nöth, E., Schuster, M., Eysholdt, U., Rosanowski, F.: Evaluation of Tracheoesophageal Substitute Voices Using Prosodic Features. In: Proc. of 3rd International Conference on Speech Prosody, Dresden, pp. 701–704 (2006)
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Davies, M., Fleiss, J.: Measuring agreement for multinomial data. Biometrics 38(4), 1047–1051 (1982)
Cohen, J., Cohen, P.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, New Jersey (1983)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Smola, A., Schölkopf, B.: A Tutorial on Support Vector Regression. In: NeuroCOLT2 Technical Report Series, NC2-TR-1998-030 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nöth, E., Maier, A., Haderlein, T., Riedhammer, K., Rosanowski, F., Schuster, M. (2007). Automatic Evaluation of Pathologic Speech – from Research to Routine Clinical Use. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)