Abstract
Human–machine interfaces are rapidly evolving. They are moving from the traditional methods of input like keyboard and mouse to modern methods like gestures and voice. It is imperative to improve voice recognition and response since there is a growing market of technologies, world-wide, that use this interface. Majority of English speakers around the world have accents which are not exposed to speech recognition systems on a greater scale. In order to bridge the comprehension gap between these systems and the users, the systems need to be tuned according to the accent of the user. Accent classification is an important feature that can be used to increase the accuracy of comprehension of speech recognition systems. This paper recognizes Indian and American English speakers and distinguishes them based on their accents by constructing sequential MFCC features from the frames of the audio sample, oversampling the under-represented data and employing supervised learning techniques. The accuracies of these techniques reach a maximum of 95% with an average of 76%. Neural networks emerge as the top classifier and perform the best in terms of evaluation metrics. The results gleaned indicate that concatenating MFCC features sequentially and applying an apposite supervised learning technique on the data provide a good solution to the problem of detecting and classifying accents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Center PR (2017) Voice assistants used by 46% of Americans, mostly on smartphones. https://pewrsr.ch/2l4wQnr
CBC (2018) Smart speakers make life easier for blind users, January. https://www.cbc.ca/radio/spark/380-phantom-traffic-jams-catfishing-scams-and-smart-speakers-1.4482967/smart-speakers-make-life-easier-for-blind-users-1.4482978
Accenture (2018) Accenture digital consumer survey. https://www.accenture.com/t20180105T221916Z__w__/us-en/_acnmedia/PDF-69/Accenture-2018-Digital-Consumer-Survey-Findings-Infographic.pdf
Ellis P (2017) Why virtual assistants can’t understand accents, August. https://www.huffingtonpost.co.uk/philip-ellis/is-siri-racist-why-virtua_b_11423538.html?guccounter=2 (online)
Chu A, Lai P, Le D (2017)
Kat LW, Fung P (1999) Fast accent identification and accented speech recognition. In: 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), March, vol 1, pp 221–224. https://doi.org/10.1109/ICASSP.1999.758102
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1. ISSN 2320-6802
Tang H, Ghorbani AA (2003) Accent classification using support vector machine and hidden Markov model. In: Canadian conference on AI
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(17):1–5. http://jmlr.org/papers/v18/16-365.html
Mel frequency cepstral coefficients. https://bit.ly/1mDpzDu
Librosa (2018) Audio and music processing in python, August. https://zenodo.org/record/1342708#.XDG9LS2B01I
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Honnavalli, D., Shylaja, S.S. (2021). Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features. In: Chiplunkar, N.N., Fukao, T. (eds) Advances in Artificial Intelligence and Data Engineering. AIDE 2019. Advances in Intelligent Systems and Computing, vol 1133. Springer, Singapore. https://doi.org/10.1007/978-981-15-3514-7_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-3514-7_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3513-0
Online ISBN: 978-981-15-3514-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)