Skip to main content

Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Data Engineering (AIDE 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1133))

Abstract

Human–machine interfaces are rapidly evolving. They are moving from the traditional methods of input like keyboard and mouse to modern methods like gestures and voice. It is imperative to improve voice recognition and response since there is a growing market of technologies, world-wide, that use this interface. Majority of English speakers around the world have accents which are not exposed to speech recognition systems on a greater scale. In order to bridge the comprehension gap between these systems and the users, the systems need to be tuned according to the accent of the user. Accent classification is an important feature that can be used to increase the accuracy of comprehension of speech recognition systems. This paper recognizes Indian and American English speakers and distinguishes them based on their accents by constructing sequential MFCC features from the frames of the audio sample, oversampling the under-represented data and employing supervised learning techniques. The accuracies of these techniques reach a maximum of 95% with an average of 76%. Neural networks emerge as the top classifier and perform the best in terms of evaluation metrics. The results gleaned indicate that concatenating MFCC features sequentially and applying an apposite supervised learning technique on the data provide a good solution to the problem of detecting and classifying accents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Center PR (2017) Voice assistants used by 46% of Americans, mostly on smartphones. https://pewrsr.ch/2l4wQnr

  2. CBC (2018) Smart speakers make life easier for blind users, January. https://www.cbc.ca/radio/spark/380-phantom-traffic-jams-catfishing-scams-and-smart-speakers-1.4482967/smart-speakers-make-life-easier-for-blind-users-1.4482978

  3. Accenture (2018) Accenture digital consumer survey. https://www.accenture.com/t20180105T221916Z__w__/us-en/_acnmedia/PDF-69/Accenture-2018-Digital-Consumer-Survey-Findings-Infographic.pdf

  4. Ellis P (2017) Why virtual assistants can’t understand accents, August. https://www.huffingtonpost.co.uk/philip-ellis/is-siri-racist-why-virtua_b_11423538.html?guccounter=2 (online)

  5. Chu A, Lai P, Le D (2017)

    Google Scholar 

  6. Kat LW, Fung P (1999) Fast accent identification and accented speech recognition. In: 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), March, vol 1, pp 221–224. https://doi.org/10.1109/ICASSP.1999.758102

  7. Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1. ISSN 2320-6802

    Google Scholar 

  8. Tang H, Ghorbani AA (2003) Accent classification using support vector machine and hidden Markov model. In: Canadian conference on AI

    Google Scholar 

  9. Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(17):1–5. http://jmlr.org/papers/v18/16-365.html

  10. Mel frequency cepstral coefficients. https://bit.ly/1mDpzDu

  11. Librosa (2018) Audio and music processing in python, August. https://zenodo.org/record/1342708#.XDG9LS2B01I

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dweepa Honnavalli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Honnavalli, D., Shylaja, S.S. (2021). Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features. In: Chiplunkar, N.N., Fukao, T. (eds) Advances in Artificial Intelligence and Data Engineering. AIDE 2019. Advances in Intelligent Systems and Computing, vol 1133. Springer, Singapore. https://doi.org/10.1007/978-981-15-3514-7_5

Download citation

Publish with us

Policies and ethics