A comparative study for Arabic speech recognition system in noisy environments

Ouisaadane, Abdelkbir; Safi, Said

doi:10.1007/s10772-021-09847-7

A comparative study for Arabic speech recognition system in noisy environments

Published: 27 April 2021

Volume 24, pages 761–770, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

374 Accesses
8 Citations
Explore all metrics

Abstract

Speech recognition in noisy environments is one of the long-standing research themes but remains a very important challenge nowadays. Therefore, there is much research into all techniques and approaches to improve the performance of speech recognition systems, even in poor conditions. This paper presents a comparative study under various conditions based on two architectures (GMM-HMM and DNN-HMM), the Hybrid GMM-HMM models using the CMU Sphinx tools and the Hybrid DNN-HMM using the KALDI toolkit in noise environment. In this study, we compare the Hybrid GMM-HMM models and the Hybrid DNN-HMM models to evaluate the performance of the proposed system. The novelty of this paper is to test if the presented tools could be, with good accuracy, recognize the Arabic speech principally in noisy environment. In addition, we adopted the noisy training theory in this paper based on GMM-HMM and DNN-HMM model. We use the public Arabic Speech Corpus for Isolated Words (20 words), three noise levels, and three noise types. The implementation of our system consists of two phases: Features extraction using Mel-frequency Cepstral Coefficient (MFCC) and the classification phase will use separately the previous two models. In order to test the performance of these methods a simulation will presented for different SNR and for different district type of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An experimental framework for Arabic digits speech recognition in noisy environments

Article 03 February 2017

Biologically inspired Continuous Arabic Speech Recognition

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Article 26 June 2017

References

Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(10), 1533–1545.
Article Google Scholar
Abdou, S.M., & Moussa, A.M. (2017). Arabic Speech Recognition: Challenges and State of the Art. In: Computational linguistics, speech and image processing for arabic language, vol. 4, (pp. 1–27), World Scientific.
Abushariah, A.A.M., Gunawan, T.S., Khalifa, O.O., & Abushariah, M.A.M. (2010). English digits speech recognition system based on Hidden Markov Models. In: International conference on computer and communication engineering (ICCCE’10), (pp. 1–5).
Alalshekmubara, A.K., & Smith, L.S. (2014). On improving the classification capability of reservoir computing for Arabic speech recognition. In: Artificial neural networks and machine learning – ICANN 2014, (pp. 225–232).
Alalshekmubarak, A., & Smith, L.S. (2014). On improving the classification capability of reservoir computing for Arabic speech recognition. In: Artificial neural networks and machine learning – ICANN 2014, Cham, (pp. 225–232).
Alotaibi, Y.A. (2004). Spoken Arabic digits recognizer using recurrent neural networks. In: Proceedings of the fourth IEEE international symposium on signal processing and information technology, (pp. 195–199).
Alsharhan, E., & Ramsay, A. (2020). Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition. Language Resources and Evaluation, 54(4), 975–998. https://doi.org/10.1007/s10579-020-09505-5.
Article Google Scholar
Aymen, M., Abdelaziz, A., Halim, S., & Maaref, H. (2011). hidden Markov models for automatic speech recognition. In: 2011 International conference on communications, computing and control applications (CCCA), (pp. 1–6).
Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). The third “CHiME” speech separation and recognition challenge: Analysis and outcomes. Computer Speech & Language, 46, 605–626.
Article Google Scholar
Chien, J.-T., & Misbullah, A. (2016). Deep long short-term memory networks for speech recognition. In: 2016 10th international symposium on Chinese spoken language processing (ISCSLP), (pp. 1–5).
Elmahdy, M., Gruhn, R., & Minker, W. (2012). Novel techniques for dialectal Arabic speech recognition. New York: Springer.
Book Google Scholar
El-Ramly, S.H., Abdel-Kader, N.S., & El-Adawi, R. (2002). Neural networks used for speech recognition. In: Proceedings of the nineteenth national radio science conference, (pp. 200–207).
Eltiraifi, O., Elbasheer, E., & Nawari, M. (2018). A comparative study of MFCC and LPCC features for speech activity detection using deep belief network. In: 2018 international conference on computer, control, electrical, and electronics engineering (ICCCEEE), (pp. 1–5).
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, (pp. 6645–6649).
Guerid, A., & Houacine, A. (2019). Recognition of isolated digits using DNN–HMM and harmonic noise model. IET Signal Processing, 13(2), 207–214.
Article Google Scholar
Haton, J.-P. (1999). Neural networks for automatic speech recognition: A review. In G. Chollet, M.-G. Di Benedetto, A. Esposito, & M. Marinaro (Eds.), Speech processing, recognition and artificial neural networks (pp. 259–280). London: Springer.
Chapter Google Scholar
Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., & Rudnicky, A.I. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In 2006 IEEE international conference on acoustics speech and signal processing proceedings, vol. 1, (pp. I–I).
Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.
Article MathSciNet Google Scholar
Kadyan, V., Mantri, A., Aggarwal, R. K., & Singh, A. (2019). A comparative study of deep neural network based Punjabi-ASR system. International Journal of Speech Technology, 22(1), 111–119.
Article Google Scholar
Khelifa, M. O. M., Elhadj, Y. M., Abdellah, Y., & Belkasmi, M. (2017). Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology, 20(4), 937–949.
Article Google Scholar
Le Prell, C. G., & Clavier, O. H. (2017). Effects of noise on speech recognition: Challenges for communication by service members. Hearing Research, 349, 76–89.
Article Google Scholar
Manchanda, S., & Gupta, D. (2018). Hybrid approach of feature extraction and vector quantization in speech recognition. In: Proceedings of the second international conference on computational intelligence and informatics, (pp. 639–645). Singapore.
Novoa, J., Wuth, J., Escudero, J.P., Fredes, J., Mahu, R., & Yoma, N.B. DNN-HMM based Automatic Speech Recognition for HRI Scenarios. In Proceedings of the 2018 ACM/IEEE international conference on human–robot interaction, Chicago, USA, (pp. 150–159).
Oueslati, O., Cambria, E., HajHmida, M. B., & Ounelli, H. (2020). A review of sentiment analysis research in Arabic language. Future Generation Computer Systems, 112, 408–430. https://doi.org/10.1016/j.future.2020.05.034.
Article Google Scholar
Pearce, D., Hirsch, H., & Gmbh, E.E.D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, (pp. 29–32).
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Roberts, W.J.J., & Willmore, J.P. (1999). Automatic speaker recognition using Gaussian mixture models. In: Proceedings 1999 information, decision and control. Data and information fusion symposium, signal processing and communications symposium and decision and control symposium. (Cat. No.99EX251), (pp. 465–470).
Senior, A., Heigold, G., Bacchiani, M., & Liao, H. (2014). GMM-free DNN acoustic model training. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5602–5606.
Senior, A., Heigold, G., Bacchiani, M., & Liao, H. (2014). GMM-free DNN acoustic model training. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5602–5606).
Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., & Gutierrez-Osuna, R. (2014). Comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: 15th Annual conference of the international speech communication association (INTERSPEECH 2014): celebrating the diversity of spoken languages, (pp. 1583–1587).
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.
Article Google Scholar
Vegesna, V. V. R., Gurugubelli, K., Vydana, H. K., Pulugandla, B., Shrivastava, M., & Vuppala, A. K. (2017). DNN-HMM acoustic modeling for large vocabulary telugu speech recognition. Mining Intelligence and Knowledge Exploration. https://doi.org/10.1007/978-3-319-71928-3_19.
Article Google Scholar
Wahyuni, E.S. (2017). Arabic speech recognition using MFCC feature extraction and ANN classification. In 2017 2nd International conferences on information technology, information systems and electrical engineering, (pp. 22–25).
Yoshioka, T., Karita, S., & Nakatani, T. (2015). Far-field speech recognition using CNN-DNN-HMM with convolution in time. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 4360–4364).
Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. London: Springer.
MATH Google Scholar
Zhang, W., Zhai, M., Huang, Z., Liu, C., Li, W., & Cao, Y. (2019). Towards end-to-end speech recognition with deep multipath convolutional neural networks. In W. Zhang, M. Zhai, Z. Huang, C. Liu, W. Li, & Y. Cao (Eds.), Intelligent robotics and applications. Cham: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Polydisciplinary Faculty, Sultan Moulay Slimane University, Benimellal, Morocco
Abdelkbir Ouisaadane & Said Safi

Authors

Abdelkbir Ouisaadane
View author publications
You can also search for this author in PubMed Google Scholar
Said Safi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelkbir Ouisaadane.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouisaadane, A., Safi, S. A comparative study for Arabic speech recognition system in noisy environments. Int J Speech Technol 24, 761–770 (2021). https://doi.org/10.1007/s10772-021-09847-7

Download citation

Received: 29 June 2020
Accepted: 04 April 2021
Published: 27 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10772-021-09847-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative study for Arabic speech recognition system in noisy environments

Abstract

Access this article

Similar content being viewed by others

An experimental framework for Arabic digits speech recognition in noisy environments

Biologically inspired Continuous Arabic Speech Recognition

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative study for Arabic speech recognition system in noisy environments

Abstract

Access this article

Similar content being viewed by others

An experimental framework for Arabic digits speech recognition in noisy environments

Biologically inspired Continuous Arabic Speech Recognition

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation