Skip to main content
Log in

A comparative study for Arabic speech recognition system in noisy environments

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech recognition in noisy environments is one of the long-standing research themes but remains a very important challenge nowadays. Therefore, there is much research into all techniques and approaches to improve the performance of speech recognition systems, even in poor conditions. This paper presents a comparative study under various conditions based on two architectures (GMM-HMM and DNN-HMM), the Hybrid GMM-HMM models using the CMU Sphinx tools and the Hybrid DNN-HMM using the KALDI toolkit in noise environment. In this study, we compare the Hybrid GMM-HMM models and the Hybrid DNN-HMM models to evaluate the performance of the proposed system. The novelty of this paper is to test if the presented tools could be, with good accuracy, recognize the Arabic speech principally in noisy environment. In addition, we adopted the noisy training theory in this paper based on GMM-HMM and DNN-HMM model. We use the public Arabic Speech Corpus for Isolated Words (20 words), three noise levels, and three noise types. The implementation of our system consists of two phases: Features extraction using Mel-frequency Cepstral Coefficient (MFCC) and the classification phase will use separately the previous two models. In order to test the performance of these methods a simulation will presented for different SNR and for different district type of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(10), 1533–1545.

    Article  Google Scholar 

  • Abdou, S.M., & Moussa, A.M. (2017). Arabic Speech Recognition: Challenges and State of the Art. In: Computational linguistics, speech and image processing for arabic language, vol. 4, (pp. 1–27), World Scientific.

  • Abushariah, A.A.M., Gunawan, T.S., Khalifa, O.O., & Abushariah, M.A.M. (2010). English digits speech recognition system based on Hidden Markov Models. In: International conference on computer and communication engineering (ICCCE’10), (pp. 1–5).

  • Alalshekmubara, A.K., & Smith, L.S. (2014). On improving the classification capability of reservoir computing for Arabic speech recognition. In: Artificial neural networks and machine learningICANN 2014, (pp. 225–232).

  • Alalshekmubarak, A., & Smith, L.S. (2014). On improving the classification capability of reservoir computing for Arabic speech recognition. In: Artificial neural networks and machine learningICANN 2014, Cham, (pp. 225–232).

  • Alotaibi, Y.A. (2004). Spoken Arabic digits recognizer using recurrent neural networks. In: Proceedings of the fourth IEEE international symposium on signal processing and information technology, (pp. 195–199).

  • Alsharhan, E., & Ramsay, A. (2020). Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition. Language Resources and Evaluation, 54(4), 975–998. https://doi.org/10.1007/s10579-020-09505-5.

    Article  Google Scholar 

  • Aymen, M., Abdelaziz, A., Halim, S., & Maaref, H. (2011). hidden Markov models for automatic speech recognition. In: 2011 International conference on communications, computing and control applications (CCCA), (pp. 1–6).

  • Barker, J., Marxer, R., Vincent, E., & Watanabe, S. (2017). The third “CHiME” speech separation and recognition challenge: Analysis and outcomes. Computer Speech & Language, 46, 605–626.

    Article  Google Scholar 

  • Chien, J.-T., & Misbullah, A. (2016). Deep long short-term memory networks for speech recognition. In: 2016 10th international symposium on Chinese spoken language processing (ISCSLP), (pp. 1–5).

  • Elmahdy, M., Gruhn, R., & Minker, W. (2012). Novel techniques for dialectal Arabic speech recognition. New York: Springer.

    Book  Google Scholar 

  • El-Ramly, S.H., Abdel-Kader, N.S., & El-Adawi, R. (2002). Neural networks used for speech recognition. In: Proceedings of the nineteenth national radio science conference, (pp. 200–207).

  • Eltiraifi, O., Elbasheer, E., & Nawari, M. (2018). A comparative study of MFCC and LPCC features for speech activity detection using deep belief network. In: 2018 international conference on computer, control, electrical, and electronics engineering (ICCCEEE), (pp. 1–5).

  • Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, (pp. 6645–6649).

  • Guerid, A., & Houacine, A. (2019). Recognition of isolated digits using DNN–HMM and harmonic noise model. IET Signal Processing, 13(2), 207–214.

    Article  Google Scholar 

  • Haton, J.-P. (1999). Neural networks for automatic speech recognition: A review. In G. Chollet, M.-G. Di Benedetto, A. Esposito, & M. Marinaro (Eds.), Speech processing, recognition and artificial neural networks (pp. 259–280). London: Springer.

    Chapter  Google Scholar 

  • Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  • Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., & Rudnicky, A.I. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In 2006 IEEE international conference on acoustics speech and signal processing proceedings, vol. 1, (pp. I–I).

  • Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.

    Article  MathSciNet  Google Scholar 

  • Kadyan, V., Mantri, A., Aggarwal, R. K., & Singh, A. (2019). A comparative study of deep neural network based Punjabi-ASR system. International Journal of Speech Technology, 22(1), 111–119.

    Article  Google Scholar 

  • Khelifa, M. O. M., Elhadj, Y. M., Abdellah, Y., & Belkasmi, M. (2017). Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology, 20(4), 937–949.

    Article  Google Scholar 

  • Le Prell, C. G., & Clavier, O. H. (2017). Effects of noise on speech recognition: Challenges for communication by service members. Hearing Research, 349, 76–89.

    Article  Google Scholar 

  • Manchanda, S., & Gupta, D. (2018). Hybrid approach of feature extraction and vector quantization in speech recognition. In: Proceedings of the second international conference on computational intelligence and informatics, (pp. 639–645). Singapore.

  • Novoa, J., Wuth, J., Escudero, J.P., Fredes, J., Mahu, R., & Yoma, N.B. DNN-HMM based Automatic Speech Recognition for HRI Scenarios. In Proceedings of the 2018 ACM/IEEE international conference on humanrobot interaction, Chicago, USA, (pp. 150–159).

  • Oueslati, O., Cambria, E., HajHmida, M. B., & Ounelli, H. (2020). A review of sentiment analysis research in Arabic language. Future Generation Computer Systems, 112, 408–430. https://doi.org/10.1016/j.future.2020.05.034.

    Article  Google Scholar 

  • Pearce, D., Hirsch, H., & Gmbh, E.E.D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, (pp. 29–32).

  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  • Roberts, W.J.J., & Willmore, J.P. (1999). Automatic speaker recognition using Gaussian mixture models. In: Proceedings 1999 information, decision and control. Data and information fusion symposium, signal processing and communications symposium and decision and control symposium. (Cat. No.99EX251), (pp. 465–470).

  • Senior, A., Heigold, G., Bacchiani, M., & Liao, H. (2014). GMM-free DNN acoustic model training. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5602–5606.

  • Senior, A., Heigold, G., Bacchiani, M., & Liao, H. (2014). GMM-free DNN acoustic model training. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5602–5606).

  • Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., & Gutierrez-Osuna, R. (2014). Comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: 15th Annual conference of the international speech communication association (INTERSPEECH 2014): celebrating the diversity of spoken languages, (pp. 1583–1587).

  • Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.

    Article  Google Scholar 

  • Vegesna, V. V. R., Gurugubelli, K., Vydana, H. K., Pulugandla, B., Shrivastava, M., & Vuppala, A. K. (2017). DNN-HMM acoustic modeling for large vocabulary telugu speech recognition. Mining Intelligence and Knowledge Exploration. https://doi.org/10.1007/978-3-319-71928-3_19.

    Article  Google Scholar 

  • Wahyuni, E.S. (2017). Arabic speech recognition using MFCC feature extraction and ANN classification. In 2017 2nd International conferences on information technology, information systems and electrical engineering, (pp. 22–25).

  • Yoshioka, T., Karita, S., & Nakatani, T. (2015). Far-field speech recognition using CNN-DNN-HMM with convolution in time. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 4360–4364).

  • Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. London: Springer.

    MATH  Google Scholar 

  • Zhang, W., Zhai, M., Huang, Z., Liu, C., Li, W., & Cao, Y. (2019). Towards end-to-end speech recognition with deep multipath convolutional neural networks. In W. Zhang, M. Zhai, Z. Huang, C. Liu, W. Li, & Y. Cao (Eds.), Intelligent robotics and applications. Cham: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelkbir Ouisaadane.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ouisaadane, A., Safi, S. A comparative study for Arabic speech recognition system in noisy environments. Int J Speech Technol 24, 761–770 (2021). https://doi.org/10.1007/s10772-021-09847-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09847-7

Keywords

Navigation