Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models

Muhammad Amir As'ari; Nur Anis Jasmin Sufri; Guat Si Qi

doi:10.26555/ijain.v10i1.1170


Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models

^{(1) *} Muhammad Amir As'ari

(Dept of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, 81310, Malaysia, Malaysia)
⁽²⁾ Nur Anis Jasmin Sufri

(Department of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Malaysia)
⁽³⁾ Guat Si Qi

(Department of Biomedical Engineering and Health Sciences, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Malaysia)
^*corresponding author

Abstract

Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.

Keywords

Sign Language; Bidirectional Long Short Term Memory; Convolutional Neural Networks

DOI

https://doi.org/10.26555/ijain.v10i1.1170

Article metrics

Abstract views : 288 | PDF views : 52

Cite

How to cite item

Full Text

Download

References

[1] A. Wadhawan and P. Kumar, “Sign Language Recognition Systems: A Decade Systematic Literature Review,” Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 785–813, May 2021, doi: 10.1007/s11831-019-09384-2.

[2] C. Warnicke, “Equal Access to Make Emergency Calls: A Case for Equal Rights for Deaf Citizens in Norway and Sweden,” Soc. Incl., vol. 7, no. 1, pp. 173–179, Jan. 2019, doi: 10.17645/si.v7i1.1594.

[3] Y. Wang, J. Li, X. Zhao, G. Feng, and X. Luo, “Using Mobile Phone Data for Emergency Management: a Systematic Literature Review,” Inf. Syst. Front., vol. 22, no. 6, pp. 1539–1559, Dec. 2020, doi: 10.1007/s10796-020-10057-w.

[4] C. Warnicke and C. Plejert, “The headset as an interactional resource in a video relay interpreting (VRI) setting,” Interpret. Int. J. Res. Pract. Interpret., vol. 20, no. 2, pp. 285–308, Sep. 2018, doi: 10.1075/intp.00013.war.

[5] J. Napier, R. Skinner, and G. Turner, “‘It’s good for them but not so for me’: Inside the sign language interpreting call centre,” Int. J. Transl. Interpret. Res., vol. 9, no. 2, pp. 1–23, Jul. 2017, doi: 10.12807/ti.109202.2017.a01.

[6] S. Chang and D. Russell, “Coming Apart at the Screens: Canadian Video Relay Interpreters and Stress,” Journal of Interpretation., vol. 3, no. 9, pp. 19, Nov. 29, 2022. Online: Available at: https://digitalcommons.unf.edu/joi/vol30/iss1/6/.

[7] S. Dargan, M. Kumar, M. R. Ayyagari, and G. Kumar, “A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning,” Arch. Comput. Methods Eng., vol. 27, no. 4, pp. 1071–1092, Sep. 2020, doi: 10.1007/s11831-019-09344-w.

[8] S. Malik, A. K. Tyagi, and S. Mahajan, “Architecture, Generative Model, and Deep Reinforcement Learning for IoT Applications: Deep Learning Perspective,” in Internet of Things, Springer Science and Business Media Deutschland GmbH, 2022, pp. 243–265, doi: 10.1007/978-3-030-87059-1_9.

[9] C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and deep learning,” Electron. Mark., vol. 31, no. 3, pp. 685–695, Sep. 2021, doi: 10.1007/s12525-021-00475-2.

[10] F. Zhuang et al., “A Comprehensive Survey on Transfer Learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555.

[11] R. Ribani and M. Marengoni, “A Survey of Transfer Learning for Convolutional Neural Networks,” in 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), Oct. 2019, pp. 47–57, doi: 10.1109/SIBGRAPI-T.2019.00010.

[12] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” J. Big Data, vol. 3, no. 1, p. 9, Dec. 2016, doi: 10.1186/s40537-016-0043-6.

[13] L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J. Big Data, vol. 8, no. 1, p. 53, Mar. 2021, doi: 10.1186/s40537-021-00444-8.

[14] M. Z. Alom et al., “A State-of-the-Art Survey on Deep Learning Theory and Architectures,” Electronics, vol. 8, no. 3, p. 292, Mar. 2019, doi: 10.3390/electronics8030292.

[15] R. Ghosh, C. Vamshi, and P. Kumar, “RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning,” Pattern Recognit., vol. 92, pp. 203–218, Aug. 2019, doi: 10.1016/j.patcog.2019.03.030.

[16] G. Saon, Z. Tuske, D. Bolanos, and B. Kingsbury, “Advancing RNN Transducer Technology for Speech Recognition,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2021, vol. 2021-June, pp. 5654–5658, doi: 10.1109/ICASSP39728.2021.9414716.

[17] Y. Yu, X. Si, C. Hu, and J. Zhang, “A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,” Neural Comput., vol. 31, no. 7, pp. 1235–1270, Jul. 2019, doi: 10.1162/neco_a_01199.

[18] A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,” Phys. D Nonlinear Phenom., vol. 404, p. 132306, Mar. 2020, doi: 10.1016/j.physd.2019.132306.

[19] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A Critical Review of Recurrent Neural Networks for Sequence Learning,” arxiv, p. 38, May 29, 2015. [Online]. Available at: https://arxiv.org/abs/1506.00019v4.

[20] D. J. Samuel R. et al., “Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM,” Comput. Networks, vol. 151, pp. 191–200, Mar. 2019, doi: 10.1016/j.comnet.2019.01.028.

[21] M. J. Hamayel and A. Y. Owda, “A Novel Cryptocurrency Price Prediction Model Using GRU, LSTM and bi-LSTM Machine Learning Algorithms,” AI, vol. 2, no. 4, pp. 477–496, Oct. 2021, doi: 10.3390/ai2040030.

[22] A. Das Antar, M. Ahmed, and M. A. R. Ahad, “Recognition of human locomotion on various transportations fusing smartphone sensors,” Pattern Recognit. Lett., vol. 148, pp. 146–153, Aug. 2021, doi: 10.1016/j.patrec.2021.04.015.

[23] A. K. Agirman and K. Tasdemir, “BLSTM based night-time wildfire detection from video,” PLoS One, vol. 17, no. 6, p. e0269161, Jun. 2022, doi: 10.1371/journal.pone.0269161.

[24] W.-S. Hu, H.-C. Li, L. Pan, W. Li, R. Tao, and Q. Du, “Spatial–Spectral Feature Extraction via Deep ConvLSTM Neural Networks for Hyperspectral Image Classification,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 6, pp. 4237–4250, Jun. 2020, doi: 10.1109/TGRS.2019.2961947.

[25] H. Huang, C. Liu, L. Tian, J. Mu, and X. Jing, “A novel FCNs‐ConvLSTM network for video salient object detection,” Int. J. Circuit Theory Appl., vol. 49, no. 4, pp. 1050–1060, Apr. 2021, doi: 10.1002/cta.2924.

[26] G. Xu, J. Xu, Z. Li, L. Wang, X. Sun, and M.-M. Cheng, “Temporal Modulation Network for Controllable Space-Time Video Super-Resolution,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 6384–6393, doi: 10.1109/CVPR46437.2021.00632.

[27] N. K. Tamiru, M. Tekeba, and A. O. Salau, “Recognition of Amharic sign language with Amharic alphabet signs using ANN and SVM,” Vis. Comput., vol. 38, no. 5, pp. 1703–1718, May 2022, doi: 10.1007/s00371-021-02099-1.

[28] Zhang, Yang, Qian, and Zhang, “Real-Time Surface EMG Pattern Recognition for Hand Gestures Based on an Artificial Neural Network,” Sensors, vol. 19, no. 14, p. 3170, Jul. 2019, doi: 10.3390/s19143170.

[29] A. A. Barbhuiya, R. K. Karsh, and R. Jain, “CNN based feature extraction and classification for sign language,” Multimed. Tools Appl., vol. 80, no. 2, pp. 3051–3069, Jan. 2021, doi: 10.1007/s11042-020-09829-y.

[30] W. Aly, S. Aly, and S. Almotairi, “User-Independent American Sign Language Alphabet Recognition Based on Depth Image and PCANet Features,” IEEE Access, vol. 7, pp. 123138–123150, 2019, doi: 10.1109/ACCESS.2019.2938829.

[31] Q. Gao, U. E. Ogenyi, J. Liu, Z. Ju, and H. Liu, “A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion,” in Advances in Intelligent Systems and Computing, vol. 1043, Springer Verlag, 2020, pp. 107–118, doi: 10.1007/978-3-030-29933-0_9.

[32] S. Yang and Q. Zhu, “Continuous Chinese sign language recognition with CNN-LSTM,” in https://doi.org/10.1117/12.2281671, Jul. 2017, vol. 10420, p. 104200F, doi: 10.1117/12.2281671.

[33] V. Adithya and R. Rajesh, “Hand gestures for emergency situations: A video dataset based on words from Indian sign language,” Data Br., vol. 31, p. 106016, Aug. 2020, doi: 10.1016/j.dib.2020.106016.

[34] M. Jia, J. Huang, L. Pang, and Q. Zhao, “Analysis and Research on Stock Price of LSTM and Bidirectional LSTM Neural Network,” in Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019), Jul. 2019, pp. 467–473, doi: 10.2991/iccia-19.2019.72.

[35] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168–192, Jan. 2021, doi: 10.1016/j.aci.2018.08.003.

[36] J. Brownlee, D. Learning, D. Between, N. Network, and G. Cook, “What is the Difference Between a Batch and an Epoch in a Neural Network ?,” pp. 1-5, 2018. [Online]. Available at: https://deeplearning.lipingyang.org/wp-content/uploads/2018/07/What-is-the-Difference-Between-a-Batch-and-an-Epoch-in-a-Neural-Network_.pdf.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me