Abstract
The deaf community relies on sign language as the primary means of communication. For the millions of people around the world who suffer from hearing loss, interaction with hearing people is quite difficult. The main objective of sign language recognition (SLR) is the development of automatic SLR systems to facilitate communication with the deaf community. Arabic SLR (ArSLR) specifically did not receive much attention until recent years. This work presents a comprehensive comparison between two different recognition techniques for continuous ArSLR, namely a Modified k-Nearest Neighbor which is suitable for sequential data and Hidden Markov Models (HMMs) techniques based on two different toolkits. Additionally, in this work, two new ArSL datasets composed of 40 Arabic sentences are collected using Polhemus G4 motion tracker and a camera. An existing glove-based dataset is employed in this work as well. The three datasets are made publicly available to the research community. The advantages and disadvantages of each data acquisition approach and classification technique are discussed in this paper. In the experimental results section, it is shown that classification accuracy for sign sentences acquired using a motion tracker are very similar the classification accuracy for sentences acquired using sensor gloves. The modified KNN solution is inferior to HMMs in terms of the computational time required for classification.
Similar content being viewed by others
References
Starner, T., Weaver, J., & Pentland, A. (1998). Real-time American sign language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1371–1375.
Dgs-corpus. (2015). http://www.sign-lang.uni-hamburg.de/dgs-korpus/.
Dictasign project. (2016). http://www.sign-lang.uni-hamburg.de/dicta-sign.
Bsl corpus project. (2016). http://www.bslcorpusproject.org/.
Yang, R., & Sarkar, S. (2006). Detecting coarticulation in sign language using conditional random fields. In 18th international conference on pattern recognition (ICPR’06) (Vol. 2, pp. 108–112).
Yang, R., Sarkar, S., & Loeding, B. (2007). Enhanced level building algorithm for the movement epenthesis problem in sign language recognition. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Yang, R., Sarkar, S., & Loeding, B. (2010). Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 462–477.
Cooper, H., Holt, B., & Bowden, R. (2011). Sign language recognition. In Visual analysis of humans (pp. 539–562). London: Springer.
Ong, S. C., & Ranganath, S. (2005). Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 873–891.
Dipietro, L., Sabatini, A. M., & Dario, P. (2008). A survey of glove-based systems and their applications. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(4), 461–482.
Agrawal, S. C., Jalal, A. S., & Tripathi, R. K. (2016). A survey on manual and non-manual sign language recognition for isolated and continuous sign. International Journal of Applied Pattern Recognition, 3(2), 99–134.
Al-Rousan, M., & Hussain, M. (2001). Automatic recognition of Arabic sign language finger spelling. International Journal of Computers and Their Applications, 8, 80–88.
Assaleh, K., & Al-Rousan, M. (2005). Recognition of Arabic sign language alphabet using polynomial classifiers. EURASIP Journal on Applied Signal Processing, 2005, 2136–2145.
Uebersax, D., Gall, J., den Bergh, M. V., & Gool, L. V. (2011). Real-time sign language letter and word recognition from depth data. In IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 383–390).
Oz, C., & Leu, M. C. (2011). American sign language word recognition with a sensory glove using artificial neural networks. Engineering Applications of Artificial Intelligence, 24(7), 1204–1213.
Shanableh, T., Assaleh, K., & Al-Rousan, M. (2007). Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language. IEEE Transactions on Systems, Man, and Cybernetics Part B (Cybernetics), 37(3), 641–650.
Gweth, Y. L., Plahl, C., & Ney, H. (2012). Enhanced continuous sign language recognition using PCA and neural network features. In IEEE computer society conference on computer vision and pattern recognition workshop (pp. 55–60).
Forster, J., Oberdörfer, C., Koller, O., & Ney, H. (2013). Modality combination techniques for continuous sign language recognition. In Pattern recognition and image analysis. IbPRIA 2013. Lecture notes in computer science (Vol. 7887, pp. 89–99). Berlin, Heidelberg: Springer.
Koller, O., Zargaran, O., Ney, H., & Bowden, R. (2016). Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In British machine vision conference.
Pu, J., Zhou, W., Zhang, J., & Li, H. (2016). Sign language recognition based on trajectory modeling with HMMs. In Multimedia modeling. MMM 2016. Lecture notes in computer science (Vol. 9516, pp. 686–697). Cham: Springer.
Kong, W., & Ranganath, S. (2014). Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recognition, 47(3), 1294–1308.
Kong, W. W., & Ranganath, S. (2008). Automatic hand trajectory segmentation and phoneme transcription for sign language. In 8th IEEE international conference on automatic face & gesture recognition (pp. 1–6). Netherlands.
Koller, O., Forster, J., & Ney, H. (2015). Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding, 141, 108–125.
Gao, W., Fang, G., Zhao, D., & Chen, Y. (2004). A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition, 37(12), 2389–2402.
Fang, G., Gao, W., & Zhao, D. (2007). Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(1), 1–9.
Chai, X., Li, G., Lin, Y., Xu, Z., Tang, Y., Chen, X., & Zhou, M. (2013). Sign language recognition and translation with Kinect.
Chen, X., et al. (2013). Kinect sign language translator expands communication possibilities.
Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., & Presti, P. (2011). American sign language recognition with the Kinect. In Proceedings of the 13th international conference on multimodal interfaces (pp. 279–286). Spain.
Lang, S., Block, M., & Rojas, R. (2012). Sign language recognition using Kinect. In Artificial intelligence and soft computing. ICAISC 2012. Lecture notes in computer science (Vol. 7267, pp. 394–402). Berlin: Springer.
Mohandes, M., Deriche, M., & Liu, J. (2014). Image-based and sensor-based approaches to Arabic sign language recognition. IEEE Transactions on Human-Machine Systems, 44(4), 551–557.
Al-Jarrah, O., & Halawani, A. (2001). Recognition of gestures in Arabic sign language using neuro-fuzzy systems. Artificial Intelligence, 133(1–2), 117–138.
Elhenawy, I., & Khamiss, A. (2014). The design and implementation of mobile Arabic fingerspelling recognition system. International Journal of Computer Science and Network Security (IJCSNS), 14(2), 149.
Assaleh, K., Shanableh, T., Fanaswala, M., Amin, F., & Bajaj, H. (2010). Continuous Arabic sign language recognition in user dependent mode. Journal of Intelligent Learning Systems and Applications, 2(01), 19.
Tubaiz, N., Shanableh, T., & Assaleh, K. (2015). Glove-based continuous Arabic sign language recognition in user-dependent mode. IEEE Transactions on Human-Machine Systems, 45(4), 526–533.
Tuffaha, M., Shanableh, T., & Assaleh, K. (2015). Novel feature extraction and classification technique for sensor-based continuous Arabic sign language recognition, pp. 290–299.
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., et al. (2004). Sphinx-4: A flexible open source framework for speech recognition. Mountain View, California: Sun Microsystems, Inc.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2002). The HTK book (Vol. 3, p. 175). Cambridge: Cambridge University Engineering Department.
Lee, A., Kawahara, T., & Shikano, K. (2001). Julius—An open source real-time large vocabulary recognition engine. In European conference on speech communication and technology (EUROSPEECH).
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The kaldi speech recognition toolkit, no. EPFL-CONF-192584.
Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., et al. (2009). The RWTH AACHEN university open source speech recognition system. In 10th annual conference of the international speech communication association (pp. 2111–2114). Brighton, UK.
Westeyn, T., Brashear, H., Atrash, A., & Starner, T. (2003). Georgia tech gesture toolkit: Supporting experiments in gesture recognition. In 5th international conference on multimodal interfaces (pp. 85–92). New York.
Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., & Ney, H. (2007). Speech recognition techniques for a sign language recognition system. In 8th annual conference of the international speech communication association (p. 80). Belgium.
Dreuw, P., Rybach, D., Heigold, G., & Ney, H. (2012). RWTH OCR: A large vocabulary optical character recognition system for Arabic scripts. In Guide to OCR for Arabic scripts (pp. 215–254). London: Springer.
Gillian, N., & Paradiso, J. A. (2014). The gesture recognition toolkit. The Journal of Machine Learning Research, 15(1), 3483–3487.
Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., et al. (2007). The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In 8th annual conference of the international speech communication association (pp. 2145–2148). Belgium.
Rybach, D., Hahn, S., Gollan, C., Schluter, R., & Ney, H. (2007). Advances in Arabic broadcast news transcription at RWTH. In IEEE workshop on automatic speech recognition & understanding (ASRU) (pp. 449–454). Koyoto, Japan.
Sundermeyer, M., Nußbaum-Thom, M., Wiesler, S., Plahl, C., Mousa, A. E.-D., Hahn, S., et al. (2011). The RWTH 2010 Quaero ASR evaluation system for English, French, and German. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2212–2215). Prague, Czech Republic.
Plahl, C., Hoffmeister, B., Hwang, M., Lu, D., Heigold, G., Lööf, J., et al. (2008). Recent improvements of the RWTH GALE mandarin LVCSR system. In 9th annual conference of the international speech communication association (pp. 2426–2429). Brisbane, Australia.
Povey, D., & Woodland, P. C. (2002). Minimum phone error and i-smoothing for improved discriminative training. In IEEE international conference on acoustics, speech, and signal processing (pp. I-105). Orlando, FL, USA.
RASR manual. (2017). http://www.hltpr.rwth-aachen.de/rasr/manual
Acknowledgements
The authors gratefully acknowledge the American University of Sharjah for supporting this research through Grant FRG14-2-26.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hassan, M., Assaleh, K. & Shanableh, T. Multiple Proposals for Continuous Arabic Sign Language Recognition. Sens Imaging 20, 4 (2019). https://doi.org/10.1007/s11220-019-0225-3
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s11220-019-0225-3