The Korean Sign Language Dataset for Action Recognition

Yang, Seunghan; Jung, Seungjun; Kang, Heekwang; Kim, Changick

doi:10.1007/978-3-030-37731-1_43

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

International Conference on Multimedia Modeling

3054 Accesses
5 Citations

Abstract

Recently, the development of computer vision technologies has shown excellent performance in complex tasks such as behavioral recognition. Therefore, several studies propose datasets for behavior recognition, including sign language recognition. In many countries, researchers are carrying out studies to automatically recognize and interpret sign language to facilitate communication with deaf people. However, there is no dataset aiming at sign language recognition that is used in Korea yet, and research on this is insufficient. Since sign language varies from country to country, it is valuable to build a dataset for Korean sign language. Therefore, this paper aims to propose a dataset of videos of isolated signs from Korean sign language that can also be used for behavior recognition using deep learning. We present the Korean Sign Language (KSL) dataset. The dataset is composed of 77 words of Korean sign language video clips conducted by 20 deaf people. We train and evaluate this dataset in deep learning networks that have recently achieved excellent performance in the behavior recognition task. Also, we have confirmed through the deconvolution-based visualization method that the deep learning network fully understands the characteristics of the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_3
Chapter Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Chai, X., Wanga, H., Zhoub, M., Wub, G., Lic, H., Chena, X.: DEVISIGN: dataset and evaluation for 3D sign language recognition, Technical report, Beijing (2015)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Forster, J., et al.: RWTH-PHOENIX-Weather: a large vocabulary sign language recognition and translation corpus. In: LREC, pp. 3785–3789 (2012)
Google Scholar
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Article Google Scholar
Kapuscinski, T., Oszust, M., Wysocki, M., Warchol, D.: Recognition of hand gestures observed by depth cameras. Int. J. Adv. Rob. Syst. 12(4), 36 (2015)
Article Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Lu, P., Huenerfauth, M.: Collecting and evaluating the CUNY ASL corpus for research on American Sign Language animation. Comput. Speech Lang. 28(3), 812–831 (2014)
Article Google Scholar
Martínez, A.M., Wilbur, R.B., Shay, R., Kak, A.C.: Purdue RVL-SLLL ASL database for automatic recognition of American sign language. In: Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces, pp. 167–172. IEEE (2002)
Google Scholar
Neidle, C., Thangali, A., Sclaroff, S.: Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus. In: Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, Language Resources and Evaluation Conference (LREC) 2012, CiteSeer (2012)
Google Scholar
Oszust, M., Wysocki, M.: Polish sign language words recognition with Kinect. In: 2013 6th International Conference on Human System Interactions (HSI), pp. 219–226. IEEE (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Von Agris, U., Zieren, J., Canzler, U., Bauer, B., Kraiss, K.F.: Recent developments in visual sign language recognition. Univ. Access Inf. Soc. 6(4), 323–362 (2008)
Article Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Yang, S., Zhu, Q.: Video-based Chinese sign language recognition using convolutional neural network. In: 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN), pp. 929–934. IEEE (2017)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Seunghan Yang, Seungjun Jung & Changick Kim
Samsung Electornics, Suwon, Korea
Heekwang Kang

Authors

Seunghan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Seungjun Jung
View author publications
You can also search for this author in PubMed Google Scholar
Heekwang Kang
View author publications
You can also search for this author in PubMed Google Scholar
Changick Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changick Kim .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, S., Jung, S., Kang, H., Kim, C. (2020). The Korean Sign Language Dataset for Action Recognition. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-37731-1_43
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics