Abstract
It becomes an urgent demand how to make people find relevant video content of interest from massive sports videos. We have designed and developed a sports video search engine based on distributed architecture, which aims to provide users with content-based video analysis and retrieval services. In sports video search engine, we focus on event detection, highlights analysis and image retrieval. Our work has several advantages: (I) CNN and RNN are used to extract features and integrate dynamic information and a new sliding window model are used for multi-length event detection. (II) For highlights analysis. An improved method based on self-adapting dual threshold and dominant color percentage are used to detect the shot boundary. Affect arousal method are used for highlights extraction. (III) For image’s indexing and retrieval. Hyper-spherical soft assignment method is proposed to generate image descriptor. Enhanced residual vector quantization is presented to construct multi-inverted index. Two adaptive retrieval methods based on hype-spherical filtration are used to improve the time efficient. (IV) All of previous algorithms are implemented in the distributed platform which we develop for massive video data processing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Geetha, P., Narayanan, V.: A survey of content-based video retrieval. J. Comput. Sci. 4(6), 734 (2008)
Chao, Y.W., Vijayanarasimhan, S., Seybold, B., et al.: Rethinking the faster R-CNN architecture for temporal action localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1130–1139 (2018)
Lin, T., Zhao, X., Su, H., et al.: BSN: boundary sensitive network for temporal action proposal generation. In: European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Ramanathan, V., Huang, J., Abu-El-Haija, S., et al.: Detecting events and key actors in multi-person videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3043–3053 (2016)
Ibrahim, M.S., Muralidharan, S., Deng, Z., et al.: A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980 (2016)
Lea, C., Flynn, M.D., Vidal, R., et al.: Temporal convolutional networks for action segmentation and detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1003–1012 (2017)
Krishna, R., Hata, K., Ren, F., et al.: Dense-captioning events in videos. In: IEEE International Conference on Computer Vision (ICCV), pp. 706–715 (2017)
Hanjalic, A.: Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans. Multimedia 7(6), 1114–1122 (2005)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in video. In: IEEE International Conference on Computer Vision (ICCV), pp. 1470–1477 (2003)
Jegou, H., Douze, M., Schmid, C., et al.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)
Wengert, C., Douze, M., Jegou, H.: Bag-of-colors for improved image search. In: 19th ACM International Conference on Multimedia, pp. 1437–1440 (2011)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Tavenard, R., Jegou, H., Amsaleg, L.: Balancing clusters to reduce response time variability in large scale image search. In: 9th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 19–24 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
He, K., Zhang, X., Ren, S., et al.: Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044 (2015)
Yu, J., Wang, N.: Shot classification for soccer video based on sub-window region. J. Image Graph. 1006–8961 (2008). 07-1347-06
Gan, C., et al.: DevNet: a deep event network for multimedia event detection and evidence recounting. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Buch, S., et al.: SST: single-stream temporal action proposals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Yu, J.Q., Lei, A.P., Song, Z.K., et al.: Comprehensive dataset of broadcast soccer videos. In: IEEE International Conference on Multimedia Information Processing and Retrieval (2018)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Herve, J., Matthijs, D., Cordelia, S., et al.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)
Acknowledgments
We gratefully acknowledge the granted financial support from the National Natural Science Foundation of China (No. 61572211, 61173114, 61202300).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, Z., Yu, J., Cai, H., Hu, Y., Chen, YP.P. (2020). Fine-Grain Level Sports Video Search Engine. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-37731-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)