ABSTRACT
To detect scene text in the video is valuable to many content-based video applications. In this paper, we present a novel scene text detection and tracking method for videos, which effectively exploits the cues of the background regions of the text. Specifically, we first extract text candidates and potential background regions of text from the video frame. Then, we exploit the spatial, shape and motional correlations between the text and its background region with a bipartite graph model and the random walk algorithm to refine the text candidates for improved accuracy. We also present an effective tracking framework for text in the video, making use of the temporal correlation of text cues across successive frames, which contributes to enhancing both the precision and the recall of the final text detection result. Experiments on public scene text video datasets demonstrate the state-of-the-art performance of the proposed method.
- Katherine L. Bouman, Golnaz Abdollahian, Mireille Boutin, and Edward J. Delp. 2011. A Low Complexity Sign Detection and Text Localization Method for Mobile Applications. IEEE Transactions on Multimedia Vol. 13, 5 (Oct. . 2011), 922--934. Google ScholarDigital Library
- Xiangrong Chen and Alan L. Yuille. 2004. Detecting and reading text in natural scenes. In 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. Vol. 2. II-366-II-373 Vol.2. opersonChangsong Liu, and Xiaoqing Ding. 2013. A research on Video text tracking and recognition. Proceedings of SPIE Vol. 8664 (2013), 8664-8664-10. Google ScholarDigital Library
- Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-End Scene Text Recognition. In 2011 International Conference on Computer Vision. 1457--1464. Google ScholarDigital Library
- Christian Wolf, Jean-Michel Jolion, and Francoise Chassaing. 2002. Text Localization, Enhancement and Binarization in Multimedia Documents 16th International Conference on Pattern Recognition, Vol. Vol. 2. 1037--1040 vol.2.Google Scholar
- Liang Wu, Palaiahnakote Shivakumara, Tong Lu, and Chew Lim Tan. 2015. A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video. IEEE Transactions on Multimedia Vol. 17, 8 (Aug . 2015), 1137--1152.Google ScholarDigital Library
- Hailiang Xu and Feng Su. 2015. Robust Seed Localization and Growing with Deep Convolutional Features for Scene Text Detection. In 2015 5th ACM International Conference on Multimedia Retrieval (ICMR 2015). 387--394. Google ScholarDigital Library
- Chun Yang, Xu-Cheng Yin, Wei-Yi Pei, Shu Tian, Ze-Yu Zuo, Chao Zhu, and Junchi Yan. 2017. Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming. IEEE Transactions on Image Processing Vol. 26, 7 (July. 2017), 3235--3248. Google ScholarDigital Library
- Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao. 2014. Robust Text Detection in Natural Scene Images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, 5 (May. 2014), 970--983.Google Scholar
- Xu-Cheng Yin, Ze-Yu Zuo, Shu Tian, and Cheng-Lin Liu. 2016. Text Detection, Tracking and Recognition in Video: A Comprehensive Survey. IEEE Transactions on Image Processing Vol. 25, 6 (June. 2016), 2752--2773. Google ScholarDigital Library
- Zheng Zhang, Wei Shen, Cong Yao, and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2558--2567.Google Scholar
- Xu Zhao, Kai-Hsiang Lin, Yun Fu, Yuxiao Hu, Yuncai Liu, and Thomas S. Huang. 2011. Text From Corners: A Novel Approach to Detect Text and Caption in Videos. IEEE Transactions on Image Processing Vol. 20, 3 (March. 2011), 790--799. Google ScholarDigital Library
- Ze-Yu Zuo, Shu Tian, Wei yi Pei, and Xu-Cheng Yin. 2015. Multi-strategy tracking based text detection in scene videos 2015 13th International Conference on Document Analysis and Recognition (ICDAR). 66--70. Google ScholarDigital Library
Index Terms
- Scene Text Detection and Tracking in Video with Background Cues
Recommendations
A Robust Approach for Scene Text Detection and Tracking in Video
Advances in Multimedia Information Processing – PCM 2018AbstractThe detection of scene text in videos is of great value in various content-based video applications such as video analysis and retrieval. In this paper, we present a robust scene text detection and tracking method for videos. We first propose an ...
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither ...
Automatic Detection and Localization of Natural Scene Text in Video
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern RecognitionVideo scene text contains semantic information and thus can contribute significantly to video indexing and summarization. However, most of the previous approaches to detecting scene text from videos experience difficulties in handling texts with various ...
Comments