ABSTRACT
This paper describes a system for multimedia event detection and recounting. The goal is to detect a high level event class in unconstrained web videos and generate event oriented summarization for display to users. For this purpose, we detect informative segments and collect observations for them, leading to our ISOMER system. We combine a large collection of both low level and semantic level visual and audio features for event detection. For event recounting, we propose a novel approach to identify event oriented discriminative video segments and their descriptions with a linear SVM event classifier. User friendly concepts including objects, actions, scenes, speech and optical character recognition are used in generating descriptions. We also develop several mapping and filtering strategies to cope with noisy concept detectors. Our system performed competitively in the TRECVID 2013 Multimedia Event Detection task with near 100,000 videos and was the highest performer in TRECVID 2013 Multimedia Event Recounting task.
- M.-Y. Chen and A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical Report CMU-CS-09-161, 2009.Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.Google ScholarCross Ref
- D. Ding, F. Metze, S. Rawat, P. F. Schulam, S. Burger, E. Younessian, L. Bao, M. G. Christel, and A. Hauptmann. Beyond audio and video retrieval: Towards multimedia summarization. In ICMR, 2012. Google ScholarDigital Library
- A. Habibian, K. E. A. van de Sande, and C. G. M. Snoek. Recommendations for video event recognition using concept vocabularies. In ICMR, 2013. Google ScholarDigital Library
- H. Izadinia and M. Shah. Recognizing complex events using large margin joint low-level event model. In ECCV, 2012. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. TPAMI, 2011.Google ScholarDigital Library
- I. Laptev. On space-time interest points. IJCV, 2005. Google ScholarDigital Library
- J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. S. Sawhney. Video event recognition using concept attributes. In WACV, 2013.Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004. Google ScholarDigital Library
- G. K. Myers, R. C. Bolles, Q.-T. Luong, J. A. Herson, and H. Aradhye. Rectification and recognition of text in 3-d scenes. IJDAR, 2005.Google ScholarDigital Library
- G. K. Myers, R. Nallapati, J. van Hout, S. Pancoast, R. Nevatia, C. Sun, A. Habibian, D. C. Koelma, K. E. A. van de Sande, A. W. M. Smeulders, and C. G. M. Snoek. Evaluating multimedia features and fusion for example-based event detection. MVA, 2014. Google ScholarDigital Library
- P. Natarajan et al. Bbnviser: Bbn viser trecvid 2012 multimedia event detection and multimedia event recounting systems. In TRECVID, 2012.Google Scholar
- P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, W. Kraaij, A. F. Smeaton, and G. Queenot. Trecvid 2013 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID, 2013.Google Scholar
- J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image Classification with the Fisher Vector: Theory and Practice. IJCV, 2013. Google ScholarDigital Library
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In MIR, 2006. Google ScholarDigital Library
- K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01.Google Scholar
- C. Sun and R. Nevatia. Active: Activity concept transitions in video event classification. In ICCV, 2013. Google ScholarDigital Library
- C. Sun and R. Nevatia. Large-scale web video event classification by use of fisher vectors. In WACV, 2013. Google ScholarDigital Library
- B. T. Truong and S. Venkatesh. Video abstraction: A systematic review and classification. TOMCCAP, 2007. Google ScholarDigital Library
- K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. TPAMI, 2010. Google ScholarDigital Library
- H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, 2013.Google ScholarCross Ref
Index Terms
- ISOMER: Informative Segment Observations for Multimedia Event Recounting
Recommendations
DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern RecognitionWe propose a unified framework DISCOVER to simultaneously discover important segments, classify high-level events and generate recounting for large amounts of unconstrained web videos. The motivation is our observation that many video events are ...
Zero-Example Multimedia Event Detection and Recounting with Unsupervised Evidence Localization
MM '16: Proceedings of the 24th ACM international conference on MultimediaRetrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video ...
Bag-of-Fragments: Selecting and Encoding Video Fragments for Event Detection and Recounting
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalThe goal of this paper is event detection and recounting using a representation of concept detector scores. Different from existing work, which encodes videos by averaging concept scores over all frames, we propose to encode videos using fragments that ...
Comments