ABSTRACT
The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.
- W. Adams, G. Iyengar, C.-Y. Lin, M. Naphade, C. Neti, H. Nock, and J. Smith. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP JASP, 2003(2):170--185, 2003.Google Scholar
- A. Amir et al. IBM research TRECVID-2005 video retrieval system. In Proc. of TREC Video Retrieval Evaluation, 2005.Google Scholar
- J. Cao et al. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Proc. of TREC Video Retrieval Evaluation, 2006.Google Scholar
- S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of ICASSP, 2007.Google ScholarCross Ref
- R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008. to appear. Google ScholarDigital Library
- W. Jiang, S.-F. Chang, and A. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, 2007.Google ScholarCross Ref
- W. Jiang, S.-F. Chang, and A. C. Loui. Active context-based concept fusion with partial user labels. In Proc. of ICIP, 2006.Google Scholar
- Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proc. of CIVR, 2007. Google ScholarDigital Library
- L. Kennedy and S.-F. Chang. A reranking approach for context-based concept fusion in video indexing and retrieval. In Proc. of CIVR, 2007. Google ScholarDigital Library
- M. Koskela and A. F. Smeaton. An empirical study of inter-concept similarities in multimedia ontologies. In Proc. of CIVR, 2007. Google ScholarDigital Library
- M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State-of-the-art and challenges. ACM TOMCCAP, 2(1):1--19, 2006. Google ScholarDigital Library
- K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE TMM, 10(2):240--251, 2008. Google ScholarDigital Library
- LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, March 2006.Google Scholar
- M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE TCSVT, 12(1):40--52, Jan 2002. Google ScholarDigital Library
- H. Naphide and T. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE TMM, 3(1):141--151, Mar. 2001. Google ScholarDigital Library
- W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google Scholar
- G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In Proc. of ACM Multimedia, 2007. Google ScholarDigital Library
- S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, second edition, 2003. Google ScholarDigital Library
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In Proc. of MIR, 2006. Google ScholarDigital Library
- A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE TPAMI, 22(12):1349--1380, 2000. Google ScholarDigital Library
- J. R. Smith, M. Naphade, and A. Natsev. Multimedia semantic indexing using model vectors. In Proc. of ICME, 2003. Google ScholarDigital Library
- C. G. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proc. of ACM Multimedia, 2006. Google ScholarDigital Library
- C. G. M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5--35, 2005. Google ScholarDigital Library
- B. Tseng, C.-Y. Lin, M. Naphade, A. Natsev, and J. Smith. Normalized classifier fusion for semantic visual concept detection. In Proc. of ICIP, 2003.Google ScholarCross Ref
- M.-F. Weng et al. The NTU toolkit and framework for high-level feature detection at TRECVID 2007. In Proc. of TREC Video Retrieval Evaluation, 2007.Google Scholar
- R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of ICME, 2006.Google ScholarCross Ref
- A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, March 2007.Google Scholar
- J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In Proc. of MIR, 2006. Google ScholarDigital Library
- E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. of CIKM, 2006. Google ScholarDigital Library
Index Terms
- Multi-cue fusion for semantic video indexing
Recommendations
Cross-Domain Multicue Fusion for Concept-Based Video Indexing
The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to ...
Extended conceptual feedback for semantic multimedia indexing
In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion ...
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systemsWith the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Comments