skip to main content
10.1145/1459359.1459370acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-cue fusion for semantic video indexing

Published:26 October 2008Publication History

ABSTRACT

The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.

References

  1. W. Adams, G. Iyengar, C.-Y. Lin, M. Naphade, C. Neti, H. Nock, and J. Smith. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP JASP, 2003(2):170--185, 2003.Google ScholarGoogle Scholar
  2. A. Amir et al. IBM research TRECVID-2005 video retrieval system. In Proc. of TREC Video Retrieval Evaluation, 2005.Google ScholarGoogle Scholar
  3. J. Cao et al. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Proc. of TREC Video Retrieval Evaluation, 2006.Google ScholarGoogle Scholar
  4. S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of ICASSP, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Jiang, S.-F. Chang, and A. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. W. Jiang, S.-F. Chang, and A. C. Loui. Active context-based concept fusion with partial user labels. In Proc. of ICIP, 2006.Google ScholarGoogle Scholar
  8. Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proc. of CIVR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Kennedy and S.-F. Chang. A reranking approach for context-based concept fusion in video indexing and retrieval. In Proc. of CIVR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Koskela and A. F. Smeaton. An empirical study of inter-concept similarities in multimedia ontologies. In Proc. of CIVR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State-of-the-art and challenges. ACM TOMCCAP, 2(1):1--19, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE TMM, 10(2):240--251, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, March 2006.Google ScholarGoogle Scholar
  14. M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE TCSVT, 12(1):40--52, Jan 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Naphide and T. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE TMM, 3(1):141--151, Mar. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google ScholarGoogle Scholar
  17. G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In Proc. of ACM Multimedia, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, second edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In Proc. of MIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE TPAMI, 22(12):1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. R. Smith, M. Naphade, and A. Natsev. Multimedia semantic indexing using model vectors. In Proc. of ICME, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. G. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proc. of ACM Multimedia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. G. M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5--35, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Tseng, C.-Y. Lin, M. Naphade, A. Natsev, and J. Smith. Normalized classifier fusion for semantic visual concept detection. In Proc. of ICIP, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. M.-F. Weng et al. The NTU toolkit and framework for high-level feature detection at TRECVID 2007. In Proc. of TREC Video Retrieval Evaluation, 2007.Google ScholarGoogle Scholar
  26. R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of ICME, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, March 2007.Google ScholarGoogle Scholar
  28. J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In Proc. of MIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. of CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-cue fusion for semantic video indexing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '08: Proceedings of the 16th ACM international conference on Multimedia
          October 2008
          1206 pages
          ISBN:9781605583037
          DOI:10.1145/1459359

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader