research-article

Multi-cue fusion for semantic video indexing

Authors:
Ming-Fang Weng

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Yung-Yu Chuang

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

MM '08: Proceedings of the 16th ACM international conference on MultimediaOctober 2008Pages 71–80https://doi.org/10.1145/1459359.1459370

Published:26 October 2008Publication History

MM '08: Proceedings of the 16th ACM international conference on Multimedia

Pages 71–80

ABSTRACT

The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.

References

W. Adams, G. Iyengar, C.-Y. Lin, M. Naphade, C. Neti, H. Nock, and J. Smith. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP JASP, 2003(2):170--185, 2003.Google Scholar
A. Amir et al. IBM research TRECVID-2005 video retrieval system. In Proc. of TREC Video Retrieval Evaluation, 2005.Google Scholar
J. Cao et al. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Proc. of TREC Video Retrieval Evaluation, 2006.Google Scholar
S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of ICASSP, 2007.Google ScholarCross Ref
R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008. to appear. Google ScholarDigital Library
W. Jiang, S.-F. Chang, and A. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, 2007.Google ScholarCross Ref
W. Jiang, S.-F. Chang, and A. C. Loui. Active context-based concept fusion with partial user labels. In Proc. of ICIP, 2006.Google Scholar
Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proc. of CIVR, 2007. Google ScholarDigital Library
L. Kennedy and S.-F. Chang. A reranking approach for context-based concept fusion in video indexing and retrieval. In Proc. of CIVR, 2007. Google ScholarDigital Library
M. Koskela and A. F. Smeaton. An empirical study of inter-concept similarities in multimedia ontologies. In Proc. of CIVR, 2007. Google ScholarDigital Library
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State-of-the-art and challenges. ACM TOMCCAP, 2(1):1--19, 2006. Google ScholarDigital Library
K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE TMM, 10(2):240--251, 2008. Google ScholarDigital Library
LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, March 2006.Google Scholar
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE TCSVT, 12(1):40--52, Jan 2002. Google ScholarDigital Library
H. Naphide and T. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE TMM, 3(1):141--151, Mar. 2001. Google ScholarDigital Library
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google Scholar
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In Proc. of ACM Multimedia, 2007. Google ScholarDigital Library
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, second edition, 2003. Google ScholarDigital Library
A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In Proc. of MIR, 2006. Google ScholarDigital Library
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE TPAMI, 22(12):1349--1380, 2000. Google ScholarDigital Library
J. R. Smith, M. Naphade, and A. Natsev. Multimedia semantic indexing using model vectors. In Proc. of ICME, 2003. Google ScholarDigital Library
C. G. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proc. of ACM Multimedia, 2006. Google ScholarDigital Library
C. G. M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5--35, 2005. Google ScholarDigital Library
B. Tseng, C.-Y. Lin, M. Naphade, A. Natsev, and J. Smith. Normalized classifier fusion for semantic visual concept detection. In Proc. of ICIP, 2003.Google ScholarCross Ref
M.-F. Weng et al. The NTU toolkit and framework for high-level feature detection at TRECVID 2007. In Proc. of TREC Video Retrieval Evaluation, 2007.Google Scholar
R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of ICME, 2006.Google ScholarCross Ref
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, March 2007.Google Scholar
J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In Proc. of MIR, 2006. Google ScholarDigital Library
E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. of CIKM, 2006. Google ScholarDigital Library

Index Terms

Multi-cue fusion for semantic video indexing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to ...
Read More
Extended conceptual feedback for semantic multimedia indexing

In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion ...
Read More
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systems

With the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '08: Proceedings of the 16th ACM international conference on Multimedia
October 2008
1206 pages
ISBN:9781605583037
DOI:10.1145/1459359
General Chairs:
Abdulmotaleb EL Saddik
University of Ottawa
,
Son Vuong
University of British Colombia
,
Program Chairs:
Carsten Griwodz
University of Oslo
,
Alberto Del Bimbo
University degli Studi di Firenze
,
K. Selcuk Candan
Arizona State University
,
Alejandro Jaimes
Telefonica R&D, Madrid, Spain
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contextual correlation
semantic video indexing
temporal dependency
trecvid
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 486
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-cue fusion for semantic video indexing

MM '08: Proceedings of the 16th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Extended conceptual feedback for semantic multimedia indexing

News video retrieval by learning multimodal semantic information