research-article

Identifying relevant frames in weakly labeled videos for training concept detectors

Authors:
Adrian Ulges

Technical University, Kaiserslautern, Germany

Technical University, Kaiserslautern, Germany
View Profile

,
Christian Schulze

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
View Profile

,
Daniel Keysers

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
View Profile

,
Thomas Breuel

DFKI and Technical University - Kaiserslautern, Kaiserslautern, Germany

DFKI and Technical University - Kaiserslautern, Kaiserslautern, Germany
View Profile

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrievalJuly 2008Pages 9–16https://doi.org/10.1145/1386352.1386358

Published:07 July 2008Publication History

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

Pages 9–16

ABSTRACT

A key problem with the automatic detection of semantic concepts (like 'interview' or 'soccer') in video streams is the manual acquisition of adequate training sets. Recently, we have proposed to use online videos downloaded from portals like youtube.com for this purpose, whereas tags provided by users during video upload serve as ground truth annotations.

The problem with such training data is that it is weakly labeled: Annotations are only provided on video level, and many shots of a video may be "non-relevant", i.e. not visually related to a tag. In this paper, we present a probabilistic framework for learning from such weakly annotated training videos in the presence of irrelevant content. Thereby, the relevance of keyframes is modeled as a latent random variable that is estimated during training.

In quantitative experiments on real-world online videos and TV news data, we demonstrate that the proposed model leads to a significantly increased robustness with respect to irrelevant content, and to a better generalization of the resulting concept detectors.

References

D. Borth, A. Ulges, C. Schulze, and T. Breuel. Keyframe Extraction for Video Tagging and Summarization. In GI--Informatiktage, 2008.Google Scholar
M. Campbell, A. Haubold, M. Liu, A. Natsev, J. Smith, and J. Tesic. IBM Research TRECVID--2007 Video Retrieval System. In TRECVID Workshop, Gaithersburg, USA, November 2007.Google Scholar
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.Google Scholar
T. Deselaers, D. Keysers, and H. Ney. Discriminative Training for Object Recognition Using Image Patches. In CVPR, pages 157--162, 2005. Google ScholarDigital Library
R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd Edition). Wiley Interscience Publications, 2000. Google ScholarDigital Library
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from Google’s Image Search. Computer Vision, 2:1816--1823, 2005. Google ScholarDigital Library
R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. In CVPR, pages 264--271, 2003.Google ScholarCross Ref
T. Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42:177--196, 2001. Google ScholarDigital Library
S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In CVPR, pages 2169--2178, 2006. Google ScholarDigital Library
K. Mikolajczyk, R. Mohr, and C. Bauckhage. Evaluation of Interest Point Detectors. Intern. J. Compt. Vis., 37(2):1--38, 2007. Google ScholarDigital Library
K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. In CVPR, pages 257--263, 2007.Google Scholar
A. Opelt, M. Fussenegger, and P. Auer. Generic Object Recognition with Boosting. IEEE Trans. Pattern Anal. Mach. Intell., 28(3):416--431, 2006. Google ScholarDigital Library
J. Philbin, O. Chum, J. Sivic, V. Ferrari, M. Marin, A. Bosch, N. Apostolof, and A. Zisserman. Oxford TRECVID 2007, Notebook paper. In TRECVID Workshop, 2007.Google Scholar
C. Rosenberg and M. Hebert. Training Object Detection Models with Weakly Labeled Data. In BMVC, 2002.Google ScholarCross Ref
J. Sivic, B. Russell, A. Efros, and A. Zisserman. Discovering Objects and their Locations in Images. In ICCV, pages 370--377, 2005. Google ScholarDigital Library
C. G. M. Snoek, I. Everts, J. C. van Gemert, J.-M. Geusebroek, B. Huurnink, D. C. Koelma, M. van Liempt, O. de Rooij, K. E. A. van de Sande, A. W. M. Smeulders, J. R. R. Uijlings, and M. Worring. The MediaMill TRECVID 2007 Semantic Video Search Engine. In TRECVID Workshop, November 2007.Google Scholar
H. Tamura, S. Mori, and T. Yamawaki. Textural Features Corresponding to Visual Perception. IEEE Trans. on Sys., Man, Cybern., 6(8):460--472, 1978.Google ScholarCross Ref
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Content-Based Video Tagging for Online Video Portals. In MUSCLE/Image-CLEF Workshop, Budapest, 2007.Google Scholar
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. A System that Learns to Tag Videos by Watching Youtube. In ICVS (accepted for publication), 2008. Google ScholarDigital Library
C. Yang and T. Lozano-Perez. Image Database Retrieval with Multiple-Instance Learning Techniques. In Int. Conf. on Data Eng., 2000. Google ScholarDigital Library
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision, 73(2):213--238, June 2007. Google ScholarDigital Library

Index Terms

Identifying relevant frames in weakly labeled videos for training concept detectors
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Training object detectors from few weakly-labeled and many unlabeled images
Highlights
- A novel method to train detector by few weakly-labeled images and lots of unlabeled images.
Abstract
Weakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training ...
Read More
A system that learns to tag videos by watching youtube
ICVS'08: Proceedings of the 6th international conference on Computer vision systems

We present a system that automatically tags videos, i.e. detects high-level semantic concepts like objects or actions in them. To do so, our system does not rely on datasets manually annotated for research purposes. Instead, we propose to use videos from ...
Read More
Learning automatic concept detectors from online video

Concept detection is targeted at automatically labeling video content with semantic concepts appearing in it, like objects, locations, or activities. While concept detectors have become key components in many research prototypes for content-based video ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval
July 2008
674 pages
ISBN:9781605580708
DOI:10.1145/1386352
General Chairs:
Jiebo Luo
Kodak Research Laboratories
,
Ling Guan
Ryerson University
,
Program Chairs:
Alan Hanjalic
Delft University of Technology
,
Mohan Kankanhalli
National University of Singapore
,
Ivan Lee
University of South Australia
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
content-based video retrieval
online videos
video tagging
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 427
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying relevant frames in weakly labeled videos for training concept detectors

CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Training object detectors from few weakly-labeled and many unlabeled images

A system that learns to tag videos by watching youtube

Learning automatic concept detectors from online video