Elsevier

Pattern Recognition

Volume 42, Issue 2, February 2009, Pages 267-282
Pattern Recognition

State-of-the-art on spatio-temporal information-based video retrieval

https://doi.org/10.1016/j.patcog.2008.08.033Get rights and content

Abstract

Video retrieval is increasingly based on image content. A number of studies on video retrieval have used low-level pixel content related to statistical moments, shape, colour and texture. However, it is well recognised that such information is not enough for uniquely discriminating across different multimedia content. The use of semantic information, especially which derived from spatio-temporal analysis is of great value in multimedia annotation, archiving and retrieval. In this review paper, we detail how the use of spatiotemporal semantic knowledge is changing the way in which modern research the conducted. In this paper we review a number of studies and concepts related to such analysis, and draw important conclusions on where future research is headed.

Section snippets

Spatio-temporal information for video retrieval

Content-based video retrieval is a very important area of research and several practical systems have been developed over the last decade with the aim of improving retrieval performance and tested on large-scale databases such as TRECVID http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html. Video classification and retrieval problems can be hierarchically categorised with a taxonomy, an example of which is presented by Roach et al. [1]. A key characteristic of video data is its associated

Spatial representation

Spatial information can be formulated with the following two methodologies:

  • The first approach is to use weak spatial constraints and capture spatial local information to represent low-level texture features. Examples include Gabor wavelets [22], local histograms [23], co-occurrence matrices [24], colour correlograms [25], composite region templates (CRTs) [26], etc.

  • The second approach is to represent global qualitative spatial relations that support high-level semantic textual queries.

Visual appearance-based representation and recognition

Spatio-temporal motion-based recognition has the wide spectrum of applications in surveillance, automation, health and medical systems, etc., through perceptual identification of biometrics, activity recognition. For example, motion analysis can be used in sports and athletic training, e.g., analysing tennis strokes. For instance, the discrimination between different tennis strokes is investigated by Yamato et al. [88]. Motion-based recognition can be employed in obstacle avoidance of moving

Spatio-temporal video indexing systems

An integrated system for spatio-temporal video retrieval is LucentVision. LucentVision [12] was developed at the Visual Communications Research Department within Bell Labs. It was effectively used for tennis video indexing through spatio-temporal activity maps. This system analyses video from multiple cameras in real-time and captures the activity of the players and the ball in the form of motion trajectories. The system stores these trajectories in a database along with video, 3D models of the

Conclusions

Video retrieval is essentially the task of finding the most similar video based on a query video. Traditionally, text-based labels attached to videos were used for matching. Since the 1980s, significant research into image analysis opened up the possibility of extracting image content information from these videos which could form the basis of matching, ranking and retrieving them. Over the recent years, it has been recognised that raw pixel information and basic statistical features of colour

About the Author—SAMEER SINGH is Professor of Autonomous Systems in the Department of Computer Science, and is the Director of Research School of Informatics, Loughborough University, UK. He also heads Computer Vision and Autonomous Systems research group at Loughborough with more than 55 members. His main research focus is on the development of novel sensor data analysis and machine learning techniques that can support semi- and fully automated intelligent systems for transportation, security

References (171)

  • G. Hamarneh et al.

    Deformable spatio-temporal shape models: extending ASM to 2D+ time

    J. Image Vision Comput.

    (2004)
  • E.J. Hwang et al.

    Querying video libraries

    J. Visual Commun. Image Representation

    (1996)
  • P. Salembier et al.

    Description schemes for video programs, users and devices

    Signal Process. Image Commun.

    (2000)
  • J.K. Aggarwal et al.

    Human motion analysis: a review

    Comput. Vision Image Understand.

    (1999)
  • C.C. Chang et al.

    A shape recognition scheme based on relative distances of feature points from the centroid

    Pattern Recognition

    (1991)
  • M. Roach et al.

    Recent trends in video analysis: a taxonomy of video classification problems

  • J. Li et al.

    Multi-training support vector machine for image retrieval

    IEEE Trans. Image Process.

    (2006)
  • D. Tao et al.

    Asymmetric bagging and random subspace for support vector machine-based relevance feedback in image retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • D. Tao et al.

    Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm

    IEEE Trans. Multimedia

    (2006)
  • D. Tao et al.

    Negative samples analysis in relevance feedback

    IEEE Trans. Knowl. Data Eng.

    (2007)
  • D. Tao et al.

    Which components are important for interactive image searching?

    IEEE Trans. Circuits Systems Video Technol.

    (2008)
  • A.D. Bimbo et al.

    Symbolic description and visual querying of image sequences using spatio-temporal logic

    IEEE Trans. Knowl. Data Eng.

    (1995)
  • M. Erwig, M. Schneider, Visual specifications of spatio-temporal developments, in: Proceeding of 15th IEEE Symposium on...
  • J.K. Wu et al.

    CORE: a content-based retrieval engine for multimedia information systems

    ACM Multimedia Systems

    (1995)
  • J.P. Cheylan et al.

    Toward a conceptual model for the analysis of spatio-temporal processes

  • C. Claramunt et al.

    Toward semantics for modelling spatio-temporal processes within GIS

  • G.S. Pingali et al.

    Instantly indexed multimedia databases of real world events

    IEEE Trans. Multimedia

    (2002)
  • B.F. Buxton et al.

    Monocular depth perception from optical flow by space–time signal processing

    Proc. R. Soc. London Ser. B Biol. Sci.

    (1983)
  • E. Aldelson et al.

    Spatiotemporal energy models for the perception of motion

    J. Opt. Soc. Am.

    (1985)
  • R.C. Bolles et al.

    Epipolar plane image analysis: an approach to determining structure from motion

    Int. J. Comput. Vision

    (1987)
  • H.H. Baker et al.

    Generalizing epipolar plane image analysis on the spatiotemporal surface

    Int. J. Comput. Vision

    (1989)
  • A. Stefanidis, P. Agouris, P. Partsinevelos, Spatio-temporal helixes for event modelling, in: Proceedings of the Third...
  • Y. Ricquebourg et al.

    Real-time tracking of moving persons by exploiting spatiotemporal image slices

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • R. Wildes et al.

    Qualitative spatio-temporal analysis using an oriented energy representation

    Proc. Eur. Conf. Comput. Vision

    (2000)
  • W.-S. Li et al.

    SEMCOG: a hybrid object-based image and video database system and its modeling, language, and query processing

    Theory Prac. Object System (TAPOS)

    (1999)
  • J.F. Allen

    Maintaining knowledge about temporal intervals

    Commun. ACM

    (1983)
  • W.Y. Ma, B.S. Manjunath, A comparison of wavelet transform features for texture image annotation, in: Proceeding of...
  • M. Stricker, A. Dimai, Color Indexing with weak spatial constraints, in: Proceeding of Storage and Retrieval for Image...
  • R.M. Haralick et al.

    Texture features for image classification

    IEEE Trans. Systems Man Cybernet.

    (1973)
  • J. Huang, R. Zabih, Combining color and spatial information for content-based image retrieval, in: Proceeding of...
  • S.K. Chang et al.

    Iconic indexing by 2D strings

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1987)
  • M.J. Egenhofer et al.

    Point-set topological spatial relations

    Int. J. Geogr. Inf. System

    (1991)
  • V.N. Gudivada et al.

    Design and evaluation of algorithms for image retrieval by spatial similarity

    ACM Trans. Inf. Systems

    (1995)
  • A. Cohn
    (1997)
  • S.K. Chang, E. Jungert, A spatial knowledge structure for image systems using symbolic projections, in: Proceedings of...
  • S.K. Chang et al.

    An intelligent image database system

    IEEE Trans. Software Eng.

    (1988)
  • E. Jungert, Extended symbolic projections as a knowledge structure for spatial reasoning, in: Proceeding of Fourth BPRA...
  • S. Chang et al.

    Representation and retrieval of symbolic pictures using generalized 2D strings

    Proc. Visual Commun. Image Process. IV SPIE

    (1989)
  • M.-C. Yang, 2D B-string representation and access methods of image database, Master Thesis, Department of Computer...
  • E.G.M. Petrakis et al.

    A generalized approach to image indexing and retrieval based on 2-D strings

  • Cited by (0)

    About the Author—SAMEER SINGH is Professor of Autonomous Systems in the Department of Computer Science, and is the Director of Research School of Informatics, Loughborough University, UK. He also heads Computer Vision and Autonomous Systems research group at Loughborough with more than 55 members. His main research focus is on the development of novel sensor data analysis and machine learning techniques that can support semi- and fully automated intelligent systems for transportation, security and surveillance, mobile phone networks, and biomedical applications. These diverse applications are complex in nature, depend heavily on advances in machine learning and sensor technology for solving problems, and can benefit enormously from automation. Over the last two decades, Prof. Singh has worked at the interface between computer science, engineering, health sciences and mathematics to develop novel algorithms in the areas of computer vision (quantitative evaluation of image enhancement, evolutionary approaches to object tracking, novelty detection in video sequences, optimisation of image analysis tools, classifying human dynamics, audio–video fusion, and handwriting recognition), and machine learning (multi-resolution pattern recognition, pareto-evolutionary neural networks, sensor fusion, predictive systems, and multi-objective optimisation). Most of this research has been published in various IEEE Transactions and other leading journals. Altogether, Prof. Singh has published over 170 papers in his career, and currently has more than £2 million research grant income to support his research team. His work is supported by several leading companies, for example HP Labs, Motorola, Corus Rail, QinetiQ, and government agencies working on transport and national security. He is also highly active in serving on various conference committees, and journals. Notably, he is currently serving as Editor-in-Chief of Pattern Analysis and Applications journal by Springer, and is Associate Editor of IEEE Transactions on SMC B, IEEE Transactions on Knowledge and Data Engineering, Real Time Image Analysis, and Neural Computing and Applications journal.

    About the Author—MANEESHA SINGH was born in India. She received the B.S. degree in computer science from Kurukshetra University, Kurukshetra, India and the M.Phil. and Ph.D. degrees from the University of Exeter, Exeter, UK in 1999, 2001 and 2004, respectively. Her Ph.D. was in the area of machine learning for image analysis in aviation security. Her main research interests include image processing, natural scene analysis, video analysis, and neural networks. She has published more than 30 papers in the area of machine learning for image analysis in peer reviewed journals and conferences. Currently she is a Senior Research Fellow at Loughborough University leading the project on imaging for road transport applications.

    About the Author—WEI REN graduated from University of Exeter with a Ph.D. in 2005 in the area of spatiotemporal analysis for video retrieval. Her key research interests are in the areas of image analysis, neural networks and machine learning. During the course of her Ph.D. she developed novel framework for video retrieval and a publicly available benchmark Minerva for video retrieval. She is currently working as a Post-Doctoral research in Peking University.

    View full text