State-of-the-art on spatio-temporal information-based video retrieval

doi:10.1016/j.patcog.2008.08.033

Pattern Recognition

Volume 42, Issue 2, February 2009, Pages 267-282

https://doi.org/10.1016/j.patcog.2008.08.033 Get rights and content

Abstract

Video retrieval is increasingly based on image content. A number of studies on video retrieval have used low-level pixel content related to statistical moments, shape, colour and texture. However, it is well recognised that such information is not enough for uniquely discriminating across different multimedia content. The use of semantic information, especially which derived from spatio-temporal analysis is of great value in multimedia annotation, archiving and retrieval. In this review paper, we detail how the use of spatiotemporal semantic knowledge is changing the way in which modern research the conducted. In this paper we review a number of studies and concepts related to such analysis, and draw important conclusions on where future research is headed.

Section snippets

Spatio-temporal information for video retrieval

Content-based video retrieval is a very important area of research and several practical systems have been developed over the last decade with the aim of improving retrieval performance and tested on large-scale databases such as TRECVID http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html. Video classification and retrieval problems can be hierarchically categorised with a taxonomy, an example of which is presented by Roach et al. [1]. A key characteristic of video data is its associated

Spatial representation

Spatial information can be formulated with the following two methodologies:

•
The first approach is to use weak spatial constraints and capture spatial local information to represent low-level texture features. Examples include Gabor wavelets [22], local histograms [23], co-occurrence matrices [24], colour correlograms [25], composite region templates (CRTs) [26], etc.
•
The second approach is to represent global qualitative spatial relations that support high-level semantic textual queries.

Visual appearance-based representation and recognition

Spatio-temporal motion-based recognition has the wide spectrum of applications in surveillance, automation, health and medical systems, etc., through perceptual identification of biometrics, activity recognition. For example, motion analysis can be used in sports and athletic training, e.g., analysing tennis strokes. For instance, the discrimination between different tennis strokes is investigated by Yamato et al. [88]. Motion-based recognition can be employed in obstacle avoidance of moving

Spatio-temporal video indexing systems

An integrated system for spatio-temporal video retrieval is LucentVision. LucentVision [12] was developed at the Visual Communications Research Department within Bell Labs. It was effectively used for tennis video indexing through spatio-temporal activity maps. This system analyses video from multiple cameras in real-time and captures the activity of the players and the ball in the form of motion trajectories. The system stores these trajectories in a database along with video, 3D models of the

Conclusions

Video retrieval is essentially the task of finding the most similar video based on a query video. Traditionally, text-based labels attached to videos were used for matching. Since the 1980s, significant research into image analysis opened up the possibility of extracting image content information from these videos which could form the basis of matching, ranking and retrieving them. Over the recent years, it has been recognised that raw pixel information and basic statistical features of colour

References (171)

J.R. Smith et al.
Image classification and querying using composite region templates
J. Comput. Vision Image Understand.
(1999)
C.C. Chang et al.
Retrieval of similar pictures on pictorial databases
Pattern Recognition
(1991)
S. Lee et al.
Spatial reasoning and similarity retrieval of images using 2D C-string knowledge representation
Pattern Recognition
(1992)
S.-Y. Lee et al.
2D C-String: a new spatial knowledge representation for image database systems
Pattern Recognition
(1990)
P.W. Huang et al.
Using 2D C+-string as spatial knowledge representation for image database systems
Pattern Recognition
(1994)
P.W. Huang et al.
Spatial reasoning and similarity retrieval for image database systems based on RS-strings
Pattern Recognition
(1996)
S. Lee et al.
Signature file as a spatial filter for iconic image database
J. Visual Lang. Comput.
(1992)
A.J.T. Lee et al.
3D C-string: a new spatio-temporal knowledge representation for video database systems
Pattern Recognition
(2002)
F.-J. Hsu et al.
Video data indexing by 2D C-Trees
J. Visual Lang. Comput.
(1998)
C. Freksa
Temporal reasoning based on semi-intervals
Artif. Intell.
(1992)

G. Hamarneh et al.

Deformable spatio-temporal shape models: extending ASM to 2D+ time

J. Image Vision Comput.

(2004)

E.J. Hwang et al.

Querying video libraries

J. Visual Commun. Image Representation

(1996)

P. Salembier et al.

Description schemes for video programs, users and devices

Signal Process. Image Commun.

(2000)

J.K. Aggarwal et al.

Human motion analysis: a review

Comput. Vision Image Understand.

(1999)

C.C. Chang et al.

A shape recognition scheme based on relative distances of feature points from the centroid

Pattern Recognition

(1991)

M. Roach et al.

Recent trends in video analysis: a taxonomy of video classification problems

J. Li et al.

Multi-training support vector machine for image retrieval

IEEE Trans. Image Process.

(2006)

D. Tao et al.

Asymmetric bagging and random subspace for support vector machine-based relevance feedback in image retrieval

IEEE Trans. Pattern Anal. Mach. Intell.

(2006)

D. Tao et al.

Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm

IEEE Trans. Multimedia

(2006)

D. Tao et al.

Negative samples analysis in relevance feedback

IEEE Trans. Knowl. Data Eng.

(2007)

D. Tao et al.

Which components are important for interactive image searching?

IEEE Trans. Circuits Systems Video Technol.

(2008)

A.D. Bimbo et al.

Symbolic description and visual querying of image sequences using spatio-temporal logic

IEEE Trans. Knowl. Data Eng.

(1995)

M. Erwig, M. Schneider, Visual specifications of spatio-temporal developments, in: Proceeding of 15th IEEE Symposium on...

J.K. Wu et al.

CORE: a content-based retrieval engine for multimedia information systems

ACM Multimedia Systems

(1995)

J.P. Cheylan et al.

Toward a conceptual model for the analysis of spatio-temporal processes

C. Claramunt et al.

Toward semantics for modelling spatio-temporal processes within GIS

G.S. Pingali et al.

Instantly indexed multimedia databases of real world events

IEEE Trans. Multimedia

(2002)

B.F. Buxton et al.

Monocular depth perception from optical flow by space–time signal processing

Proc. R. Soc. London Ser. B Biol. Sci.

(1983)

E. Aldelson et al.

Spatiotemporal energy models for the perception of motion

J. Opt. Soc. Am.

(1985)

R.C. Bolles et al.

Epipolar plane image analysis: an approach to determining structure from motion

Int. J. Comput. Vision

(1987)

H.H. Baker et al.

Generalizing epipolar plane image analysis on the spatiotemporal surface

Int. J. Comput. Vision

(1989)

A. Stefanidis, P. Agouris, P. Partsinevelos, Spatio-temporal helixes for event modelling, in: Proceedings of the Third...

Y. Ricquebourg et al.

Real-time tracking of moving persons by exploiting spatiotemporal image slices

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

R. Wildes et al.

Qualitative spatio-temporal analysis using an oriented energy representation

Proc. Eur. Conf. Comput. Vision

(2000)

W.-S. Li et al.

SEMCOG: a hybrid object-based image and video database system and its modeling, language, and query processing

Theory Prac. Object System (TAPOS)

(1999)

J.F. Allen

Maintaining knowledge about temporal intervals

Commun. ACM

(1983)

W.Y. Ma, B.S. Manjunath, A comparison of wavelet transform features for texture image annotation, in: Proceeding of...

M. Stricker, A. Dimai, Color Indexing with weak spatial constraints, in: Proceeding of Storage and Retrieval for Image...

R.M. Haralick et al.

Texture features for image classification

IEEE Trans. Systems Man Cybernet.

(1973)

J. Huang, R. Zabih, Combining color and spatial information for content-based image retrieval, in: Proceeding of...

S.K. Chang et al.

Iconic indexing by 2D strings

IEEE Trans. Pattern Anal. Mach. Intell.

(1987)

M.J. Egenhofer et al.

Point-set topological spatial relations

Int. J. Geogr. Inf. System

(1991)

V.N. Gudivada et al.

Design and evaluation of algorithms for image retrieval by spatial similarity

ACM Trans. Inf. Systems

(1995)

A. Cohn

(1997)

S.K. Chang, E. Jungert, A spatial knowledge structure for image systems using symbolic projections, in: Proceedings of...

S.K. Chang et al.

An intelligent image database system

IEEE Trans. Software Eng.

(1988)

E. Jungert, Extended symbolic projections as a knowledge structure for spatial reasoning, in: Proceeding of Fourth BPRA...

S. Chang et al.

Representation and retrieval of symbolic pictures using generalized 2D strings

Proc. Visual Commun. Image Process. IV SPIE

(1989)

M.-C. Yang, 2D B-string representation and access methods of image database, Master Thesis, Department of Computer...

E.G.M. Petrakis et al.

A generalized approach to image indexing and retrieval based on 2-D strings

Cited by (0)

About the Author—SAMEER SINGH is Professor of Autonomous Systems in the Department of Computer Science, and is the Director of Research School of Informatics, Loughborough University, UK. He also heads Computer Vision and Autonomous Systems research group at Loughborough with more than 55 members. His main research focus is on the development of novel sensor data analysis and machine learning techniques that can support semi- and fully automated intelligent systems for transportation, security and surveillance, mobile phone networks, and biomedical applications. These diverse applications are complex in nature, depend heavily on advances in machine learning and sensor technology for solving problems, and can benefit enormously from automation. Over the last two decades, Prof. Singh has worked at the interface between computer science, engineering, health sciences and mathematics to develop novel algorithms in the areas of computer vision (quantitative evaluation of image enhancement, evolutionary approaches to object tracking, novelty detection in video sequences, optimisation of image analysis tools, classifying human dynamics, audio–video fusion, and handwriting recognition), and machine learning (multi-resolution pattern recognition, pareto-evolutionary neural networks, sensor fusion, predictive systems, and multi-objective optimisation). Most of this research has been published in various IEEE Transactions and other leading journals. Altogether, Prof. Singh has published over 170 papers in his career, and currently has more than £2 million research grant income to support his research team. His work is supported by several leading companies, for example HP Labs, Motorola, Corus Rail, QinetiQ, and government agencies working on transport and national security. He is also highly active in serving on various conference committees, and journals. Notably, he is currently serving as Editor-in-Chief of Pattern Analysis and Applications journal by Springer, and is Associate Editor of IEEE Transactions on SMC B, IEEE Transactions on Knowledge and Data Engineering, Real Time Image Analysis, and Neural Computing and Applications journal.

About the Author—MANEESHA SINGH was born in India. She received the B.S. degree in computer science from Kurukshetra University, Kurukshetra, India and the M.Phil. and Ph.D. degrees from the University of Exeter, Exeter, UK in 1999, 2001 and 2004, respectively. Her Ph.D. was in the area of machine learning for image analysis in aviation security. Her main research interests include image processing, natural scene analysis, video analysis, and neural networks. She has published more than 30 papers in the area of machine learning for image analysis in peer reviewed journals and conferences. Currently she is a Senior Research Fellow at Loughborough University leading the project on imaging for road transport applications.

About the Author—WEI REN graduated from University of Exeter with a Ph.D. in 2005 in the area of spatiotemporal analysis for video retrieval. Her key research interests are in the areas of image analysis, neural networks and machine learning. During the course of her Ph.D. she developed novel framework for video retrieval and a publicly available benchmark Minerva for video retrieval. She is currently working as a Post-Doctoral research in Peking University.

View full text

State-of-the-art on spatio-temporal information-based video retrieval

Abstract

Section snippets

Spatio-temporal information for video retrieval

Spatial representation

Visual appearance-based representation and recognition

Spatio-temporal video indexing systems

Conclusions

J. Comput. Vision Image Understand.

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

J. Visual Lang. Comput.

Pattern Recognition

J. Visual Lang. Comput.

Artif. Intell.

J. Image Vision Comput.

J. Visual Commun. Image Representation

Signal Process. Image Commun.

Comput. Vision Image Understand.

Pattern Recognition

Recent trends in video analysis: a taxonomy of video classification problems

Multi-training support vector machine for image retrieval

IEEE Trans. Image Process.

Asymmetric bagging and random subspace for support vector machine-based relevance feedback in image retrieval

IEEE Trans. Pattern Anal. Mach. Intell.

Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm

IEEE Trans. Multimedia

Negative samples analysis in relevance feedback

IEEE Trans. Knowl. Data Eng.

Which components are important for interactive image searching?

IEEE Trans. Circuits Systems Video Technol.

Symbolic description and visual querying of image sequences using spatio-temporal logic

IEEE Trans. Knowl. Data Eng.

CORE: a content-based retrieval engine for multimedia information systems

ACM Multimedia Systems

Toward a conceptual model for the analysis of spatio-temporal processes

Toward semantics for modelling spatio-temporal processes within GIS

Instantly indexed multimedia databases of real world events

IEEE Trans. Multimedia

Monocular depth perception from optical flow by space–time signal processing

Proc. R. Soc. London Ser. B Biol. Sci.

Spatiotemporal energy models for the perception of motion

J. Opt. Soc. Am.

Epipolar plane image analysis: an approach to determining structure from motion

Int. J. Comput. Vision

Generalizing epipolar plane image analysis on the spatiotemporal surface

Int. J. Comput. Vision

Real-time tracking of moving persons by exploiting spatiotemporal image slices

IEEE Trans. Pattern Anal. Mach. Intell.

Qualitative spatio-temporal analysis using an oriented energy representation

Proc. Eur. Conf. Comput. Vision

SEMCOG: a hybrid object-based image and video database system and its modeling, language, and query processing

Theory Prac. Object System (TAPOS)

Maintaining knowledge about temporal intervals

Commun. ACM

Texture features for image classification

IEEE Trans. Systems Man Cybernet.

Iconic indexing by 2D strings

IEEE Trans. Pattern Anal. Mach. Intell.

Point-set topological spatial relations

Int. J. Geogr. Inf. System

Design and evaluation of algorithms for image retrieval by spatial similarity

ACM Trans. Inf. Systems

An intelligent image database system

IEEE Trans. Software Eng.

Representation and retrieval of symbolic pictures using generalized 2D strings

Proc. Visual Commun. Image Process. IV SPIE

A generalized approach to image indexing and retrieval based on 2-D strings