Color active shape models for tracking non-rigid objects

doi:10.1016/S0167-8655(02)00330-6

Pattern Recognition Letters

Volume 24, Issue 11, July 2003, Pages 1751-1765

https://doi.org/10.1016/S0167-8655(02)00330-6 Get rights and content

Abstract

Active shape models can be applied to tracking non-rigid objects in video image sequences. Traditionally these models do not include color information in their formulation. In this paper, we present a hierarchical realization of an enhanced active shape model for color video tracking and we study the performance of both hierarchical and non-hierarchical implementations in the RGB, YUV, and HSI color spaces.

Introduction

The problem of tracking people and recognizing their actions in video sequences is of increasing importance to many applications (Haritaoglu et al., 2000; McKenna et al., 1999; Plänkers and Fua, 2001). Examples include video surveillance, human computer interaction, and motion capture for animation, to name a few. Special considerations for digital image processing are required when tracking objects whose forms (and/or their silhouettes) change between consecutive frames. For example, cyclists in a road scene and people in an airport terminal belong to this class of objects denoted as non-rigid objects. Active shape models (ASMs) can be applied to the tracking of non-rigid objects in a video sequence. Most existing ASMs do not consider color information (Pardàs and Sayrol, 2001). We present several extensions of the ASM for color images using different color-adapted objective functions.

In this paper, tracking an object means to identify the object in a video sequence and to calculate its position in every image frame during the analysis of successive images. Using color information as a feature to describe a moving object or person can support these tasks. Brock-Gunn et al. (1994) suggested the use of four-dimensional templates for tracking objects in color image sequences. However, if the observation is accomplished over a long period of time and with many single objects, then both the memory requirements for the templates in the database and the time requirements for the search of a template in the database increase. Deng and Manjunath (2001) used a segmentation scheme based on color quantization to track regions. The regions represent image areas of similar colors and they do not necessarily represent objects. Their approach also differs from other techniques in that it does not estimate exact object motion. In contrast to this approach, active contour models have been investigated for interactive interpretation of features in an image by Kass et al. (1988). The active contour model is an energy-minimizing spline, which is pulled toward features such as lines and edges. This model has been considered as a framework for low-level feature interpretation, such as stereo matching and motion tracking. A more comprehensive description and several applications of the active contour model are given by Blake and Isard (1998). The ASM is a compact model for which the form variety and the color distribution of an object class can both be taught in a training phase (Cootes et al., 1995). Compactness of the ASM results from principal component analysis (PCA) and a priori shape information from the training set.

Several systems use skin color information for tracking faces and hands (e.g. Comaniciu and Ramesh, 2000; Kim et al., 2001; Lee et al., 2001; Li et al., 2000; Marqués and Vilaplana, 2002). The basic idea is to limit the search complexity to one single color cluster (representing skin color) and to identify pixels based on their membership to this cluster. Several problems affect these approaches. First, skin colors are not easy to define for different ethnic groups under varying illumination conditions (Störring and Granum, 2002). Second, it is difficult to track individuals in a crowd of people if these individuals have similar skin colors and; in addition, a person cannot be identified based on skin color when seen from behind. Tracking clothes instead of skin is more appropriate in this situation (Roh et al., 2000). Third, color distributions are sensitive to occlusions, shadows, and changing illumination. Addressing the problem occurring with shadows and occlusions, Lu and Tan (2001) assume that the only moving objects in the scene are people. This assumption does not hold for many applications. Most of the approaches mentioned above cannot be easily extended to multi-colored objects other than people. In this paper, we present a general technique to track colored, non-rigid objects (including people).

A very efficient technique for the recognition of colored objects is color indexing (Swain and Ballard, 1991). An object in the image is assigned to an object stored in a database based on comparisons between color distributions. When applying color indexing to video tracking, the color distribution of the tracked object in frame i can be treated similar to the data stored in the database in “classical” indexing. In this context, tracking becomes the identification and localization of the object to be tracked in frame i+1 based on comparison with its color distribution in frame i. Several modifications of the color indexing algorithm have been proposed to make this technique more robust with regard to illumination changes (e.g. Adjeroh and Lee, 2001; Berens et al., 2000; Finlayson and Xu, 2002; Finlayson et al., 1996; Funt and Finlayson, 1995; Healey and Slater, 1994). However, this technique usually requires multiple views of the object to be recognized, which is not always ensured when the people to be tracked are in a road scene, for example. Furthermore, color indexing partly fails with partial occlusions of the object. ASMs do not need multiple views of an object, since by using energy functions they can be adapted to the silhouette of an object represented in the image. However, the outlier problem, which can occur particularly with partial object occlusion, represents a difficulty for these models.

Vandenbroucke et al. (1997) presented a snake-based approach for tracking soccer players. They used a supervised scheme to learn the jersey colors of each team in a hybrid color space. Based on the results of a color classification in the images, each player who is present in the images is modeled by a snake. In our approach, we do not apply color segmentation to the images. Update of the ASM position in the next frame is based on the minimization of energy functions in the color components. Vandenbroucke et al. (1997) assume that there is only a small change in illumination during the entire soccer game. In our approach, we assume that there is only a small change in illumination between two successive frames.

In addition to earlier results presented in (Koschan et al., 2002), we study the performance of a hierarchical technique in the RGB, YUV, and HSI color spaces. The contributions of the paper are

•
an extension of the ASM to color images by incorporating color information into the minimization of the energy functions,
•
a hierarchical implementation of the tracking scheme in a color image pyramid applying different color spaces, and
•
an investigation of the influence of the length of the search profiles and the number of landmark points on the results.

The remaining part of this paper is organized as follows. In Section 2, the fundamentals of ASMs are described. A hierarchical realization of the tracking algorithm is introduced in Section 3. In Section 4, we discuss the extension of ASMs to color images. Experimental results are provided in Section 5 and Section 6 concludes the paper.

Section snippets

Active shape models

Detecting the shape and position of the target is a fundamental task for tracking a non-rigid target in a video sequence. Two-dimensional deformable models typically use a boundary representation (deformable contour) to describe an object in the image. Within the class of deformable models, the ASM is one of the best-suited approaches in the sense of both accuracy and efficiency for applications where a priori information about the object (or more precisely about the shape of the object) in the

Hierarchical approach

Video tracking systems have to deal with variously shaped and sized input objects, which often results in a poor match of the initial model with an actual input shape. A hierarchical approach to ASMs is essential for video tracking systems to deal with such varying types of inputs. The idea of using pyramid models in image analysis was introduced by Tanimoto and Pavlidis (1975) as a solution to edge detection. One important property of the pyramid model is that it is computationally efficient

Extending ASMs to color image sequences

The fundamental difference between color images and gray level images is that in a color image, a color vector (which generally consists of three components) is assigned to a pixel, while a scalar gray value is assigned to a pixel of a gray level image. Thus, in color image processing vector-valued image functions are treated instead of scalar image functions (in gray level image processing). The techniques used for this can be subdivided on the basis of their principle procedures into two

Experimental results

We captured various indoor and outdoor image sequences using different cameras. Frames of various selected test image sequences are shown in Fig. 5. The sequence Man_1 was captured using a Nikon Coolpix 990 digital still camera. The original image was compressed by JPEG to a size of 640×480 pixels. We subsampled the original image to a size of 320×240 pixels in the experiments. The sequences Man_2, _6, and _9 were captured using a Sony 3-CCD DXC-930 video camera with a 7–112 mm zoom lens. The

Conclusions

A technique has been presented for recognizing and tracking a moving non-rigid object or person in a video sequence. The objective function for active shape models has been extended to color images. We have evaluated several different approaches for defining an objective function considering the information from the single components of the color image vectors. This tracking technique does not require a static camera (except to initialize the landmark points for the object to be recognized).

Acknowledgements

This work was supported by the University Research Program in Robotics under grant DOE-DE-FG02-86NE37968, by the DOD/TACOM/NAC/ARC Program, R01-1344-18, and by FAA/NSSA Program, R01-1344-48/49.

References (33)

A. Baumberg
Hierarchical shape fitting using an iterated linear filter
Image and Vision Comput.
(1998)
T.F. Cootes et al.
Active shape models––their training and application
Comput. Image and Vision Understanding
(1995)
F. Marqués et al.
Face segmentation and tracking based on connected operators and partition projection
Pattern Recognition
(2002)
S.J. McKenna et al.
Tracking colour objects using adaptive mixture models
Image and Vision Comput.
(1999)
M. Pardàs et al.
Motion estimation based tracking of active contours
Pattern Recognition Lett.
(2001)
R. Plänkers et al.
Tracking and modeling people in video sequences
Comput. Image and Vision Understanding
(2001)
P. Sozou et al.
A nonlinear generalization of point distribution models using polynomial regression
Image and Vision Comput.
(1995)
S. Tanimoto et al.
A hierarchical data structure for picture processing
Comput. Graphics Image Process.
(1975)
D.A. Adjeroh et al.
On ratio-based color indexing
IEEE Trans. Image Process.
(2001)
J. Berens et al.
Image indexing using compressed colour histogram
IEE Proc. Vision, Image Signal Process.
(2000)

A. Blake et al.

Active Contours

(1998)

Brock-Gunn, S.A., Dowling, G.R., Ellis, T.J., 1994. Tracking using colour information. In: Proc. Internat. Conf. on...

Comaniciu, D., Ramesh, V., 2000. Robust detection and tracking of human faces with an active camera. In: Proc. Visual...

Y. Deng et al.

Unsupervised segmentation of color-texture regions in images and video

IEEE Trans. Pattern Anal. Machine Intell.

(2001)

Finlayson, G.D., Xu, R., 2002. Non-iterative comprehensive normalization. In: Proc. 1st Europ. Conf. on Color Graphics,...

Finlayson, G.D., Chatterjee, S.S., Funt, B.V., 1996. Color angular indexing. In: Proc. 4th ECCV, Cambridge, England,...

Cited by (51)

Residual attention-based tracking-by-detection network with attention-driven data augmentation
2021, Journal of Visual Communication and Image Representation
Citation Excerpt :
However, integrated classifiers are limited by computational costs. Meanwhile, some feature extractors have been improved and combined to encode image features, such as colour features [5], texture features [6], and gradient histograms [7]. However, handcraft features in encoding are difficult to deal with especially considering rare appearance changes, which usually result in failure of the tracking task.
Tracking-by-detection (TBD) is a significant framework for visual object tracking. However, current trackers are usually updated online based on random sampling with a probability distribution. The performance of the learning-based TBD trackers is limited by the lack of discriminative features, especially when the background is full of semantic distractors. We propose an attention-driven data augmentation method, in which a residual attention mechanism is integrated into the TBD tracking network as supplementary references to identify discriminative image features. A mask generating network is used to simulate changes in target appearances to obtain positive samples, where attention information and image features are combined to identify discriminative features. In addition, we propose a method for mining hard negative samples, which searches for semantic distractors with the response of the attention module. The experiments on the OTB2015, UAV123, and LaSOT benchmarks show that this method achieves competitive performance in terms of accuracy and robustness.
Automatic visual detection of human behavior: A review from 2000 to 2014
2015, Expert Systems with Applications
Citation Excerpt :
The proposed framework offers robustness and can handle unrestricted camera angles. A comparable methodology is seen in Koschan, Kang, Paik, Abidi, and Abidi (2003) where the dynamic shape model is applied to discover a fit in the current frame. Atsushi, Hirokazu, Shinsaku, and Seiji (2002) used an ellipse to model the pose of human in the previous frame and foresee nine conceivable stances of the human into the current frame.
Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.
Using classifiers as heuristics to describe local structure in Active Shape Models with small training sets
2013, Pattern Recognition Letters
Citation Excerpt :
In this paper we simply take the mean of the estimations as a compound value. Other approaches may be used, such as taking the median of the estimations to cope with outliers as in Koschan et al. (2003) or using a multi-objective approach. Since we are particularly interested in training sets with few samples, we have created two small sets of images in order to test the approach.
Active Shape Models (ASM) are a successful image segmentation technique that is widely used by the image processing community. This technique is very appealing when the results of the segmentation are going to be used to perform some kind of classification, as it provides a mathematical model of the segmented contours. Nevertheless, little attention has been paid to the development of general local appearance models for small image training sets and most researchers have resorted to ad hoc solutions. In this paper we propose a heuristic approach to this problem. A general procedure for the use of heuristics to guide the ASM search algorithm and an implementation using machine learning classifiers is presented. This procedure is also extended to cope with multichannel images. Tests are carried out over small synthetic and real image datasets. The performance of this approach is compared to the most commonly used Mahalanobis appearance model and the simpler edge search strategy. The results show that the heuristic approach performs better than the other two procedures.
A spatio-temporal 2D-models framework for human pose recovery in monocular sequences
2008, Pattern Recognition
This paper addresses the pose recovery problem of a particular articulated object: the human body. In this model-based approach, the 2D-shape is associated to the corresponding stick figure allowing the joint segmentation and pose recovery of the subject observed in the scene. The main disadvantage of 2D-models is their restriction to the viewpoint. To cope with this limitation, local spatio-temporal 2D-models corresponding to many views of the same sequences are trained, concatenated and sorted in a global framework. Temporal and spatial constraints are then considered to build the probabilistic transition matrix (PTM) that gives a frame to frame estimation of the most probable local models to use during the fitting procedure, thus limiting the feature space. This approach takes advantage of 3D information avoiding the use of a complex 3D human model. The experiments carried out on both indoor and outdoor sequences have demonstrated the ability of this approach to adequately segment pedestrians and estimate their poses independently of the direction of motion during the sequence.
Video object segmentation and tracking using region-based statistics
2007, Signal Processing: Image Communication
Citation Excerpt :
This Region-Aided Geometric Snake (RAGS) method can be used with any region segmentation technique and it provides improvements over the standard geometric snake especially around weak edges and in noisy images. In [17], active shape models method [5] for non-rigid object tracking has been extended for color images and a hierarchical implementation has been proposed. The performance in the RGB color space was found to be better as compared to HSI and YUV color spaces.
Two new region-based methods for video object tracking using active contours are presented. The first method is based on the assumption that the color histogram of the tracked object is nearly stationary from frame to frame. The proposed method is based on minimizing the color histogram difference between the estimated objects at a reference frame and the current frame using a dynamic programming framework. The second method is defined for scenes where there is an out-of-focus blur difference between the object of interest and the background. In such scenes, the proposed “defocus energy” can be utilized for automatic segmentation of the object boundary, and it can be combined with the histogram method to track the object more efficiently. Experiments demonstrate that the proposed methods are successful in difficult scenes with significant background clutter.
Quadcopter Tracks Quadcopter via Real-Time Shape Fitting
2018, IEEE Robotics and Automation Letters

View all citing articles on Scopus

¹: Present address: The Graduate School of Advanced Imaging Science, Chung-Ang University, Seoul 156-756, South Korea.

View full text

Color active shape models for tracking non-rigid objects

Abstract

Introduction

Section snippets

Active shape models

Hierarchical approach

Extending ASMs to color image sequences

Experimental results

Conclusions

Acknowledgements

Image and Vision Comput.

Comput. Image and Vision Understanding

Pattern Recognition

Image and Vision Comput.

Pattern Recognition Lett.

Comput. Image and Vision Understanding

Image and Vision Comput.

Comput. Graphics Image Process.

On ratio-based color indexing

IEEE Trans. Image Process.

Image indexing using compressed colour histogram

IEE Proc. Vision, Image Signal Process.

Active Contours

Unsupervised segmentation of color-texture regions in images and video

IEEE Trans. Pattern Anal. Machine Intell.