Color active shape models for tracking non-rigid objects

https://doi.org/10.1016/S0167-8655(02)00330-6Get rights and content

Abstract

Active shape models can be applied to tracking non-rigid objects in video image sequences. Traditionally these models do not include color information in their formulation. In this paper, we present a hierarchical realization of an enhanced active shape model for color video tracking and we study the performance of both hierarchical and non-hierarchical implementations in the RGB, YUV, and HSI color spaces.

Introduction

The problem of tracking people and recognizing their actions in video sequences is of increasing importance to many applications (Haritaoglu et al., 2000; McKenna et al., 1999; Plänkers and Fua, 2001). Examples include video surveillance, human computer interaction, and motion capture for animation, to name a few. Special considerations for digital image processing are required when tracking objects whose forms (and/or their silhouettes) change between consecutive frames. For example, cyclists in a road scene and people in an airport terminal belong to this class of objects denoted as non-rigid objects. Active shape models (ASMs) can be applied to the tracking of non-rigid objects in a video sequence. Most existing ASMs do not consider color information (Pardàs and Sayrol, 2001). We present several extensions of the ASM for color images using different color-adapted objective functions.

In this paper, tracking an object means to identify the object in a video sequence and to calculate its position in every image frame during the analysis of successive images. Using color information as a feature to describe a moving object or person can support these tasks. Brock-Gunn et al. (1994) suggested the use of four-dimensional templates for tracking objects in color image sequences. However, if the observation is accomplished over a long period of time and with many single objects, then both the memory requirements for the templates in the database and the time requirements for the search of a template in the database increase. Deng and Manjunath (2001) used a segmentation scheme based on color quantization to track regions. The regions represent image areas of similar colors and they do not necessarily represent objects. Their approach also differs from other techniques in that it does not estimate exact object motion. In contrast to this approach, active contour models have been investigated for interactive interpretation of features in an image by Kass et al. (1988). The active contour model is an energy-minimizing spline, which is pulled toward features such as lines and edges. This model has been considered as a framework for low-level feature interpretation, such as stereo matching and motion tracking. A more comprehensive description and several applications of the active contour model are given by Blake and Isard (1998). The ASM is a compact model for which the form variety and the color distribution of an object class can both be taught in a training phase (Cootes et al., 1995). Compactness of the ASM results from principal component analysis (PCA) and a priori shape information from the training set.

Several systems use skin color information for tracking faces and hands (e.g. Comaniciu and Ramesh, 2000; Kim et al., 2001; Lee et al., 2001; Li et al., 2000; Marqués and Vilaplana, 2002). The basic idea is to limit the search complexity to one single color cluster (representing skin color) and to identify pixels based on their membership to this cluster. Several problems affect these approaches. First, skin colors are not easy to define for different ethnic groups under varying illumination conditions (Störring and Granum, 2002). Second, it is difficult to track individuals in a crowd of people if these individuals have similar skin colors and; in addition, a person cannot be identified based on skin color when seen from behind. Tracking clothes instead of skin is more appropriate in this situation (Roh et al., 2000). Third, color distributions are sensitive to occlusions, shadows, and changing illumination. Addressing the problem occurring with shadows and occlusions, Lu and Tan (2001) assume that the only moving objects in the scene are people. This assumption does not hold for many applications. Most of the approaches mentioned above cannot be easily extended to multi-colored objects other than people. In this paper, we present a general technique to track colored, non-rigid objects (including people).

A very efficient technique for the recognition of colored objects is color indexing (Swain and Ballard, 1991). An object in the image is assigned to an object stored in a database based on comparisons between color distributions. When applying color indexing to video tracking, the color distribution of the tracked object in frame i can be treated similar to the data stored in the database in “classical” indexing. In this context, tracking becomes the identification and localization of the object to be tracked in frame i+1 based on comparison with its color distribution in frame i. Several modifications of the color indexing algorithm have been proposed to make this technique more robust with regard to illumination changes (e.g. Adjeroh and Lee, 2001; Berens et al., 2000; Finlayson and Xu, 2002; Finlayson et al., 1996; Funt and Finlayson, 1995; Healey and Slater, 1994). However, this technique usually requires multiple views of the object to be recognized, which is not always ensured when the people to be tracked are in a road scene, for example. Furthermore, color indexing partly fails with partial occlusions of the object. ASMs do not need multiple views of an object, since by using energy functions they can be adapted to the silhouette of an object represented in the image. However, the outlier problem, which can occur particularly with partial object occlusion, represents a difficulty for these models.

Vandenbroucke et al. (1997) presented a snake-based approach for tracking soccer players. They used a supervised scheme to learn the jersey colors of each team in a hybrid color space. Based on the results of a color classification in the images, each player who is present in the images is modeled by a snake. In our approach, we do not apply color segmentation to the images. Update of the ASM position in the next frame is based on the minimization of energy functions in the color components. Vandenbroucke et al. (1997) assume that there is only a small change in illumination during the entire soccer game. In our approach, we assume that there is only a small change in illumination between two successive frames.

In addition to earlier results presented in (Koschan et al., 2002), we study the performance of a hierarchical technique in the RGB, YUV, and HSI color spaces. The contributions of the paper are

  • an extension of the ASM to color images by incorporating color information into the minimization of the energy functions,

  • a hierarchical implementation of the tracking scheme in a color image pyramid applying different color spaces, and

  • an investigation of the influence of the length of the search profiles and the number of landmark points on the results.


The remaining part of this paper is organized as follows. In Section 2, the fundamentals of ASMs are described. A hierarchical realization of the tracking algorithm is introduced in Section 3. In Section 4, we discuss the extension of ASMs to color images. Experimental results are provided in Section 5 and Section 6 concludes the paper.

Section snippets

Active shape models

Detecting the shape and position of the target is a fundamental task for tracking a non-rigid target in a video sequence. Two-dimensional deformable models typically use a boundary representation (deformable contour) to describe an object in the image. Within the class of deformable models, the ASM is one of the best-suited approaches in the sense of both accuracy and efficiency for applications where a priori information about the object (or more precisely about the shape of the object) in the

Hierarchical approach

Video tracking systems have to deal with variously shaped and sized input objects, which often results in a poor match of the initial model with an actual input shape. A hierarchical approach to ASMs is essential for video tracking systems to deal with such varying types of inputs. The idea of using pyramid models in image analysis was introduced by Tanimoto and Pavlidis (1975) as a solution to edge detection. One important property of the pyramid model is that it is computationally efficient

Extending ASMs to color image sequences

The fundamental difference between color images and gray level images is that in a color image, a color vector (which generally consists of three components) is assigned to a pixel, while a scalar gray value is assigned to a pixel of a gray level image. Thus, in color image processing vector-valued image functions are treated instead of scalar image functions (in gray level image processing). The techniques used for this can be subdivided on the basis of their principle procedures into two

Experimental results

We captured various indoor and outdoor image sequences using different cameras. Frames of various selected test image sequences are shown in Fig. 5. The sequence Man_1 was captured using a Nikon Coolpix 990 digital still camera. The original image was compressed by JPEG to a size of 640×480 pixels. We subsampled the original image to a size of 320×240 pixels in the experiments. The sequences Man_2, _6, and _9 were captured using a Sony 3-CCD DXC-930 video camera with a 7–112 mm zoom lens. The

Conclusions

A technique has been presented for recognizing and tracking a moving non-rigid object or person in a video sequence. The objective function for active shape models has been extended to color images. We have evaluated several different approaches for defining an objective function considering the information from the single components of the color image vectors. This tracking technique does not require a static camera (except to initialize the landmark points for the object to be recognized).

Acknowledgements

This work was supported by the University Research Program in Robotics under grant DOE-DE-FG02-86NE37968, by the DOD/TACOM/NAC/ARC Program, R01-1344-18, and by FAA/NSSA Program, R01-1344-48/49.

References (33)

  • A. Blake et al.

    Active Contours

    (1998)
  • Brock-Gunn, S.A., Dowling, G.R., Ellis, T.J., 1994. Tracking using colour information. In: Proc. Internat. Conf. on...
  • Comaniciu, D., Ramesh, V., 2000. Robust detection and tracking of human faces with an active camera. In: Proc. Visual...
  • Y. Deng et al.

    Unsupervised segmentation of color-texture regions in images and video

    IEEE Trans. Pattern Anal. Machine Intell.

    (2001)
  • Finlayson, G.D., Xu, R., 2002. Non-iterative comprehensive normalization. In: Proc. 1st Europ. Conf. on Color Graphics,...
  • Finlayson, G.D., Chatterjee, S.S., Funt, B.V., 1996. Color angular indexing. In: Proc. 4th ECCV, Cambridge, England,...
  • Cited by (51)

    • Residual attention-based tracking-by-detection network with attention-driven data augmentation

      2021, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      However, integrated classifiers are limited by computational costs. Meanwhile, some feature extractors have been improved and combined to encode image features, such as colour features [5], texture features [6], and gradient histograms [7]. However, handcraft features in encoding are difficult to deal with especially considering rare appearance changes, which usually result in failure of the tracking task.

    • Automatic visual detection of human behavior: A review from 2000 to 2014

      2015, Expert Systems with Applications
      Citation Excerpt :

      The proposed framework offers robustness and can handle unrestricted camera angles. A comparable methodology is seen in Koschan, Kang, Paik, Abidi, and Abidi (2003) where the dynamic shape model is applied to discover a fit in the current frame. Atsushi, Hirokazu, Shinsaku, and Seiji (2002) used an ellipse to model the pose of human in the previous frame and foresee nine conceivable stances of the human into the current frame.

    • Using classifiers as heuristics to describe local structure in Active Shape Models with small training sets

      2013, Pattern Recognition Letters
      Citation Excerpt :

      In this paper we simply take the mean of the estimations as a compound value. Other approaches may be used, such as taking the median of the estimations to cope with outliers as in Koschan et al. (2003) or using a multi-objective approach. Since we are particularly interested in training sets with few samples, we have created two small sets of images in order to test the approach.

    • Video object segmentation and tracking using region-based statistics

      2007, Signal Processing: Image Communication
      Citation Excerpt :

      This Region-Aided Geometric Snake (RAGS) method can be used with any region segmentation technique and it provides improvements over the standard geometric snake especially around weak edges and in noisy images. In [17], active shape models method [5] for non-rigid object tracking has been extended for color images and a hierarchical implementation has been proposed. The performance in the RGB color space was found to be better as compared to HSI and YUV color spaces.

    • Quadcopter Tracks Quadcopter via Real-Time Shape Fitting

      2018, IEEE Robotics and Automation Letters
    View all citing articles on Scopus
    1

    Present address: The Graduate School of Advanced Imaging Science, Chung-Ang University, Seoul 156-756, South Korea.

    View full text