Elsevier

Image and Vision Computing

Volume 21, Issue 10, September 2003, Pages 913-929
Image and Vision Computing

A real time adaptive visual surveillance system for tracking low-resolution colour targets in dynamically changing scenes

https://doi.org/10.1016/S0262-8856(03)00076-3Get rights and content

Abstract

This paper presents a variety of probabilistic models for tracking small-area targets which are common objects of interest in outdoor visual surveillance scenes. We address the problem of using appearance and motion models in classifying and tracking objects when detailed information of the object's appearance is not available. The approach relies upon motion, shape cues and colour information to help in associating objects temporally within a video stream. Unlike previous applications of colour and complex shape in object tracking, where relatively large-size targets are tracked, our method is designed to track small colour targets commonly found in outdoor visual surveillance. Our approach uses a robust background model based around online Expectation Maximisation to segment moving objects with very low false detection rates. The system also incorporates a shadow detection algorithm which helps alleviate standard environmental problems associated with such approaches. A colour transformation derived from anthropological studies to model colour distributions of low-resolution targets is used along with a probabilistic method of combining colour and motion information. A data association algorithm is applied to maintain tracking of multiple objects under circumstances. Simple shape information is employed to detect subtle interactions such as occlusion and camouflage. A novel guided search algorithm is then introduced to facilitate tracking of multiple objects during these events. This provides a robust visual tracking system which is capable of performing accurately and consistently within a real world visual surveillance arena. This paper shows the system successfully tracking multiple people moving independently and the ability of the approach to maintain trajectories in the presence of occlusions and background clutter.

Introduction

There has been extensive work on the subject of tracking multiple point targets, typically in the arena of radar tracking. The main components of which are the tracking process itself and data association. Tracking deals with maintaining motion models of the objects being tracked whereas data association uses the motion model which summaries all past measurements of a target to predict a position for the next time step. Data association is then responsible for matching or assigning measurements at the current time to targets. As a number of objects move independently, target observations may fall in other targets' predicted areas. False or undetected measurements further introduce ambiguity to this assignment problem. Unlike Radar tracking systems, visual-based systems require preprocessing to obtain measurements of a target motion state. Targets may be occluded by some stationary objects in the scene as well as by other targets. Occlusions do occur in Radar tracking systems, however, they are few and do not last long. Therefore, standard tracking and data association algorithms are sufficient. In visual surveillance, targets consisting of pedestrians and vehicles can have relatively slow non-linear motions. The occlusions tend to happen more frequently and last for longer periods. Standard tracking and data association algorithms may terminate their tracks sooner to reduce the chance of incorrect assignment (as the prediction uncertainty grows with time). In appearance-based tracking systems, a reliable model is required to facilitate tracking an object through background clutter that overcomes these aforementioned difficulties.

This paper addresses the problem of using appearance and motion models in tracking low-resolution colour objects. Instead of relying solely upon a motion model and maintaining multiple hypotheses, simple shape and colour information can be useful in data association resulting in reducing the number of hypotheses that need be supported. However, the number of pixels constituting an object can be too small to be able to build a reliable appearance model for either shape or colour tracking. An example of targets used in systems by many authors is shown in Fig. 1(a) while our method monitors targets in the scene similar to Fig. 1(b). In such cases, the colour distribution learnt from the scene is deemed unreliable due to the limited supporting evidence obtained from the scene. The number of pixels supporting the object is too few to train a complex shape or colour model. As the model becomes more complex, the number of required training samples increases exponentially. This not only leads to overfitting, but some algorithms may converge upon singular solutions leading to an unstable system. Under these circumstances, most systems treat the colour information as unreliable and use motion cues and simple shape features alone to track objects [10], [25], [18]. This paper utilises both simple shape and motion along with a colour model based on transformations derived from psychophysical studies [26], [24]. This transformation provides the ability to construct a simple colour profile based upon a small sample data set which overcomes colour consistency issues while providing sufficient discrimination to distinguish between different colours.

This paper is organised as follows: Section 2 provides a survey of related work. The system components are described in Section 3. 4 Object detection module, 5 Target tracking module outline the main parts of the tracker; object detection and object tracking modules. Experimental results are presented in Section 6 along with discussions and possible further improvements. It is followed by a conclusion in Section 7.

Section snippets

Related work

The topic of tracking non-rigid objects by appearance has been tackled using various image cues. Colour, motion, shape, depth are the common appearance modalities used in such work. As mentioned in Section 1, most of them designed to deal with relatively large-scale objects.

Birchfield [4] used colour cues and intensity gradients to control a camera's pan and tilt to track a human head around an untextured and static room where the head is modelled as an ellipse. The attributes of the ellipse

System overview

The system consists of two main parts, an object detection module and a target tracking module as show in Fig. 2. The object detection module deals with detecting moving objects from a stationary scene, eliminating shadows and removing spurious objects. The target tracking module takes detected objects from the current frame and matches them to the target models maintained in the target model library. The tracking process is performed in 2D; therefore, a geometric camera calibration is not

Object detection module

The module for object detection consists of three parts. First, each pixel in the input images is segmented into moving regions by a background subtraction method. The background subtraction uses a per pixel mixture if Gaussians for the reference image to compare with the current image. The outcome is fed into a shadow detection module to eliminate shadows from moving objects. The resulting binary image is then grouped into different objects by the foreground region detection module.

Target tracking module

The target tracking module deals with assigning foreground objects detected from the object detection module to models maintained in the target model library. It also handles situations such as new targets, lost or occluded targets, camouflaged targets and targets whose appearance merges with others. This task incorporates all available information to choose the best hypothesis to match. The following sections describe the target tracking model used in our system. It consists of data

Motion model

The co-ordinates of the object's centroid are modelled by a discrete-time kinematic model. Kalman filters are employed to maintain the state of the object. The centroid is modelled by a white noise acceleration model. This is to correspond with the assumptions that objects move with a near constant velocity. The object manoeuvring ability is encoded in the process covariance matrix as described below. The state equation for the centroid x co-ordinate is derived from the piecewise-constant white

Shape model

Simple shape information about the object is useful in identifying what type of tracking is applied for each detected foreground region in the current frame. It can be represented by the height and width of the minimum bounding box of the object. The extensive change of the shape size from an average size may indicate an object under camouflage or partial occlusion.

The average-height estimate, ĥ is maintained as follows (and similarly for width ŵ).ĥ(k+1)=ĥ(k)+β(k+1)(h(k+1)−ĥ(k))where β(m

Colour model

Most colour tracking systems model multi-coloured objects using a colour histogram or a mixture of Gaussian distributions. If only a small set of samples is available, most of the bins in the histogram are empty and the solution of fitting data to a mixture model tends to converge to a singular solution. One answer to this is to increase the bin size or limit the number of Gaussian components in the mixture. This leads to the question of how large the bin size should be or how many components

Experimental results

Some results of applying the proposed technique for tracking low-resolution targets are presented in this section.

Fig. 13(a) and (b) show the cumulative trajectories of objects tracked during one hour of operation from two cameras. The cameras were set to monitor two different locations of the campus. The tracker runs on a PC Pentium III 450 MHz at approximately 5 frames per second. It is evident that many spurious trajectories are constructed during the period. This is mainly due to

Conclusions

We have presented a method to track low-resolution moving objects mainly for outdoor surveillance applications, using colour, simple shape and motion information. The key strength of this method is the use of robust background modelling, the colour mapping used and a novel guided random search. The update equations used in the background subtraction method provide a model which can adapt to scene content and quickly converges upon a stable reference image. The addition of shadow suppression

References (28)

  • S.J. McKenna et al.

    Tracking groups of people

    CVIU

    (2000)
  • Y. Bar-Shalom et al.

    Estimation and Tracking: Principles, Techniques and Software

    (1993)
  • Y. Bar-Shalom et al.

    Multitarget–Multisensor Tracking: Principles and Techniques

    (1995)
  • B. Berlin et al.

    Basic Color Terms: Their Universality and Evolution

    (1991)
  • S.T. Birchfield

    Elliptical head tracking using intensity gradients and color histograms

    CVPR98

    (1998)
  • S. Blackman et al.

    Design and Analysis of Modern Tracking Systems

    (1999)
  • R. Bowden et al.

    Smart Graphics'02, Second International Symposium on Smart Graphics, ACM International Conference Proceedings Series, Hawthorn, NY, USA

    (2002)
  • G.R. Bradski

    Computer vision face tracking for use in a perceptual user interface

    Intel Technology Journal

    (1998)
  • G. Carpaneto et al.

    Solution of the assignment problem

    ACM Transactions on Mathematical Software

    (1980)
  • A.P. Dempster et al.

    Maximum-likelihood from incomplete data via the em algorithm

    Journal of the Royal Statistical Society Series B

    (1977)
  • W.E.L. Grimson et al.

    Using adaptive tracking to classify and monitor activities in a site

    CVPR98

    (1998)
  • I. Haritaoglu et al.

    W4: real-time surveillance of people and their activities

    PAMI

    (2000)
  • T. Horprasert et al.

    A statistical approach for real-time robust background subtraction and shadow detection

    Frame-Rate99 Workshop

    (1999)
  • M. Isard et al.

    Contour tracking by stochastic propagation of conditional density

    ECCV96

    (1996)
  • Cited by (110)

    • Features for stochastic approximation based foreground detection

      2015, Computer Vision and Image Understanding
      Citation Excerpt :

      On one hand, most proposals use the RGB pixel values as inputs [34]. On the other hand, some well established and frequently used background modeling algorithms use the same variance for all the input variables [17,41,51], although there is no fundamental reason not to use the full covariance matrix. In other words, the use of full covariances is an option which exists in theory but is very rarely implemented in practice.

    View all citing articles on Scopus
    View full text