A real time adaptive visual surveillance system for tracking low-resolution colour targets in dynamically changing scenes
Introduction
There has been extensive work on the subject of tracking multiple point targets, typically in the arena of radar tracking. The main components of which are the tracking process itself and data association. Tracking deals with maintaining motion models of the objects being tracked whereas data association uses the motion model which summaries all past measurements of a target to predict a position for the next time step. Data association is then responsible for matching or assigning measurements at the current time to targets. As a number of objects move independently, target observations may fall in other targets' predicted areas. False or undetected measurements further introduce ambiguity to this assignment problem. Unlike Radar tracking systems, visual-based systems require preprocessing to obtain measurements of a target motion state. Targets may be occluded by some stationary objects in the scene as well as by other targets. Occlusions do occur in Radar tracking systems, however, they are few and do not last long. Therefore, standard tracking and data association algorithms are sufficient. In visual surveillance, targets consisting of pedestrians and vehicles can have relatively slow non-linear motions. The occlusions tend to happen more frequently and last for longer periods. Standard tracking and data association algorithms may terminate their tracks sooner to reduce the chance of incorrect assignment (as the prediction uncertainty grows with time). In appearance-based tracking systems, a reliable model is required to facilitate tracking an object through background clutter that overcomes these aforementioned difficulties.
This paper addresses the problem of using appearance and motion models in tracking low-resolution colour objects. Instead of relying solely upon a motion model and maintaining multiple hypotheses, simple shape and colour information can be useful in data association resulting in reducing the number of hypotheses that need be supported. However, the number of pixels constituting an object can be too small to be able to build a reliable appearance model for either shape or colour tracking. An example of targets used in systems by many authors is shown in Fig. 1(a) while our method monitors targets in the scene similar to Fig. 1(b). In such cases, the colour distribution learnt from the scene is deemed unreliable due to the limited supporting evidence obtained from the scene. The number of pixels supporting the object is too few to train a complex shape or colour model. As the model becomes more complex, the number of required training samples increases exponentially. This not only leads to overfitting, but some algorithms may converge upon singular solutions leading to an unstable system. Under these circumstances, most systems treat the colour information as unreliable and use motion cues and simple shape features alone to track objects [10], [25], [18]. This paper utilises both simple shape and motion along with a colour model based on transformations derived from psychophysical studies [26], [24]. This transformation provides the ability to construct a simple colour profile based upon a small sample data set which overcomes colour consistency issues while providing sufficient discrimination to distinguish between different colours.
This paper is organised as follows: Section 2 provides a survey of related work. The system components are described in Section 3. 4 Object detection module, 5 Target tracking module outline the main parts of the tracker; object detection and object tracking modules. Experimental results are presented in Section 6 along with discussions and possible further improvements. It is followed by a conclusion in Section 7.
Section snippets
Related work
The topic of tracking non-rigid objects by appearance has been tackled using various image cues. Colour, motion, shape, depth are the common appearance modalities used in such work. As mentioned in Section 1, most of them designed to deal with relatively large-scale objects.
Birchfield [4] used colour cues and intensity gradients to control a camera's pan and tilt to track a human head around an untextured and static room where the head is modelled as an ellipse. The attributes of the ellipse
System overview
The system consists of two main parts, an object detection module and a target tracking module as show in Fig. 2. The object detection module deals with detecting moving objects from a stationary scene, eliminating shadows and removing spurious objects. The target tracking module takes detected objects from the current frame and matches them to the target models maintained in the target model library. The tracking process is performed in 2D; therefore, a geometric camera calibration is not
Object detection module
The module for object detection consists of three parts. First, each pixel in the input images is segmented into moving regions by a background subtraction method. The background subtraction uses a per pixel mixture if Gaussians for the reference image to compare with the current image. The outcome is fed into a shadow detection module to eliminate shadows from moving objects. The resulting binary image is then grouped into different objects by the foreground region detection module.
Target tracking module
The target tracking module deals with assigning foreground objects detected from the object detection module to models maintained in the target model library. It also handles situations such as new targets, lost or occluded targets, camouflaged targets and targets whose appearance merges with others. This task incorporates all available information to choose the best hypothesis to match. The following sections describe the target tracking model used in our system. It consists of data
Motion model
The co-ordinates of the object's centroid are modelled by a discrete-time kinematic model. Kalman filters are employed to maintain the state of the object. The centroid is modelled by a white noise acceleration model. This is to correspond with the assumptions that objects move with a near constant velocity. The object manoeuvring ability is encoded in the process covariance matrix as described below. The state equation for the centroid co-ordinate is derived from the piecewise-constant white
Shape model
Simple shape information about the object is useful in identifying what type of tracking is applied for each detected foreground region in the current frame. It can be represented by the height and width of the minimum bounding box of the object. The extensive change of the shape size from an average size may indicate an object under camouflage or partial occlusion.
The average-height estimate, ĥ is maintained as follows (and similarly for width where β(m
Colour model
Most colour tracking systems model multi-coloured objects using a colour histogram or a mixture of Gaussian distributions. If only a small set of samples is available, most of the bins in the histogram are empty and the solution of fitting data to a mixture model tends to converge to a singular solution. One answer to this is to increase the bin size or limit the number of Gaussian components in the mixture. This leads to the question of how large the bin size should be or how many components
Experimental results
Some results of applying the proposed technique for tracking low-resolution targets are presented in this section.
Fig. 13(a) and (b) show the cumulative trajectories of objects tracked during one hour of operation from two cameras. The cameras were set to monitor two different locations of the campus. The tracker runs on a PC Pentium III 450 MHz at approximately 5 frames per second. It is evident that many spurious trajectories are constructed during the period. This is mainly due to
Conclusions
We have presented a method to track low-resolution moving objects mainly for outdoor surveillance applications, using colour, simple shape and motion information. The key strength of this method is the use of robust background modelling, the colour mapping used and a novel guided random search. The update equations used in the background subtraction method provide a model which can adapt to scene content and quickly converges upon a stable reference image. The addition of shadow suppression
References (28)
- et al.
Tracking groups of people
CVIU
(2000) - et al.
Estimation and Tracking: Principles, Techniques and Software
(1993) - et al.
Multitarget–Multisensor Tracking: Principles and Techniques
(1995) - et al.
Basic Color Terms: Their Universality and Evolution
(1991) Elliptical head tracking using intensity gradients and color histograms
CVPR98
(1998)- et al.
Design and Analysis of Modern Tracking Systems
(1999) - et al.
Smart Graphics'02, Second International Symposium on Smart Graphics, ACM International Conference Proceedings Series, Hawthorn, NY, USA
(2002) Computer vision face tracking for use in a perceptual user interface
Intel Technology Journal
(1998)- et al.
Solution of the assignment problem
ACM Transactions on Mathematical Software
(1980) - et al.
Maximum-likelihood from incomplete data via the em algorithm
Journal of the Royal Statistical Society Series B
(1977)
Using adaptive tracking to classify and monitor activities in a site
CVPR98
W4: real-time surveillance of people and their activities
PAMI
A statistical approach for real-time robust background subtraction and shadow detection
Frame-Rate99 Workshop
Contour tracking by stochastic propagation of conditional density
ECCV96
Cited by (110)
New trends on moving object detection in video images captured by a moving camera: A survey
2018, Computer Science ReviewA probabilistic soft alert method for abnormal glycemic event by quantitative analysis of prediction uncertainty for type 1 diabetes
2018, Chemometrics and Intelligent Laboratory SystemsFeatures for stochastic approximation based foreground detection
2015, Computer Vision and Image UnderstandingCitation Excerpt :On one hand, most proposals use the RGB pixel values as inputs [34]. On the other hand, some well established and frequently used background modeling algorithms use the same variance for all the input variables [17,41,51], although there is no fundamental reason not to use the full covariance matrix. In other words, the use of full covariances is an option which exists in theory but is very rarely implemented in practice.
Calculating the Traffic Density of Real-Time Video Using Moving Object Detection
2023, Cognitive Science and Technology