Combining patch matching and detection for robust pedestrian tracking in monocular calibrated cameras

doi:10.1016/j.patrec.2013.08.031

Pattern Recognition Letters

Volume 39, 1 April 2014, Pages 11-20

https://doi.org/10.1016/j.patrec.2013.08.031 Get rights and content

Highlights

•
We found a way of using the information of a pedestrian detector in a WVMF framework.
•
Motion prediction and detections can help the tracker recover from failure.
•
We compare different methods for pedestrian tracking in publicly available videos.
•
Our method outperforms some of the current state-of-the-art trackers.

Abstract

This paper presents a new approach for tracking multiple people in monocular calibrated cameras combining patch matching and pedestrian detection. Initially, background removal and pedestrian detection are used in conjunction with the vertical standing hypothesis to initialize the targets with multiples patches. In the tracking step, each patch related to a given target is matched individually across frames, and their translation vectors are combined robustly with pedestrian detection results in the world coordinate frame using weighted vector median filters. Additionally, the algorithm uses the camera parameters to both estimate the person scale in a straightforward manner and to limit the search region used to track each fragment. Our experimental results indicate that our tracker can deal with occlusions and video sequences with strong appearance variations, presenting results comparable to or better than existing state-of-the-art algorithms.

Introduction

Object tracking is an active research topic in the computer vision community. In particular, pedestrian tracking represents an important task in a wide range of applications, such as video analytics, surveillance and analysis of athletic performance in collective sports.

The development of a complete framework for pedestrian tracking involves a variety of challenges. In the initialization step, each new pedestrian that enters the scene must be detected. In general, background removal and/or pedestrian detection algorithms are used in this stage, and they are strongly affected by illumination changes, shadows and varying poses. The tracking phase consists of constantly localizing a pedestrian across time, and it is a complex task due to several reasons: people can move fast and unpredictably, the subject’s appearance can change throughout the sequence and the video can present noise and blur, among others. In a common surveillance scenario, the problem can be even harder because of the many occlusions (between a group of people or the person and scene) that can occur. Additionally, these systems often require on-the-fly execution, so non-causal methods that use future information to estimate the current state are not well suited for these applications.

Instead of considering the target as a whole, some approaches split the target into multiple fragments (Adam et al., 2006, Dihl et al., 2011, Führ and Jung, 2012). These approaches have been shown to increase robustness in the presence of partial occlusions, since non-occluded fragments can be matched correctly. This paper extends the fragments-based approach described in Führ and Jung (2012) for pedestrian tracking, by including several new features:

•
Multiple people tracking: in this paper, we present an automatic procedure for target initialization based on background removal, pedestrian detection and a vertical standing hypothesis, so that several targets can be detected and tracked simultaneously.
•
Use of patch matching and pedestrian detection to improve robustness: the information from a pedestrian detection algorithm is included in the tracker, increasing accuracy and helping to recover the target after occlusions.
•
Inclusion of a predicted position based on the history of movement of each target, which provides smoother tracks and helps the tracker during occlusions.

The remainder of this paper is organized as follows. Section 2 reviews some relevant work on object tracking, focusing on pedestrians. The proposed approach is described in Section 3, and the experimental validation is presented in Section 4. Finally, conclusions are drawn in Section 5.

Section snippets

Related work

Object tracking is an active research topic in computer vision, and there is a great variety of approaches to tackle the problem. This review will focus on pedestrian and/or multi-target tracking, the reader can refer to Yilmaz et al. (2006) for a comprehensive review and taxonomy of generic tracking algorithms. Another useful reference is the survey paper by Enzweiler and Gavrila (2009), which covers the problem of pedestrian detection and tracking using monocular cameras.

A common strategy for

The proposed method

The proposed approach consists of initially detecting the targets (pedestrians), and representing each target as a set of patches. The patches related to each pedestrian are then tracked individually, and their motion patterns are combined in a robust manner in the WCS using a weighted vector median filter (WVMF). A predicted motion vector and a people detector are also included in the tracking framework to improve accuracy and to better handle occlusions. The steps of the proposed method are

Experiments

In this section we present several experimental results obtained with the proposed algorithm. Validation was performed qualitatively, by visual inspection of tracking results, and also quantitatively, by computing tracking errors using ground truth data. We compared our approach to the FragTrack¹ algorithm (Adam et al., 2006) and the TLD² tracker (Kalal et al.,

Conclusions

In this work we presented a robust approach to multiple pedestrian tracking using monocular calibrated cameras. Our method explores the motion of independently tracked patches, extracted for each pedestrian, and combines these results with a predicted motion vector and the result of a pedestrian detector in a robust manner using a weighted median filter vector.

Our experimental results indicate that the proposed tracker is able to handle short-term occlusions and scale changes. Also, it presents

Acknowledgments

The authors would like to thank Brazilian agencies CNPq and Capes for supporting this work.

References (25)

A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in: IEEE Conference on...
J. Astola et al.
Vector median filters
Proceedings of the IEEE
(1990)
O. Barnich et al.
ViBE: a universal background subtraction algorithm for video sequences
IEEE Transactions on Image Processing
(2011)
R. Benenson, M. Mathias, R. Timofte, L. Van Gool, Pedestrian detection at 100 frames per second, in: IEEE Conference on...
B. Benfold, I. Reid, Stable multi-target tracking in real-time surveillance video, in: IEEE Conference on Computer...
K. Bernardin et al.
Evaluating multiple object tracking performance: the CLEAR MOT metrics
EURASIP Journal on Image and Video Processing
(2008)
M. Breitenstein et al.
Online multiperson tracking-by-detection from a single, uncalibrated camera
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2011)
W. Choi, S. Savarese, Multiple target tracking in world coordinate with single, minimally calibrated camera, in:...
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: IEEE Conference on Computer Vision and...
L.L. Dihl, C.R. Jung, J.C. Bins, Robust adaptive patch-based object tracking using weighted vector median filters, in:...

P. Dollár, S. Belongie, P. Perona, The fastest pedestrian detector in the west, in: British Machine Vision Conference,...

M. Enzweiler et al.

Monocular pedestrian detection: survey and experiments

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2009)

Cited by (26)

Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera
2021, Expert Systems with Applications
Citation Excerpt :
Datasets of single objects are provided in the visual tracker benchmark1 (Wu et al., 2013). Online multiple object tracking methods are more complex than single object tracking, with complicated target registration, model update and track termination (Führ & Jung, 2014; Joshi & Thakore, 2012; Lavee, Rivlin, & Rudzsky, 2009; Possegger, Mauthner, Roth, & Bischof, 2014; Wang, Yoon, & Park, 2017). In Xue, Liu, Cai, and He (2016), authors considered the problem of human tracking in RGBD videos filmed by sensors such as MS Kinect and Primesense, aiming to track persons where the crowd of people is known in advance or all persons in the video have appeared in the very beginning.
Multiple pedestrian tracking in video surveillance is still a pressing challenge, especially under static and dynamic occlusions and target appearance variations. Considering these complex environments in video surveillance, a multiple pedestrian tracking system with special processing procedures is proposed in this paper. In the proposed tracking system, pedestrian candidates are detected on each frame and registered as the tracked targets or associated with existing targets when their situations are suitable. The registered pedestrian targets are tracked frame by frame and terminated when the termination criteria are satisfied. In order to distinguish these target individuals, multi-sample adaptive modeling (MSAM) is proposed, which is used to adapt to a new target’s unpredictable pose variation. Furthermore, static occlusions are annotated for each scene, which may occlude pedestrians in the annotated regions. These occluded targets are modified by the assigned rules and treated differently in the process of target association. Aiming to enhance the effect of target association, each target’s location on the current frame is predicted with information on the previous frames using a Kalman filter. The predicted location is regarded as the center of the search region of the corresponding target. The experimental results show that the proposed tracker achieves the best performance among the five state-of-the-art trackers on three publicly available databases.
Assessment of radiation dose received by nuclear plant personnel through a video-based surveillance system
2018, Progress in Nuclear Energy
Citation Excerpt :
In the proposed system, a minimum threshold of 40 percent of foreground area within a given ROI was applied. Further validations were also adopted, similarly to in (Führ and Jung, 2014): (i) aspect ratio of the ROI in the image plane should be in the range 0.2–0.6; and (ii) persons’ heights in the WCS should be in the range from 1.5 to 2.5 m. The foregrounds obtained with GMM were improved by using mathematical morphological operations (opening and closing), (González and Woods, 2007).
This work aims at developing a video-based surveillance system for safety purposes in nuclear plants. The objective is to assess the radiation dose received by nuclear plant personnel, while they execute daily tasks, by means of computer vision methods. The system is conceived to provide some redundancy to the radioprotection means currently in use, being independent and complementary to them. After evaluating some methods from the literature for automatic target detection and tracking, a novel system is developed to correctly detect, track and identify people, so that the radiation dose received by each person is reliably computed. The video data are supplied by cameras installed in the nuclear plant room. Radiation dose rate mapping is combined with the tracking results to account for the received doses. We provide experimental results from a research reactor room, which show that the proposed system achieves radiation dose estimates that are in general similar to the ones of the ground truth. The database developed in this work for performance evaluation has been made publicly available for the research community.
Online multi-object tracking by detection based on generative appearance models
2016, Computer Vision and Image Understanding
Citation Excerpt :
We evaluate our MOT approach by a comparison to recent state-of-the-art algorithms. Among the compared approaches, a first category studied MOT with the aim of improving detection responses using model-free tracker Breitenstein et al. (2011) Milan et al. (2013), a second category aimed to ameliorate the data association technique Andriyenko and Schindler (2011) Segal and Reid (2013) Berclaz et al. (2006), and a third category aimed to improve the appearance model Yang et al. (2009a) Führ and Jung (2014) Riahi and Bilodeau (2014). The results, when available, are obtained from the authors’ papers.
This paper presents a robust online multiple object tracking (MOT) approach based on multiple features. Our approach is able to handle MOT problems, like long-term and heavy occlusions and close similarity between target appearance models. The proposed MOT algorithm is based on the concept of multi-feature fusion. It selects the best position of the tracked target by using a robust appearance model representation. The appearance model of a target is built with a color model, a sparse appearance model, a motion model and a spatial information model. In order to select the optimal candidate (detection response) of the target, we calculate a linear affinity function that integrates similarity scores coming from each feature. In our MOT system, we formulate the problem as a data association problem between a set of detections and a set of targets according to their joint probability values. The proposed method has been evaluated on public video sequences. Compared with the state-of-the-art, we demonstrate that our MOT framework achieves competitive results and is capable of handling several challenging problems.
Multiple Pedestrian Tracking with Graph Attention Map on Urban Road Scene
2023, IEEE Transactions on Intelligent Transportation Systems
Multiple pedestrian tracking with occlusion handling in high-density crowds
2019, Journal of Advanced Research in Dynamical and Control Systems
Low-cost multiple object tracking for embedded vision applications
2019, Turkish Journal of Electrical Engineering and Computer Sciences

View all citing articles on Scopus

View full text

Combining patch matching and detection for robust pedestrian tracking in monocular calibrated cameras

Highlights

Abstract

Introduction

Section snippets

Related work

The proposed method

Experiments

Conclusions

Acknowledgments

Vector median filters

Proceedings of the IEEE

ViBE: a universal background subtraction algorithm for video sequences

IEEE Transactions on Image Processing

Evaluating multiple object tracking performance: the CLEAR MOT metrics

EURASIP Journal on Image and Video Processing

Online multiperson tracking-by-detection from a single, uncalibrated camera

IEEE Transactions on Pattern Analysis and Machine Intelligence

Monocular pedestrian detection: survey and experiments

IEEE Transactions on Pattern Analysis and Machine Intelligence