Elsevier

Pattern Recognition Letters

Volume 39, 1 April 2014, Pages 11-20
Pattern Recognition Letters

Combining patch matching and detection for robust pedestrian tracking in monocular calibrated cameras

https://doi.org/10.1016/j.patrec.2013.08.031Get rights and content

Highlights

  • We found a way of using the information of a pedestrian detector in a WVMF framework.

  • Motion prediction and detections can help the tracker recover from failure.

  • We compare different methods for pedestrian tracking in publicly available videos.

  • Our method outperforms some of the current state-of-the-art trackers.

Abstract

This paper presents a new approach for tracking multiple people in monocular calibrated cameras combining patch matching and pedestrian detection. Initially, background removal and pedestrian detection are used in conjunction with the vertical standing hypothesis to initialize the targets with multiples patches. In the tracking step, each patch related to a given target is matched individually across frames, and their translation vectors are combined robustly with pedestrian detection results in the world coordinate frame using weighted vector median filters. Additionally, the algorithm uses the camera parameters to both estimate the person scale in a straightforward manner and to limit the search region used to track each fragment. Our experimental results indicate that our tracker can deal with occlusions and video sequences with strong appearance variations, presenting results comparable to or better than existing state-of-the-art algorithms.

Introduction

Object tracking is an active research topic in the computer vision community. In particular, pedestrian tracking represents an important task in a wide range of applications, such as video analytics, surveillance and analysis of athletic performance in collective sports.

The development of a complete framework for pedestrian tracking involves a variety of challenges. In the initialization step, each new pedestrian that enters the scene must be detected. In general, background removal and/or pedestrian detection algorithms are used in this stage, and they are strongly affected by illumination changes, shadows and varying poses. The tracking phase consists of constantly localizing a pedestrian across time, and it is a complex task due to several reasons: people can move fast and unpredictably, the subject’s appearance can change throughout the sequence and the video can present noise and blur, among others. In a common surveillance scenario, the problem can be even harder because of the many occlusions (between a group of people or the person and scene) that can occur. Additionally, these systems often require on-the-fly execution, so non-causal methods that use future information to estimate the current state are not well suited for these applications.

Instead of considering the target as a whole, some approaches split the target into multiple fragments (Adam et al., 2006, Dihl et al., 2011, Führ and Jung, 2012). These approaches have been shown to increase robustness in the presence of partial occlusions, since non-occluded fragments can be matched correctly. This paper extends the fragments-based approach described in Führ and Jung (2012) for pedestrian tracking, by including several new features:

  • Multiple people tracking: in this paper, we present an automatic procedure for target initialization based on background removal, pedestrian detection and a vertical standing hypothesis, so that several targets can be detected and tracked simultaneously.

  • Use of patch matching and pedestrian detection to improve robustness: the information from a pedestrian detection algorithm is included in the tracker, increasing accuracy and helping to recover the target after occlusions.

  • Inclusion of a predicted position based on the history of movement of each target, which provides smoother tracks and helps the tracker during occlusions.

The remainder of this paper is organized as follows. Section 2 reviews some relevant work on object tracking, focusing on pedestrians. The proposed approach is described in Section 3, and the experimental validation is presented in Section 4. Finally, conclusions are drawn in Section 5.

Section snippets

Related work

Object tracking is an active research topic in computer vision, and there is a great variety of approaches to tackle the problem. This review will focus on pedestrian and/or multi-target tracking, the reader can refer to Yilmaz et al. (2006) for a comprehensive review and taxonomy of generic tracking algorithms. Another useful reference is the survey paper by Enzweiler and Gavrila (2009), which covers the problem of pedestrian detection and tracking using monocular cameras.

A common strategy for

The proposed method

The proposed approach consists of initially detecting the targets (pedestrians), and representing each target as a set of patches. The patches related to each pedestrian are then tracked individually, and their motion patterns are combined in a robust manner in the WCS using a weighted vector median filter (WVMF). A predicted motion vector and a people detector are also included in the tracking framework to improve accuracy and to better handle occlusions. The steps of the proposed method are

Experiments

In this section we present several experimental results obtained with the proposed algorithm. Validation was performed qualitatively, by visual inspection of tracking results, and also quantitatively, by computing tracking errors using ground truth data. We compared our approach to the FragTrack1 algorithm (Adam et al., 2006) and the TLD2 tracker (Kalal et al.,

Conclusions

In this work we presented a robust approach to multiple pedestrian tracking using monocular calibrated cameras. Our method explores the motion of independently tracked patches, extracted for each pedestrian, and combines these results with a predicted motion vector and the result of a pedestrian detector in a robust manner using a weighted median filter vector.

Our experimental results indicate that the proposed tracker is able to handle short-term occlusions and scale changes. Also, it presents

Acknowledgments

The authors would like to thank Brazilian agencies CNPq and Capes for supporting this work.

References (25)

  • A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in: IEEE Conference on...
  • J. Astola et al.

    Vector median filters

    Proceedings of the IEEE

    (1990)
  • O. Barnich et al.

    ViBE: a universal background subtraction algorithm for video sequences

    IEEE Transactions on Image Processing

    (2011)
  • R. Benenson, M. Mathias, R. Timofte, L. Van Gool, Pedestrian detection at 100 frames per second, in: IEEE Conference on...
  • B. Benfold, I. Reid, Stable multi-target tracking in real-time surveillance video, in: IEEE Conference on Computer...
  • K. Bernardin et al.

    Evaluating multiple object tracking performance: the CLEAR MOT metrics

    EURASIP Journal on Image and Video Processing

    (2008)
  • M. Breitenstein et al.

    Online multiperson tracking-by-detection from a single, uncalibrated camera

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • W. Choi, S. Savarese, Multiple target tracking in world coordinate with single, minimally calibrated camera, in:...
  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: IEEE Conference on Computer Vision and...
  • L.L. Dihl, C.R. Jung, J.C. Bins, Robust adaptive patch-based object tracking using weighted vector median filters, in:...
  • P. Dollár, S. Belongie, P. Perona, The fastest pedestrian detector in the west, in: British Machine Vision Conference,...
  • M. Enzweiler et al.

    Monocular pedestrian detection: survey and experiments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • Cited by (26)

    • Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera

      2021, Expert Systems with Applications
      Citation Excerpt :

      Datasets of single objects are provided in the visual tracker benchmark1 (Wu et al., 2013). Online multiple object tracking methods are more complex than single object tracking, with complicated target registration, model update and track termination (Führ & Jung, 2014; Joshi & Thakore, 2012; Lavee, Rivlin, & Rudzsky, 2009; Possegger, Mauthner, Roth, & Bischof, 2014; Wang, Yoon, & Park, 2017). In Xue, Liu, Cai, and He (2016), authors considered the problem of human tracking in RGBD videos filmed by sensors such as MS Kinect and Primesense, aiming to track persons where the crowd of people is known in advance or all persons in the video have appeared in the very beginning.

    • Assessment of radiation dose received by nuclear plant personnel through a video-based surveillance system

      2018, Progress in Nuclear Energy
      Citation Excerpt :

      In the proposed system, a minimum threshold of 40 percent of foreground area within a given ROI was applied. Further validations were also adopted, similarly to in (Führ and Jung, 2014): (i) aspect ratio of the ROI in the image plane should be in the range 0.2–0.6; and (ii) persons’ heights in the WCS should be in the range from 1.5 to 2.5 m. The foregrounds obtained with GMM were improved by using mathematical morphological operations (opening and closing), (González and Woods, 2007).

    • Online multi-object tracking by detection based on generative appearance models

      2016, Computer Vision and Image Understanding
      Citation Excerpt :

      We evaluate our MOT approach by a comparison to recent state-of-the-art algorithms. Among the compared approaches, a first category studied MOT with the aim of improving detection responses using model-free tracker Breitenstein et al. (2011) Milan et al. (2013), a second category aimed to ameliorate the data association technique Andriyenko and Schindler (2011) Segal and Reid (2013) Berclaz et al. (2006), and a third category aimed to improve the appearance model Yang et al. (2009a) Führ and Jung (2014) Riahi and Bilodeau (2014). The results, when available, are obtained from the authors’ papers.

    • Multiple Pedestrian Tracking with Graph Attention Map on Urban Road Scene

      2023, IEEE Transactions on Intelligent Transportation Systems
    • Multiple pedestrian tracking with occlusion handling in high-density crowds

      2019, Journal of Advanced Research in Dynamical and Control Systems
    • Low-cost multiple object tracking for embedded vision applications

      2019, Turkish Journal of Electrical Engineering and Computer Sciences
    View all citing articles on Scopus
    View full text