People re-identification by spectral classification of silhouettes

doi:10.1016/j.sigpro.2009.09.005

Signal Processing

Volume 90, Issue 8, August 2010, Pages 2362-2374

https://doi.org/10.1016/j.sigpro.2009.09.005 Get rights and content

Abstract

The problem described in this paper consists in re-identifying moving people in different sites which are completely covered with non-overlapping cameras. Our proposed framework relies on the spectral classification of the appearance-based signatures extracted from the detected person in each sequence. We first propose a new feature called “color-position” histogram combined with several illumination invariant methods in order to characterize the silhouettes in static images. Then, we develop an algorithm based on spectral analysis and support vector machines (SVM) for the re-identification of people. The performance of our system is evaluated on real datasets collected on INRETS premises. The experimental results show that our approach provides promising results for security applications.

Introduction

Nowadays, there is no doubt that security should be a major worry for the actors of public transport (travelers, staff, operating companies, governments). Each network or country has established measures according to their knowledge of these problems, to local conditions, and cultural traditions (for example: attitudes and legal limits relative to private life). Timely detection and intervention are needed in the case of threats for security, such as aggressions against people, vandalism against property, acts of terrorism, accidents and major catastrophes such as fires. The Closed-Circuit TeleVision (CCTV) coverage, which is considered as an essential element by several networks of large and middle-size cities, local authorities and police forces, has improved unceasingly. For instance, it was estimated that more than a million cameras are in public places in the United Kingdom and that on average, an individual is “seen” by 300 cameras in only one day in London.

However, the lack of staff limits drastically the general use of CCTV, especially if these systems must be used for prevention, rather than to react after the detection of accidents. It is usual that a human operator, responsible for a video surveillance system, should have to manage simultaneously 20–40 video sources. It brings new difficulties in defining the suitable procedures capable of managing the large volumes of information produced by such systems. When raw video data are available, one must automatically identify incidents as well as dangerous and potentially dangerous situations. Indeed, it is essential to avoid the visual excess to which human operators are currently exposed.

The research presented in this paper is within the framework of BOSS European project [1] (on BOard wireless Secured video Surveillance) which aims at developing a multi-camera vision system specified to monitor, detect and recognize abnormal events occurring on-board trains. One of the important tasks of such a system is to establish correspondence between observations of people over different camera views located at different physical sites. In most cases, such a task relies on the appearance-based models of moving people that may vary depending on several factors, such as illumination conditions, camera angles and pose changes.

In this paper, we propose a particular function between two cameras in order to re-identify a person who has appeared in the field of one camera and then reappears in front of another camera. Our proposed approach consists of several steps. First, we compute invariant features (also called signatures) in order to characterize the silhouettes in static images. Then, a graph-based approach is introduced to reduce the effective working space and realize the comparison of two video sequences (two passages). The performance of our system is evaluated on a real dataset containing 40 people filmed in two different environments (one indoors and one outdoors).

One of the originalities of our research is the tracking of people that represent in the image processing field, what are called deformable shapes. The second originality is the developed algorithms based on spectral analysis and support vector machines (SVM) for the re-identification of people as they move from one location to another. Lastly, the third strong point is that the algorithm is fully illuminant invariant.

The organization of the article is as follows: after this introduction, we will find in Section 2 a short state of the art on video sequence comparison. Section 3 describes how the invariant signature of a detected person is generated. In Section 4, after a few theoretical reminders on spectral analysis, we explain how we adapt the latter to our problematic. The first illustrated results allow us to establish a good discrimination between individuals. In Section 5, we briefly describe the main concepts of SVM and their application to our problem. In fact, the use of SVM is an interested step that complements spectral analysis to perform re-identification. Section 6 presents global results on the performance of our system on a real dataset. Finally, in Section 7, conclusions and important short-term perspectives are given.

Section snippets

State of the art on video sequence comparison

Over the past several years, a significant amount of research has been carried out in the field of object recognition by comparing video sequences. It is usual to describe the color-based features of video sequences using a set of key frames that describes well an entire video sequence. Several techniques of key frame selection from video sequences have been proposed so far. Ueda et al. [2] used the first and last frames of each sequence as two key frames. Ferman et al. [3] clustered the frames

Signature generation

The first step in our system consists in extracting from each frame a robust signature characterizing the passage of a person. To do this, a detection of moving areas, by background subtraction, combined with a shadow elimination algorithm is first carried out [11], [12]. Let us assume now that each person's silhouette is located in all the frames of a video sequence. Since the appearance of people is dominated by their clothes, color features are suitable for their description. Several tools

Overview

High-dimensional data, meaning data that require several dimensions to represent, can be difficult to interpret and process. One approach to tackle this problem is to assume that the data of interest lies on an embedded non-linear manifold within the higher dimensional space. If the manifold is of low enough dimension then the data can be visualized in the low dimensional space. Spectral methods have recently emerged as a powerful tool for non-linear dimensionality reduction and manifold

Application of SVM in measuring the similarity of two sequences

In Section 4, we described how an image set can be mapped into a 2D plane by using spectral dimensionality reduction. Several experimental results showed that the new coordinate system is a good representation for visualizing the image set. Moreover, it introduces a gap between two clusters that can be used to solve our objective of re-identification. We present in this section the application of SVM [28] (see Appendix for more details) to define the gap between two clusters (two groups of

Experimental results

As mentioned above, our research aims to set up an on-board surveillance system that is able to re-identify a person through multiple cameras with different fields of vision. Before collecting a real on-board dataset, a large database containing video sequences of 40 people acquired in INRETS premises was collected for the evaluation of our algorithms. We have chosen two different locations (indoors in a hall near windows and outdoors with natural light) to set up these two cameras. Fig. 7

Conclusion and perspectives

In this paper, we have presented a system that is able to track moving people in different sites while observing them through multiple cameras. Our proposed approach is based on the spectral classification of the color-based signatures extracted from the detected person in each sequence. A new descriptor called “color-position” histogram combined with several invariant methods is proposed to characterize the silhouettes in static images and obtain robust signatures which are invariant to

References (31)

K. Yoon et al.
Appearance-based person recognition using color/path-length profile
Journal of Visual Communication and Image Representation
(2006)
G. Buchsbaum
A spatial processor model for object color perception
Journal of the Franklin Institute
(1980)
G. Finlayson et al.
Illuminant and device invariant colour using histogram equalisation
Pattern Recognition
(2005)
...
H. Ueda, T. Miyatake, S. Yoshizawa, An interavtive natural motion picture dedicated multimedia authoring system, in:...
A. Ferman, A. Tekalp, Multiscale content extraction and representation for video indexing, in: Multimedia Storage and...
X. Sun, M. Kankanhalli, Y. Zhu, J. Wu, Content-based representative frame extraction for digital video, in: IEEE...
A. Girgensohn et al.
Time-constrained keyframe selection technique
Multimedia Tools and Applications
(2000)
Y. Yu et al.
Human appearance modeling for matching across video sequences
Machine Vision and Applications
(2007)
A. Ferman, S. Krishnamachari, A. Tekalp, M. Abdel-Mottaleb, R. Mehrotra, Group-of-frames/pictures color histogram...

A. Ferman et al.

Robust color histogram descriptors for video segment retrieval and identification

IEEE Transactions on Image Processing

(2002)

T. Leclercq, L. Khoudour, L. Macaire, J.-G. Postaire, Compact color video signature by principal component analysis,...

N. Gheissari, T. Sebastian, R. Hartley, Person reidentification using spatiotemporal appearance, in: Proceedings of the...

F. Porikli, O. Tuzel, Human body tracking by adaptive background models and mean-shift analysis, in: IEEE International...

D. Hall, J. Nascimento, P. Ribeiro, E. Andrade, P. Moreno, S. Pesnel, T. List, R. Emonet, R. Fisher, J. Victor, J....

Cited by (65)

MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification
2022, Internet of Things (Netherlands)
Citation Excerpt :
Existing works on person Re-ID can be broadly categorized into image-based and video-based. In early stages, researchers paid much attention to image-based person Re-ID [19,20]. Recently, video-based person Re-ID, as an extension and improvement of image-based person Re-ID, has attracted more and more interest from both academia and industry.
With the popularization of surveillance cameras, public-safety related applications requiring the functionality of video-based person re-identification (Re-ID) thrive. Re-ID aims at accurately identifying a person-of-interest across video sequences from multiple cameras. Existing methods usually focus on either spatially salient regions, or temporal features among frames of fixed intervals (i.e., either short- or long-term temporal features), resulting in the under-utilization of neglected features and hence moderate identification accuracy. To achieve high Re-ID accuracy, we propose a novel framework termed Multi-granular Spatial–Temporal Network (MSTN), that facilitates full utilization of spatial–temporal features for video-based person Re-ID. Within MSTN, a Temporal Kernel Attention (TKA) module is proposed to adaptively capture both short- and long-term temporal relationships; a Feature Disentanglement Spatial Attention (FDSA) module is further proposed to mine spatially salient and subtle features. Extensive experiments on the MARS dataset demonstrate that MSTN can achieve high identification accuracy, exhibiting 86.1% in terms of mAP and 91.0% in terms of Rank-1, notably higher than state-of-the-art comparison schemes.
Person re-identification: A taxonomic survey and the path ahead
2022, Image and Vision Computing
Person re-identification (PRId) is one of the most challenging tasks in automated video surveillance and has been an area of intense research spanning the past decade. PRId aims at finding a person who has previously been identified using some unique descriptor of the person. This survey comprises a wide spectrum of PRId methods spanning from traditional to deep learning-based being analyzed and compared. This survey also discusses different PRId frameworks on the basis of machine learning and deep learning. It offers a multi-dimensional taxonomy to classify the most pertinent researches according to different perspectives and tries to unify the categorization of PRId methods and fill the gap between the recently published surveys. This study highlights the challenges in building PRId systems. It presents a critical overview of recent progress and the state-of-the-art approaches to solving some major challenges of existing PRId systems. Furthermore, we discuss the performance comparisons of the various state-of-the-art in different datasets. Finally, we discuss several open issues and directions for future studies.
A multi-image Joint Re-ranking framework with updateable Image Pool for person re-identification
2019, Journal of Visual Communication and Image Representation
Citation Excerpt :
Furthermore, there are some approaches that explicitly model video include using a conditional random field (CRF) to ensure similar images in a video sequence receive similar labels [48]. In some works, a number of key-frames are selected from the individual’s video sequence, for instance, Cong et al. [49] selected ten key frames. Researches apply it to reduce the final signature dimensionality.
Real-world video surveillance has increasing demand for person re-identification. Existing multi-shot works usually aggregate single sample features by computing the average features or using time series model. The Multi-image Joint Re-ranking framework with updateable Image Pool that we are proposing will give a different approach. First, we defined the term ‘Image Pool’ to store image samples for each pedestrian. Next, the updating rules of Image Pool has been defined in order to optimize the representativeness of it. Second, we compute initial ranking lists of every sample in Image Pool, and propose the ‘Multiple-image Joint Re-ranking’ algorithm to aggregate initial ranking lists. We calculate the rank score of partial elements of initial ranking lists. In the end, we get final ranking list by ascending the order of the rank scores. We validated our re-ranking results on Market-1501, iLIDS-VID, PRID-2011 and our ITSD datasets, and the results outperform other methods.
Maximal similarity based region classification method through local image region descriptors and Bhattacharyya coefficient-based distance: Application to horizon line detection using wide-angle camera
2017, Neurocomputing
Citation Excerpt :
If any segmentation algorithm is applied directly to image segmentation without any pre-processing step, the problem of over-segmentation caused by insignificant structures or noise will reach important levels. To deal with this drawback and therefore to extract sky regions with high accuracy, we adopt a common strategy consisting of simplifying the input image with a suitable CI to obtain invariant signatures [11–14]. Then, we try to observe the effects of a certain number of low-level features invariants on the segmentations obtained from SRM algorithm.
In recent years, many approaches have been proposed to compensate the lack of performance of GNSS (Global Navigation Satellites Systems) occurring when operating in constrained environments. One of these approaches consists in characterizing the environment of reception of GNSS signals using a wide-angle (fisheye) camera oriented to the sky. The content of acquired images is classified into two regions (sky and not-sky) in order to determine LOS (Line-Of-Sight) satellites and NLOS (Nonline-Of-Sight) satellites. This paper is aimed at proposing an image-content classification method to make this approach more effective. The proposed method is composed of four major steps. The first one consists of simplifying the acquired image with an appropriate couple of colorimetric invariant and exponential transform. In the second step, the simplified image is segmented using Statistical Region Merging method. The third step consists of characterizing the segmented regions with a number of local image region descriptors providing more statistically meaningful and discriminatory features. In order to classify the characterized regions into sky and non sky regions, we propose the supervised $MSRC$ (Maximal Similarity Based Region Classification) method by using Bhattacharyya coefficient-based distance. Comparative and extensive experiments have been conducted to investigate the effectiveness of the proposed $MSRC$ method according to the proposed groups of local image region descriptors. Furthermore, we clearly validate the feasibility of $MSRC$ method by comparing its results with those presented in the state of the art.
Tracking multiple persons under partial and global occlusions: Application to customers’ behavior analysis
2016, Pattern Recognition Letters
Multiple objects (targets) tracking plays an important role in computer vision. It is considered as the first step in many artificial intelligence applications that are developed to analyze people behavior for either security or statistical purposes. The most important challenge faced by algorithms designed for multiple objects tracking is the identity switches that occur between tracked objects due to occlusions and interactions between these same objects. This work falls within the scope of video-based behavioral marketing analysis and aims to better understand the purchasing behavior of customers by analyzing their movements in a densely-populated sales area. We propose to use a re-identification strategy to prevent these identity switches. This re-identification strategy is based on segmenting detected individuals into head, torso, and legs in addition to the classification of their appearances into front and back poses. This re-identification module is integrated within our tracking system to fuse tracklets obtained from a particle filter based tracking framework in a mono-camera tracking system. The combination of these tracking and re-identification modules allows the recovery of global trajectories for tracked individuals.
Unbiased Spatiotemporal Representation with Uncertainty Control for Person Reidentification
2023, IEEE Transactions on Cognitive and Developmental Systems

View all citing articles on Scopus

View full text

People re-identification by spectral classification of silhouettes

Abstract

Introduction

Section snippets

State of the art on video sequence comparison

Signature generation

Overview

Application of SVM in measuring the similarity of two sequences

Experimental results

Conclusion and perspectives

Journal of Visual Communication and Image Representation

Journal of the Franklin Institute

Pattern Recognition

Time-constrained keyframe selection technique

Multimedia Tools and Applications

Human appearance modeling for matching across video sequences

Machine Vision and Applications

Robust color histogram descriptors for video segment retrieval and identification

IEEE Transactions on Image Processing