People re-identification by spectral classification of silhouettes
Introduction
Nowadays, there is no doubt that security should be a major worry for the actors of public transport (travelers, staff, operating companies, governments). Each network or country has established measures according to their knowledge of these problems, to local conditions, and cultural traditions (for example: attitudes and legal limits relative to private life). Timely detection and intervention are needed in the case of threats for security, such as aggressions against people, vandalism against property, acts of terrorism, accidents and major catastrophes such as fires. The Closed-Circuit TeleVision (CCTV) coverage, which is considered as an essential element by several networks of large and middle-size cities, local authorities and police forces, has improved unceasingly. For instance, it was estimated that more than a million cameras are in public places in the United Kingdom and that on average, an individual is “seen” by 300 cameras in only one day in London.
However, the lack of staff limits drastically the general use of CCTV, especially if these systems must be used for prevention, rather than to react after the detection of accidents. It is usual that a human operator, responsible for a video surveillance system, should have to manage simultaneously 20–40 video sources. It brings new difficulties in defining the suitable procedures capable of managing the large volumes of information produced by such systems. When raw video data are available, one must automatically identify incidents as well as dangerous and potentially dangerous situations. Indeed, it is essential to avoid the visual excess to which human operators are currently exposed.
The research presented in this paper is within the framework of BOSS European project [1] (on BOard wireless Secured video Surveillance) which aims at developing a multi-camera vision system specified to monitor, detect and recognize abnormal events occurring on-board trains. One of the important tasks of such a system is to establish correspondence between observations of people over different camera views located at different physical sites. In most cases, such a task relies on the appearance-based models of moving people that may vary depending on several factors, such as illumination conditions, camera angles and pose changes.
In this paper, we propose a particular function between two cameras in order to re-identify a person who has appeared in the field of one camera and then reappears in front of another camera. Our proposed approach consists of several steps. First, we compute invariant features (also called signatures) in order to characterize the silhouettes in static images. Then, a graph-based approach is introduced to reduce the effective working space and realize the comparison of two video sequences (two passages). The performance of our system is evaluated on a real dataset containing 40 people filmed in two different environments (one indoors and one outdoors).
One of the originalities of our research is the tracking of people that represent in the image processing field, what are called deformable shapes. The second originality is the developed algorithms based on spectral analysis and support vector machines (SVM) for the re-identification of people as they move from one location to another. Lastly, the third strong point is that the algorithm is fully illuminant invariant.
The organization of the article is as follows: after this introduction, we will find in Section 2 a short state of the art on video sequence comparison. Section 3 describes how the invariant signature of a detected person is generated. In Section 4, after a few theoretical reminders on spectral analysis, we explain how we adapt the latter to our problematic. The first illustrated results allow us to establish a good discrimination between individuals. In Section 5, we briefly describe the main concepts of SVM and their application to our problem. In fact, the use of SVM is an interested step that complements spectral analysis to perform re-identification. Section 6 presents global results on the performance of our system on a real dataset. Finally, in Section 7, conclusions and important short-term perspectives are given.
Section snippets
State of the art on video sequence comparison
Over the past several years, a significant amount of research has been carried out in the field of object recognition by comparing video sequences. It is usual to describe the color-based features of video sequences using a set of key frames that describes well an entire video sequence. Several techniques of key frame selection from video sequences have been proposed so far. Ueda et al. [2] used the first and last frames of each sequence as two key frames. Ferman et al. [3] clustered the frames
Signature generation
The first step in our system consists in extracting from each frame a robust signature characterizing the passage of a person. To do this, a detection of moving areas, by background subtraction, combined with a shadow elimination algorithm is first carried out [11], [12]. Let us assume now that each person's silhouette is located in all the frames of a video sequence. Since the appearance of people is dominated by their clothes, color features are suitable for their description. Several tools
Overview
High-dimensional data, meaning data that require several dimensions to represent, can be difficult to interpret and process. One approach to tackle this problem is to assume that the data of interest lies on an embedded non-linear manifold within the higher dimensional space. If the manifold is of low enough dimension then the data can be visualized in the low dimensional space. Spectral methods have recently emerged as a powerful tool for non-linear dimensionality reduction and manifold
Application of SVM in measuring the similarity of two sequences
In Section 4, we described how an image set can be mapped into a 2D plane by using spectral dimensionality reduction. Several experimental results showed that the new coordinate system is a good representation for visualizing the image set. Moreover, it introduces a gap between two clusters that can be used to solve our objective of re-identification. We present in this section the application of SVM [28] (see Appendix for more details) to define the gap between two clusters (two groups of
Experimental results
As mentioned above, our research aims to set up an on-board surveillance system that is able to re-identify a person through multiple cameras with different fields of vision. Before collecting a real on-board dataset, a large database containing video sequences of 40 people acquired in INRETS premises was collected for the evaluation of our algorithms. We have chosen two different locations (indoors in a hall near windows and outdoors with natural light) to set up these two cameras. Fig. 7
Conclusion and perspectives
In this paper, we have presented a system that is able to track moving people in different sites while observing them through multiple cameras. Our proposed approach is based on the spectral classification of the color-based signatures extracted from the detected person in each sequence. A new descriptor called “color-position” histogram combined with several invariant methods is proposed to characterize the silhouettes in static images and obtain robust signatures which are invariant to
References (31)
- et al.
Appearance-based person recognition using color/path-length profile
Journal of Visual Communication and Image Representation
(2006) A spatial processor model for object color perception
Journal of the Franklin Institute
(1980)- et al.
Illuminant and device invariant colour using histogram equalisation
Pattern Recognition
(2005) - ...
- H. Ueda, T. Miyatake, S. Yoshizawa, An interavtive natural motion picture dedicated multimedia authoring system, in:...
- A. Ferman, A. Tekalp, Multiscale content extraction and representation for video indexing, in: Multimedia Storage and...
- X. Sun, M. Kankanhalli, Y. Zhu, J. Wu, Content-based representative frame extraction for digital video, in: IEEE...
- et al.
Time-constrained keyframe selection technique
Multimedia Tools and Applications
(2000) - et al.
Human appearance modeling for matching across video sequences
Machine Vision and Applications
(2007) - A. Ferman, S. Krishnamachari, A. Tekalp, M. Abdel-Mottaleb, R. Mehrotra, Group-of-frames/pictures color histogram...
Robust color histogram descriptors for video segment retrieval and identification
IEEE Transactions on Image Processing
Cited by (65)
MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification
2022, Internet of Things (Netherlands)Citation Excerpt :Existing works on person Re-ID can be broadly categorized into image-based and video-based. In early stages, researchers paid much attention to image-based person Re-ID [19,20]. Recently, video-based person Re-ID, as an extension and improvement of image-based person Re-ID, has attracted more and more interest from both academia and industry.
Person re-identification: A taxonomic survey and the path ahead
2022, Image and Vision ComputingA multi-image Joint Re-ranking framework with updateable Image Pool for person re-identification
2019, Journal of Visual Communication and Image RepresentationCitation Excerpt :Furthermore, there are some approaches that explicitly model video include using a conditional random field (CRF) to ensure similar images in a video sequence receive similar labels [48]. In some works, a number of key-frames are selected from the individual’s video sequence, for instance, Cong et al. [49] selected ten key frames. Researches apply it to reduce the final signature dimensionality.
Maximal similarity based region classification method through local image region descriptors and Bhattacharyya coefficient-based distance: Application to horizon line detection using wide-angle camera
2017, NeurocomputingCitation Excerpt :If any segmentation algorithm is applied directly to image segmentation without any pre-processing step, the problem of over-segmentation caused by insignificant structures or noise will reach important levels. To deal with this drawback and therefore to extract sky regions with high accuracy, we adopt a common strategy consisting of simplifying the input image with a suitable CI to obtain invariant signatures [11–14]. Then, we try to observe the effects of a certain number of low-level features invariants on the segmentations obtained from SRM algorithm.
Tracking multiple persons under partial and global occlusions: Application to customers’ behavior analysis
2016, Pattern Recognition LettersUnbiased Spatiotemporal Representation with Uncertainty Control for Person Reidentification
2023, IEEE Transactions on Cognitive and Developmental Systems