Pose Depth Volume extraction from RGB-D streams for frontal gait recognition

https://doi.org/10.1016/j.jvcir.2013.02.010Get rights and content

Abstract

We explore the applicability of Kinect RGB-D streams in recognizing gait patterns of individuals. Gait energy volume (GEV) is a recently proposed feature that performs gait recognition in frontal view using only depth image frames from Kinect. Since depth frames from Kinect are inherently noisy, corresponding silhouette shapes are inaccurate, often merging with the background. We register the depth and RGB frames from Kinect to obtain smooth silhouette shape along with depth information. A partial volume reconstruction of the frontal surface of each silhouette is done and a novel feature termed as Pose Depth Volume (PDV) is derived from this volumetric model. Recognition performance of the proposed approach has been tested on a data set captured using Microsoft Kinect in an indoor environment. Experimental results clearly demonstrate the effectiveness of the approach in comparison with other existing methods.

Highlights

► We combine depth and RGB information from Kinect for frontal gait recognition. ► Key poses are extracted using depth frames registered in RGB frame coordinate system. ► A new feature named Pose Depth Volume is proposed. ► Comparative study with existing gait features has been done.

Introduction

Like several other biometric based identification methods, gait has been studied extensively as a biometric feature in recent years. An advantage of gait recognition is that, unlike other existing biometric methods like finger print detection, iris scan and face detection, gait of a subject can be recognized from a distance without active participation of the subject. This is because detailed textured information is not required in gait recognition. Capturing position variation of human limbs during walking is the main aim of gait recognition and this can be done using binary silhouettes extracted from images which may not be of very high quality. Over the years, while model based and appearance based gait recognition has captured significant attention from researchers, appearance based gait recognition entirely from the frontal view was not given much focus. Also, use of depth cameras in gait recognition is quite rare.

In this paper, we concentrate on gait recognition from frontal view (frontal gait recognition) only. An advantage of frontal gait recognition is that walking videos captured from this viewpoint do not suffer from self-occlusion due to hand swings which prevails in fronto-parallel view. Also, since the camera is positioned right in front of a walking person, videos can be captured in a narrow corridor like situation as well. However, a disadvantage of frontal gait recognition is that binary silhouettes extracted from RGB video frames cannot represent which limb (left/right) of a walking person is nearer to the camera and which one is behind. Thus, pose ambiguity cannot be adequately resolved, leading to incorrect gait recognition. This information deficiency is not present in depth images, where depth values indicate whether the right limb is forward and the left limb is backward or the other way round. Variation of depth in limb positions together with variation of shape is an important element of frontal gait recognition.

Recently developed depth cameras like Kinect [1], [2] can efficiently capture the depth variation in different human body parts while walking. But the depth video frames so obtained are quite noisy, as a result of which extracted object silhouettes are not often clean. In contrast, silhouettes extracted from the RGB video frames are much cleaner but shape variation over a gait cycle is not enough for the extraction of useful gait features. In order to capture both color and shape information in a single frame, we combine information from both the RGB and the depth video streams from Kinect to derive a new gait feature. Each silhouette from the depth frame of Kinect is projected into the RGB frame coordinates using a standard registration procedure, thereby forming a silhouette in transformed space which we term as depth registered silhouette. Previously, registration of Kinect depth and RGB frames has been used for 3D reconstruction using depth videos captured from multiple views of an object [4]. However, to the best of our knowledge, no gait recognition method exists which fuses both RGB and depth information for deriving gait features. It may be noted that, there is no publicly available frontal gait database with both color and depth video frames of walking persons recorded simultaneously. So, we have built a new database using Microsoft Kinect by capturing walking sequences of 30 individuals.

The proposed gait feature is termed as Pose Depth Volume (PDV). It is derived from a partial volumetric reconstruction of each depth registered silhouette. First, a certain number of depth key poses are estimated from a set of training samples and each frame of an entire walking sequence of a subject is classified into an appropriate depth key pose. A PDV is constructed corresponding to each such pose by averaging voxel volumes corresponding to all the frames which belong to that pose. Thus, the number of PDVs of each subject is same as the number of depth key poses. Each voxel in a PDV indicates the number of times an object voxel occurs in that position for that particular depth key pose within a complete gait cycle. A classifier is trained with gait cycles of subjects in the training data set and a different gait cycle is used for testing the accuracy of recognition.

The rest of the paper is organized as follows. Section 2 introduces the Kinect RGB-D camera and basic functionality of its different parts. A brief background study on gait is also included in this section. Section 3 illustrates the sequence of steps followed in deriving our proposed gait feature. Positioning of the Kinect camera and construction of the data set together with experimental results are presented in Section 4. Finally, Section 5 concludes the paper and points out future scope of work.

Section snippets

Basics of RGB-D Kinect camera

RGB-D cameras [2], [3] are useful for providing depth and color information of an object simultaneously. Kinect, developed by Microsoft, is one such type of camera [1]. It captures depth information through its infrared projector and sensor. The infrared laser emitted by Kinect draws a structured pattern on the object surface. The infrared camera senses the depth from this pattern using a technology which is based on the structured light principle. Apart from the infrared projector and sensor,

Gait recognition using Pose Depth Volume

In this section, we describe a new feature called Pose Depth Volume (PDV). The applicability of the feature is tested on videos captured by Microsoft Kinect. Instead of using depth videos directly as done in case of GEV, we combine RGB and depth information from Kinect to obtain better silhouettes along with depth information. To capture intrinsic dynamics of gait better than GEV, we divide an entire gait cycle into a number of depth key poses. Averaging of voxel volumes is done over all the

Experimental results

In this section, we present results from an extensive set of experiments carried out using the proposed Pose Depth Volume (PDV) feature. Experiments have been conducted in the context of biometric based identification in which the features of a test subject are matched against a gallery of previously captured and annotated feature sets. The proposed gait recognition algorithm has been implemented in Matlab 7.12.0 (R2010a) on a 2.50 GHz Intel Core i5 processor having 4 GB RAM.

Conclusions

In this paper, we have combined both depth as well as color streams from Kinect by registering Kinect depth frames to map with corresponding color frames. Next, we have introduced a novel feature called Pose Depth Volume by averaging voxel volumes corresponding to frames belonging to the same pose. The proposed feature considers both shape and depth variations of walking sequence of individuals over each depth key pose of a gait cycle. Experiments carried out on a data set comprising of 30

Acknowledgments

This work is partially funded by project Grant No. 22(0554)/11/EMR-II sponsored by the Council of Scientific and Industrial Research, Govt. of India. The authors thank the anonymous reviewers for their constructive suggestions.

References (31)

  • A.F. Bobick et al.

    The recognition of human movement using temporal templates

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • M.S. Nixon et al.

    Automated person recognition by walking and running via model-based approaches

    Pattern Recognition

    (2004)
  • A. Kale et al.

    Identification of humans using gait

    IEEE Transactions on Image Processing

    (2004)
  • J. Han et al.

    Individual recognition using gait energy image

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2006)
  • J. Liu, N. Zheng, Gait history image: a novel temporal template for gait recognition, in: Proceedings of the IEEE...
  • Cited by (68)

    • High performance inference of gait recognition models on embedded systems

      2022, Sustainable Computing: Informatics and Systems
    • Gait recognition based on vision systems: A systematic survey

      2021, Journal of Visual Communication and Image Representation
    • Person re-identification from appearance cues and deep Siamese features

      2021, Journal of Visual Communication and Image Representation
    View all citing articles on Scopus
    View full text