Zernike velocity moments for sequence-based description of moving features

https://doi.org/10.1016/j.imavis.2005.12.001Get rights and content

Abstract

The increasing interest in processing sequences of images motivates development of techniques for sequence-based object analysis and description. Accordingly, new velocity moments have been developed to allow a statistical description of both shape and associated motion through an image sequence. Through a generic framework motion information is determined using the established centralised moments, enabling statistical moments to be applied to motion based time series analysis. The translation invariant Cartesian velocity moments suffer from highly correlated descriptions due to their non-orthogonality. The new Zernike velocity moments overcome this by using orthogonal spatial descriptions through the proven orthogonal Zernike basis. Further, they are translation and scale invariant. To illustrate their benefits and application the Zernike velocity moments have been applied to gait recognition—an emergent biometric. Good recognition results have been achieved on multiple datasets using relatively few spatial and/or motion features and basic feature selection and classification techniques. The prime aim of this new technique is to allow the generation of statistical features which encode shape and motion information, with generic application capability. Applied performance analyses illustrate the properties of the Zernike velocity moments which exploit temporal correlation to improve a shape's description. It is demonstrated how the temporal correlation improves the performance of the descriptor under more generalised application scenarios, including reduced resolution imagery and occlusion.

Introduction

Moving object description is a growing area of computer vision research, traditionally an arena dominated by tracking algorithms. The developments in this area were previously limited not least by the storage requirements of image sequences. With the advance of digital video (DV), and the explosion of storage capacities, the analysis and storage of image sequences has become viable, enabling increased interest. Tracking algorithms [1] generally locate the region or feature of interest in the first frame and then track it throughout the remainder of the sequence. This requires good initialisation in the first image and assumes that in later images tracked objects are not overcome by noise or occlusion. This kind of approach enables real-time performance, a major benefit of these algorithms. With the ever increasing available computing power, alternative approaches that process the complete image-sequence are appearing. For example, the velocity Hough transform for conic sections [2] and its extension for arbitrary shapes [3] process a complete image sequence, overcoming the problems of image noise and occlusion by exploiting temporal correlation, treating the image sequence as a single entity rather than individual images. These approaches locate the perimeter of a moving shape by searching for a particular motion. However, a great deal of information can be held within a shape's perimeter—motivating techniques enabling holistic moving shape description.

Statistical moments, e.g. [4] describe a shape with respect to its axes, producing holistic descriptions encoding information including mass, centroid and variation across axes. Mukundan [5] provides descriptions of most of the current moment techniques, along with background information and applications. In general the different types of moments fall into two categories, orthogonal and non-orthogonal. Orthogonal moments produce features that are less correlated than their non-orthogonal counterparts. Further, the orthogonality property enables simple, accurate signal reconstruction from the generated moments. Moments that are non-orthogonal tend to be simpler to implement, computationally less expensive and include descriptors that have a range of useful properties, i.e. scale, translation and rotation invariance. Their highly correlated features (as a result of their non-orthogonal nature) make reconstruction more difficult. This correlation requires the need for high accuracy in the calculations when interested in the high frequency components of the image and/or when analysing large datasets.

There have been many studies using two-dimensional moments for image recognition purposes. However, to date, most applications use single images. Hoey [6] used Zernike polynomials to study facial motion by generating flow fields which provided input to hidden Markov models. Little [7] used moments to characterise optical flows between images for gait recognition. These techniques still only link adjacent images, and do not consider the complete sequence. Rosales [8] described motion by producing one image that contained information from a complete sequence, building on the work done by Davis [9]. Rosales's system was based on Hu [10] invariant moments and was used to recognise types of motion, e.g. sitting down or kicking; due to several images being compressed into one, subtle differences between subjects are lost due to self occlusion and overlapping of data.

For this work, we began by looking at a traditional statistical method of moments to describe the motion of a person through multiple images. Unfortunately, this does not provide a very detailed description of the motion, as there is no information linking the images of the sequence, since they are treated as separate entities. By using the general theory of moments a method has been developed that not only contains information about the pixel structure of the moving object, but also how its movement flows between images. Through analysing image sequences the temporal information can be exploited and the possibility of describing deforming shapes becomes apparent. Accordingly, we describe a new technique called velocity moments, enabling the holistic statistical description of temporal image sequences. We present this new technique to enable the application of statistical moments to image sequences. To aid its characterisation while demonstrating its beneficial attributes, we apply it to human gait recognition, an emergent biometric.

This paper is structured as follows. Firstly, Section 2 briefly reviews non-orthogonal and orthogonal statistical moments. Velocity moments are then introduced in Section 3. Section 4 uses human gait classification to illustrate their application. Section 5 details the performance attributes of the Zernike velocity moments analysing the effects of reduced resolution imagery and occlusion. Conclusions are then drawn.

Section snippets

Background theory

Statistical moments are applicable to many different aspects of image processing, ranging from invariant pattern recognition and image encoding to pose estimation. Moments of an image [10], describe the image content (or distribution) with respect to its axes. They are designed to capture both global and detailed geometric information about the image. In continuous form an image can be considered as a two-dimensional Cartesian density distribution function f(x,y). With this assumption, the

Velocity moments

One method of developing a statistical moment technique to analyse image sequences is to stack the images into a three-dimensional XYT (x,y plus time) block, and then apply a 3D descriptor to these data. Data in this form could be described using conventional 3D moments [15], treating time as the z-axis. However, this method confounds the separation of the time and space information, as they are embedded in the data and not specific to the descriptor. Time is fundamentally different from space,

Human gait

Gait is defined as the ‘manner of walking or forward motion’ [19]. It is primarily determined by muscular and skeletal structure. One of the earliest documented examples of recognition by gait was Shakespeare who wrote in The Tempest [Act 4 Scene 1]

“High'st Queen of state, Great Juno comes; I know her by her gait”

An early documented example of psychological gait observations was by Johansson [20] who attached point light displays onto specific points on a subject. Johansson then showed that

Performance analysis

This section details performance evaluation of the Zernike velocity moments as applied to the complete SOTON ST database. The analysis is intended to provide an insight into the robustness of the technique under a selection of simulated application scenarios. This analysis has been applied to the SOTON STs as their chroma-key extraction provides a suitable ground truth. To characterise the velocity moments it is important to describe the performance characteristics in terms of an error-rate,

Conclusions

A new moment descriptor structure that includes spatial and temporal information is proposed. This allows the application of statistical moments to motion based time series analysis. Thus, classification of an image sequence can be based on moments describing spatial characteristics and/or motion information, while retaining both scale and translation invariance. For example, similar objects moving with different motion can be statistically discriminated. The Cartesian velocity moments are

Acknowledgements

Our thanks to Dr Michael Grant for his help in capturing and preparing the data and we gratefully acknowledge the partial support from the European Research Office of the US Army, Contract No. N68171-01-C-9002.

References (39)

  • J. Hoey, J. Little, Representation and Recognition of Complex Human Motion, Proceedings of Computer Vision and Pattern...
  • J.J. Little et al.

    Recognising people by their gait: the shape of motion

    Videre

    (1998)
  • R. Rosales, Recognition of Human Actions Using Moment-based Features, Boston University Computer Science Technical...
  • J.W. Davis, A.F. Bobick, The Representation and Recognition of Action Using Temporal Templates, Proceedings of IEEE...
  • M.-K. Hu

    Visual pattern recognition by moment invariants

    IRE Transactions on Information Theory

    (1962)
  • C. Teh et al.

    On image analysis by the method of moments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1988)
  • A.B. Bhatia et al.

    On the circle polynomials of Zernike and related orthogonal sets

    Proceedings of Cambridge Philosophical Society

    (1954)
  • A. Khotanzad et al.

    Invariant image recognition by Zernike moments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1990)
  • F.A. Sadjadi et al.

    Three-dimensional moment invariants

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1980)
  • Cited by (42)

    • Vision-based approaches towards person identification using gait

      2021, Computer Science Review
      Citation Excerpt :

      They use this information to recognize the gait. Shutler et al. [271] encode the motion information from gait video sequence using zernike velocity moments. They fuse this motion information with the statistical shape of an individual derived from the binary silhouette.

    • Silhouette-based gait recognition using Procrustes shape analysis and elliptic Fourier descriptors

      2012, Pattern Recognition
      Citation Excerpt :

      However, they are susceptible to variation in camera view and walking speed. Statistical methods (e.g. [2,15–17]) usually describe silhouettes using shape and motion descriptors such as velocity moments [15], Zernike velocity moments [16] and Procrustes mean shape distance [2]. Eigenspace transformation and canonical space analysis are widely used in these methods to reduce the dimensionality of input feature space and optimise class discrimination.

    • Large-scale graph networks and AI applied to medical image data processing

      2020, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus
    View full text