Elsevier

Pattern Recognition Letters

Volume 45, 1 August 2014, Pages 217-225
Pattern Recognition Letters

A new efficient method to characterize dynamic textures based on a two-phase texture and dynamism analysis

https://doi.org/10.1016/j.patrec.2014.04.009Get rights and content

Highlights

  • A new efficient method is proposed based on a two-phase texture and dynamism analysis.

  • A mathematical model showing the dynamism of a Dynamic Texture (DT) is proposed.

  • Analysis of the DT and describing it is based on the JPEG2000 algorithm and DTCWT.

  • Classification of DTs is based on a dictionary of visual words which is formed in the learning stage.

  • The proposed method is invariant to illumination, rotational and shift transformations.

Abstract

Dynamic texture (DT) is an extension of texture to the temporal domain. Recently, description and classification of DTs have attracted much attention. In this article, a new method for classifying and synthesizing DTs is proposed. This method is based on a two-phase texture and dynamism analysis. At first, a mathematical model is proposed to model the dynamism of a DT. Then a DT is described in a two-step procedure: describing the texture and the dynamism of a DT. Dual Tree Complex Wavelet Transform (DTCWT) is applied on the textured frames and models of the dynamism of the DT. This makes our algorithm robust enough to illumination and shift variations. The mean and standard deviation of the complex wavelet coefficients, as obtained from textured frames and models of the dynamism of the DT, are concatenated and used to form the DT feature vector. By using Fourier Transform of the feature vector, rotation invariance is also provided. A dictionary of visual words made from these feature vectors is then used to describe each DT. Together, the two phases cover a large variety of DT classification problems, including the cases where classes are different in both appearance and motion and those where appearance is similar for all classes and only motion is discriminant or vice versa. Our approach has many advantages compared with the earlier ones, such as providing high speed and a better performance for two test databases.

Introduction

There are many visual patterns that are best characterized by using the aggregate dynamics of a set of constituent elements, rather than the dynamics of the individuals. In computer vision literature, miscellaneous terms such as turbulent flow/motion [15], temporal textures [25], time-varying textures [2], DTs [28] and textured motion [32] have been used to refer to these patterns. Typically, these terms have been referenced to image sequences of natural processes which have stochastic dynamics (e.g. fire, turbulent water and windblown vegetation).

Recognizing dynamic patterns based on visual processing is significant for many applications such as remote monitoring for the prevention of natural disasters, e.g. forest fires, various types of surveillance, e.g. traffic monitoring, background subtraction in challenging environments, e.g. outdoor scenes with vegetation, homeland security applications and scientific studies of animal behavior. In the context of surveillance, recognizing dynamic patterns is of significance to isolate activities of interest (e.g. fire) from distracting background (e.g. windblown vegetation and changes in scene illumination). Also, in the context of decision making for intelligent agents, certain critical dynamic patterns can be used to trigger corresponding reactive behaviors (e.g. flight and pursuit). Further, pattern dynamics can be used as complementary clues to spatial appearance-based ones for indexing and retrieval of video.

The goal of the present paper is introducing a unified approach to classify and synthesize a diverse set of dynamic patterns with robustness for illumination, rotation and shift transformations. Toward that end, a mathematical model of the dynamism of a DT is proposed. We then describe a DT in a two-step procedure: describing the texture and the dynamism of a DT. DTCWT is applied on both the textured frames and the models of the dynamism of the DT. Mean and standard deviation of DTCWT coefficients, as obtained from textured frames and the models of the dynamism of the DT, form very powerful and discriminative descriptors for that dynamic pattern. Then descriptors of all dynamic patterns belonging to a specific category will be clustered. This gives some visual words representing that category. Considering this for all categories produces a dictionary of visual words. The dictionary of visual words is correspondingly used to derive a distribution (histogram) of the relative presence of particular visual words for each dynamic pattern. Finally DT recognition is performed by matching the histograms.

Various methods have been proposed to characterize DTs for the purpose of recognition [7]. One category is the use of physics-based approaches [19]. In these methods, models for specific dynamic patterns (e.g. water) are derived based on a first-principles analysis of the generating process. While input imagery is used to derive models, their parameters can be used to derive inference. This type of approach has two main disadvantages: the first one involves significant computational expense and the problem with the second one is that the derived models are highly focused on specific patterns, and thus, it suffers from the lack of generalization to other classes.

The second category refers to motion analysis methods that extract motion flow field-based features, which are assumed to be captured by the estimated normal flow [25]. This work was followed-up by several proposed variations of normal flow [3], and optical flow-based features [21]. However, optical flow based methods assume that the brightness is constant and image is locally smooth, conditions which cannot be justified for stochastic dynamics. Rather than capturing dynamic information alone, some others captured the joint photometric-dynamic pattern structure using aggregate measurements of local threshold values [35]. Object tracking methods [8] also tend to be infeasible here due to the large number of extremely small and non-rigid moving objects with little shape stability, complex motion characteristics, and inter-object interactions.

Statistical generative models are very attractive as both spatial appearance and dynamics of a pattern can be derived from these models. In the recognition stage, the estimated model parameters are compared and used. Autoregressive (AR) models [10], [12], [31], [32] and multi-resolution analysis based methods [2], [15] are examples of this approach. Doretto in [10] proposed the joint photometric-dynamic, AR-based Linear Dynamic System (LDS) model to describe DTs. Their approach has been used and improved by many other researchers [6], [4], [24], [27], [28], [33]. However, when the intensity values of a DT is modeled by LDS, although the model is regarded as a stable autoregressive moving average (ARMA) process, it has three main disadvantages: (i) second-order probabilistic stationarity assumption cannot be justified for many sequences (e.g. fire), (ii) the suboptimal relationship between the order of the LDS model and the extent of temporal modeling possibility (that is, an LDS of order n does not capture the most temporal variation in a DT among all models of order n), and (iii) computational complexity is high, since the model is applied directly to pixel intensities regardless of spatial redundancy; (iv) however, the temporal evolution of DT is non-linear in general and is not fully captured by the linear model [36].

A good descriptor must accurately and efficiently capture both the texture and dynamism of DTs. Texture can be captured from textured frames, but for capturing the dynamism, a mathematical model is proposed to model the dynamism of a DT. In the proposed model, we efficiently compute velocity of dynamism or transmission model of the codes (transmitted frame to code space) from one frame to the next one. We then describe a DT in a two-step procedure: describing the texture and the dynamism of a DT using wavelet transform. Since its emergence 20 years ago, the wavelet transform has been exploited with great success across the gamut of signal processing applications, often redefining the state-of-the-art performance in the process. The dual tree implementation of complex wavelet transform provides a directional multi-resolution decomposition of a given DT. Thus, in this paper, an automatic method for textural feature extraction (from textured frames) and dynamical feature extraction (from models of the dynamism) using DTCWT has been proposed. The characteristics of the DTCWT such as illumination and shift invariance, directional selectivity, phase information, perfect reconstruction, limited redundancy, and efficient order-N computation provide a framework which renders the extraction of textural and dynamical features capable of powerfully characterizing DTs. It also overcomes the limitations of the classical Discrete Wavelet Transform (DWT) such as lack of shift invariance, poor directional selectivity, oscillations of the coefficients at a singularity and aliasing due to down-sampling [17], [30].

In the light of previous researches, the contributions of the present work are as follows. A new efficient method based on a two-phase texture and dynamism analysis is proposed to address classification and synthesis, the most challenging problems of DTs. For this purpose, a mathematical model of the dynamism of a DT is proposed. DTCWT is applied on the textured frames and the models of the dynamism of the DT. Then statistical features of DTCWT coefficients form the descriptors. The descriptors of all dynamic patterns are clustered using k-means clustering method. Then cluster centers produce a dictionary of visual words. By using the dictionary of visual words, each dynamic pattern is associated with a histogram representing the relative presence of particular visual words. Classification is performed by matching the histograms and synthesis is performed using the first frame feature vector and the model of the dynamism of the DT. In order to make it promising for practical applications, the proposed approach is invariant to illumination, rotational and shift transformations. Empirical evaluation of the standard data sets shows that the proposed approach achieves admissible speed and superior performance in accuracy over state-of-the-art methods.

Section snippets

A new approach to model the dynamism of a dynamic texture

In this section, it is shown how a mathematical model is obtained to model the dynamism of a DT.

Synthesis

To evaluate the performance of our proposed synthesis algorithm, we perform a comparison between the synthesized frames from [10] and the synthesized frames from our two-phase texture and dynamism analysis. The comparison for few examples is shown in Fig. 3. As a criterion, Root Mean Square (rms) error from original frames is calculated according to (9) as listed in Table 1.Erms=1PQp=1Pq=1QIorg(p,q)-Isyn(p,q)2where P and Q are image sizes and Iorg and Isyn are original and synthesized frames,

Conclusions

In this paper, based on a two-phase texture and dynamism analysis, a new efficient method was proposed to address classification and synthesis, the most challenging problems of DTs. For this purpose, a mathematical model of the dynamism of a DT was proposed. DTCWT was applied on the textured frames and the models of the dynamism of the DT. Then statistical features of DTCWT coefficients formed the descriptors. The descriptors of all dynamic patterns were clustered using k-means clustering

References (36)

  • N.G. Kingsbury

    Complex wavelets for shift invariant analysis and filtering of signals

    J. Appl. Comput. Harmon. Anal.

    (2001)
  • R. Nelson et al.

    Qualitative recognition of motion using temporal texture

    CVGIP: Image Understanding

    (1992)
  • T. Ahonen et al.

    Rotation invariant image description with local binary pattern histogram fourier features

  • Z. Bar-Joseph et al.

    Texture mixing and texture movie synthesis using statistical learning

    IEEE Trans. Visual Comput. Graphics

    (2001)
  • P. Bouthemy et al.

    Motion characterization from temporal cooccurrences of local motion-based measures for video indexing

  • A.B. Chan et al.

    Probabilistic kernels for the classification of auto-regressive visual processes

  • A.B. Chan et al.

    Classifying video with kernel dynamic textures

  • A.B. Chan, E. Coviello, G.R.G. Lanckriet, Clustering dynamic textures with the hierarchical EM algorithm, in: IEEE...
  • D. Chetverikov et al.

    A brief survey of dynamic texture description and recognition

  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • K. Derpanis et al.

    Space time texture representation and recognition based on a spatiotemporal orientation analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • G. Doretto et al.

    Dynamic textures

    Int. J. Comput. Vision

    (2003)
  • S. Fazekas et al.

    Normal versus complete flow in dynamic texture recognition: a comparative study

  • A. Fitzgibbon

    Stochastic rigidity: image registration for nowhere-static scenes

  • B. Ghanem et al.

    Sparse coding of linear dynamical systems with an application to dynamic texture recognition

  • R. Gonzalez et al.

    Digital Image Processing Using Matlab

    (2009)
  • D. Heeger et al.

    Seeing structure through chaos

  • N.G. Kingsbury

    The dual-tree complex wavelet transform: a new efficient tool for image restoration and enhancement

  • This paper has been recommended for acceptance by R. Davies.

    View full text