A new efficient method to characterize dynamic textures based on a two-phase texture and dynamism analysis☆
Introduction
There are many visual patterns that are best characterized by using the aggregate dynamics of a set of constituent elements, rather than the dynamics of the individuals. In computer vision literature, miscellaneous terms such as turbulent flow/motion [15], temporal textures [25], time-varying textures [2], DTs [28] and textured motion [32] have been used to refer to these patterns. Typically, these terms have been referenced to image sequences of natural processes which have stochastic dynamics (e.g. fire, turbulent water and windblown vegetation).
Recognizing dynamic patterns based on visual processing is significant for many applications such as remote monitoring for the prevention of natural disasters, e.g. forest fires, various types of surveillance, e.g. traffic monitoring, background subtraction in challenging environments, e.g. outdoor scenes with vegetation, homeland security applications and scientific studies of animal behavior. In the context of surveillance, recognizing dynamic patterns is of significance to isolate activities of interest (e.g. fire) from distracting background (e.g. windblown vegetation and changes in scene illumination). Also, in the context of decision making for intelligent agents, certain critical dynamic patterns can be used to trigger corresponding reactive behaviors (e.g. flight and pursuit). Further, pattern dynamics can be used as complementary clues to spatial appearance-based ones for indexing and retrieval of video.
The goal of the present paper is introducing a unified approach to classify and synthesize a diverse set of dynamic patterns with robustness for illumination, rotation and shift transformations. Toward that end, a mathematical model of the dynamism of a DT is proposed. We then describe a DT in a two-step procedure: describing the texture and the dynamism of a DT. DTCWT is applied on both the textured frames and the models of the dynamism of the DT. Mean and standard deviation of DTCWT coefficients, as obtained from textured frames and the models of the dynamism of the DT, form very powerful and discriminative descriptors for that dynamic pattern. Then descriptors of all dynamic patterns belonging to a specific category will be clustered. This gives some visual words representing that category. Considering this for all categories produces a dictionary of visual words. The dictionary of visual words is correspondingly used to derive a distribution (histogram) of the relative presence of particular visual words for each dynamic pattern. Finally DT recognition is performed by matching the histograms.
Various methods have been proposed to characterize DTs for the purpose of recognition [7]. One category is the use of physics-based approaches [19]. In these methods, models for specific dynamic patterns (e.g. water) are derived based on a first-principles analysis of the generating process. While input imagery is used to derive models, their parameters can be used to derive inference. This type of approach has two main disadvantages: the first one involves significant computational expense and the problem with the second one is that the derived models are highly focused on specific patterns, and thus, it suffers from the lack of generalization to other classes.
The second category refers to motion analysis methods that extract motion flow field-based features, which are assumed to be captured by the estimated normal flow [25]. This work was followed-up by several proposed variations of normal flow [3], and optical flow-based features [21]. However, optical flow based methods assume that the brightness is constant and image is locally smooth, conditions which cannot be justified for stochastic dynamics. Rather than capturing dynamic information alone, some others captured the joint photometric-dynamic pattern structure using aggregate measurements of local threshold values [35]. Object tracking methods [8] also tend to be infeasible here due to the large number of extremely small and non-rigid moving objects with little shape stability, complex motion characteristics, and inter-object interactions.
Statistical generative models are very attractive as both spatial appearance and dynamics of a pattern can be derived from these models. In the recognition stage, the estimated model parameters are compared and used. Autoregressive (AR) models [10], [12], [31], [32] and multi-resolution analysis based methods [2], [15] are examples of this approach. Doretto in [10] proposed the joint photometric-dynamic, AR-based Linear Dynamic System (LDS) model to describe DTs. Their approach has been used and improved by many other researchers [6], [4], [24], [27], [28], [33]. However, when the intensity values of a DT is modeled by LDS, although the model is regarded as a stable autoregressive moving average (ARMA) process, it has three main disadvantages: (i) second-order probabilistic stationarity assumption cannot be justified for many sequences (e.g. fire), (ii) the suboptimal relationship between the order of the LDS model and the extent of temporal modeling possibility (that is, an LDS of order n does not capture the most temporal variation in a DT among all models of order n), and (iii) computational complexity is high, since the model is applied directly to pixel intensities regardless of spatial redundancy; (iv) however, the temporal evolution of DT is non-linear in general and is not fully captured by the linear model [36].
A good descriptor must accurately and efficiently capture both the texture and dynamism of DTs. Texture can be captured from textured frames, but for capturing the dynamism, a mathematical model is proposed to model the dynamism of a DT. In the proposed model, we efficiently compute velocity of dynamism or transmission model of the codes (transmitted frame to code space) from one frame to the next one. We then describe a DT in a two-step procedure: describing the texture and the dynamism of a DT using wavelet transform. Since its emergence 20 years ago, the wavelet transform has been exploited with great success across the gamut of signal processing applications, often redefining the state-of-the-art performance in the process. The dual tree implementation of complex wavelet transform provides a directional multi-resolution decomposition of a given DT. Thus, in this paper, an automatic method for textural feature extraction (from textured frames) and dynamical feature extraction (from models of the dynamism) using DTCWT has been proposed. The characteristics of the DTCWT such as illumination and shift invariance, directional selectivity, phase information, perfect reconstruction, limited redundancy, and efficient order-N computation provide a framework which renders the extraction of textural and dynamical features capable of powerfully characterizing DTs. It also overcomes the limitations of the classical Discrete Wavelet Transform (DWT) such as lack of shift invariance, poor directional selectivity, oscillations of the coefficients at a singularity and aliasing due to down-sampling [17], [30].
In the light of previous researches, the contributions of the present work are as follows. A new efficient method based on a two-phase texture and dynamism analysis is proposed to address classification and synthesis, the most challenging problems of DTs. For this purpose, a mathematical model of the dynamism of a DT is proposed. DTCWT is applied on the textured frames and the models of the dynamism of the DT. Then statistical features of DTCWT coefficients form the descriptors. The descriptors of all dynamic patterns are clustered using k-means clustering method. Then cluster centers produce a dictionary of visual words. By using the dictionary of visual words, each dynamic pattern is associated with a histogram representing the relative presence of particular visual words. Classification is performed by matching the histograms and synthesis is performed using the first frame feature vector and the model of the dynamism of the DT. In order to make it promising for practical applications, the proposed approach is invariant to illumination, rotational and shift transformations. Empirical evaluation of the standard data sets shows that the proposed approach achieves admissible speed and superior performance in accuracy over state-of-the-art methods.
Section snippets
A new approach to model the dynamism of a dynamic texture
In this section, it is shown how a mathematical model is obtained to model the dynamism of a DT.
Synthesis
To evaluate the performance of our proposed synthesis algorithm, we perform a comparison between the synthesized frames from [10] and the synthesized frames from our two-phase texture and dynamism analysis. The comparison for few examples is shown in Fig. 3. As a criterion, Root Mean Square (rms) error from original frames is calculated according to (9) as listed in Table 1.where P and Q are image sizes and Iorg and Isyn are original and synthesized frames,
Conclusions
In this paper, based on a two-phase texture and dynamism analysis, a new efficient method was proposed to address classification and synthesis, the most challenging problems of DTs. For this purpose, a mathematical model of the dynamism of a DT was proposed. DTCWT was applied on the textured frames and the models of the dynamism of the DT. Then statistical features of DTCWT coefficients formed the descriptors. The descriptors of all dynamic patterns were clustered using k-means clustering
References (36)
Complex wavelets for shift invariant analysis and filtering of signals
J. Appl. Comput. Harmon. Anal.
(2001)- et al.
Qualitative recognition of motion using temporal texture
CVGIP: Image Understanding
(1992) - et al.
Rotation invariant image description with local binary pattern histogram fourier features
- et al.
Texture mixing and texture movie synthesis using statistical learning
IEEE Trans. Visual Comput. Graphics
(2001) - et al.
Motion characterization from temporal cooccurrences of local motion-based measures for video indexing
- et al.
Probabilistic kernels for the classification of auto-regressive visual processes
- et al.
Classifying video with kernel dynamic textures
- A.B. Chan, E. Coviello, G.R.G. Lanckriet, Clustering dynamic textures with the hierarchical EM algorithm, in: IEEE...
- et al.
A brief survey of dynamic texture description and recognition
- et al.
Kernel-based object tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2003)
Space time texture representation and recognition based on a spatiotemporal orientation analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Dynamic textures
Int. J. Comput. Vision
Normal versus complete flow in dynamic texture recognition: a comparative study
Stochastic rigidity: image registration for nowhere-static scenes
Sparse coding of linear dynamical systems with an application to dynamic texture recognition
Digital Image Processing Using Matlab
Seeing structure through chaos
The dual-tree complex wavelet transform: a new efficient tool for image restoration and enhancement
Cited by (4)
A Comprehensive Taxonomy of Dynamic Texture Representation
2021, ACM Computing SurveysStatistical model and local binary pattern based texture feature extraction in dual-tree complex wavelet transform domain
2018, Multidimensional Systems and Signal ProcessingA comparative study applied to dynamic textures segmentation
2016, 2nd International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2016
- ☆
This paper has been recommended for acceptance by R. Davies.