doi:10.1016/j.imavis.2003.07.002
Copyright © 2003 Elsevier B.V. All rights reserved.
Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study
Computational Vision and Active Perception Laboratory (CVAP), Department of Numerical Analysis and Computer Science, KTH, SE-100 44, Stockholm, Sweden
Received 26 September 2002;
revised 27 June 2003;
accepted 2 July 2003. ;
Available online 30 December 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
This article presents an experimental study of the influence of velocity adaptation when recognizing spatio-temporal patterns using a histogram-based statistical framework. The basic idea consists of adapting the shapes of the filter kernels to the local direction of motion, so as to allow the computation of image descriptors that are invariant to the relative motion in the image plane between the camera and the objects or events that are studied. Based on a framework of recursive spatio-temporal scale-space, we first outline how a straightforward mechanism for local velocity adaptation can be expressed. Then, for a test problem of recognizing activities, we present an experimental evaluation, which shows the advantages of using velocity-adapted spatio-temporal receptive fields, compared to directional derivatives or regular partial derivatives for which the filter kernels have not been adapted to the local image motion.
Author Keywords: Motion; Spatio-temporal filtering; Scale-space; Recognition
Fig. 1. Spatio-temporal image of a walking person (a) depends on the relative motion between the person and the camera (b)–(c). If this motion is not taken into account, spatio-temporal filtering (here, the second order spatial derivative) results in highly different responses as illustrated in (d) and (e). Manual stabilization of the pattern in (e) shown in (f) makes the difference more explicit for comparisons with (d).
Fig. 2. A pre-requisite for perfect matching of spatio-temporal receptive field responses for different amounts of motion is that the image representation is closed under motions in the image domain. The aim of the velocity adaptation mechanism is to allow for such closedness, and to permit the construction of a velocity invariant recognition scheme.
Fig. 3. The effect of global velocity adaptation for a synthetic spatio-temporal pattern in (a). (b)–(d) Convolution of (a) with spatio-temporal second-order derivative operators with σ
2=32, τ
2=32 and velocity parameters
v=−1,0,1, respectively. Note, that depending on the velocity parameter, global velocity adaptation emphasizes either the moving pattern (b) or the stationary pattern (c).
Fig. 4. Spatio-temporal filters
Lxx computed from a velocity-adapted spatio-temporal scale-space for a 1+1D image pattern, for different values of the velocity parameter
v, the spatial scale σ
2 and the temporal scale τ
2.
Fig. 5. Results of filtering original patterns in (a) and (d) using the proposed
local velocity adaptation are illustrated in (b) and (e), respectively. The orientation of the ellipses in (c) and (f) show the chosen velocity at each point of the pattern. Note that filtering with local velocity adaptation preserves the details of the moving and stationary pattern. The similarity of the filter responses in (b) and (e) also illustrates the independence of the filtering results with respect to the amount of camera motion.
Fig. 6. Spatio-temporal filtering with local velocity adaptation applied to a gait pattern recorded with a stabilized camera (a) and a stationary camera (b) (see
Fig. 1 for comparison); (c) and (d) velocity adapted shape of filter kernels; (e) and (f) results of filtering with a second-order derivative operator; (g) warped version of (f) showing high similarity with (e).
Fig. 7. (a) Prototype spatio-temporal blob signal with velocity
vx=2. (b)–(d) Responses to the ∂
xxt-derivative operator when using (b): velocity-adapted filters; (c): velocity-steered filters; (d): non-adapted filters. A correct shape of the filter response is obtained only for the case of velocity-adapted filtering.
Fig. 8. Test sequences of people walking
W1–
W4 and people performing an exercise
E1–
E4. Whereas the sequences
W1,
W4,
E1,
E3 were taken with a manually stabilized camera, the other four sequences were recorded using a stationary camera.
Fig. 9. Results of local velocity adaptation for image sequences recorded with a manually stabilized camera (a), and with a stationary camera (b). Directions of cones in (c) and (d) correspond to the velocity chosen by the proposed adaptation algorithm. The size of the cones corresponds the value of the squared Laplacian ((∂
xx+∂
yy)
L(
x,
y,
t;σ,τ))
2 at the selected velocities.
Fig. 10. Means and variances of histograms for the activities ‘walking’ (red) and ‘exercise’ (blue). (a)–(c) Histograms of velocity-adapted derivatives
Lxxt,
Lxyt,
Lyyt; (d)–(f) histograms of velocity-steered directional derivatives
Lxxt,
Lxyt,
Lyyt; (g)–(i) histograms of non-adapted partial derivatives
Lxxt,
Lxyt,
Lyyt. As can be seen, the velocity-adapted filter responses give considerably better possibility to discriminate the motion patterns compared to velocity-steered or non-adapted filters.
Fig. 11. Distance ratios computed for different types of derivatives and for velocity-adapted (solid lines), velocity-steered (point-dashed lines) and non-adapted (dashed lines) filter responses. As can be seen, local velocity adaptation results in lower values of the distance ratio and therefore better recognition performance compared to steered or non-adapted filter responses.
Fig. 12. Evolution of the distance ratio
r over spatial scales (a) and temporal scales (b). Minima over scales indicate scale values with the highest discrimination ability.
Fig. 13. Values of distance ratios when averaged over all scales and at the manually selected scales that give best discrimination performance.