Abstract
Over the years, several spatio-temporal interest point detectors have been proposed. While some detectors can only extract a sparse set of scale-invariant features, others allow for the detection of a larger amount of features at user-defined scales. This paper presents for the first time spatio-temporal interest points that are at the same time scale-invariant (both spatially and temporally) and densely cover the video content. Moreover, as opposed to earlier work, the features can be computed efficiently. Applying scale-space theory, we show that this can be achieved by using the determinant of the Hessian as the saliency measure. Computations are speeded-up further through the use of approximative box-filter operations on an integral video structure. A quantitative evaluation and experimental results on action recognition show the strengths of the proposed detector in terms of repeatability, accuracy and speed, in comparison with previously proposed detectors.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2, pp. 1470–1477 (October 2003)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC, Edinburgh, U.K (2006)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, Nice, France (October 2003)
Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 65–72 (2005)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, vol. I, pp. 166–173 (2005)
Oikonomopoulos, A., Patras, I., Pantic, M.: Spatiotemporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics 36(3), 710–719 (2006)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. IJCV 65(1-2), 43–72 (2005)
Kadir, T., Brady, M.: Scale, saliency and image description. IJCV 45(2), 83–105 (2001)
Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, Rio de Janeiro, Brazil, pp. 1–8 (2007)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded-up robust features. In: ECCV, Graz, Austria (2006)
Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 77–116 (1998)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)
Beaudet, P.: Rotationally invariant image operators. In: International Joint Conference on Pattern Recognition, pp. 579–583 (1978)
Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: ICPR, Cambridge, U.K (2004)
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. Technical Report KUL/ESAT/PSI/0802, K.U. Leuven (2008)
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Int. Workshop on Spatial Coherence for Visual Motion Analysis
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: CIVR, pp. 494–501 (2007)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification, pp. 1919–1923 (2007)
Yan, J., Pollefeys, M.: Video synchronization via space-time interest point distribution. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Electronic Supplementary Material
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Willems, G., Tuytelaars, T., Van Gool, L. (2008). An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88688-4_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-88688-4_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88685-3
Online ISBN: 978-3-540-88688-4
eBook Packages: Computer ScienceComputer Science (R0)