Abstract
While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose scale-invariant feature transform (SIFT) flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration, and face recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Avidan, S.: Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 261–271 (2007)
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proceeding of ICCV (2007)
Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Systems and experiment performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (NIPS) (2000)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Berg, A., Berg., T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Bergen, J.R., Anandan, P., Hanna, K.J., Hingorani, R.: Hierarchical model-based motion estimation. In: European Conference on Computer Vision (ECCV), pp. 237–252 (1992)
Black, M.J., Anandan, P.: The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: European Conference on Computer Vision (ECCV), pp. 25–36 (2004)
Brox, T., Bregler, C., Malik, J.: Large displacement optical flow. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Bruhn, A., Weickert, J., Schnörr, C.: Lucas/Kanade meets Horn/Schunk: combining local and global optical flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)
Cai, D., He, X., Hu, Y., Han, J., Huang, T.: Learning a spatially smooth subspace for face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Carson, C., Belongie, S., Greenspan, H., Malik, J.: Blobworld: color- and texture-based image segmentation using EM and its application to image querying and classification. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1026–1038 (2002)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: European Conference on Computer Vision (ECCV), vol. 2, pp. 484–498 (1998)
Cornelis, N., Gool, L.V.: Real-time connectivity constrained depth map computation using programmable graphics hardware. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1099–1104 (2005)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 524–531 (2005)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vis. 70(1), 41–54 (2006)
Fleet, D.J., Jepson, A.D., Jenkin, M.R.M.: Phase-based disparity measurement. Comput. Vis. Graph. Image Process. 53(2), 198–210 (1991)
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vis. 40(1), 25–47 (2000)
Gorkani, M.M., Picard, R.W.: Texture orientation for sorting photos at a glance. In: IEEE International Conference on Pattern Recognition (ICPR), vol. 1, pp. 459–464 (1994)
Grauman, K., Darrell, T.: Pyramid match kernels: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision (ICCV) (2005)
Grimson, W.E.L.: Computational experiments with a feature based stereo algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 7(1), 17–34 (1985)
Hannah, M.J.: Computer matching of areas in stereo images. Ph.D. thesis, Stanford University (1974)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147–151 (1988)
Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM SIGGRAPH 26(3) (2007)
Horn, B.K.P., Schunck, B.G.: Determinig optical flow. Artif. Intell. 17, 185–203 (1981)
Jones, D.G., Malik, J.: A computational framework for determining stereo correspondence from a set of linear spatial filters. In: European Conference on Computer Vision (ECCV), pp. 395–410 (1992)
Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: IEEE International Conference on Computer Vision (ICCV), pp. 508–515 (2001)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 2169–2178 (2006)
Liu, C., Freeman, W.T., Adelson, E.H.: Analysis of contour motions. In: Advances in Neural Information Processing Systems (NIPS) (2006)
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: European Conference on Computer Vision (ECCV) (2008)
Liu, C., Freeman, W.T., Adelson, E.H., Weiss, Y.: Human-assisted motion annotation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision (ICCV), Kerkyra, pp. 1150–1157 (1999)
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM SIGGRAPH 22(3), 313–318 (2003)
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching – incorporating a global constraint into MRFs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 993–1000 (2006)
Russell, B.C., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems (NIPS) (2007)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Samaria, F., Harter, A.: Parameterization of a stochastic model for human face identification. In: IEEE Workshop on Applications of Computer Vision (1994)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1), 7–42 (2002)
Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vis. 37(2), 151–172 (2000)
Shekhovtsov, A., Kovtun, I., Hlavac, V.: Efficient MRF deformation model for non-rigid image matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision (ICCV) (2003)
Sun, J., Zheng, N., Shum, H.: Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 25(7), 787–800 (2003)
Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)
Szeliski, R.: Image alignment and stitching: a tutorial. Found. Trends Comput. Graph. Comput. Vis. 2(1), 1–104 (2006)
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 1068–1080 (2008)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Viola, P., Wells, W., III: Alignment by maximization of mutual information. In: IEEE International Conference on Computer Vision (ICCV), pp. 16–23 (1995)
Weiss, Y.: Interpreting images by propagating bayesian beliefs. In: Advances in Neural Information Processing Systems (NIPS), pp. 908–915 (1997)
Weiss, Y.: Smoothness in layers: motion segmentation using nonparametric mixture estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 520–527 (1997)
Winn, J., Jojic, N.: Locus: learning object classes with unsupervised segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp. 756–763 (2005)
Yang, G., Stewart, C.V., Sofka, M., Tsai, C.L.: Registration of challenging image pairs: initialization, estimation, and decision. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1973–1989 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Liu, C., Yuen, J., Torralba, A. (2016). SIFT Flow: Dense Correspondence Across Scenes and Its Applications. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-23048-1_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)