Abstract
We propose a theoretical construct coined nested pictorial structure to represent an object by parts that are recursively nested. Three innovative ideas are proposed: First, the nested pictorial structure finds a part configuration that is allowed to be deformed in geometric arrangement, while being confined to be topologically nested. Second, we define nested features which lend themselves to better, more detailed accounting of pixel data cost and describe occlusion in a principled way. Third, we develop the concept of constrained distance transform, a variation of the generalized distance transform, to guarantee the topological nesting relations and to further enforce that parts have no overlap with each other. We show that matching an optimal nested pictorial structure of K parts on an image of N pixels takes O(NK) time using dynamic programming and constrained distance transform. In our MATLAB/C++ implementation, it takes less than 0.1 seconds to do the global optimal matching when K = 10 and N = 400 ×400. We demonstrate the usefulness of nested pictorial structures in the matching of objects of nested patterns, objects in occlusion, and objects that live in a context.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures 22, 67–92 (1973)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61, 55–79 (2005)
Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Technical Report TR2004-1963, Cornell Computing and Information Science (2004)
Yuille, A., Hallinan, P., Cohen, D.: Feature extraction from faces using deformable templates. IJCV 8, 99–111 (1992)
Burl, M.C., Weber, M., Perona, P.: A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 628–641. Springer, Heidelberg (1998)
Weber, M., Welling, M., Perona, P.: Towards automatic discovery of object categories. In: CVPR, pp. 2101–2108 (2000)
Coughlan, J., Yuille, A., English, C., Snow, D.: Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding 78, 303–319 (2000)
Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE PAMI 23, 681–685 (2001)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE CVPR, pp. 264–271 (2003)
Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition using statistical models. In: IEEE CVPR, pp. 10–17 (2005)
Amit, Y., Trouvé, A.: Pop: Patchwork of parts models for object recognition. IJCV 75, 267–282 (2007)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE PAMI 32, 1627–1645 (2010)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: IEEE CVPR, pp. 1062–1069 (2010)
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)
Felzenszwalb, P., Huttenlocher, D.: Efficient matching of pictorial structures. In: IEEE CVPR, pp. 2066–2073 (2000)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR, pp. 886–893 (2005)
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007), http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Gao, T., Packer, B., Koller, D.: A segmentation-aware object detection model with occlusion handling. In: IEEE CVPR, pp. 1361–1368 (2011)
Sundberg, P., Brox, T., Maire, M., Arbelaez, P., Malik, J.: Occlusion boundary detection and figure/ground assignment from optical flow. In: IEEE CVPR, pp. 2233–2240 (2011)
Humayun, A., Aodha, O., Brostow, G.: Learning to find occlusion regions. In: IEEE CVPR, pp. 2161–2168 (2011)
Woodcock, C., Harward, V.: Nested-hierarchical scene models and image segmentation. Internal Journal of Remote Sensing 13 (1992)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)
Li, L., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: IEEE CVPR, pp. 2036–2043 (2009)
Lampert, C., Blaschko, M., Hofmann, T.: Efficient subwindow search: A branch and bound framework for object localization. IEEE PAMI 31, 2129–2142 (2009)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gu, S., Zheng, Y., Tomasi, C. (2012). Nested Pictorial Structures. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7573. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33709-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-33709-3_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33708-6
Online ISBN: 978-3-642-33709-3
eBook Packages: Computer ScienceComputer Science (R0)