Skip to main content
Log in

Abstract

In this paper we propose an object recognition approach that is based on shape masks—generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks—classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agarwal, S., & Roth, D. (2002). Learning a sparse representation for object detection. In ECCV.

    Google Scholar 

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.

    Article  Google Scholar 

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV.

    Google Scholar 

  • Chapelle, O., Haffner, P., & Vapnik, V. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.

    Article  Google Scholar 

  • Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop on statistical learning in computer vision.

    Google Scholar 

  • Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV.

    Google Scholar 

  • Everingham, M., Zisserman, A., Williams, C., & Gool, L.V., et al. (2006). The 2005 PASCAL visual object classes challenge. In Selected proceedings of the first PASCAL challenges workshop.

    Google Scholar 

  • Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2008). Overview and results of the detection challenge. In The PASCAL VOC’08 challenge workshop in conj. with ECCV.

    Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL visual object classes challenge 2009 (VOC2009) results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.

  • Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303.

    Article  Google Scholar 

  • Fowlkes, C., Belongie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 1–12.

    Article  Google Scholar 

  • Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminant models for object category detection. In ICCV.

    Google Scholar 

  • Fussenegger, M., Opelt, A., & Pinz, A. (2006). Object localization/segmentation using generic shape priors. In ICPR.

    Google Scholar 

  • Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV.

    Google Scholar 

  • Gårding, J., & Lindeberg, T. (1996). Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2), 163–191.

    Article  Google Scholar 

  • Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV.

    Google Scholar 

  • Gu, C., Lim, J., Arbelaez, P., & Malik, J. (2009). Recognition using regions. In CVPR.

    Google Scholar 

  • Hayman, E., Caputo, B., Fritz, M., & Eklundh, JO (2004). On the significance of real-world conditions for material classification. In ECCV.

    Google Scholar 

  • Jing, F., Li, M., Zhang, H. J., & Zhang, B. (2003). Support vector machines for region-based image retrieval. In ICME.

    Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In ICCV.

    Google Scholar 

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.

    Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.

    Article  Google Scholar 

  • Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an unsupervised framework. In CVPR.

    Google Scholar 

  • Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.

    Article  Google Scholar 

  • Lowe, D. (2004). Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR.

    Google Scholar 

  • Marr, D. (1982). Vision. New York: Freeman.

    Google Scholar 

  • Marszałek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In CVPR.

    Google Scholar 

  • Marszałek, M., & Schmid, C. (2007). Accurate object localization with shape masks. In CVPR.

    Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

    Article  Google Scholar 

  • Opelt, A., & Pinz, A. (2005). Object localization with boosting and weak supervision for generic object recognition. In SCIA.

    Google Scholar 

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004a). Generic object recognition with boosting. Tech. rep. TR-EMT-2004-01, TU Graz.

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004b). Weak hypotheses and boosting for generic object detection and recognition. In ECCV.

    Google Scholar 

  • Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431.

    Article  Google Scholar 

  • Peterson, M. (1994). Object recognition processes can and do operate before figure-ground organization. Current Directions in Psychological Science, 3, 105–111.

    Article  Google Scholar 

  • Ramanan, D. (2007). Using segmentation to verify object hypotheses. In CVPR.

    Google Scholar 

  • Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2003). 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In CVPR.

    Google Scholar 

  • Rowley, H., Baluja, S., & Kanade, T. (1998). Neural networks based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 22–38.

    Article  Google Scholar 

  • Rubner, Y., Tomasi, C., & Guibas, L. (2000). The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.

    Article  MATH  Google Scholar 

  • Russell, B., Efros, A., Sivic, J., Freeman, W., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extents in image collections. In CVPR.

    Google Scholar 

  • Schölkopf, B., & Smola, A. (2002). Learning with kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.

    Google Scholar 

  • Seemann, E., & Schiele, B. (2006). Cross-articulation learning for robust detection of pedestrians. In DAGM.

    Google Scholar 

  • Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR.

    Google Scholar 

  • Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In ICCV.

    Google Scholar 

  • Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.

    Google Scholar 

  • Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV.

    Google Scholar 

  • Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV.

    Google Scholar 

  • Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., & Gool, L. V. (2006). Towards multi-view object class detection. In CVPR.

    Google Scholar 

  • Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.

    Google Scholar 

  • Vecera, S. (1998). Figure-ground organization and object recognition processes: an interactive account. Journal of Experimental Psychology. Human Perception and Performance, 24(2), 441–462.

    Article  Google Scholar 

  • Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Winn, J., & Joijic, N. (2005). LOCUS: learning object classes with unsupervised segmentation. In ICCV.

    Google Scholar 

  • Wu, B., & Nevatia, R. (2007). Simultaneous object detection and segmentation by boosting local shape feature based classifier. In CVPR.

    Google Scholar 

  • Yu, S., & Shi, J. (2003). Object-specific figure-ground segregation. In CVPR.

    Google Scholar 

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Marszałek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marszałek, M., Schmid, C. Accurate Object Recognition with Shape Masks. Int J Comput Vis 97, 191–209 (2012). https://doi.org/10.1007/s11263-011-0479-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0479-2

Keywords

Navigation