Skip to main content
Log in

Putting Objects in Perspective

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to refine geometry and vice-versa. Our framework allows painless substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • The PASCAL object recognition database collection (2005). Website, http://www.pascal-network.org/challenges/VOC/.

  • Barrow, H., & Tenenbaum, J. (1978). Recovering intrinsic scene characteristics from images. In Comp. vision systems.

  • Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual organization, Chap. 8. Hillsdale: Erlbaum.

    Google Scholar 

  • Brooks, R., Greiner, R., & Binford, T. (1979). Model-based three-dimensional interpretation of two-dimensional images. In IJCAI.

  • Collins, M., Schapire, R., & Singer, Y. (2002). Logistic regression, Adaboost and Bregman distances. Machine Learning, 48(1–3), 253–285.

    Article  MATH  Google Scholar 

  • Coughlan, J., & Yuille, A. (2003). Manhattan world: orientation and outlier detection by Bayesian inference. Neural Computation, 15(5), 1063–1088.

    Article  Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.

  • Forsyth, D. A., Mundy, J. L., Zisserman, A., & Rothwell, C. A. (1994). Using global consistency to recognise Euclidean objects with an uncalibrated camera. In CVPR.

  • Greienhagen, M., Ramesh, V., Comaniciu, D., & Niemann, H. (2000). Statistical modeling and performance characterization of a real-time dual camera surveillance system. In CVPR.

  • Hanson, A., & Riseman, E. (1978). VISIONS: A computer system for interpreting scenes. In Computer vision systems.

  • He, X., Zemel, R. S., & Carreira-Perpiñán, M.Á. (2004). Multiscale conditional random fields for image labeling. In CVPR.

  • Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV.

  • Hoiem, D., Efros, A. A., & Hebert, M. (2006). Putting objects in perspective. In CVPR.

  • Jeong, S. G., Kim, C. S., Lee, D. Y., Ha, S. K., Lee, D. H., Lee, M. H., & Hashimoto, H. (2001). Real-time lane detection for autonomous vehicle. In ISIE.

  • Kosecka, J., & Zhang, W. (2002). Video compass. In ECCV. Berlin: Springer.

    Google Scholar 

  • Krahnstoever, N., & Mendonça, P. R. S. (2005). Bayesian autocalibration for surveillance. In ICCV.

  • Kumar, S., & Hebert, M. (2003). Discriminative random fields: A discriminative framework for contextual interaction in classification. In ICCV.

  • Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In ICCV.

  • Lalonde, J.-F., Hoiem, D., Efros, A. A., Rother, C., Winn, J., & Criminisi, A. (2007). Photo clip art. In ACM SIGGRAPH.

  • Murphy, K. (2001). The Bayes net toolbox for Matlab. In Computing science and statistics (Vol. 33).

  • Murphy, K., Torralba, A., & Freeman, W. T. (2003). Graphical model for recognizing scenes and objects. In NIPS.

  • Navot, A., Shpigelman, L., Tishby, N., & Vaadia, E. (2006). Nearest neighbor based feature selection for regression and its application to neural activity. In NIPS.

  • Ohta, Y. (1985). Knowledge-based interpretation of outdoor natural color scenes. London: Pitman.

    Google Scholar 

  • Oliva, A., & Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition.

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Platt, J. C. (2000). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers.

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2005). LabelMe: a database and web-based tool for image annotation (Technical Report). Cambridge, MA: MIT Press.

  • Schneiderman, H. (2004). Learning a restricted Bayesian network for object detection. In CVPR.

  • Sudderth, E., Torralba, A., Freeman, W. T., & Wilsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In ICCV.

  • Torralba, A. (2005). Contextual Influences on Saliency (pp. 586–593). San Diego/Amsterdam: Academic Press/Elsevier.

    Google Scholar 

  • Torralba, A., & Oliva, A. (2002). Depth estimation from image structure. IEEE Trans. Pattern Anal. Math. Intell., 24(9), 1226–1238.

    Article  Google Scholar 

  • Torralba, A., & Sinha, P. (2001). Statistical context priming for object detection. In ICCV.

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. Int. J. Comput. Vis., 63(2), 113–140.

    Article  Google Scholar 

  • Viola, P., & Jones, M. J. (2004). Robust real-time face detection. Int. J. Comput. Vis., 57(2), 137–154.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoiem, D., Efros, A.A. & Hebert, M. Putting Objects in Perspective. Int J Comput Vis 80, 3–15 (2008). https://doi.org/10.1007/s11263-008-0137-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-008-0137-5

Keywords

Navigation