Skip to main content
Log in

Abstract

This paper shows (i) improvements over state-of-the-art local feature recognition systems, (ii) how to formulate principled models for automatic local feature selection in object class recognition when there is little supervised data, and (iii) how to formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods, Bayesian learning techniques and data association with constraints, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and obtains excellent results for image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1475–1490.

    Article  Google Scholar 

  • Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Multiple instance learning with generalized support vector machines. In Proceedings of the 18th national conference on artificial intelligence (pp. 943–944).

  • Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2), 5–43.

    Article  MATH  Google Scholar 

  • Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory. New York: Wiley.

    MATH  Google Scholar 

  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–236.

    MATH  MathSciNet  Google Scholar 

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Carbonetto, P., de Freitas, N., Gustafson, P., & Thompson, N. (2003). Bayesian feature weighting for unsupervised learning, with application to object recognition. In Proceedings of the workshop on artificial intelligence and statistics.

  • Carbonetto, P., de Freitas, N., & Barnard, K. (2004a). A statistical model for general contextual object recognition. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 350–362).

  • Carbonetto, P., Dorko, G., Schmid, C., & de Freitas, N. (2004b). Bayesian learning for weakly supervised object classification. Technical report, INRIA Rhône-Alpes.

  • Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970.

    Article  MATH  MathSciNet  Google Scholar 

  • Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335.

    Article  Google Scholar 

  • Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the ECCV international workshop on statistical learning in computer vision.

  • Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 157–162).

  • Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance learning with axis-parallel rectangles. Artificial Intelligence, 89(1), 31–71.

    Article  MATH  Google Scholar 

  • Dorkó, G., & Schmid, C. (2003). Selection of scale invariant neighborhoods for object class recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 634–640).

  • Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. A. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European conference on computer vision (Vol. IV, pp. 97–112).

  • Everingham, M., Zisserman, A., Williams, C., & Gool, L. V. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. Technical report.

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 264–271).

  • Hamze, F., & de Freitas, N. (2004). From fields to trees. In Proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 243–250).

  • Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.

    Article  MATH  Google Scholar 

  • Kohn, R., Smith, M., & Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11, 313–322.

    Article  MathSciNet  Google Scholar 

  • Kück, H., & de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 332–339).

  • Kück, H., Carbonetto, P., & de Freitas, N. (2004). A constrained semi-supervised learning approach to data association. In Proceedings of the 8th European conference on computer vision (Vol. III, pp. 1–12).

  • Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 26, 179–201.

    Article  Google Scholar 

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning.

  • Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 878–885).

  • Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.

    Article  Google Scholar 

  • Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274.

    Article  MATH  MathSciNet  Google Scholar 

  • Liu, J. S., Wong, W. H., & Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81(1), 27–40.

    Article  MATH  MathSciNet  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Marsden, J. E., & Tromba, A. J. (1999). Vector calculus (4th ed.). New York: Freeman.

    Google Scholar 

  • McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57, 995–1026.

    Article  MATH  MathSciNet  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In Proceedings of the 8th international conference on computer vision (Vol. I, pp. 525–531).

  • Mikolajczyk, K., & Schmid, C. (2003). A Performance evaluation of local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 257–263).

  • Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 69–82).

  • Miller, T., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y. W., Learned-Miller, E., & Forsyth, D. A. (2004). Names and faces in the news. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 848–854).

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the 8th European conference on computer vision (Vol. II, pp. 71–84).

  • Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. In Advances in neural information processing systems 15.

  • Quattoni, A., Collins, M., & Darrell, T. (2005). Conditional random fields for object recognition. In Advances in neural information processing systems 17 (pp. 1097–1104)

  • Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 10–17).

  • Robert, C. P. (1994). The Bayesian choice. Berlin: Springer.

    MATH  Google Scholar 

  • Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5, 121–125.

    Article  MATH  Google Scholar 

  • Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (2nd ed.). Berlin: Springer.

    MATH  Google Scholar 

  • Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 994–1000).

  • Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 731–737).

  • Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their locations in images. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 370–377).

  • Tham, S. (2002). Markov chain Monte Carlo for sparse Bayesian regression and classification. PhD thesis, University of Melbourne.

  • Tham, S. S., Doucet, A., & Kotagiri, R. (2002). Sparse Bayesian learning for regression and classification using Markov Chain Monte Carlo. In Proceedings of the 19th international conference on machine learning.

  • Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.

    Article  MATH  MathSciNet  Google Scholar 

  • Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 273–280).

  • Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., & Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In Proceedings of the CVPR workshop on learning for adaptable visual systems.

  • Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 37–44).

  • Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley.

    MATH  Google Scholar 

  • Zhang, J., Marsałek, M., Lazebnik, S., & Schmid, C. (2006). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

  • Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th international conference on machine learning (pp. 912–919).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Carbonetto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carbonetto, P., Dorkó, G., Schmid, C. et al. Learning to Recognize Objects with Little Supervision. Int J Comput Vis 77, 219–237 (2008). https://doi.org/10.1007/s11263-007-0067-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0067-7

Keywords

Navigation