Understanding bag-of-words model: a statistical framework

Zhang, Yin; Jin, Rong; Zhou, Zhi-Hua

doi:10.1007/s13042-010-0001-0

Understanding bag-of-words model: a statistical framework

Original Article
Published: 28 August 2010

Volume 1, pages 43–52, (2010)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Yin Zhang¹,
Rong Jin² &
Zhi-Hua Zhou¹

13k Accesses
11 Altmetric
1 Mention
Explore all metrics

Abstract

The bag-of-words model is one of the most popular representation methods for object categorization. The key idea is to quantize each extracted key point into one of visual words, and then represent each image by a histogram of the visual words. For this purpose, a clustering algorithm (e.g., K-means), is generally used for generating the visual words. Although a number of studies have shown encouraging results of the bag-of-words representation for object categorization, theoretical studies on properties of the bag-of-words model is almost untouched, possibly due to the difficulty introduced by using a heuristic clustering process. In this paper, we present a statistical framework which generalizes the bag-of-words representation. In this framework, the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. A theoretical analysis based on statistical consistency is presented for the proposed framework. Moreover, based on the framework we developed two algorithms which do not rely on clustering, while achieving competitive performance in object categorization when compared to clustering-based bag-of-words representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Bag-of-Words Method with Dictionary Analysis by Evolutionary Algorithm

The Image Classification with Different Types of Image Features

Incremental Estimation of Visual Vocabulary Size for Image Retrieval

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Abramowitz M, Stegun IA (eds) (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York
MATH Google Scholar
Bartlett PL, Wang M (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Article MathSciNet Google Scholar
Csurka G, Dance C, Fan L, Williamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning in computer vision, Prague, Czech Republic, 2004
Everingham M, Zisserman A, Williams CKI, Van Gool L (2006) The PASCAL visual object classes challenge 2006 (VOC2006) results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf
Farquhar J, Szedmak S, Meng H, Shawe-Taylor J (2005) Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton
Joachims T (1998) Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning. Chemnitz, Germany, pp 137–142
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proceedings of the 10th IEEE international conference on computer vision, Beijing, China, 2005, pp 604–610
Lazebnik S, Raginsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309
Article Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI workshop on learning for text categorization, Madison, WI
McDiarmid C (1989) On the method of bounded differences. In: Surveys in combinatorics 1989, pp 148–188
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT Press, Cambridge, pp 985–992
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, NY, pp 2161–2168
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the 9th European conference on computer vision, Graz, Austria, pp 490–503
Opelt A, Pinz A, Fussenegger M, Auer P (2006) Generic object recognition with boosting. IEEE Trans Pattern Anal Mach Intell 28(3):416–431
Article Google Scholar
Perronnin F, Dance C, Csurka G, Bressian M (2006) Adapted vocabularies for generic visual categorization. In: Proceedings of the 9th European conference on computer vision, Graz, Austria, pp 464–475
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Anchorage, AK
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Shawe-Taylor J, Dolia A (2007) A framework for probability density estimation. In: Proceedings of the 11th international conference on artificial intelligence and statistics, San Juan, Puerto Rico, pp 468–475
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE international conference on computer vision, Nice, France, pp 1470–1477
Tuytelaars T, Schmid C (2007) Vector quantizing feature space with a regular lattice. In: Proceedings of the 11th IEEE international conference on computer vision, Rio de Janeiro, Brazil, pp 1–8
van Gemert JC, Geusebroek J-M, Veenman CJ, Smeulders AWM (2008) Kernel codebooks for scene categorization. In: Proceedings of the 10th European conference on computer vision, Marseille, France, pp 696–709
Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/
Viitaniemi V, Laaksonen J (2008) Experiments on selection of codebooks for local image feature histograms. In: Proceedings of the 10th international conference series on visual information systems, Salerno, Italy, pp 126–137
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of the 10th IEEE international conference on computer vision, Beijing, China, pp 1800–1807

Download references

Acknowledgments

We want to thank the reviewers for helpful comments and suggestions. This research is partially supported by the National Fundamental Research Program of China (2010CB327903), the Jiangsu 333 High-Level Talent Cultivation Program and the National Science Foundation (IIS-0643494). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Yin Zhang & Zhi-Hua Zhou
Department of Computer Science & Engineering, Michigan State University, East Lansing, MI, 48824, USA
Rong Jin

Authors

Yin Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Rong Jin
View author publications
You can also search for this author inPubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Jin, R. & Zhou, ZH. Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. & Cyber. 1, 43–52 (2010). https://doi.org/10.1007/s13042-010-0001-0

Download citation

Received: 27 February 2010
Accepted: 02 July 2010
Published: 28 August 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s13042-010-0001-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding bag-of-words model: a statistical framework

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Bag-of-Words Method with Dictionary Analysis by Evolutionary Algorithm

The Image Classification with Different Types of Image Features

Incremental Estimation of Visual Vocabulary Size for Image Retrieval

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now