Skip to main content
Log in

Learning visual codebooks for image classification using spectral clustering

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This study explores the idea of learning visual codebook using spectral clustering, which we call spectral visual codebook learning (SVCL). Though spectral clustering has been widely applied into unsupervised segmentation, clustering, and manifold learning, using it to learn codebooks on standard image benchmark datasets has not been thoroughly studied. We show how learned codebooks by SVCL can be used for scene classification, texture recognition and image categorization. We describe several implementations for constructing the similarity graph and addressing the large-scale local image patches problem. We show that our approach captures nonlinear manifolds of semantic image patches. Another advantage is that both label and spatial information can be incorporated without increasing its model complexity. We validate SVCL on datasets such as KTH-TIPS, Scene-15, Graz-02, and Caltech-101.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Chang Y, Lin C (2008) Ranking feature using linear svm. In: JMLR workshop, pp 53–64

  • Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586

    Article  Google Scholar 

  • Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop of European conference on computer vision, pp 1–16

  • Fischer B, Buhmann J (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25:513–518

    Google Scholar 

  • Forsyth D, Toor P, Zisserman A (2008) Kernel codebooks for scence categorization. In: European conference on computer vision, pp 696–709

  • Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26:214–224

    Article  Google Scholar 

  • Fred A, Jain A (2004) Robust data clustering. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International conference on computer vision, pp 604–610

  • Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process ASSP 29(6):1153–1160

    Article  MathSciNet  MATH  Google Scholar 

  • Lanckriet G, Cristianini N, Ghaoui L, Bartlett P, Jordan J (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MathSciNet  MATH  Google Scholar 

  • Lazebnik S, Raginshy M (2007) Learning nearest-neighbor quantizers from labeled data by information loss minimization. In: AI statistics, pp 251–258

  • Lazebnik S, Schmid C, Ponce J (2003) Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: International conference on computer vision, pp 649–655

  • Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognition natural scene categories. In: IEEE conference on computer vision and pattern recognition, pp 2169–2178

  • Leibe B, Mikolajczyk K, Schiele B (2006) Efficient clustering and matching for object class recognition. In: British conference on computer vision, pp 1–10

  • Leung T, Malik J (1999) Recognizing surfaces using three-dimensional textons. In: International conference on computer vision, p 1010

  • Li F, Fergus R, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE conference on computer vision and pattern recognition, pp 524–531

  • Lim J, Ho J, Yang M, Lee K, Kriegman D (2004) Image clustering with metric, local linear structure and affine symmetry. In: European conference on computer vision, pp 456–468

  • Liu D, Hua G, Viola P, Chen T (2008) Integrated feature selection and higher-order spatial feature extraction for object categorization. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Liu J, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: IEEE conference on computer vision and pattern recognition, pp 461–468

  • Liu L, Wang L, Shen C (2011) A generalized probabilistic framework for compact codebook creation. In: IEEE conference on computer vision and pattern recognition, pp 1537–1544

  • Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–100

    Article  Google Scholar 

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    Article  MathSciNet  Google Scholar 

  • Mallapragada P, Jin R, Jain A (2010) Online visual vocabulary pruning using pairwise constraints. In: IEEE conference on computer vision and pattern recognition, pp 3073–3080

  • Mikulik A, Perdoch M, Chum O, Matas J (2010) Learning a fine vocabulary. In: European conference on computer vision, pp 1–14

  • Miladenic D, Brank J, Grobelnik M, Milic-Frayling N (2004) Feature selection using linear classifier weights: interaction with classification model. In: ACM SIGIR conference on research and development in information retrieval, pp 234–241

  • Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Neural information processing systems, pp 985–992

  • Ng A, Jordan M, Weiss Y (2002) On spectral clusterings: analysis and an algorithm. In: Neural information processing systems, pp 849–856

  • Nguyen H, Fablet R, Boucher J (2011) Visual textures as realizations of multivariate log-gaussian cox processes. In: IEEE conference on computer vision and pattern recognition, pp 2945–2952

  • Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: IEEE conference on computer vision and pattern recognition, pp 2161–2168

  • Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, pp 490–503

  • Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In: European conference on computer vision, pp 71–84

  • Shi J, Malik J (2000) Normilzed cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905

    Article  Google Scholar 

  • Sivic J, Aisserman Z (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp 1470–1477

  • Sonnenburg S, Ratsch G, Schafer C, Scholkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  • Strehl A, Ghosh J (2002) Clustering ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MATH  Google Scholar 

  • Wang J, Yang J, Yu K, Lu F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3360—3367

  • Wu J, Rehg JM (2009) Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: International conference on computer vision, pp 630–637

  • Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 907–916

  • Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp 1794–1801

  • Zhang J, Marszalek M, Lazebnik S, Schimd C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238

    Article  Google Scholar 

  • Zhu Q, Song G, Shi J (2007) Untangling cycles for contour grouping. In: International conference on computer vision, pp 1–8

  • Zhu S, Guo C, Wu Y, Wang Y (2002) What are textons. In: European conference on computer vision, pp 793–807

Download references

Acknowledgements

This research is supported in part by the Outstanding Young Academic Talents Start-up Funds of Wuhan University No. 216-410100004, the Fundamental Research Funds for the Central Universities of China No. 2042015kf0042, the National Natural Science Foundation of China No. 61502351, and the Nature Science Foundation of Hubei, China No. 2015CFB340.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiping Zhu.

Ethics declarations

Conflict of interest

Author Yi Hong declares that he has no conflict of interest. Author Weiping Zhu declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by A. Di Nola.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, Y., Zhu, W. Learning visual codebooks for image classification using spectral clustering. Soft Comput 22, 6077–6086 (2018). https://doi.org/10.1007/s00500-017-2937-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2937-4

Keywords

Navigation