Skip to main content
Log in

A multimedia information fusion framework for web image categorization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid development of technologies for fast Internet access and the popularization of digital cameras, an enormous number of digital images are posted and shared online everyday. Web images are usually organized by topic and are often assigned appropriate topic-related textual descriptions. Given a large set of images along with the corresponding texts, a challenging problem is how to utilize the available information to efficiently and effectively perform image retrieval tasks, such as image classification and image clustering. Previous approaches on image categorization focus on either adopting text or image features, or simply combining these two types of information together. In this paper, we improve our previously reported two multi-view classification approaches—(Dynamic Weighting and Region-based Semantic Concept Integration) for categorizing the images under the “supervision” of topic-related textual descriptions—by proposing a novel multimedia information fusion framework, in which these two proposed methods are seamlessly integrated by analyzing the special characteristics of different images. Notice that, the proposed framework is a generic multimedia information fusion framework which is not limited to our previously reported two approaches, and it can also be used to integrate other existing multi-view classification methods or models. Also, our proposed framework is capable of handling the large scale image categorization. Specifically, the proposed framework can automatically choose an appropriate classification model for each testing image according to its special characteristics and consequently achieve better classification performance with relatively less computation time for large scale datasets; Moreover, it is able to categorize images without any textual description in real world applications. Empirical experiments on two different types of web image datasets demonstrate the efficacy and efficiency of our proposed classification framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Allan M, Verbeek J (2009) Ranking user-annotated images for multiple query terms. In: British machine vision conference. URL http://lear.inrialpes.fr/pubs/2009/AV09

  2. Bishop C (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  3. Blei D, Jordan M (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 127–134

  4. Carter R, Dubchak I, Holbrook S (2001) A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res 29(19):3928

    Google Scholar 

  5. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3): Article 27

    Google Scholar 

  6. Chatzichristofis S, Boutalis Y (2008) Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Proceedings of the 6th international conference on computer vision systems, pp 312–322

  7. Deng Y, Manjunath BS (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810

    Article  Google Scholar 

  8. Giacinto G, Roli F, Fumerga G (2002) Unsupervised learning of neural network ensembles for image classification. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, vol 3. IEEE, pp 155–159

  9. Gill P, Murray W,Wright M (1981) Practical optimization. Academic Press

  10. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases. Morgan Kaufmann, pp 518–529

  11. Hare J, Lewis P (2010) Automatically annotating the MIR Flickr dataset. In: Proceedings of the 2nd ACM international conference on multimedia information retrieval

  12. Hsu C, Lin C (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  13. Indyk P (1999) A small approximately min-wise independent family of hash functions. In: Proceedings of the tenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 454–456

  14. Jordan M, Jacobs R (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214

    Article  Google Scholar 

  15. Kalva P, Enembreck F, Koerich A (2007) Web image classification based on the fusion of image and text classifiers. In: Proceedings of the 9th international conference on document analysis and recognition. IEEE Computer Society, pp 561–568

  16. Lanckriet G, Cristianini N, Bartlett P, Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MATH  Google Scholar 

  17. Lee W, Verzakov S, Duin R (2007) Kernel combination versus classifier combination. In: Multiple classifier systems, pp 22–31

  18. Li T, Ogihara M (2005) Semisupervised learning from different information sources. Knowl Inf Syst 7(3):289–309

    Article  Google Scholar 

  19. Li H, Tang J, Li G, Chua T (2008) Word2image: towards visual interpreting of words. In: Proceeding of the 16th ACM international conference on multimedia. ACM, pp 813–816

  20. Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 2036–2043

  21. Li L, Lu W, Li J, Li T, Zhang H, Guo J (2011) Exploring interaction between images and texts for web image categorization. In: Proceedings of FLAIRS, pp 45–50

  22. Liu Y, Zhang D, Lu G (2008) Region-based image retrieval with high-level semantics using decision tree learning. Pattern Recogn 41(8):2554–2570

    Article  MATH  Google Scholar 

  23. Liu X, Cheng B, Yan S, Tang J, Chua T, Jin H (2009) Label to region by bi-layer sparsity priors. In: Proceedings of the seventeen ACM international conference on multimedia. ACM, pp 115–124

  24. McCallum A (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu

  25. Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  26. Salton G, McGill M (1986) Introduction to modern information retrieval. McGraw-Hill, Inc., New York, NY, USA

  27. Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT

  28. Shao B, Ogihara M, Wang D, Li T (2009) Music recommendation based on acoustic features and user access patterns. IEEE Trans Audio Speech Lang Process 17(8):1602–1611

    Article  Google Scholar 

  29. Wang Y, Gong S (2007) Refining image annotation using contextual relations between words. In: Proceedings of the 6th ACM international conference on image and video retrieval. ACM, pp 425–432

  30. Wu L, Oviatt S, Cohen P (2002) Multimodal integration-a statistical view. IEEE Trans Multimedia 1(4):334–341

    Google Scholar 

  31. Wu Y, Chang E, Chang K, Smith J (2004) Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th annual ACM international conference on multimedia. ACM, pp 572–579

  32. Yin Z, Li R, Mei Q, Han J (2009) Exploring social tagging graph for web object classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 957–966

  33. Zhu Q, Yeh M, Cheng K (2006) Multimodal fusion using learned text concepts for image categorization. In: Proceedings of the 14th annual ACM international conference on multimedia. ACM, pp 211–220

Download references

Acknowledgements

This work is partially supported by the Army Research Office under grant number W911NF-10-1-0366, the National Natural Science Foundation of China under Grant No.61175011, and the China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, W., Li, L., Li, J. et al. A multimedia information fusion framework for web image categorization. Multimed Tools Appl 70, 1453–1486 (2014). https://doi.org/10.1007/s11042-012-1165-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1165-2

Keywords

Navigation