skip to main content
10.1145/1631272.1631285acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Descriptive visual words and visual phrases for image applications

Authors Info & Claims
Published:19 October 2009Publication History

ABSTRACT

The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the words in texts. However, massive experiments show that the commonly used visual words are not as expressive as the text words, which is not desirable because it hinders their effectiveness in various applications. In this paper, Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, novel descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs from classic visual words for various applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain scenes or objects are identified as the DVWs and DVPs. Experiments show that the DVWs and DVPs are compact and descriptive, thus are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including image retrieval, image re-ranking, and object recognition. The DVW and DVP combination outperforms the classic visual words by 19.5% and 80% in image retrieval and object recognition tasks, respectively. The DVW and DVP based image re-ranking algorithm: DWPRank outperforms the state-of-the-art VisualRank by 12.4% in accuracy and about 11 times faster in efficiency.

References

  1. S. Battiato, G. M. Farinella, G. Gallo, and D. Ravi. Spatial hierarchy of textons distribution for scene classification. Proc. Eurocom Multimedia Modeling, pp. 333--342, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. International World-Wide Web Conference, pp. 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: a large-scale hierarchical image database. Proc. CVPR, pp. 710--719, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. C. Fellbaum. Wordnet: an electronic lexical database. Bradford Books, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  5. B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 16(315): 972--976, Jan. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. Proc. VLDB, pp. 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. H. Hsu, L. S. Kennedy, and S. F. Chang. Video search reranking through random walk over document-level context graph. Proc. ACM Multimedia, pp. 971--980, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Jing and S. Baluja. VisualRank: applying PageRank to large-scale image search. PAMI, 30(11): 1877--1890, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. Proc. ICCV, pp. 17--21, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Lazebnik and M. Raginsky. Supervised learning of quantizer codebook by information loss minimization. PAMI, 31(7): 1294--1309, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Proc. CVPR, pp. 2169--2178, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Liu, G. Hua, P. Viola, and T. Chen. Integrated feature selection and higher-order spatial feature extraction for object categorization. Proc. CVPR, pp. 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Liu, W. Lai, X. Hua, Y. Huang, and S. Li. Video search re-ranking via multi-graph propagation. ACM Multimedia, pp. 208--217, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91--110, Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Marszalek and C. Schmid. Spatial weighting for bag-of-features. Proc. CVPR, pp. 2118--2125, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Moosmann, E. Nowak, and F. Jurie. Randomized clustering forests for image classification. PAMI, 30(9): 1632--1646, Sep. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. Proc. CVPR, pp. 2161--2168, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Perronnin and C. Dance. Fisher kernels on visual vocabulary for image categorization. Proc. CVPR, pp. 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Perronnin. Universal and adapted vocabularies for generic visual categorization. PAMI, 30(7): 1243--1256, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Savarese, J. Winn, and A. Criminisi. Discriminative object class models of appearance and shape by correlatons. Proc. CVPR, pp. 2033--2040, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Si, H. Gong, Y. N. Wu, and S. C. Zhu. Learning mixed templates for object recognition. Proc. CVPR, 2009.Google ScholarGoogle Scholar
  22. J. Sivic and A. Zisserman. Video Google: a text retrieval approach to object matching in videos. Proc. ICCV, pp. 1470--1477, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X. Hua. Bayesian video search reranking. Proc. ACM Multimedia, pp. 131--140, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI, 30(11): 1958--1970, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Viola and M. Jones. Robust real-time face detection. Proc. ICCV, pp. 7--14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Wang, D. Blei, and L. Fei-Fei. Simultaneous image classification and annotation. Proc. CVPR, 2009.Google ScholarGoogle Scholar
  27. F. Wang, Y. G. Jiang, and C. W. Ngo. Video event detection using motion relativity and visual relatedness. Proc. ACM Multimedia, pp. 239--248, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Winn, A. Criminisi, and T. Minka. Object categorization by learning universal visual dictionary. Proc. ICCV, pp. 17--21, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Z. Wu, Q. F. Ke, and J. Sun. Bundling features for large-scale partial-duplicate web image search. Proc. CVPR, 2009.Google ScholarGoogle Scholar
  30. D. Xu and S. F. Chang. Video event recognition using kernel methods with multilevel temporal alignment. PAMI, 30(11): 1985--1997, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Yang, P. Meer, and D. J. Foran. Multiple class segmentation using a unified framework over mean-shift patches. Proc. CVPR, pp. 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. Yuan, Y. Wu, and M. Yang. Discovery of collocation patterns: from visual words to visual phrases. Proc. CVPR, pp.1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. T. Zheng, M. Zhao, S. Y. Neo, T. S. Chua, and Q. Tian. Visual synset: a higher-level visual representation. CVPR, pp. 1--8, 2008.Google ScholarGoogle Scholar
  34. X. Zhou, X. D. Zhuang, S. C. Yan, S. F. Chang, M.H. Johnson, and T.S. Huang. SIFT-bag kernel for video event analysis. Proc. ACM Multimedia, pp. 229--238, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Descriptive visual words and visual phrases for image applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '09: Proceedings of the 17th ACM international conference on Multimedia
      October 2009
      1202 pages
      ISBN:9781605586083
      DOI:10.1145/1631272

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader