skip to main content
10.1145/2671188.2749290acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Describing Images with Hierarchical Concepts and Object Class Localization

Published:22 June 2015Publication History

ABSTRACT

Current research into automatic generation of semantic descriptions centers mainly on improving the annotation accuracy for individual tag or attributes. In this paper, we focus on the generation of more informative descriptions for images. We proposes to generate layered, semantically meaningful descriptions and create summaries of key aspects of the data from the component detectors. In particular, the output descriptions include superclass, class, attributes, and the location of the area of the object which may interest users. We propose to integrate ROI (Region of Interest) identification and hierarchical semantic elements detection into a joint framework. The joint optimization of the ROI localizer and the hierarchical concept detection make them mutually beneficial and reciprocal. In this way, we create a discriminative image description generation framework based on a tightly coupled multi-layer optimization. The output descriptions contain richer information of the image content with layered contextual information, thereby enabling better management and usage of image data. Experiments on two public open benchmark datasets demonstrate that the proposed method obtains state of the art performance.

References

  1. U. Brefeld, T. Gärtner, T. Scheffer, and S. Wrobel. Efficient co-regularised least squares regression. In NIPS, pages 137--144, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Cai, F. Nie, H. Huang, and C. H. Ding. Multi-class ℓ2,1-norm support vector machine. In IEEE International Conference on Data Mining, pages 91--100, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265--292, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Ding, F. Metze, S. Rawat, P. F. Schulam, S. Burger, E. Younessian, L. Bao, M. G. Christel, and A. Hauptmann. Beyond audio and video retrieval: towards multimedia summarization. In Proceedings of the 2012 ACM International Conference on Multimedia Retrieval, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In CVPR, pages 1778--1785, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. ECCV, pages 15--29, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861--874, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, pages 2712--2719. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia, 9(5):958--966, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, pages 1601--1608, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Liu, Y. Mu, B. Lang, and S.-F. Chang. Compact hashing for mixed image-keyword query over multi-label images. In ACM Multimedia, page 18, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Ma, F. Nie, Y. Yang, J. R. Uijlings, and N. Sebe. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Transactions on Multimedia, 14(4):1021--1030, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Ma, Y. Yang, Y. Cai, N. Sebe, and A. G. Hauptmann. Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM Multimedia, pages 469--478, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Nie, H. Huang, X. Cai, and C. H. Ding. Efficient and robust feature selection via joint ℓ2,1-norms minimization. NIPS, 23:1813--1821, 2010.Google ScholarGoogle Scholar
  16. D. Parikh and K. Grauman. Relative attributes. In ICCV, pages 503--510, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Pollard and I. A. Sag. Head-driven phrase structure grammar. University of Chicago Press, 1994.Google ScholarGoogle Scholar
  18. C. Wang, S. Yan, L. Zhang, and H.-J. Zhang. Multi-label sparse coding for automatic image annotation. In CVPR, pages 1643--1650, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  19. S.-I. Yu, Z. Xu, D. Ding, W. Sze, F. Vicente, Z. Lan, Y. Cai, S. Rawat, P. Schulam, S. Bahmani, et al. Informedia e-lamp@ trecvid 2012 multimedia event detection and recounting (med and mer). 2012.Google ScholarGoogle Scholar
  20. D. Zhang, M. M. Islam, and G. Lu. A review on automatic image annotation techniques. Pattern Recognition, 45(1):346--362, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Zhou, J. Huang, and B. Scholkopf. Learning with hypergraphs: Clustering, classification, and embedding. NIPS, 19:1601, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Describing Images with Hierarchical Concepts and Object Class Localization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
        June 2015
        700 pages
        ISBN:9781450332743
        DOI:10.1145/2671188

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader