Skip to main content

Content-Based Image Retrieval and the Semantic Gap in the Deep Learning Era

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Abstract

Content-based image retrieval has seen astonishing progress over the past decade, especially for the task of retrieving images of the same object that is depicted in the query image. This scenario is called instance or object retrieval and requires matching fine-grained visual patterns between images. Semantics, however, do not play a crucial role. This brings rise to the question: Do the recent advances in instance retrieval transfer to more generic image retrieval scenarios?

To answer this question, we first provide a brief overview of the most relevant milestones of instance retrieval. We then apply them to a semantic image retrieval task and find that they perform inferior to much less sophisticated and more generic methods in a setting that requires image understanding. Following this, we review existing approaches to closing this so-called semantic gap by integrating prior world knowledge. We conclude that the key problem for the further advancement of semantic image retrieval lies in the lack of a standardized task definition and an appropriate benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918, June 2012

    Google Scholar 

  2. Arponen, H., Bishop, T.E.: SHREWD: semantic hierarchy based relational embeddings for weakly-supervised deep hashing. In: ICLR 2019 Workshop on Learning from Limited Labeled Data (2019)

    Google Scholar 

  3. Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: IEEE International Conference on Computer Vision, pp. 1269–1277, December 2015

    Google Scholar 

  4. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38

    Chapter  Google Scholar 

  5. Barz, B., Denzler, J.: Automatic query image disambiguation for content-based image retrieval. In: International Conference on Computer Vision Theory and Applications, vol. 5, pp. 249–256. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006593402490256

  6. Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: IEEE Winter Conference on Applications of Computer Vision, pp. 638–647 (2019). https://doi.org/10.1109/WACV.2019.00073

  7. Barz, B., Käding, C., Denzler, J.: Information-theoretic active learning for content-based image retrieval. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 650–666. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_45

    Chapter  Google Scholar 

  8. Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., Douze, M.: MultiGrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 (2019)

  9. Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-AP: smoothing the path towards large-scale image retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39

    Chapter  Google Scholar 

  10. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43

    Chapter  Google Scholar 

  11. Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11(36), 1109–1135 (2010)

    MathSciNet  MATH  Google Scholar 

  12. Deng, J., Berg, A.C., Fei-Fei, L.: Hierarchical semantic indexing for large scale image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792. IEEE (2011)

    Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  14. Fellbaum, C.: WordNet. Wiley, Hoboken (1998)

    Book  Google Scholar 

  15. Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: International Conference on Neural Information Processing Systems, pp. 2121–2129 (2013)

    Google Scholar 

  16. Gairola, S., Shah, R., Narayanan, P.J.: Unsupervised image style embeddings for retrieval and recognition tasks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 3270–3278 (2020)

    Google Scholar 

  17. Gomez, R., Gomez, L., Gibert, J., Karatzas, D.: Learning to learn from web data through deep semantic embeddings. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11134, pp. 514–529. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11024-6_40

    Chapter  Google Scholar 

  18. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8

    Article  MathSciNet  Google Scholar 

  19. Ha, M.L., Hosu, V., Blanz, V.: Color composition similarity and its application in fine-grained similarity. In: IEEE Winter Conference on Applications of Computer Vision, pp. 2559–2568 (2020)

    Google Scholar 

  20. He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)

    Google Scholar 

  21. Hu, H., et al.: Web-scale responsive visual search at Bing. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, pp. 359–367. ACM, New York (2018)

    Google Scholar 

  22. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: ACM International Conference on Multimedia Information Retrieval. ACM, New York (2008). http://press.liacs.nl/mirflickr/

  23. Husain, S.S., Bober, M.: Improving large-scale image retrieval through robust aggregation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1783–1796 (2017)

    Article  Google Scholar 

  24. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  25. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24

    Chapter  Google Scholar 

  26. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311, June 2010

    Google Scholar 

  27. Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3310–3317, June 2014

    Google Scholar 

  28. Kato, T., Kurita, T., Otsu, N., Hirata, K.: A sketch retrieval method for full color image database - query by visual example. In: IAPR International Conference on Pattern Recognition, pp. 530–533, August 1992

    Google Scholar 

  29. Long, T., Mettes, P., Shen, H.T., Snoek, C.G.: Searching for actions on the hyperbole. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1141–1150 (2020)

    Google Scholar 

  30. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  31. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004). https://doi.org/10.1023/B:VISI.0000027790.02288.f2

    Article  Google Scholar 

  32. Narayana, P., Pednekar, A., Krishnamoorthy, A., Sone, K., Basu, S.: HUSE: hierarchical universal semantic embeddings. arXiv preprint arXiv:1911.05978 (2019)

  33. Niblack, C.W., et al.: QBIC project: querying images by content, using color, texture, and shape. In: Proceedings of the SPIE, Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173–188. International Society for Optics and Photonics (1993)

    Google Scholar 

  34. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: IEEE International Conference on Computer Vision, pp. 3476–3485 (2017)

    Google Scholar 

  35. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391, June 2010

    Google Scholar 

  36. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007

    Google Scholar 

  37. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008

    Google Scholar 

  38. Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting Oxford and Paris: large-scale image retrieval benchmarking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5706–5715, June 2018

    Google Scholar 

  39. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)

    Article  Google Scholar 

  40. Razavian, A.S., Sullivan, J., Carlsson, S., Maki, A.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016)

    Article  Google Scholar 

  41. Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519, June 2014

    Google Scholar 

  42. Revaud, J., Almazan, J., de Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: The IEEE International Conference on Computer Vision, October 2019

    Google Scholar 

  43. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, June 2015

    Google Scholar 

  44. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)

    Google Scholar 

  45. Smeulders, A.W., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)

    Article  Google Scholar 

  46. Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (2016)

    Google Scholar 

  47. Wu, H., Mao, J., Zhang, Y., Jiang, Y., Li, L., Sun, W., Ma, W.Y.: Unified visual-semantic embeddings: bridging vision and language with structured meaning representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6602–6611 (2019)

    Google Scholar 

  48. Yang, S., Yu, W., Zheng, Y., Yao, H., Mei, T.: Adaptive semantic-visual tree for hierarchical embeddings. In: ACM International Conference on Multimedia, pp. 2097–2105. Association for Computing Machinery, New York (2019)

    Google Scholar 

  49. Zhi, T., Duan, L.Y., Wang, Y., Huang, T.: Two-stage pooling of deep convolutional features for image retrieval. In: IEEE International Conference on Image Processing, pp. 2465–2469, September 2016

    Google Scholar 

  50. Zhou, X.S., Huang, T.S.: Relevance feedback in image retrieval: a comprehensive review. Multimed. Syst. 8(6), 536–544 (2003). https://doi.org/10.1007/s00530-002-0070-3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Barz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barz, B., Denzler, J. (2021). Content-Based Image Retrieval and the Semantic Gap in the Deep Learning Era. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12662. Springer, Cham. https://doi.org/10.1007/978-3-030-68790-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68790-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68789-2

  • Online ISBN: 978-3-030-68790-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics