Content-Based Image Retrieval and the Semantic Gap in the Deep Learning Era

Barz, Björn; Denzler, Joachim

doi:10.1007/978-3-030-68790-8_20

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12662))

Included in the following conference series:

International Conference on Pattern Recognition

2306 Accesses
5 Citations

Abstract

Content-based image retrieval has seen astonishing progress over the past decade, especially for the task of retrieving images of the same object that is depicted in the query image. This scenario is called instance or object retrieval and requires matching fine-grained visual patterns between images. Semantics, however, do not play a crucial role. This brings rise to the question: Do the recent advances in instance retrieval transfer to more generic image retrieval scenarios?

To answer this question, we first provide a brief overview of the most relevant milestones of instance retrieval. We then apply them to a semantic image retrieval task and find that they perform inferior to much less sophisticated and more generic methods in a setting that requires image understanding. Following this, we review existing approaches to closing this so-called semantic gap by integrating prior world knowledge. We conclude that the key problem for the further advancement of semantic image retrieval lies in the lack of a standardized task definition and an appropriate benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918, June 2012
Google Scholar
Arponen, H., Bishop, T.E.: SHREWD: semantic hierarchy based relational embeddings for weakly-supervised deep hashing. In: ICLR 2019 Workshop on Learning from Limited Labeled Data (2019)
Google Scholar
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: IEEE International Conference on Computer Vision, pp. 1269–1277, December 2015
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Chapter Google Scholar
Barz, B., Denzler, J.: Automatic query image disambiguation for content-based image retrieval. In: International Conference on Computer Vision Theory and Applications, vol. 5, pp. 249–256. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006593402490256
Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: IEEE Winter Conference on Applications of Computer Vision, pp. 638–647 (2019). https://doi.org/10.1109/WACV.2019.00073
Barz, B., Käding, C., Denzler, J.: Information-theoretic active learning for content-based image retrieval. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 650–666. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_45
Chapter Google Scholar
Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., Douze, M.: MultiGrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 (2019)
Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-AP: smoothing the path towards large-scale image retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39
Chapter Google Scholar
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43
Chapter Google Scholar
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11(36), 1109–1135 (2010)
MathSciNet MATH Google Scholar
Deng, J., Berg, A.C., Fei-Fei, L.: Hierarchical semantic indexing for large scale image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792. IEEE (2011)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Fellbaum, C.: WordNet. Wiley, Hoboken (1998)
Book Google Scholar
Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: International Conference on Neural Information Processing Systems, pp. 2121–2129 (2013)
Google Scholar
Gairola, S., Shah, R., Narayanan, P.J.: Unsupervised image style embeddings for retrieval and recognition tasks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 3270–3278 (2020)
Google Scholar
Gomez, R., Gomez, L., Gibert, J., Karatzas, D.: Learning to learn from web data through deep semantic embeddings. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11134, pp. 514–529. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11024-6_40
Chapter Google Scholar
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8
Article MathSciNet Google Scholar
Ha, M.L., Hosu, V., Blanz, V.: Color composition similarity and its application in fine-grained similarity. In: IEEE Winter Conference on Applications of Computer Vision, pp. 2559–2568 (2020)
Google Scholar
He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)
Google Scholar
Hu, H., et al.: Web-scale responsive visual search at Bing. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, pp. 359–367. ACM, New York (2018)
Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: ACM International Conference on Multimedia Information Retrieval. ACM, New York (2008). http://press.liacs.nl/mirflickr/
Husain, S.S., Bober, M.: Improving large-scale image retrieval through robust aggregation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1783–1796 (2017)
Article Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24
Chapter Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311, June 2010
Google Scholar
Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3310–3317, June 2014
Google Scholar
Kato, T., Kurita, T., Otsu, N., Hirata, K.: A sketch retrieval method for full color image database - query by visual example. In: IAPR International Conference on Pattern Recognition, pp. 530–533, August 1992
Google Scholar
Long, T., Mettes, P., Shen, H.T., Snoek, C.G.: Searching for actions on the hyperbole. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1141–1150 (2020)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004). https://doi.org/10.1023/B:VISI.0000027790.02288.f2
Article Google Scholar
Narayana, P., Pednekar, A., Krishnamoorthy, A., Sone, K., Basu, S.: HUSE: hierarchical universal semantic embeddings. arXiv preprint arXiv:1911.05978 (2019)
Niblack, C.W., et al.: QBIC project: querying images by content, using color, texture, and shape. In: Proceedings of the SPIE, Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173–188. International Society for Optics and Photonics (1993)
Google Scholar
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: IEEE International Conference on Computer Vision, pp. 3476–3485 (2017)
Google Scholar
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391, June 2010
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting Oxford and Paris: large-scale image retrieval benchmarking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5706–5715, June 2018
Google Scholar
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Article Google Scholar
Razavian, A.S., Sullivan, J., Carlsson, S., Maki, A.: Visual instance retrieval with deep convolutional networks. ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016)
Article Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519, June 2014
Google Scholar
Revaud, J., Almazan, J., de Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: The IEEE International Conference on Computer Vision, October 2019
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823, June 2015
Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1470–1477 (2003)
Google Scholar
Smeulders, A.W., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)
Article Google Scholar
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (2016)
Google Scholar
Wu, H., Mao, J., Zhang, Y., Jiang, Y., Li, L., Sun, W., Ma, W.Y.: Unified visual-semantic embeddings: bridging vision and language with structured meaning representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6602–6611 (2019)
Google Scholar
Yang, S., Yu, W., Zheng, Y., Yao, H., Mei, T.: Adaptive semantic-visual tree for hierarchical embeddings. In: ACM International Conference on Multimedia, pp. 2097–2105. Association for Computing Machinery, New York (2019)
Google Scholar
Zhi, T., Duan, L.Y., Wang, Y., Huang, T.: Two-stage pooling of deep convolutional features for image retrieval. In: IEEE International Conference on Image Processing, pp. 2465–2469, September 2016
Google Scholar
Zhou, X.S., Huang, T.S.: Relevance feedback in image retrieval: a comprehensive review. Multimed. Syst. 8(6), 536–544 (2003). https://doi.org/10.1007/s00530-002-0070-3
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Group, Friedrich Schiller University Jena, Jena, Germany
Björn Barz & Joachim Denzler

Authors

Björn Barz
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Denzler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn Barz .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, Universita di Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barz, B., Denzler, J. (2021). Content-Based Image Retrieval and the Semantic Gap in the Deep Learning Era. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12662. Springer, Cham. https://doi.org/10.1007/978-3-030-68790-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-68790-8_20
Published: 23 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68789-2
Online ISBN: 978-3-030-68790-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)