Contribution of Low, Mid and High-Level Image Features of Indoor Scenes in Predicting Human Similarity Judgements

Mikhailova, Anastasiia; Santos-Victor, José; Coco, Moreno I.

doi:10.1007/978-3-031-04881-4_40

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13256))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1461 Accesses
2 Citations

Abstract

Human judgments can still be considered the gold standard in the assessment of image similarity, but they are too expensive and time-consuming to acquire. Even though most existing computational models make almost exclusive use of low-level information to evaluate the similarity between images, human similarity judgements are known to rely on both high-level semantic and low-level visual image information. The current study aims to evaluate the impact of different types of image features on predicting human similarity judgements. We investigated how low-level (colour differences), mid-level (spatial envelope) and high-level (distributional semantics) information predict within-category human judgements of 400 indoor scenes across 4 categories in a Four-Alternative Forced Choice task in which participants had to select the most distinctive scene among four scenes presented on the screen. Linear regression analysis showed that low-level (t = 4.14, p < 0.001), mid-level (t = 3.22, p< 0.01) and high-level (t = 2.07, p < 0.04) scene information significantly predicted the probability of a scene to be selected. Additionally, the SVM model that incorporates low-mid-high level properties had 56% accuracy in predicting human similarity judgments. Our results point out: 1) the importance of including mid and high-level image properties into computational models of similarity to better characterise the cognitive mechanisms underlying human judgements, and 2) the necessity of further research in understanding how human similarity judgements are done as there is a sizeable variability in our data that it is not accounted for by the metrics we investigated.

This research was supported by Fundação para a Ciência e Tecnologia with a PhD scholarship to AM [SFRH/BD/144453/2019] and Grant [PTDC/PSI-ESP/30958/2017] to MIC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Deeper Look at Human Visual Perception of Images

Article 13 January 2020

Evaluating Image Similarity Using Contextual Information of Images with Pre-trained Models

Modeling Human Perception of Image Quality

Article 02 July 2018

References

Sampat, M.P., Wang, Z., Gupta, S., Bovik, A.C., Markey, M.K.: Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18(11), 2385–2401 (2009)
Article MathSciNet Google Scholar
Zhang, Y., Zhang, C., Akashi, T.: Multi-scale Template Matching with Scalable Diversity Similarity in an Unconstrained Environment (2019)
Google Scholar
Wu, A., Piergiovanni, A.J., Ryoo, M.S.: Model-based behavioral cloning with future image similarity learning. In: Conference on Robot Learning, pp. 1062–1077 (2020)
Google Scholar
Wang, L., et al.: Image-similarity-based convolutional neural network for robot visual relocalization. Sens. Mater. 32, 1245–1259 (2020)
Google Scholar
Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. In: ACM Trans. Graph. (TOG) 34(4), 1–10 (2015)
Google Scholar
Silva, E.A., Panetta, K., Agaian, S.S.: Quantifying image similarity using measure of enhancement by entropy. In: Mobile Multimedia/Image Processing for Military and Security Applications 2007 6579, p. 65790U (2007)
Google Scholar
Liu, Y., Gevers, T., Li, X.: Color constancy by combining low-mid-high level image cues. Comput. Vision Image Understanding 140, 1–8 (2015)
Google Scholar
Hebart, M.N., Zheng, C.Y., Pereira, F., Baker, C.I.: Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat. Hum. Behav. 4(11), 1173–1185 (2020)
Google Scholar
Zheng, C.Y., Pereira, F., Baker, C.I., Hebart, M.N.: Revealing interpretable object representations from human behavior. In: International Conference on Learning Representations (2018)
Google Scholar
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Google Scholar
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
Google Scholar
Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A.: Intrinsic and extrinsic effects on image memorability. Vision Res. 116, 165–178 (2015)
Google Scholar
Ulysses, J. N., Conci, A.: Measuring similarity in medical registration. In: IWSSIP 17th International Conference on Systems, Signals and Image Processing (2010)
Google Scholar
Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. Progress Brain Res. 155, 23–36 (2006)
Google Scholar
Sadeghi, Z., McClelland, J.L., Hoffman, P.: You shall know an object by the company it keeps: an investigation of semantic representations derived from object co-occurrence in visual scenes. Neuropsychologia 76, 52–61 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C. D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556 (2014)
Anwyl-Irvine, A.L., Massonnié, J., Flitton, A., Kirkham, N., Evershed, J.K.: Gorilla in our midst: an online behavioral experiment builder. Behav. Res. Methods 52(1), 388–407 (2019). https://doi.org/10.3758/s13428-019-01237-x
Article Google Scholar
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1–48 (2015)
Article Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Anastasiia Mikhailova & José Santos-Victor
Sapienza, University of Rome, Rome, Italy
Moreno I. Coco

Authors

Anastasiia Mikhailova
View author publications
You can also search for this author in PubMed Google Scholar
José Santos-Victor
View author publications
You can also search for this author in PubMed Google Scholar
Moreno I. Coco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasiia Mikhailova .

Editor information

Editors and Affiliations

University of Aveiro, Aveiro, Portugal
Armando J. Pinho
University of Aveiro, Aveiro, Portugal
Petia Georgieva
University of Porto, Porto, Portugal
Luís F. Teixeira
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mikhailova, A., Santos-Victor, J., Coco, M.I. (2022). Contribution of Low, Mid and High-Level Image Features of Indoor Scenes in Predicting Human Similarity Judgements. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-031-04881-4_40
Published: 26 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Contribution of Low, Mid and High-Level Image Features of Indoor Scenes in Predicting Human Similarity Judgements

Abstract

Access this chapter

Similar content being viewed by others

A Deeper Look at Human Visual Perception of Images

Evaluating Image Similarity Using Contextual Information of Images with Pre-trained Models

Modeling Human Perception of Image Quality

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Contribution of Low, Mid and High-Level Image Features of Indoor Scenes in Predicting Human Similarity Judgements

Abstract

Access this chapter

Similar content being viewed by others

A Deeper Look at Human Visual Perception of Images

Evaluating Image Similarity Using Contextual Information of Images with Pre-trained Models

Modeling Human Perception of Image Quality

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation