Abstract
Human judgments can still be considered the gold standard in the assessment of image similarity, but they are too expensive and time-consuming to acquire. Even though most existing computational models make almost exclusive use of low-level information to evaluate the similarity between images, human similarity judgements are known to rely on both high-level semantic and low-level visual image information. The current study aims to evaluate the impact of different types of image features on predicting human similarity judgements. We investigated how low-level (colour differences), mid-level (spatial envelope) and high-level (distributional semantics) information predict within-category human judgements of 400 indoor scenes across 4 categories in a Four-Alternative Forced Choice task in which participants had to select the most distinctive scene among four scenes presented on the screen. Linear regression analysis showed that low-level (t = 4.14, p < 0.001), mid-level (t = 3.22, p< 0.01) and high-level (t = 2.07, p < 0.04) scene information significantly predicted the probability of a scene to be selected. Additionally, the SVM model that incorporates low-mid-high level properties had 56% accuracy in predicting human similarity judgments. Our results point out: 1) the importance of including mid and high-level image properties into computational models of similarity to better characterise the cognitive mechanisms underlying human judgements, and 2) the necessity of further research in understanding how human similarity judgements are done as there is a sizeable variability in our data that it is not accounted for by the metrics we investigated.
This research was supported by Fundação para a Ciência e Tecnologia with a PhD scholarship to AM [SFRH/BD/144453/2019] and Grant [PTDC/PSI-ESP/30958/2017] to MIC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sampat, M.P., Wang, Z., Gupta, S., Bovik, A.C., Markey, M.K.: Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18(11), 2385–2401 (2009)
Zhang, Y., Zhang, C., Akashi, T.: Multi-scale Template Matching with Scalable Diversity Similarity in an Unconstrained Environment (2019)
Wu, A., Piergiovanni, A.J., Ryoo, M.S.: Model-based behavioral cloning with future image similarity learning. In: Conference on Robot Learning, pp. 1062–1077 (2020)
Wang, L., et al.: Image-similarity-based convolutional neural network for robot visual relocalization. Sens. Mater. 32, 1245–1259 (2020)
Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. In: ACM Trans. Graph. (TOG) 34(4), 1–10 (2015)
Silva, E.A., Panetta, K., Agaian, S.S.: Quantifying image similarity using measure of enhancement by entropy. In: Mobile Multimedia/Image Processing for Military and Security Applications 2007 6579, p. 65790U (2007)
Liu, Y., Gevers, T., Li, X.: Color constancy by combining low-mid-high level image cues. Comput. Vision Image Understanding 140, 1–8 (2015)
Hebart, M.N., Zheng, C.Y., Pereira, F., Baker, C.I.: Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat. Hum. Behav. 4(11), 1173–1185 (2020)
Zheng, C.Y., Pereira, F., Baker, C.I., Hebart, M.N.: Revealing interpretable object representations from human behavior. In: International Conference on Learning Representations (2018)
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A.: Intrinsic and extrinsic effects on image memorability. Vision Res. 116, 165–178 (2015)
Ulysses, J. N., Conci, A.: Measuring similarity in medical registration. In: IWSSIP 17th International Conference on Systems, Signals and Image Processing (2010)
Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. Progress Brain Res. 155, 23–36 (2006)
Sadeghi, Z., McClelland, J.L., Hoffman, P.: You shall know an object by the company it keeps: an investigation of semantic representations derived from object co-occurrence in visual scenes. Neuropsychologia 76, 52–61 (2015)
Pennington, J., Socher, R., Manning, C. D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556 (2014)
Anwyl-Irvine, A.L., Massonnié, J., Flitton, A., Kirkham, N., Evershed, J.K.: Gorilla in our midst: an online behavioral experiment builder. Behav. Res. Methods 52(1), 388–407 (2019). https://doi.org/10.3758/s13428-019-01237-x
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1–48 (2015)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Mikhailova, A., Santos-Victor, J., Coco, M.I. (2022). Contribution of Low, Mid and High-Level Image Features of Indoor Scenes in Predicting Human Similarity Judgements. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-04881-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)