Totally Looks Like - How Humans Compare, Compared to Machines

Rosenfeld, Amir; Solbach, Markus D.; Tsotsos, John K.

doi:10.1007/978-3-030-20887-5_18

Totally Looks Like - How Humans Compare, Compared to Machines

Amir Rosenfeld¹⁸,
Markus D. Solbach¹⁸ &
John K. Tsotsos¹⁸

Conference paper
First Online: 28 May 2019

2128 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11361))

Abstract

Perceptual judgment of image similarity by humans relies on rich internal representations ranging from low-level features to high-level concepts, scene properties and even cultural associations. However, existing methods and datasets attempting to explain perceived similarity use stimuli which arguably do not cover the full breadth of factors that affect human similarity judgments, even those geared toward this goal. We introduce a new dataset dubbed Totally-Looks-Like (TLL) after a popular entertainment website, which contains images paired by humans as being visually similar. The dataset contains 6016 image-pairs from the wild, shedding light upon a rich and diverse set of criteria employed by human beings. We conduct experiments to try to reproduce the pairings via features extracted from state-of-the-art deep convolutional neural networks, as well as additional human experiments to verify the consistency of the collected data. Though we create conditions to artificially make the matching task increasingly easier, we show that machine-extracted representations perform very poorly in terms of reproducing the matching selected by humans. We discuss and analyze these results, suggesting future directions for improvement of learned image representations.

This research was supported through grants to the senior author, for which all authors are grateful: Air Force Office of Scientific Research (FA9550-18-1-0054), the Canada Research Chairs Program (950-219525), the Natural Sciences and Engineering Research Council of Canada (RGPIN-2016-05352) and the NSERC Canadian Network on Field Robotics (NETGP417354-11).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Google Scholar
Battleday, R.M., Peterson, J.C., Griffiths, T.L.: Modeling human categorization of natural images using deep feature representations. arXiv preprint arXiv:1711.04855 (2017)
Brady, T.F., Konkle, T., Alvarez, G.A., Oliva, A.: Visual long-term memory has a massive storage capacity for object details. Proc. Nat. Acad. Sci. 105(38), 14325–14329 (2008)
Article Google Scholar
Chandrasekaran, A., et al.: We are humor beings: understanding and predicting visual humor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4603–4612 (2016)
Google Scholar
Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? Comput. Vis. Image Underst. 163, 90–100 (2017)
Article Google Scholar
Deza, A., Parikh, D.: Understanding image virality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818–1826 (2015)
Google Scholar
Geirhos, R., Janssen, D.H., Schütt, H.H., Rauber, J., Bethge, M., Wichmann, F.A.: Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
Jozwik, K.M., Kriegeskorte, N., Storrs, K.R., Mur, M.: Deep convolutional neural networks outperform feature-based but not categorical models in explaining object similarity judgments. Front. Psychol. 8, 1726 (2017). https://doi.org/10.3389/fpsyg.2017.01726. https://www.frontiersin.org/article/10.3389/fpsyg.2017.01726
Article Google Scholar
Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Konkle, T., Brady, T.F., Alvarez, G.A., Oliva, A.: Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psychol. Sci. 21(11), 1551–1556 (2010)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Article Google Scholar
Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. In: AAAI, pp. 3811–3819 (2015)
Google Scholar
Peterson, J.C., Abbott, J.T., Griffiths, T.L.: Adapting deep network features to capture psychological representations. arXiv preprint arXiv:1608.02164 (2016)
Pramod, R., Arun, S.: Do computational models differ systematically from human object perception? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1601–1609 (2016)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wertheimer, M.: Laws of organization in perceptual forms. Psycologische Forschung 4, 301–350 (1923)
Article Google Scholar
Workman, S., Souvenir, R., Jacobs, N.: Quantifying and predicting image scenicness. arXiv preprint arXiv:1612.03142 (2016)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924 (2018)
Zhou, P., Feng, J.: The landscape of deep learning algorithms. arXiv preprint arXiv:1705.07038 (2017)
Zhou, W., Li, H., Tian, Q.: Recent advance in content-based image retrieval: a literature survey. arXiv preprint arXiv:1706.06064 (2017)

Download references

Author information

Authors and Affiliations

York University, Toronto, ON, M3J 1P3, Canada
Amir Rosenfeld, Markus D. Solbach & John K. Tsotsos

Authors

Amir Rosenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Markus D. Solbach
View author publications
You can also search for this author in PubMed Google Scholar
John K. Tsotsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Rosenfeld .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rosenfeld, A., Solbach, M.D., Tsotsos, J.K. (2019). Totally Looks Like - How Humans Compare, Compared to Machines. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-20887-5_18
Published: 28 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20886-8
Online ISBN: 978-3-030-20887-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics