Deep convolutional neural networks in the face of caricature

Hill, Matthew Q.; Parde, Connor J.; Castillo, Carlos D.; Colón, Y. Ivette; Ranjan, Rajeev; Chen, Jun-Cheng; Blanz, Volker; O’Toole, Alice J.

doi:10.1038/s42256-019-0111-7

Article
Published: 12 November 2019

Deep convolutional neural networks in the face of caricature

Matthew Q. Hill ORCID: orcid.org/0000-0002-3142-8830¹,
Connor J. Parde¹,
Carlos D. Castillo²,
Y. Ivette Colón¹,
Rajeev Ranjan²,
Jun-Cheng Chen²,
Volker Blanz³ &
…
Alice J. O’Toole¹

Nature Machine Intelligence volume 1, pages 522–529 (2019)Cite this article

1720 Accesses
37 Citations
11 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Real-world face recognition requires us to perceive the uniqueness of a face across variable images. Deep convolutional neural networks (DCNNs) accomplish this feat by generating robust face representations that can be analysed in a multidimensional ‘face space’. We examined the organization of viewpoint, illumination, gender and identity in this space. We found that DCNNs create a highly organized face similarity structure in which identities and images coexist. Natural image variation is organized hierarchically, with face identity nested under gender, and illumination and viewpoint nested under identity. To examine identity, we caricatured faces and found that identification accuracy increased with the strength of identity information in a face, and caricature representations ‘resembled’ their veridical counterparts—mimicking human perception. DCNNs therefore offer a theoretical framework for reconciling decades of behavioural and neural results that emphasized either the image or the face in representations, without understanding how a neural code could seamlessly accommodate both.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Visualization of the top-level DCNN similarity space for all images.**

**Fig. 3: Visualization of top-level similarity space with identity strength variation.**

**Fig. 5: Density curves of face image-pair cosine similarity scores.**

Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks

Article Open access 30 October 2019

Shany Grossman, Guy Gaziv, … Rafael Malach

Face identity coding in the deep neural network and primate brain

Article Open access 20 June 2022

Jinge Wang, Runnan Cao, … Shuo Wang

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Article Open access 09 November 2021

Irina Higgins, Le Chang, … Matthew Botvinick

Data availability

All data used for analysis are available via the Open Science Framework at https://osf.io/ebvys/.

Code availability

All of the code used for plotting and analysis is available via the Open Science Framework at https://osf.io/ebvys/.

References

Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (MIT Press, 1982).
Brunelli, R. & Poggio, T. Face recognition: features versus templates. IEEE Trans. Pattern Anal. Mach. Intell.15, 1042–1052 (1993).
Article Google Scholar
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025 (1999).
Article Google Scholar
Bülthoff, H. H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl Acad. Sci. USA89, 60–64 (1992).
Article Google Scholar
Yuille, A. L. Deformable templates for face recognition. J. Cogn. Neurosci.3, 59–70 (1991).
Article Google Scholar
Biederman, I. Recognition-by-components: a theory of human image understanding. Psychol. Rev.94, 115–147 (1987).
Article Google Scholar
Poggio, T. & Edelman, S. A network that learns to recognize three-dimensional objects. Nature343, 263–266 (1990).
Article Google Scholar
Turk, M. & Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci.3, 71–86 (1991).
Article Google Scholar
Valentine, T. A unified account of the effects of distinctiveness, inversion and race in face recognition. Q. J. Exp. Psychol. A43, 161–204 (1991).
Article Google Scholar
Troje, N. F. & Bülthoff, H. H. Face recognition under varying poses: the role of texture and shape. Vision Res.36, 1761–1772 (1996).
Article Google Scholar
O’Toole, A. J., Abdi, H., Deffenbacher, K. A. & Valentin, D. Low-dimensional representation of faces in higher dimensions of the face space. J. Opt. Soc. Am. A10, 405–411 (1993).
Article Google Scholar
O’Toole, A. J., Deffenbacher, K. A., Valentin, D. & Abdi, H. Structural aspects of face recognition and the other-race effect. Mem. Cognit.22, 208–224 (1994).
Article Google Scholar
Nestor, A., Plaut, D. C. & Behrmann, M. Feature-based face representations and image reconstruction from behavioral and neural data. Proc. Natl Acad. Sci. USA113, 416–421 (2016).
Article Google Scholar
Blanz, V. & Vetter, T. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques 187–194 (ACM Press/Addison-Wesley, 1999).
Benson, P. J. & Perrett, D. I. Perception and recognition of photographic quality facial caricatures: implications for the recognition of natural images. Eur. J. Cogn. Psychol.3, 105–135 (1991).
Article Google Scholar
Benson, P. J. & Perrett, D. I. Visual processing of facial distinctiveness. Perception23, 75–93 (1994).
Article Google Scholar
Byatt, G. & Rhodes, G. Recognition of own-race and other-race caricatures: implications for models of face recognition. Vision Res.38, 2455–2468 (1998).
Article Google Scholar
Lee, K., Byatt, G. & Rhodes, G. Caricature effects, distinctiveness and identification: testing the face-space framework. Psychol. Sci.11, 379–385 (2000).
Article Google Scholar
Rhodes, G., Byatt, G., Tremewan, T. & Kennedy, A. Facial distinctiveness and the power of caricatures. Perception26, 207–223 (1997).
Article Google Scholar
Rhodes, G., Brennan, S. & Carey, S. Identification and ratings of caricatures: implications for mental representations of faces. Cogn. Psychol.19, 473–497 (1987).
Article Google Scholar
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference onComputer Vision Vol. 2, 1150–1157 (IEEE, 1999).
Dalal, N. & Triggs, B. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005 Vol. 1, 886–893 (IEEE, 2005).
Ojala, T., Pietikainen, M. & Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell.24, 971–987 (2002).
Article MATH Google Scholar
Riesenhuber, M. & Poggio, T. Models of object recognition. Nat. Neurosci.3, 1199–1204 (2000).
Article Google Scholar
Moghaddam, B., Jebara, T. & Pentland, A. Bayesian face recognition. Pattern Recognition33, 1771–1782 (2000).
Article Google Scholar
Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1701–1708 (IEEE, 2014).
Sankaranarayanan, S., Alavi, A., Castillo, C. & Chellappa, R. Triplet probabilistic embedding for face verification and clustering. In Proceedings of the IEEE International Conference on Biometrics Theory, Applications and Systems 1–8 (IEEE, 2016).
Schroff, F., Kalenichenko, D. & Philbin, J. Facenet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 815–823 (IEEE, 2015).
Chen, J.-C. et al. An end-to-end system for unconstrained face verification with deep convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops 118–126 (IEEE, 2015).
Ranjan, R., Sankaranarayanan, S., Castillo, C. D. & Chellappa, R. An all-in-one convolutional neural network for face analysis. In 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017) 17–24 (IEEE, 2017).
Fukushima, K. Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw.1, 119–130 (1988).
Article Google Scholar
KrizhevskyA., SutskeverI. & HintonG. E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Sÿst.25, 1097–1105 (2012).
Google Scholar
Parde, C. J. et al. Face and image representation in deep CNN features. In 12th IEEE International Conference onAutomatic Face and Gesture Recognition (FG 2017) 673–680 (IEEE, 2017).
O’TooleA. J., CastilloC. D., PardeC. J., HillM. Q. & ChellappaR. Face space representations in deep convolutional neural networks. Trends Cogn. Sci.22, 794–809 (2018).
Article Google Scholar
DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn. Sci.11, 333–341 (2007).
Article Google Scholar
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci.19, 613–622 (2016).
Article Google Scholar
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci.19, 356–365 (2016).
Article Google Scholar
Brennan, S. E. Caricature generator: the dynamic exaggeration of faces by computer. Leonardo18, 170–178 (1985).
Article Google Scholar
Rhodes, G. Superportraits: Caricatures and Recognition (Psychology Press, 1997).
Leopold, D. A., O’Toole, A. J., Vetter, T. & Blanz, V. Prototype-referenced shape encoding revealed by high-level aftereffects. Nat. Neurosci.4, 89–94 (2001).
Article Google Scholar
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res.15, 3221–3245 (2014).
MathSciNet MATH Google Scholar
Grill-Spector, K. & Weiner, K. S. The functional architecture of the ventral temporal cortex and its role in categorization. Nat. Rev. Neurosci.15, 536–548 (2014).
Article Google Scholar
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol.8, 382–439 (1976).
Article Google Scholar
EberhardtS., CaderJ. G. & SerreT. How deep is the feature analysis underlying rapid visual categorization? Adv. Neural Inf. Proc. Syst.29, 1100–1108 (2016).
Google Scholar
Kietzmann, T. C. et al. The occipital face area is causally involved in facial viewpoint perception. J. Neurosci.35, 16398–16403 (2015).
Article Google Scholar
Natu, V. S. et al. Dissociable neural patterns of facial identity across changes in viewpoint. J. Cogn. Neurosci.22, 1570–1582 (2010).
Article Google Scholar
Grill-Spector, K. et al. Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron24, 187–203 (1999).
Article Google Scholar
Yue, X., Cassidy, B. S., Devaney, K. J., Holt, D. J. & Tootell, R. B. Lower-level stimulus features strongly influence responses in the fusiform face area. Cerebral Cortex21, 35–47 (2010).
Article Google Scholar
Kay, K. N., Weiner, K. S. & Grill-Spector, K. Attention reduces spatial uncertainty in human ventral temporal cortex. Curr. Biol.25, 595–600 (2015).
Article Google Scholar
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Bansal, A., Castillo, C. D., Ranjan, R. & Chellappa, R. The do’s and don’ts for CNN-based face verification. In ICCV Workshops 2545–2554 (IEEE, 2017).
Ranjan, R. et al. A Fast and Accurate System for Face Detection, Identification, and Verification. In Proceedings of the IEEE Transactions on Biometrics, Behavior, and Identity Science 82–96 (IEEE, 2019)
Chen, J.-C., Patel, V. M. & Chellappa, R. Unconstrained face verification using deep CNN features. In IEEE Winter Conference on Applications of Computer Vision (WACV) 1–9 (IEEE, 2016).
Bansal, A., Nanduri, A., Castillo, C. D., Ranjan, R. & Chellappa, R. UMDFaces: an annotated face dataset for training deep networks. In IEEE International Joint Conference on Biometrics (IJCB) 464–473 (IEEE, 2017).
Guo, Y., Zhang, L., Hu, Y., He, X. & Gao, J. MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision 87–102 (Springer, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res.9, 2579–2605 (2008).
MATH Google Scholar
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill1, e2 (2016).
Article Google Scholar

Download references

Acknowledgements

This work had funding support from the Intelligence Advanced Research Projects Activity (IARPA). This research is based on work supported by the Office of the Director of National Intelligence (ODNI) and IARPA (via R&D contract no. 2014-14071600012). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA or the US Government.

Author information

Authors and Affiliations

The University of Texas at Dallas, Richardson, TX, USA
Matthew Q. Hill, Connor J. Parde, Y. Ivette Colón & Alice J. O’Toole
University of Maryland, College Park, MD, USA
Carlos D. Castillo, Rajeev Ranjan & Jun-Cheng Chen
University of Siegen, Siegen, Germany
Volker Blanz

Authors

Matthew Q. Hill
View author publications
You can also search for this author in PubMed Google Scholar
Connor J. Parde
View author publications
You can also search for this author in PubMed Google Scholar
Carlos D. Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Y. Ivette Colón
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Cheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Volker Blanz
View author publications
You can also search for this author in PubMed Google Scholar
Alice J. O’Toole
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors were involved in the conceptualization and design of the methodology of the study. M.Q.H., C.D.C., R.R. and J.-C.C. handled software. The original draft of the manuscript was prepared by M.Q.H. and A.J.O. Review and editing were carried out by M.Q.H., C.J.P., Y.I.C., C.D.C., V.B. and A.J.O. Formal analysis, investigation and visualization were done by M.Q.H. and C.J.P., with validation by M.Q.H., Y.I.C., C.J.P. and C.D.C. Supervision and funding acquisition were handled by C.D.C. and A.J.O., with project administration by A.J.O.

Corresponding author

Correspondence to Matthew Q. Hill.

Ethics declarations

Competing interests

University of Maryland has filed a US patent application that covers portions of network A. R.R. and C.D.C. are co-inventors on this patent.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary figures, Supplementary tables, Supplementary methods and Supplementary discussion.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hill, M.Q., Parde, C.J., Castillo, C.D. et al. Deep convolutional neural networks in the face of caricature. Nat Mach Intell 1, 522–529 (2019). https://doi.org/10.1038/s42256-019-0111-7

Download citation

Received: 25 January 2019
Accepted: 08 October 2019
Published: 12 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1038/s42256-019-0111-7

This article is cited by

Diverse types of expertise in facial recognition
- Alice Towler
- James D. Dunn
- David White
Scientific Reports (2023)
A star-nose-like tactile-olfactory bionic sensing array for robust object recognition in non-visual environments
- Mengwei Liu
- Yujia Zhang
- Tiger H. Tao
Nature Communications (2022)
Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models
- Janet H. Hsiao
- Jeehye An
- Antoni B. Chan
npj Science of Learning (2022)
A Cautionary Note on Predicting Social Judgments from Faces with Deep Neural Networks
- Umit Keles
- Chujun Lin
- Ralph Adolphs
Affective Science (2021)