Introduction

In one of the first attempts to formulate a model of the social brain, Brothers (1990) considered a few regions primarily focusing on the amygdala, orbitofrontal cortex, and the superior temporal sulcus. Since then, the number of regions implicated in social cognition has rapidly proliferated (Adolphs 2009; Lieberman 2010; Todorov et al. 2011a). The two major reasons for this proliferation are methodological advances in functional neuroimaging research and the introduction of multiple experimental paradigms tapping diverse aspects of social cognition. These aspects range from the study of perception of emotional expressions to the study of representation of others’ mental states and actions. At the same time, various proposals have been made about the core region/s underlying social cognition. Depending on one’s interests, the seat of social cognition is either in the medial prefrontal cortex (Amodio and Frith 2006), the temporoparietal junction (Saxe and Wexler 2005), or in the inferior frontal gyrus and the inferior parietal lobule (Gallese et al. 2004). Although these proposals have great merits, they have been derived from narrowing down social cognition to specific tasks such as understanding beliefs and understanding goal-directed actions. Ultimately, multiple functional brain networks underlie the complexity of social cognition.

Arguably, a good starting point for building a comprehensive model of social cognition is the ability to represent others as distinct individuals. Understanding actions, beliefs, and intentions presupposes the ability to perceive and represent other people as agents. For most people, face perception and memory is critical for representing others, and people are extremely adept at that task. Decades of computer science research have yet to produce a computer model that approximates human performance of face recognition (Bowyer et al. 2006; Sinha et al. 2006). Moreover, faces are not only used to represent and track individuals over time, but also provide a wealth of social information ranging from the individual’s membership in social categories (e.g., age, sex, race) to his or her mental and emotional states (e.g., bored, anxious, etc.). Not surprisingly, after extremely brief exposures or highly degraded visual input, people can identify faces (Grill-Spector and Kanwisher 2005; Yip and Sinha 2002), their race and gender (Cloutier et al. 2005; Martin and Macrae 2007), recognize their emotional expressions (Esteves and Öhman 1993), and make a variety of social judgments such as aggressiveness (Bar et al. 2006), trustworthiness (Todorov et al. 2009), and sexual orientation (Rule and Ambady 2008). Perception of faces is inherently imbued with affect (Todorov et al. 2008).

Yet, until recently face perception has been generally considered a cognitive area of research with forays into other areas only when emotional expressions or affective associations with faces are the focus of research. Standard cognitive models (Bruce and Young 1986) and their corresponding neural equivalents (Haxby et al. 2000) are not framed in social terms, and regions that are dedicated to face processing are rarely framed as “social regions.” To a large extent, this probably reflects disciplinary divisions and interests. The neural underpinnings of face perception have been primarily studied by vision scientists for whom faces are a well-defined category of complex stimuli that can be contrasted to other categories of complex stimuli such as houses.

Face selectivity in the brain

Vision scientists have had a tremendous success in mapping the regions responsible for face perception. Until the middle of Twentieth century, it was not even established that the inferior temporal (IT) cortex is involved in vision and, in particular, object recognition (Gross 1994). Face selective neurons were discovered in the IT cortex of the macaque brain in the 1970s (Bruce et al. 1981; Desimone 1991; Perrett et al. 1982). Consistent with these findings, Positron Emission Tomography studies of humans in the early 1990s reported face responsive regions in fusiform and inferior temporal regions (Haxby et al. 1993; Sergent et al. 1992). Electrophysiological studies recording from the same regions in epileptic patients found negative potentials (N200) evoked by faces (Allison et al. 1994). Subsequent functional Magnetic Resonance Imaging (fMRI) studies using a variety of categories established a face selective region in the fusiform gyrus (Kanwisher et al. 1997; McCarthy et al. 1997). This region—labeled the fusiform face area (FFA; Kanwisher et al. 1997)—can be reliably identified in individual subjects and its location is robust with respect to task demands (Berman et al. 2010). Two other regions can be consistently identified across most subjects: a region in the posterior Superior Temporal Sulcus (pSTS) and a region in the occipital gyrus—labeled the occipital face area (OFA). These regions are usually referred to as comprising the core system for perceptual analysis of faces (Haxby et al. 2000; Said et al. 2011).

Two of the most exciting recent developments in the field are the combination of fMRI and single cell recordings in macaques (Tsao et al. 2006) and the use of Transcranial Magnetic Stimulation (TMS) in humans (Pitcher et al. 2007). Tsao and her colleagues used fMRI to identify face selective patches in the macaque brain and then recorded from these patches. They identified a stunning number of face selective neurons in these patches. In contrast to previous studies, which have rarely reported more than 20% of face selective neurons from the sample of recorded neurons, Tsao and her colleagues reported more than 90% of face selective neurons in some of the patches. Pitcher and his colleagues used TMS to transiently disrupt the activity of the right OFA (it is not possible to target the FFA) and found that this affected performance on face perception tasks.

Undoubtedly, we have accumulated rich evidence for the importance of the “core” regions in face perception. However, given the affective and social significance of faces, the question is whether the core regions are sufficient to describe face perception. Of course, researchers have acknowledged the participation of other regions, including both subcortical and prefrontal, but these regions are usually considered as part of the “extended” as opposed to “core” system of face processing (Haxby et al. 2000).

In the rest of the paper, I argue that faces automatically evoke responses not only in the core regions but also in regions in the medial temporal lobe (MTL). In particular, I focus on the amygdala and argue that it is an integral part of the functional network dedicated to face processing. In the next section of the paper, I review evidence consistent with a general role of the amygdala in face processing. This evidence comes from (a) single cell recording studies in both monkeys and humans; (b) human functional localizer studies; and (c) meta-analyses of neuroimaging studies involving faces. In the last section of the paper, I propose a hypothesis that the key property of faces represented in the amygdala is their typicality. I also attempt to place this hypothesis into an overall framework that accommodates not only face findings but also findings from other stimuli and other modalities.

Face selective responses in the primate amygdala

The importance of the amygdala for perception, learning, memory, and behavior is well established (Aggleton 2000). In almost all cases, the role of the amygdala is related to the affective significance of stimuli. In this context, it is not surprising that the first functional neuroimaging studies that targeted the amygdala and face perception used faces expressing emotions (Breiter et al. 1996; Morris et al. 1996). However, it is unlikely that the role of the amygdala in face processing is limited to processing of emotional expressions.

At about the time of the discovery of face selective neurons in IT cortex, it was also discovered that there are visually responsive neurons in the macaque’s amygdala and that some of these neurons respond to faces (Sanghera et al. 1979). A number of subsequent neurophysiology studies reported face responsive neurons in the amygdala (Perrett et al. 1982; Leonard et al. 1985; Rolls 1984; Wilson and Rolls 1993; for a review see Rolls 2000). Recent studies have confirmed these findings. Nakamura et al. (1992) showed that the amygdala responds to visual stimuli that are not relevant to the immediate task, and that a high proportion of the visual neurons are category selective with some of the neurons preferring monkey’s faces and a smaller proportion human faces. Other studies have found selective responses for emotional expressions and identity (Gothard et al. 2007) and supramodal neurons responding to both visual (faces) and auditory (sounds) social cues (Kuraoka and Nakamura 2007).

Importantly, the monkey neurophysiology findings have been confirmed in human studies (Fried et al. 1997; Kreiman et al. 2000). Fried and his colleagues recorded from neurons in the MTL of patients undergoing treatment for epilepsy. They found face selective neurons in the amygdala, hippocampus, and entorhinal cortex. Subsequent studies have shown that the responses of some of these neurons are modulated by face familiarity (Quiroga et al. 2005; Viskontas et al. 2009). These findings are consistent with findings from patients with amygdala lesions who show impairments at face recognition (Aggleton and Shaw 1996), although the most studied patient with bilateral amygdala damage, SM (Adolphs and Tranel 2000), seems to be primarily impaired at recognition of fearful expressions.

The logic of neurophysiology studies on category selectivity is to present stimuli representing different categories (e.g., faces, everyday objects, novel objects, etc.) and look for neurons that show preference for one or more categories. The same logic underlies neuroimaging studies that use functional localizers. In such studies, human subjects are presented with faces and a number of other categories such as houses, hands, chairs, flowers, etc. Such studies identified the FFA (Kanwisher et al. 1997; McCarthy et al. 1997; Tong et al. 2000), the OFA (Gauthier et al. 2000; Puce et al. 1996), and face selective regions in the pSTS (Allison et al. 2000; Puce et al. 1996). Despite some controversy about the value of functional localizers (Friston et al. 2006; Saxe et al. 2006), they are an excellent tool for identifying category selective regions and then probe the response properties of these regions. A recent meta-analysis also shows that, at least in the case of localizing the FFA, the results are robust with respect to task demands and control categories (Berman et al. 2010).

If there are face selective neurons in the amygdala, as suggested by neurophysiology studies, why is it that fMRI studies that use functional localizers do not detect face selective voxels in the amygdala? There are, at least, two primary sets of reasons. First, the amygdala is a very small structure that is difficult to image not only because of its size but also because of its location (LaBar et al. 2001; Zald 2003). Moreover, in almost all neurophysiology studies, the number of face selective neurons is small, rarely exceeding 10% of the recorded neurons. This suggests that there would be a few face selective voxels in the amygdala. Second, given the expected small size of face selective clusters, it would be difficult to find these clusters unless one is looking for them. In fact, there is a large variation across individual subjects in functional localizer studies. The typical approach in such studies is to threshold the statistical map of the contrast of faces and the control category (or categories) at a specified probability value (e.g., p = .005) and then to record the locations of face selective regions for each subject. However, the number of observed peaks can vary from a few or none in some subjects to a few dozens in other subjects. Researchers would typically record peaks from the fusiform gyri, occasionally from the occipital gyri, and pSTS, and rarely from other regions. Some of this individual variation in observed peaks is due to a measurement error, which can be reduced by averaging across subjects. However, functional localizers were specifically introduced to map category selective regions for individual brains and, hence, to avoid the need to conduct group analyses (Kanwisher et al. 1997). The rationale for using localizers is that brains are individually different and, hence, group alignment can distort the data.

Not surprisingly, researchers who use face localizers rarely report group analyses, although these analyses can be informative and more reliable than individual level analyses (Poldrack et al. 2009). In a recent meta-analysis of studies that used functional localizers to localize the FFA, Berman and colleagues (2010) selected 49 out of 339 papers. These were studies on healthy adults that reported both the coordinates of the localized FFA and the localization task. Out of these papers, only nine reported the group analysis from the face localizer (Chen et al. 2007; Downing et al. 2006; Eger et al. 2004, 2005; Henson and Mouchlianitis 2007; Kesler-West et al. 2001; Maurer et al. 2007; Pourtois et al. 2005; Zhang et al. 2008). Four of the nine studies reported amygdala activation (see Table 1). Another study did not report a group analysis but reported that face selective voxels in the amygdala were identified by anatomical location and contrast between intact and scrambled faces (Ganel et al. 2005). Occasionally, researchers would report that they observed amygdala activation in face localizer contrasts but would not investigate this further or report the coordinates (Berman et al. 2010, p. 69; Jiang et al. 2009, p. 1085). It should be noted that the opposite is also true: emotion researchers interested in the amygdala would compare faces with another category of stimuli but not report group analyses or activations in posterior areas (Goossens et al. 2009; Hariri et al. 2002). In other cases, researchers would perform a group analysis but not individual level analyses (Fitzgerald et al. 2006; Wright and Liu 2006).

Table 1 Coordinates of face selective voxels in the amygdala from fMRI studies that compared activation to faces with activation to other categories

Comparing the studies that found amygdala activation in response to faces and those that did not shows that the former had greater statistical power to detect such activations. First, studies that found amygdala activation tended to have larger samples (mean n = 14.8 vs. 12.4). Second, these studies used a less stringent statistical criterion in the group analysis (the most stringent threshold was p < .001 uncorrected, which was the minimum criterion in the other studies). To take two extreme examples, Kesler-West et al. (2001) and Chen et al. (2007) used the same contrast (faces vs. scrambled faces) but only Kessler/West et al. reported amygdala activation in the group analysis. However, whereas Kessler/West et al.’s study had 21 subjects and used uncorrected p < .001, Chen et al.’s study had 5 subjects and used Bonferroni corrected p < .001 across all voxels. In principle, it is better to be statistically conservative, but conservative procedures would penalize small regions, particularly when the sample size of the study is small. As shown in Table 1, many human studies report amygdala activation in functional localizer tasks. This is consistent with high resolution fMRI studies of monkeys that also find face selective voxels in the amygdala (Hoffman et al. 2007; Logothetis et al. 1999).

In one of our recent studies (Said et al. 2010), we used a face localizer and following standard practices recorded the peaks in fusiform gyri, occipital gyri, and pSTS. These data are revisited here.Footnote 1 In the localizer task, subjects were presented with blocks of faces and chairs and asked to press a button when an image was repeated (one back task). As shown in Fig. 1, in addition to the clusters in the fusiform gyri (Fig. 1a), the group analysis showed large clusters in bilateral amygdala that were more active for faces than chairs (Fig. 1b, c). An analysis of individual subjects data showed that 30 out of 37 subjects had face responsive voxels in the amygdala. For this analysis, the map of the-faces-greater-than-chairs contrast was liberally thresholded at p < .05 for each individual and then intersected with an anatomical mask of the amygdala. As with the FFA, there was individual variation across subjects with respect to the size and location of the clusters of face selective voxels (Fig. 2).

Fig. 1
figure 1

Brain regions responding more strongly to faces than to chairs: bilateral fusiform gyri (a) and bilateral amygdala (b, c). The regions were identified in a group analysis (n = 37), p < .001 (uncorrected)

Fig. 2
figure 2

Clusters of voxels in the amygdala of individual subjects responding more strongly to faces than to chairs. The statistical maps for individual subjects were thresholded at p < .05 and intersected with an anatomical mask of the amygdala. Different colors indicate different clusters within the amygdala. The clusters are shown on a standardized brain image

These findings suggest that standard functional localizers can be used to identify face selective voxels in the amygdala. However, the conclusions may be limited given that we used a single control category. At the same time, using a single control category (e.g., scrambled faces, houses, etc.) to localize the posterior face selective network is a common practice and it seems that the type of category does not seriously affect the localization (Berman et al. 2010; Downing et al. 2006). Nevertheless, we need more targeted studies that use multiple categories to test for face selectivity in the amygdala.

In addition to data from single unit recordings and functional localizer studies, data from meta-analyses of functional neuroimaging studies also support a general role of the amygdala in face processing. Two large meta-analyses of PET and fMRI studies on emotional processing showed that faces are one class of stimuli that most consistently elicits responses in the amygdala (Costafreda et al. 2008; Sergerie et al. 2008). The only stimulus class that was more potent in eliciting amygdala responses was gustatory and olfactory stimuli (Costafreda et al. 2008).

Two other meta-analyses (Bzdok et al. in press; Mende-Siedlecki, Said, and Todorov, under review) analyzed fMRI studies on face evaluation. These studies typically presented emotionally neutral faces that varied either on attractiveness or perceived trustworthiness. Using an Activation Likelihood Estimation approach, Bzdok and colleagues analyzed 16 studies. Using a Multi-level Kernel Density Analysis (MKDA) approach, which treats contrast maps rather than individual activation peaks as the unit of analysis (Wager et al. 2008), Mende-Siedlecki and colleagues analyzed 30 studies. In both meta-analyses, one of the most consistently activated regions across studies was the amygdala (see Table 2).

Table 2 Coordinates of voxels in the amygdala identified in (a) meta-analyses of fMRI studies on face evaluation; (b) face localization studies (see Table 1); and (c) meta-analyses of studies on emotion processing irrespective of faces

To sum up, both single unit recording data and neuroimaging data suggest that the primate amygdala contains neurons that respond to faces.

The role of the amygdala in face processing

The question about the computational role of the amygdala in face processing is much harder than the question about establishing face selectivity in the amygdala. Although initial fMRI studies focused on the role of the amygdala in processing of fearful expressions (Morris et al. 1996; Whalen et al. 1998), subsequent studies supported a much broader role in face processing. First, many studies have observed amygdala responses not only to fearful but also to other emotional expressions, including positive expressions (e.g., Pessoa et al. 2006; Sergerie et al. 2008; Winston et al. 2003; Yang et al. 2002). Second, as described above, meta-analyses of fMRI studies on face evaluation that typically use emotionally neutral faces show that the amygdala is one of the most consistently activated regions in these studies (Bzdok et al. in press; Mende-Siedlecki et al. under review). Moreover, many studies have observed non-linear amygdala responses with stronger responses to both negative and positive faces than to faces at the middle of the continuum (Said et al. 2009, 2010; Todorov et al. 2011b; Winston et al. 2007). Third, amygdala responses have been observed to bizarre faces (faces with inverted features; Rotshtein et al. 2001) and to novel faces (Kosaka et al. 2003; Schwartz et al. 2003).

To start answering the question about the computational role of the amygdala in face processing, one needs to have a working model of how faces are represented. According to the idea of face space (Valentine 1991), faces are represented as points in a multi-dimensional face space (MDFS). Face space is a high dimensional space in which every face can be approximated as a point defined by its coordinates on the face dimensions. These dimensions define abstract, global properties of the faces. Valentine (1991) used this idea to account for a number of face recognition findings, including effects of distinctiveness (recognition advantage for distinctive faces) and race (recognition advantage for own race faces). Subsequently, face space models have been successfully used to account for a number of other face perception findings (Rhodes and Jeffery 2006; Tsao and Freiwald 2006) and to model social perception of faces (Oosterhof and Todorov 2008; Todorov and Oosterhof 2011; Walker and Vetter 2009). Finally, both single unit recording and fMRI studies have shown increased responses in face selective regions as a function of the distance from the average face (Leopold et al. 2006; Loffler et al. 2005).

Recently, using a MDFS model, we studied whether the amygdala and the FFA respond to social properties of faces or more general properties related to the distance of the faces from the average face in the model (Said et al. 2010). In terms of perception, the distance from the average face could be described as indicating the typicality of the face, where more distant faces are less typical. We used a parametric face model (Oosterhof and Todorov 2008) to generate faces that varied on valence and faces that differed on valence to a much smaller extent. Importantly, both types of faces were matched on their distance from the average face. Behavioral studies also confirmed that the faces were matched on their perceived typicality.

Contrary to our initial expectation, we found that both the FFA and the amygdala responded to the distance from the average face rather than to changes in valence. With hindsight, coding faces according to their typicality is more parsimonious than coding faces according to their social value, because the former requires only statistical learning that extracts the average and variance of the faces encountered in one’s life. Further, in real life, social attributions from facial appearance and face typicality are highly correlated (Fig. 3). Finally, the typicality explanation resolved a previous puzzle in the literature on face evaluation. Whereas some studies have observed linear responses to face valence with stronger responses to negative faces (Engell et al. 2007; Todorov and Engell 2008; Winston et al. 2002), others have observed non-linear responses with stronger responses to both positive and negative faces than to faces at the middle of the continuum (Todorov et al. 2011b). It turned out that in studies that observed linear responses, face typicality was linearly related to face valence (with more negative faces perceived as less typical). In studies that observed non-linear responses, face typicality was non-linearly related to face valence (with more negative and more positive faces perceived as less typical). Both patterns of responses could be explained by the hypothesis that the amygdala responds more strongly to less typical faces.

Fig. 3
figure 3

Scatter plots of judgments of face typicality (“How likely would you be to see a person who looks like this walking down the street?”) and judgments of face weirdness (a), attractiveness (b), trustworthiness (c), and dominance (d). Each point represents a face. Judgments are in standardized units. Typicality judgments were correlated with 13 out of 14 social judgments (Said et al. 2010). The only exception was judgments of dominance (panel D)

What is the functional value of coding face typicality? Atypical faces, by definition, are less likely to be encountered and as such are less predictable. That is, they are associated with higher uncertainty and may require deployment of additional attentional resources to resolve uncertainty. The amygdala, which receives input from IT cortex and projects back not only to IT but also to striate and extrastriate cortex (Amaral et al. 2003), is in the perfect position to modulate attention to infrequent, unexpected stimuli that have motivational significance. In other words, salient, unexpected stimuli can trigger amygdala responses, which in turn can bias attention to these stimuli (Vuilleumier 2005). There is a large body of animal work showing that the amygdala is critical for regulation of attention (Davis and Whalen 2001; Gallagher 2000; Holland and Gallagher 1999). Recent work also shows that unpredictable sound sequences evoke sustained activity in the amygdala in both mice and humans (Herry et al. 2007).

The typicality findings suggest that in the context of face perception, one of the functions of the amygdala is to regulate attention. This proposal is consistent with several other proposals about the role of the amygdala in maintaining vigilance (Whalen 2007) and detection of salient or motivationally relevant stimuli (Adolphs 2010; Sander et al. 2003). This hypothesis could account for stronger responses to bizarre faces (Rotshtein et al. 2001), novel faces (Kosaka et al. 2003; Schwartz et al. 2003), and emotional expressions (Whalen et al. 2009). It is important to note that both expressions and differences in identity could be represented within the same MDFS model (Calder and Young 2005). Finally, this hypothesis is also consistent with findings about the importance of individual differences in amygdala functioning (Aleman et al. 2008; Bishop 2008; Hariri 2009). According to the MDFS model, typicality of faces and emotional expressions can vary across individuals and such differences can result in different amygdala responses to the same face stimuli. This is an important research question to pursue in future studies.

In this framework, face information processed in face selective regions (e.g., the FFA) is further processed in the amygdala, where faces that are atypical or unexpected augment the amygdala’s responses, which in turn augment responses in face selective regions via feedback projections. Such general principles can also account for a variety of other non-face findings. These include stronger responses to both highly positive and negative visual stimuli (Sabatinelli et al. 2005), high intensity positive and negative odors (Anderson et al. 2003) and tastes (Small et al. 2003); loud sounds (Bach et al. 2008); and unpredictable sound sequences (Herry et al. 2007).

Conclusions

Although this article started with the proliferation of neural systems involved in social cognition, I focused on one specific region, the amygdala, and one category of stimuli, faces. A justification for this choice is that both the amygdala and perception of faces are at the intersection of cognition, affect, and motivation. I argued that faces robustly activate the amygdala and that one of its functions is to regulate attention to salient, atypical faces.

Undoubtedly, this proposal is an oversimplification. The amygdala consists of several nuclei with different structures, connectivity, and functions (Aggleton 2000; Amaral et al. 2003) that may play different roles in face processing. In fact, it is likely that the population of neurons that are face selective is different from the population of neurons that participate in the regulation of attention. Face selective neurons are usually located in the basolateral amygdala, whereas neurons involved in attention are located in the central nucleus. Unfortunately, current fMRI techniques do not have a sufficient spatial resolution to study subdivisions in the amygdala. It should be noted that although the activation peaks from our meta-analysis of face evaluation studies (Mende-Siedlecki et al. under review) and the face selective peaks were different (Table 2), they were in close proximity (about 3 mm distance).

At a larger scale, the network involved in face processing involves a number of regions in addition to regions in IT cortex and the amygdala. In fact, studies have shown face selectivity in lateral orbitofrontal cortex (Ó Scalaidhe et al. 1997; Rolls et al. 2006; Tsao et al. 2008). In our meta-analysis of face evaluation studies, in addition to the amygdala, we observed consistent activations across studies in ventromedial prefrontal cortex, pregenual anterior cingulate cortex, and left caudate/NAcc. Understanding face perception would require understanding the cognitive functions of all these regions and how they interact in the context of perceiving (and evaluating) faces.