The ability to individuate faces is an important skill. Because faces are composed of features that do not vary much and are organized in similar configurations, subtle differences in features and in their spacing become critical. Faces are generally thought to be processed more holistically than other objects (Farah, Wilson, Drain, & Tanaka, 1998; Tanaka & Farah, 1993; Young, Hellawell, & Hay, 1987). Specifically, recognition of a facial feature is better within a whole face than when the feature is shown alone (Tanaka & Farah, 1993). Also, naming one half of a face is more difficult when the task-irrelevant half is from a different face (Young et al., 1987), revealing an inability to selectively attend to parts in the context of a face. This holistic processing is sensitive to changes in configuration and is reduced for inverted faces or misaligned face parts (see Maurer, Le Grand, & Mondloch, 2002, for a review).

Holistic processing occurs rapidly for upright intact faces (e.g., 50 ms after the onset of a face; Richler, Mack, Gauthier, & Palmeri, 2009). It has been suggested that holistic processing supports integration when face parts are separated briefly in time (Anaki, Boyd, & Moscovitch, 2007; Anaki & Moscovitch, 2007; Singer & Sheinberg, 2006). In particular, failures of selective attention to parts in the context of a face persist when the face parts are temporally separated by up to approximately 120 ms (Singer & Sheinberg, 2006). Recognition is more successful for upright than for inverted faces when sequentially presented face parts are shown within a brief time (up to 450 ms; Anaki et al., 2007; Anaki & Mosocovitch, 2007). Such temporal integration is consistent with the idea that facial features become diagnostic over time (Vinette, Gosselin, & Schyns, 2004). One account suggests that facial features separated in time can be stored and integrated into a holistic percept in a short-term visual buffer; in other words, holistic processing might not require the simultaneous presentation of facial features (Anaki et al., 2007; Anaki & Moscovitch, 2007).

Here, we ask whether integration of temporally separated face parts is indeed of the same nature as integration in an intact face. We seek to distinguish processes that are more specific to faces versus those that may be more general to any object category. When all parts are shown at once, holistic processing is more important for faces than for other objects (Farah et al., 1998; Tanaka & Farah, 1993). It is not as clear that temporal integration of face parts shows the same advantage, since these effects obey temporal constraints that are strikingly similar to the integration of nonface visual stimuli. For instance, the interference effect for faces is strongest when the target face part is presented up to 80 ms before the irrelevant face part (Singer & Sheinberg, 2006). Likewise, brief temporal intervals between incompatible target and distractor information in the Stroop task also result in impaired performance (Glaser & Glaser, 1982; Taylor, 1977). The Stroop interference, at least in part due to conflicts at the response stage (MacLeod, 1991), peaks when the target slightly precedes the distractor for up to 100 ms under randomized conditions (Schooler, Neumann, Caplan, & Roberts, 1997). Temporal integration also occurs for visual word recognition: Letters presented in alternation are perceived as a whole word when the temporal gap between two frames is no longer than 80 ms (Forget, Buiatti, & Dehaene, 2010). Because similar temporal integration effects arise for other types of visual stimuli (e.g., words, color), we ask whether temporal integration of face parts truly reflects holistic processing.

To differentiate the sources of various types of integration of face parts, we adopted a variation of the composite task. In most composite paradigms (e.g., Richler, Gauthier, Wenger, & Palmeri, 2008; Singer & Sheinberg, 2006; Young et al., 1987), a composite face is made from pairing top and bottom halves from different individuals. In the naming version of this task, observers have to name either the top or the bottom half of a composite while ignoring the other half (Young et al., 1987). However, since a different face half is typically associated with a different name or response, the interference observed in this task could arise either from holistic processing of face halves or from response conflicts like those observed in Stroop tasks. To dissociate the potential sources of interactions arising at perceptual or response stages, additional conditions can be implemented in the composite task. In the version we use here (Richler, Cheung, Wong, & Gauthier, 2009), participants first learn names for four faces. Two faces are assigned the name “Bob,” and two others are assigned the name “Fred.” At test, a target face half (e.g., the top) from one of the four learned faces is paired with an irrelevant half (e.g., the bottom) from the same face or a different face. The critical manipulation is the face–name relation between the target and irrelevant halves at test (Fig. 1). While the irrelevant half from the same face also has the same name as the target half (same face/same name: SFSN), the irrelevant half from a different face may either have the same name as the target half (different face/same name: DFSN) or a different name (different face/different name: DFDN).

Fig. 1
figure 1

Sample face composites used in Experiments 1 and 2. During learning (Phases 1 and 2), participants learned names for four face composites, two “Bob” and two “Fred.” During testing (Phase 3), the irrelevant halves were recombined with the target halves to create composites. In the same-face/same-name condition, both the target and irrelevant halves were from the same studied face. In the different face/same-name condition, the irrelevant half was from a different face that shared the same name with the target half. In the different-face/different-name condition, the irrelevant half was from a different face that was assigned a different name from the target half

Interference in the composite task is often measured by comparing a condition in which the irrelevant half is different in both percept and name from the target half with a condition in which the halves are misaligned or both halves are from the same face. However, if the effects of holistic processing and response interference both exist in the composite task and are additive, overall interference for aligned composites may reflect both perceptual interference (i.e., holistic processing) and response interference. Using the different irrelevant-half conditions described above, these different types of interference can be dissociated. Holistic processing can be inferred by longer response times (RTs) to name the target half when the irrelevant half is from a different face that has the same name as the target face, as compared to when the irrelevant half is from the same face (DFSN vs. SFSN). In this comparison, response interference is minimized because the name of the target half is the same as that of the irrelevant half in both conditions (and the response key is also the same). Thus, any interference observed can be attributed to perceptual differences between the irrelevant halves from the same face versus a different face. Response interference, on the other hand, can be revealed by longer RTs to name the target half when there is a conflict in selecting or executing a response, given that the irrelevant half is perceptually different in both conditions (DFDN vs. DFSN).

Using this design, Richler, Cheung, et al. (2009) found that the interference for intact upright faces arises from holistic processing and not response interference: Longer RTs were observed whenever the irrelevant half was from a different face rather than from the same face, but the names associated with the face halves did not influence the effect. Here, we ask whether temporal integration of face halves reflects holistic processing or response interference. If holistic processing is a cumulative process, information from different face halves maintained in a short-term visual buffer may become integrated into a holistic percept across time (Anaki & Moscovitch, 2007). In contrast, if facial information stored in the visual buffer is not integrated perceptually, temporal integration may instead arise during the response stage.

To examine temporal integration, the target or irrelevant face half was presented either 50 or 200 ms prior to the other half. Our first goal was to replicate the temporal integration effects found in Singer and Sheinberg (2006), where temporal integration was revealed by longer RTs for DFDN trials than for SFSN trials, and this effect was larger for aligned than for misaligned composites. Next, we divided this effect into the contributions from holistic processing and response interference. In addition, if holistic processing (e.g., longer RTs for DFSN than for SFSN stimuli) is the source of temporal integration between face halves, this effect should also be disrupted by misalignment, consistent with the finding that misaligning face composites disrupts holistic processing (e.g., Richler et al., 2008; Young et al., 1987). In contrast, response interference (e.g., slower RTs for DFDN than for DFSN stimuli) may not be sensitive to misalignment (e.g., as in the Stroop task; Schooler et al., 1997).

Experiment 1

Method

Participants

Fifty members of Vanderbilt University (27 female; mean age = 22.5 years, SD = 4.5; normal/corrected-to-normal vision) were compensated $12 for participation. All participants reached at least 90% accuracy at the end of each training phase. The data from 2 participants whose performance was below chance in several test conditions were excluded from further analyses.

Stimuli

Five face tops and five face bottoms from the Max Planck Institute face database were randomly combined into five composite faces. Name assignment was counterbalanced across participants, with two of the composites assigned the name “Bob,” and two the name “Fred.” The fifth composite was not assigned a name and was only used during testing.

Aligned composites subtended 4° × 3° of visual angle, and a white line 2 mm thick separated top and bottom halves. Face halves were presented on a gray background. For misaligned composites, the top half of the composite was moved leftward and the bottom half was moved rightward, such that the side of one face half fell in the middle of the other face half.

Procedure

The experiment was conducted using MATLAB on Mac minis with 19-in. CRT monitors with 1,024 × 768 pixel resolution.

In Phase 1 (whole-face learning), participants learned the names of four whole composite faces. All four faces and their assigned names were first displayed on the screen for participants to study for as long as they wanted. Training trials began when participants terminated this study screen. On each trial, a fixation cross (500 ms) was followed by a face. Participants were told to press “1” if the face was assigned the name “Bob” and “2” if the face was assigned the name “Fred.” All participants completed two blocks of 40 trials. If accuracy was 90% or higher, participants moved on to Phase 2. Otherwise, the participants completed another block of 40 trials until this criterion was achieved, up to four additional blocks.

In Phase 2 (half-face learning), participants were trained to name face halves. The training was identical to Phase 1, except that a face half was presented in isolation on each trial. Participants named the top halves until criterion (90% accuracy) was reached, then repeated the training with the bottom halves. This training was included to ensure that names were strongly associated with each learned half (Richler, Cheung, et al., 2009).

In Phase 3 (testing), the four faces were first presented again on the screen with their assigned names. Test trials began when the participants terminated the study screen. On each trial, a fixation was presented (500 ms), followed by an isolated target face-half or a composite face with one half cued as the target. Composites were composed of a target half and one of the possible irrelevant halves with respect to the target half. Notably, either the target or the irrelevant half would be presented 50 or 200 ms prior to the other half. The response cue appeared at the onset of the first face-half, even if the target half itself would not appear for another 50 or 200 ms. Participants were told to indicate the name of the target half as quickly and accurately as possible, while ignoring the irrelevant half. They were not asked to wait for the irrelevant half, in order to encourage them to ignore it if possible. Face composites were either spatially aligned or misaligned and were presented until a response was made, to a maximum of 5 s. RTs were measured from the onset of the target face-half.

Alignment conditions (aligned/misaligned) were blocked, with the presentation order counterbalanced across participants. There were eight blocks of trials within each alignment condition, with alternating top-naming and bottom-naming blocks (four blocks each). There were four stimulus onset asynchrony (SOA) conditions (–200 or –50 ms, where the irrelevant half preceded the target half for 200 or 50 ms, and 50 or 200 ms, where the target half preceded the irrelevant half for 50 or 200 ms) and four irrelevant-half conditions (SFSN/DFSN/DFDN/unfamiliar faceFootnote 1). The SOA and irrelevant-half conditions were randomized. Note that since both temporal and spatial misalignments were involved, we used blocking and response cues to make sure that participants would not be confused about which half they should respond to. There were a total of 544 test trials.

Results

Training performance in Phases 1 and 2 is reported in Table 1. Correct RTs in Phase 3 were log10-transformed and analyzed with extreme RTs excluded (<200 ms or >3 s; 1.26% of trials). Mean correct RTs for Phase 3 are shown in Fig. 2a.

Table 1 Mean accuracy and correct RTs in the last blocks for Phases 1 (whole-face learning) and 2 (half-face learning) in Experiments 1 and 2
Fig. 2
figure 2

a Mean RTs in Phase 3 (testing) in all irrelevant-half conditions across different stimulus onset asynchrony (SOA) conditions for aligned and misaligned composites in Experiment 1. Negative SOAs indicate that the irrelevant half was presented first; positive SOAs indicate that the target half was presented first. b To emphasize the effect of holistic processing, the differences between different-face/same-name (DFSN) and same-face/same-name (SFSN) in all SOA conditions for aligned and misaligned trials are plotted. c To emphasize the effect of response interference, the differences between different-face/different-name (DFDN) and DFSN in all SOA conditions for aligned and misaligned trials are plotted. The asterisks indicate significant effects (with corrections for multiple comparisons). Error bars represent standard errors of the means

To separately examine the effects of temporal integration, holistic processing, and response interference, three 4 × 2 × 2 repeated measures ANOVAs were conducted on the correct RTs in Phase 3. Each ANOVA involved the factors SOA (–200/–50/50/200 ms), Alignment (aligned/misaligned), and Irrelevant-Half Condition (SFSN vs. DFDN for temporal integration; SFSN vs. DFSN for holistic processing; DFSN vs. DFDN for response interference). Scheffé’s tests were used to follow up significant interaction effects.

Temporal integration (SFSN vs. DFDN)

Replicating Singer and Sheinberg (2006), this ANOVA revealed a main effect of SOA, F(1, 141)   =   31.55, MSE = .0011, \( \eta_{\rm{p}}^2 \) = .402, p < .0001, with shorter RTs when the irrelevant half preceded the target half (–200/–50 ms) than when the target half was shown first (50/200 ms) (Scheffé’s tests, ps < .05). RTs were also shorter for misaligned than for aligned composites, F(1, 47)   =   4.40, MSE = .0079, \( \eta_{\rm{p}}^2 \) = .086, p = .04. The difference between SFSN and DFDN approached significance, F(1, 47)   =   3.29, MSE = .0015, \( \eta_{\rm{p}}^2 \) = .066, p = .076. Critically, there was an interaction between irrelevant-half condition and alignment, F(1, 47)   =   6.84, MSE = .127, \( \eta_{\rm{p}}^2 \) = .127, p = .012: Longer RTs for DFDN than for SFSN were found for aligned (Scheffé’s test, p < .002) but not for misaligned (Scheffé’s test, p > .54) composites. Also, the interaction between SOA and alignment was significant, F(3, 141)   =   3.20, MSE = .001, \( \eta_{\rm{p}}^2 \) = .064, p = .025, revealing larger SOA differences for aligned than for misaligned composites. No other results were significant (Fs < 1.6, ps > .19).

Holistic processing (SFSN vs. DFSN)

For holistic processing, a significant main effect of SOA, F(1, 141)   =   32.34, MSE = .0011, \( \eta_{\rm{p}}^2 \) = .408, p < .0001, revealed shorter RTs when the irrelevant half came first or when the target half appeared first for 50 ms, as compared to 200 ms (Scheffé’s test, ps < .05). The interaction between irrelevant-half condition and alignment was significant, F(1, 47)   =   4.75, MSE = .0014, \( \eta_{\rm{p}}^2 \) = .092, p = .034. Surprisingly, there was no difference between SFSN and DFSN for aligned composites (Scheffé’s test, p > .46), but overall RTs were shorter for DFSN than for SFSN for misaligned composites (Scheffé’s test, p = .027). The significant interaction between SOA and alignment, F(3, 141)   =   3.81, MSE = .0012, \( \eta_{\rm{p}}^2 \) = .075, p = .012, revealed larger SOA differences for aligned than for misaligned composites. No other results were significant (Fs < 1.40, ps > .24).

Response interference (DFSN vs. DFDN)

For response interference, a significant main effect of SOA, F(1, 141) = 28.33, MSE = .0013, \( \eta_{\rm{p}}^2 \) = .376, p < .0001, revealed shorter RTs when the irrelevant half appeared first (Scheffé’s test, ps < .05). RTs were also shorter for misaligned than for aligned composites, F(1, 47)  =  9.82, MSE = .0073, \( \eta_{\rm{p}}^2 \) = .173, p = .003, and for DFSN than for DFDN, F(1, 47)  =  13.59, MSE = .0009, \( \eta_{\rm{p}}^2 \) = .224, p = .0006. Critically, the interaction between SOA and irrelevant-half condition, F(3, 141) = 2.99, MSE = .0011, \( \eta_{\rm{p}}^2 \) = .06, p = .033, revealed that response interference was significant when the target preceded the irrelevant half for 50 ms (Scheffé’s test, p < .001), but not for other SOAs (Scheffé’s test, ps > .54). No other results were significant (Fs < .97, ps > .41).

Discussion

We replicated the temporal integration effect for aligned composites (Singer & Sheinberg, 2006). This effect was reduced for misaligned composites. While holistic processing and response interference may both contribute to the effect at different SOAs, our results indicate that the integration for aligned composites cannot be accounted for by holistic processing: When irrelevant face halves shared the same name as the target, there was no significant disadvantage for a face half from a different face. However, a reversed holistic effect was found for misaligned composites, presumably because the temporally and spatially separated face halves got assigned to different tokens rather than integrated into a unified whole.

In contrast, response interference was observed when the target half was presented 50 ms prior to the irrelevant half, regardless of alignment. This is in sharp contrast to the holistic effect observed for intact faces (Richler, Cheung, et al., 2009). Experiment 2 directly examines the possibility of a double dissociation between holistic processing and response interference for these two conditions.

Experiment 2

Method

Participants

Fifty-five members of Vanderbilt University (31 female; mean age = 25.3 years, SD = 6.4; normal/corrected-to-normal vision) were compensated $6 for participation. The data from 6 participants who did not reach the training criterion (95% accuracy; see below) were discarded. All remaining participants performed above chance in all conditions.

Stimuli and procedures

All stimuli and procedures were identical to those of Experiment 1, except for the following changes. During learning, the training criterion was raised to 95% accuracy with a minimum of three training blocks for each phase, to match the procedure in Richler, Cheung, et al. (2009).Footnote 2 During test, the unfamiliar-face condition was not included, and the two SOA conditions (0 and 50 ms) were blocked and counterbalanced across participants to prevent potential contextual influences.

Results and discussion

Training performance is reported in Table 1. Correct RTs in Phase 3 were log10-transformed and analyzed with trials excluded according to the same criterion as in Experiment 1 (0.6% of trials). Mean correct RTs for Phase 3 are illustrated in Fig. 3a.

Fig. 3
figure 3

a Mean RTs in Phase 3 (testing) in all irrelevant-half conditions across the two SOA conditions for aligned and misaligned composites in Experiment 2. b To emphasize the effect of holistic processing, the differences between different-face/same-name (DFSN) and same-face/same-name (SFSN) in the two SOA conditions for aligned and misaligned trials are plotted. c To emphasize the effect of response interference, the differences between different-face/different-name (DFDN) and DFSN in the two SOA conditions for aligned and misaligned trials are plotted. The asterisks indicate significant effects (with corrections for multiple comparisons). Error bars represent standard errors of the means

Three 2 × 2 × 2 repeated measures ANOVAs were conducted on correct RTs in Phase 3. Each ANOVA involved the factors SOA (0/50 ms), Alignment (aligned/misaligned), and Irrelevant-Half Condition (composite effect, SFSN vs. DFDN; holistic processing, SFSN vs. DFSN; or response interference, DFSN vs. DFDN). Bonferroni-corrected planned comparisons were conducted to examine the effects of holistic processing and response interference for aligned and misaligned composites at 0 and 50 ms.

In all three ANOVAs, the main effects of alignment and irrelevant-half condition and the interaction between these two factors were significant, Fs(1, 48) ≥ 3.95, ps ≤ .05. The three-way interaction between SOA, alignment, and irrelevant-half condition was significant for holistic processing, F(1, 48)  =  5.18, MSE = .019, \( \eta_{\rm{p}}^2 \) = .097, p = .027, but not for response interference, F(1, 48)  =  2.21, p = .14. No other results in the omnibus ANOVAs were significant (Fs < 2.11, ps > .15). Planned comparisons revealed that holistic processing (SFSN vs. DFSN) was only observed for aligned composites at 0 ms (p < .02), and response interference (DFSN vs. DFDN) was only observed for aligned composites at 50 ms (p < .04). No other comparisons were significant (ps > .34).

These results confirm a double dissociation for aligned faces: Holistic processing is consistently more important for face halves presented simultaneously and is not found for temporally separated face halves. Instead, response interference is reliably more important for temporally separated halves. For misaligned composites, unlike in Experiment 1, no reverse holistic effect or response interference was observed, suggesting that interference is less reliable for misaligned than for aligned composites.

General discussion

Integrative processing is thought to be stronger for faces than for nonface objects (see, e.g., Farah et al., 1998), but the contributions of different types of integration have rarely been closely examined. Here we distinguished contributions from holistic processing and response interference. Our results suggest that holistic processing is mainly engaged when all parts of a test face are shown simultaneously and in the familiar configuration. The interaction between temporally separated face parts instead arises at the response stage. This is consistent with findings in temporal Stroop tasks (Glaser & Glaser, 1982) and word recognition tasks (Forget et al., 2010), suggesting domain-general mechanisms in response interference.

Our finding that holistic processing fails to operate when parts are presented separately in time may be due to the fact that our presentation conditions do not support natural eye movements. During free viewing of a face, eye movements can play an important role in the encoding of facial features (Henderson, Williams, & Falk, 2005). However, faces can be processed holistically in the absence of eye movements (Richler, Mack, et al., 2009), and extensive eye movements may be necessary only when faces are relatively close to the observer. Interestingly, recent work has suggested that holistic processing drops sharply with increasing size at such near distances (McKone, 2009). Our results suggest a possible reason for this: To the extent that large faces require several fixations, the need for temporal integration may limit holistic processing.

Our findings may help explain the temporal integration observed in other paradigms. For instance, Anaki et al. (2007) presented parts of a face in a brief sequence and found better performance for upright than for inverted orientation. Because the percept and response are confounded in those studies, the integration effects may instead be accounted for by response facilitation, since all parts led to the same response. Note also that even with unfamiliar faces, some response processes may still be engaged if a response is required (e.g., “this face is different from the target face”). Although our methods do not apply directly to these other designs, our results emphasize the importance of investigating the locus of temporal integration in such cases.

Although there is debate about whether holistic processing occurs during encoding (Farah et al., 1998; Tanaka & Farah, 1993) or arises because face parts are not treated independently during perceptual decisions (Richler et al., 2008; Wenger & Ingvalson, 2002), according to both hypotheses holistic processing refers to an integrative process operating during perception, prior to response selection or execution. Our findings with intact faces are consistent with this assumption and provide important temporal constraints for models of holistic processing.