A man enters a coffee shop to meet with his friend. His next action is to search for the friend, who is already waiting in the shop among the customers. He spots his friend and his friend notices him. Immediately after eye contact is established, the friend smiles at him. Many people have this kind of experience, and they usually feel pleasant when they see smiling faces. The expression “feeling pleasant” can be considered a highly reinforcing condition, and also one’s verbal response to reinforcing stimuli. Researchers have compared reinforcing values of stimuli measured by a pattern of choosing (Martin, Yu, Martin, & Fazzio, 2006) via the use of the paired-stimulus procedure (Fisher et al., 1992) and the multiple-stimulus presentation procedure (Windsor, Piché, & Locke, 1994).

Human facial expressions provide critical social cues and shape behaviors that allow individuals to adapt to social communities. Learning via facial expressions from an early age is crucial for developing social communication. Evidence has suggested that facial expressions serve as the antecedents for various types of social behaviors, such as social referencing and empathic responses during infancy (e.g., Feinman & Lewis, 1983; Zahn-Waxler, Robinson, & Emde, 1992). For example, 10-month-olds will typically reach out to strangers while they are in the presence of their mothers’ positive facial expression (Feinman & Lewis, 1983), and 14-month-olds exhibit apparent concern in response to an experimenter’s hurt facial expression (Zahn-Waxler et al., 1992). Such social behaviors induced by facial expressions cannot be established without visual attention toward the face as a prerequisite to discriminate facial expressions.

Children’s visual preferences for faces have been widely documented using automatic eye-tracking technologies. For example, infants’ visual fixation on faces increases throughout the first year of their life (Frank, Vul, & Johnson, 2009), and 6-month-old infants fixate more on upright faces than inverted/scrambled faces or objects (Gliga, Elsabbagh, Andravizou, & Johnson, 2009). The question that follows is: How do infants develop visual preferences for faces? Some researchers have proposed that faces may acquire reinforcement values through reinforcement learning (Behrens, Hunt, Woolrich, & Rushworth, 2008; Vernetti, Smith, & Senju, 2017).

Previous studies have demonstrated that allocation of visual attention can be modified through reinforcement by nonsocial stimuli. In other words, these studies have employed reinforcers such as the deflection of a needle on a meter (Rosenberger, 1973; Schroeder & Holland, 1968a, 1968b, 1969), gratings (Paeye, Schütz, & Gegenfurther, 2016), tone (Paeye & Madelain, 2011), animal pictures (Wang et al., 2012), and money (Anderson, Laurent, & Yantis, 2013; Anderson & Yantis, 2013).

On the other hand, few studies have examined whether social stimuli modify the allocation of visual attention through reinforcement learning. In one recent study, Vernetti et al. (2018) compared the reinforcement value of different social videos using gaze-contingent eye tracking, demonstrating that toddlers initially looked more often toward the face than toward toys. This suggests that the reinforcement values of social stimuli are stronger than those of nonsocial stimuli in gaze fixation.

When considering the face as a social stimulus, we must distinguish several variables that affect visual attention, such as speech (spoken or unspoken), movement (static or dynamic), and facial expression. A growing body of evidence, in particular in autism research and other fields, has reported that facial expressions function as discriminative stimuli for visual fixation, demonstrating that gaze fixation for certain facial parts (i.e., eyes, nose, mouth) is differentiated by facial expression (Eisenbarth & Alpers, 2011; Falck-Ytter, Bölte, & Gredebäck, 2013; Matsuda, Minagawa, & Yamamoto, 2015; de Wit, Falck-Ytter, & von Hofsten, 2008). Facial expressions function as both the antecedents and the consequences of gaze fixation, yet previous studies have only focused on the former. Facial expressions have been shown to function as reinforcers in key-press behavior to the following effect: smiles (happy facial expressions) reinforce key-press behaviors (Averbeck & Duchaine, 2009; Furl, Gallagher, & Averbeck, 2012; Heerey, 2014) whereas angry facial expressions punish such behavior (Blair, 2003; Kringelbach & Rolls, 2003). Thus, facial expressions could also possibly function as reinforcers in gaze fixation.

To create an experimental paradigm that permits the comparison of reinforcement value of facial expressions in visual behavior, we combined a gaze-contingent paradigm (Vernetti et al. 2018; Vernetti et al., 2017) and a concurrent reinforcement schedule (Ferster & Skinner, 1957). This paradigm involved participants who could animate one of two faces through their gaze behavior under concurrent random-ratio (RR) schedules. In contrast to preferential looking paradigms, these have been used to examine preferences for visual stimuli in infants or toddlers (Fantz, 1965; Pierce et al., 2016; Teller, 1979), and the faces were only animated when the participant looked at them. During typical procedures used by preferential looking paradigms, facial expressions are displayed on the monitor for several seconds, and the fixation duration on the display is measured. Two neutral expressions modeled by the same person are presented, and positions (right/left) are used as conditioned stimuli. As the participant looks at the right face, the facial expression changes from neutral to happy. It changes from neutral to angry as the participant looks at the left face. We also added bell sounds as auditory feedback to enhance participants’ attention to changes in facial expressions. According to the extant literature, we expected that the majority of participants’ gaze behavior would be reinforced by happy faces more than angry ones.

Method

Participants

In total, 20 Japanese volunteers (11 females, 9 males; aged 20–28 years) were recruited from the Department of Psychology at Keio University (which includes undergraduate and graduate students) to participate in this study. All participants had normal or corrected vision, and all provided written informed consent using a form approved by the Keio University Institutional Review Board.

Apparatus

Fixations were noninvasively recorded using an infrared eye tracker (Tobii X120, Tobii Technology Japan, Ltd., Minato-ku Takanawa, Japan) operating at 60 Hz. The velocity threshold was a 35 pixel/window, and the distance threshold was 35 pixels. The characteristics of the eye tracker were as follows: accuracy 0.5°, spatial resolution 0.2°, drift 0.3°. Tobii X120 allows 30 × 22 × 30 cm freedom of head movement. The eye tracker was placed in front of a 27-inch monitor (1080 × 1920 pixels), which measured approximately 25.3° and 43.5° in terms of vertical and horizontal visual angles, respectively. The experimental program was made by Tobii SDK, and the low eye-tracking data was used without a fixation detection algorithm, such as a sliding window average method.

Stimuli

Next, 16 stimuli (576 × 576 pixels, 12.0° × 12.0°) were produced by filming eight trained adults (four females, four males) as they demonstrated dynamic (changing from neutral to emotional) expressions of happiness and anger (eight expressions of happiness and eight of anger). During the experiment, the amount of reinforcement obtained indicated no distinctive difference between models (see Appendix Table 3). Although neutral static faces were also presented, these were the first frames of the videos. All videos were edited using Adobe Premier Pro CS6 and Adobe After Effects CS6 to ensure uniformity of luminance and facial size. The changes in expression were also slightly speeded up or slowed down to compensate for differences in timing between the expressions produced by the different individuals. Each video began with a neutral expression, which was followed by an emotional (happy or angry) expression. Each clip was 2 seconds long.

Procedure

Every participant was tested individually in a quiet room and seated in a chair without a chin rest that was placed approximately 85 cm from the monitor. Before the experiment began, a 9-point calibration was conducted. The experiment was then introduced to the participants. Their only task was to view the monitor. Between trials, each participant was instructed to focus their gaze on a fixation cross at the center of the monitor.

The experimental session included 128 trials during which the eight models were presented 16 times randomly. Figure 1 illustrates the sequence of events for a single trial. The happy or angry facial expressions were always displayed on the same side for each participant, and two facial expressions were counterbalanced for position across the 20 participants. Each trial began with a fixation cross placed at the center of the screen. Once the participants had fixed their gaze on the cross for 500 ms, two neutral expressions (the first frames of the videos of emotional expressions) of one model were displayed side-by-side on the screen. Fixation on either the right or left face for 200 ms triggered the following: fixation on one side was then followed by a 2000 ms display of an animated or nonanimated face; the probabilities of reinforcement with animated (emotional) faces (RR2.3: p = 0.43) were the same on both sides of the display. Both emotional faces were presented along with a bell sound to enhance participants’ attention to changes in facial expressions. During the pilot experiment under a concurrent FR1 schedule without sound feedback, participants showed a strong side bias, such as repeatedly looking at the side where the first reinforcement was obtained, and participants also reported that it was challenging for them to maintain attention on the screen without sound feedback. Therefore, we added sound feedback, and reduced the reinforcement rate. The intertrial interval was 500 ms. When the participants fixated on the face on either side of the display, the face on the other side disappeared. Thus, the video clips played in a way that was triggered by participants’ fixation. We did not collect the fixation data while the video was playing.

Fig. 1
figure 1

Sequence of events for a single trial. The side where the happy facial expressions were displayed was fixed for each participant but counterbalanced across participants. When participants fixated on the face on either side of the display, the face on the other side disappeared. The sound icon indicates the presence or absence of the bell sound.

Data Analysis

The frequency of the first fixation (> 200 ms) on the side followed by the happy face was converted to proportions, which were calculated by dividing the number of first fixations on the side followed by a happy face by the total number of trials. Thus, a positive proportion would indicate that fixation was reinforced by happy faces whereas for a negative proportion, fixation would be reinforced by angry faces. The mean proportion of trials fixated on the side followed by happy faces was compared to chance (p = 0.5): a 1-sample t-test with an arcsine transformation was employed. The arcsine transformation was applied to the proportion of trials fixated on the side followed by happy faces to equalize variance across different performance levels. The statistical conclusions that were based on the transformed and untransformed proportions did not differ. In addition, we conducted a univariate repeated measure ANOVA to examine differences between quartiles of trials (1–32, 33–64, 65–96, 97–128 trials). We also analyzed the proportion of trials fixated on the side using a binominal distribution for each participant.

Results

Table 1 shows the number and percentage of responses, number of reinforcements obtained, and first reinforcement by the specific facial expression for each participant. Figure 2 illustrates the frequency of first fixation on the side followed by happy faces as a percentage. Overall, 80% of participants (n = 16) fixated first more on the side followed by a happy face, whereas 20% of participants (n = 4), fixated first more on the side followed by an angry face (M = 0.586, SD = 0.107, range = 0.438–0.844, CI 95%: 0.536–0.642). The results of a 1-sample t-test with an arcsine transformation showed that fixation behaviors were reinforced more by the happy face, t(19) = 3.509, p =.002, d =.785.

Table 1 Number and percentage of responses, number of reinforcements obtained, and first reinforcement by type of facial expression for each participant
Fig. 2
figure 2

A box plot with bee swam plots for each participant, illustrating proportion of the frequency of first fixation (>200 ms) on the side followed by a happy facial expression (128 trials). A broken line indicates the chance level (P = 0.5).

Table 2 shows the proportion of the frequency of first fixation (> 200 ms) on the side followed by a happy facial expression between quartiles of trials (1–32, 33–64, 65–96, and 97–128) for each participant. The ANOVA indicated a significant effect for quartiles, F (3, 57) = 3.02, p = .037, 휂푝2 = 0.14, but post-hoc tests with a Bonferroni correction revealed no significant difference between quartiles.

Table 2 The proportion of the frequency of first fixation (>200 ms) on the side followed by a happy facial expression for quartiles of trials (1-32, 33-64, 65-96, and 97-128) for each participant

On the other hand, for a binominal distribution with n = 128 and p = 0.5, values above 0.587 are significant at the .05 level. Using this criterion, 7 participants out 20 fixated more on the side followed by a happy face, and no participants were more likely to fixate on the side followed by an angry face (Table 1).

Discussion

This study found that visual fixations were reinforced more by happy faces than by the angry ones. In particular, the frequency of the fixation during the first 200 ms by participants on one side or the other of the monitor increased when a happy face followed, suggesting that in visual fixation behavior the reinforcement value of happy facial expressions is greater than that of angry facial expressions. However, happy faces were not strong reinforcers because only seven participants fixated more often on the side that was followed by a happy face than the chance level for 128 trials. Furthermore, no significant increase across trials was observed.

According to Michael (1982), we should consider at least two effects of happy faces: one is inducing particular response in their presence, and the other is inducing the response that has been followed by them in the past. In our study, happy faces that were presented as consequences induced more visual fixation behavior; however, it is conceivable that happy faces were not strong reinforcers, because the learning effect was not observed across trials. This result might relate to the fact that happy faces attract visual attention in infants due to biological preparedness whereas subsequent learning might change the function of happy faces in adults’ gaze behaviors. Although infants fixate more on happy faces relative to angry or neutral faces (LaBarbera, Izard, Vietze, & Parisi, 1976), adults show preferential attention to fearful facial expressions compared to happy or neutral expressions (Pourtois, Grandjean, Sander, & Vuilleumier, 2004).

Previous studies have found that smiles (happy facial expressions) reinforce key-press behavior (Averbeck & Duchaine, 2009; Furl et al., 2012; Heerey, 2014) whereas angry facial expressions punish it (Blair, 2003; Kringelbach & Rolls, 2003). Our results provide an important extension of these findings by demonstrating that visual fixation behavior is also reinforced by happy faces rather than angry faces. Our results did not allow us to determine the absolute reinforcement values of all facial expressions but only the relative reinforcement values of happy faces compared to angry ones. Thus, we should consider the possibility that the effects observed in this study may be due to the avoidance of angry faces.

Our results demonstrated that a gaze-contingent paradigm (Vernetti et al., 2018; Vernetti et al., 2017) in combination with concurrent reinforcement schedules (Ferster & Skinner, 1957) could be utilized for a quantitative comparison of reinforcement values for visual stimuli during visual fixations. Thus, this study succeeded in examining the functions of facial expressions as consequences, independent of antecedents, in contrast with previous studies that used the preferential looking paradigm (Fantz, 1965; Pierce et al., 2016; Teller, 1979). Advances in this experimental paradigm might make it easier for behavior analysts to conduct experiments using automatic eye tracking. In recent years, some studies in behavior analysis have used automatic eye tracking, focusing on visual fixations during discrimination learning (Dube et al., 2010; Perez, Endemann, Pessôa, & Tomanari, 2015; Steingrimsdottir & Arntzen, 2016). Researchers have examined the relationships between gaze fixation and error rates in matching-to-sample tasks (Dube et al., 2010; Steingrimsdottir & Arntzen, 2016) and chosen components of compound stimuli (Perez et al., 2015). The current findings provide opportunities for using automatic eye tracking not only for discrimination learning but also preference assessment. Several studies have reported that eye gaze measured by behavioral observation can be used to identify reinforcing stimuli (Cannella-Malone, Sabielny, & Tullis, 2015; Fleming et al., 2010). Thus, our new experiment paradigm suggests that eye-tracking technology could be used to automatically identify reinforcing stimuli.

In general, our results may also expand upon the findings of studies that have attempted to determine preference assessment during social interactions (Call, Shillingsburg, Bowen, Reavis, & Findley, 2013; Kelly, Roscoe, Hanley, & Schlichenmeyer, 2014). Such studies have used a selection of pictorial cards (Kelly et al., 2014) or time spent on a particular side of the room (Call et al., 2013) to determine whether social interactions functioned as reinforcers. The procedure used in our study would only require that participants contribute by viewing a display monitor, the result being that the comparison of reinforcement values for individuals who have a repertoire with few responses (i.e., infants, toddlers) would be simplified. This study, however, only tested visual fixation to compare the reinforcement values of social stimuli. Future studies are necessary to test whether the eye-tracking procedure used here could predict the function of social stimuli for different response topographies.

There were several limitations to this study. First, we used concurrent RR schedules, which limit controlling confounding variables, such as the difference of reinforcement ratios. Future studies warrant the use of a concurrent-chain schedule (Mazur, 1991) to equate reinforcement ratios. Second, we did not implement a changeover delay (COD; Hernstein, 1961) into the current procedure. Future studies should include a COD during the procedure to avoid the accidental reinforcement of switching behavior. Third, we only use a RR2.3 schedule. Future studies should compare different schedules or reinforcement ratios including concurrent FR1. Using other schedules may also help understand why there was no significant difference between quartiles of trials in this study.

Only one-third of participants were significantly more likely to fixate on the side followed by a happy facial expression. This weak effect of happy faces might be due to the use of a videotaped model in this study. For example, Laidlaw, Foulsham, Kuhn, and Kingstone (2011) reported that gaze behavior toward a physically present confederate differed greatly from a videotaped confederate. More naturalistic tasks might be crucial to examine the functions of facial expressions as consequences in everyday life.

Nevertheless, the current method serves to bridge the disconnect between developmental psychology and human operant studies by establishing a behavior analytic procedure to understand visual fixation responses to social stimuli, which clarifies reinforcement contingencies in a manner not discernable in traditional developmental psychology approaches. In particular, this study suggests that social visual engagement, which has been studied in the context of autism research (Constantino et al., 2017; Jones & Klin, 2013; Klin, Jones, Schultz, Volkmar, & Cohen, 2002), can be studied in the context of behavior analysis. Furthermore, this study may also provide insights into intervention studies exploring empathic response (Schrandt, Townsend, & Poulson, 2009) or social referencing (Pelaez, Virues-Ortega, & Gewirtz, 2012), which would regard visual fixation on facial expression as a prerequisite for such social behavior.