The search for reliable physiological manifestations of specific emotions is a long-established and prolific research direction within affective psychology (Cacioppo, Klein, Berntson, & Hatfield, 1993), psychophysiology (Cacioppo & Tassinary, 1990), and affective computing (Picard, 1997). This work was largely inspired by the famous dictum by William James (1884) that the "feeling" of "bodily changes [that] follow directly the perception of the exciting fact ... is the emotion" (pp. 189–190). James envisioned that "as physiology advances," research would "begin to discern" that "the bodily affections characteristic of any one of the standard emotions" are "almost infinitely numerous and subtle" (p. 191). These statements were interpreted (Friedman, 2010) or perhaps misinterpreted (Ellsworth, 1994; Gendron & Feldman Barrett, 2009) as suggesting that each emotion manifests itself via a unique configuration of perceptible physiological changes, a bodily "signature" that characterizes each emotion. In the decades that followed, several claims of reliable patterning in cardiovascular, electromyographic, and neuroendocrine responses associated with distinct emotions emerged (e.g., Ax, 1953; Ekman, Levenson, & Friesen, 1983). The contemporary literature on this topic remains replete with assertions that, when several psychophysiological indices are used as predictors to distinguish between two or more induced emotions, various combinations of them can yield a satisfactory degree of differentiation. A review of 134 studies concluded that there is "considerable [autonomic nervous system] response specificity in emotion" (Kreibig, 2010, p. 394).

As the field of affective psychology developed, authors from a variety of theoretical backgrounds proposed diverse conceptual schemes for distinguishing between constructs such as "emotion" and "affect." Despite considerable differences, several of these conceptualizations converged on the notion of a hierarchy, with "core affect" (Feldman Barrett & Bliss-Moreau, 2009; Russell, 2003) or feelings generated by primal "affective processing" (Walla, 2018; Walla & Panksepp, 2013) providing an experiential substrate upon which more nuanced experiences, including particular emotions, are superimposed. These elemental constructs, namely "core affect" and feelings resulting from primal "affective processing," have been defined by a small set of dimensions, primary among them being valence (pleasure-displeasure) and perceived activation (or arousal). Similar to claims of specificity in indices manifesting distinct emotions, claims of specificity were also made about psychophysiological indices reflecting these basic affective dimensions (e.g., Frankenhaeuser, 1991).

In the last two decades, this line of investigation experienced a resurgence of interest, following the growth of the field of affective computing and the application of multivariate pattern classification algorithms (Calvo & D'Mello, 2010; Egger, Ley, & Hanke, 2019). When working with data from one sample of participants and fixed sets of emotional stimuli and psychophysiological indices, automated classifiers have reached high levels of differentiation, leading researchers to conclude that distinct emotions or affective dimensions are characterized by specific patterns of physiological responses (e.g., Christie & Friedman, 2004; Kolodyazhniy, Kreibig, Gross, Roth, & Wilhelm, 2011; Kragel & LaBar, 2014). Moreover, consistent with conceptual views that distinct emotions are subserved by distinct brain regions or networks of brain regions (e.g., Hamann, 2012; Panksepp, 2007), the application of pattern classification algorithms to neuroimaging data has supported claims that emotion specificity may also be discernible in the brain (Kragel & LaBar, 2014, 2016).

Such claims have always faced considerable conceptual and methodological challenges. Soon after the death of William James in 1910, Walter Cannon (1915) raised the first objections to the notion of specificity in the patterns of physiological responses that accompany different emotions. The crux of his argument was that most organs that respond to emotional excitation are jointly innervated by the sympathetic and parasympathetic branches of the autonomic nervous system and thus tend to respond in a "diffuse," undifferentiated manner, indicative of general arousal: "The visceral changes which accompany fear and rage are the result of discharges by way of sympathetic neurones. It will be recalled that these neurones are arranged for diffuse rather than for narrowly directed effects" (p. 276). From this, Cannon deduced that a meaningful degree of differentiation across emotions is unlikely: "In terror and rage and intense elation, for example, the responses in the viscera seem too uniform to offer a satisfactory means of distinguishing states which, in man at least, are very different in subjective quality" (p. 280). The suspected "uniformity" (i.e., nonspecificity) of physiological responses during diverse emotions became not only the central point of Cannon's (1927) formal critique of James' theory of emotion but also a major conceptual pillar of cognitive theories of emotion that emerged in the 1960s and beyond (e.g., Lazarus, 1966; Schachter, 1964).

Furthermore, the search for distinct patterns in physiological responses across different emotions ran into methodological complications, some of which remain unaddressed. Early psychophysiological investigations demonstrated that, besides any possible specificity in physiological responses associated with an emotion, researchers should also anticipate at least some degree of specificity associated with individuals (i.e., idiosyncratic patterns of responding across physiological channels) and specificity associated with the situations in which the emotion is embedded (Engel, 1960; Lacey, 1950; Lacey, Bateman, & Vanlehn, 1953; Lacey & Lacey, 1958). These complications not only make the detection of reliable patterns in physiological responses to different emotions more difficult, but they also reduce the likelihood that a pattern found in one sample of participants tested under one set of experimental conditions could be replicated in different samples and under different conditions.

At present, while numerous studies have reported promising pattern-classification results that are described as "subject-independent" and "stimulus-independent," no known model has been successfully cross-validated in a fully independent sample (i.e., different from the one used to train the classifier). This void underscores both the conceptual and technical challenges involved in this undertaking and places the reliability of the many seemingly promising results obtained with pattern classification algorithms in doubt (Quigley & Feldman Barrett, 2014). Indeed, an updated meta-analysis of indices of autonomic nervous system responses to emotion-induction procedures found no reliable evidence of specificity in response patterns (Siegel et al., 2018).

Besides the lack of consistent evidence of specificity among autonomic variables, emerging evidence from basic neuroscience and human neuroimaging is also casting doubt on the notion that distinct emotions are controlled by distinct and circumscribable brain areas or networks (Lindquist, Wager, Kober, Bliss-Moreau, & Feldman Barrett, 2012; Pessoa, 2017). Moreover, it seems unlikely that there are distinct brain areas or networks specializing in only pleasant or only unpleasant states (Berridge, 2019; Lindquist, Satpute, Wager, Weber, & Feldman Barrett, 2016). Instead, there is mounting evidence of valence-general distributed networks that dynamically switch to positive or negative "modes" (Berridge, 2019).

For example, the amygdala, once considered the "fear center" of the brain (LeDoux, 2014) or the central component of a system that evolved to preferentially or exclusively deal with negative stimuli (Carretié, Albert, López-Martín, & Tapia, 2009), has been found to respond to both pleasant and unpleasant stimuli in both basic animal research (Murray, 2007) and human neuroimaging studies (Costafreda, Brammer, David, & Fu, 2008; Sergerie, Chochol, & Armony, 2008). Consequently, in contemporary reconceptualizations of its role, the amygdala is described as a detector and encoder of the biological "relevance" or "value" of both positive and negative multimodal stimuli (Pessoa & Adolphs, 2010; Sander, Grafman, & Zalla, 2003; Scharpf, Wendt, Lotze, & Hamm, 2010; Weierich, Wright, Negreira, Dickerson, & Feldman Barrett, 2010; Zald, 2003). In turn, the amygdala is embedded within a broader valence-general network (including, but not limited to, the insula, the anterior cingulate, the medial, dorsal, and orbital divisions of the prefrontal cortex, and the ventral striatum) that has been described as an "affective neural reference space" or "affective workspace" (Feldman Barrett & Bliss-Moreau, 2009; Lindquist et al., 2016). Nevertheless, within the amygdala itself, studies involving single-unit recordings have identified distinct subpopulations of neurons that respond preferentially to (and thus may encode for) either positive or negative stimuli (Maren, 2016; O'Neill, Gore, & Salzman, 2018), as well as the positive or negative current state of the organism (Belova, Paton, & Salzman, 2008). While there is no obvious anatomical separation between each neuronal type, the identity of the valence-specific neurons can be distinguished by examining their synaptic properties and projection sites (Fadok, Markovic, Tovote, & Lüthi, 2018; Pignatelli & Beyeler, 2019).

The present state of the evidence suggests that the long search for psychophysiological "signatures" of distinct emotions has not produced models that have been independently and reliably cross-validated, appears increasingly incompatible with emerging neuroscientific data, and at this point may benefit from a realignment. Arguably, a more conceptually and empirically justified research direction would be to revert to a search for a reliable and practical marker of central processes that can reflect differences in core affective valence, such as activity in the central nucleus of the amygdala (Berridge, 2019). A measure that fits this description is the acoustic eyeblink startle reflex. The eyeblink startle exhibits a pattern of affective modulation, such that the amplitude of the electromyographic response of the orbicularis oculi to short bursts of white noise has been found to be larger in the presence of negative stimuli than positive ones (Boecker & Pauli, 2019; Bradley, Cuthbert, & Lang, 1999; Grillon & Baas, 2003; Lang & Davis, 2006). Startle responses have also been shown to reflect affect-regulation efforts, such that instructions to participants to suppress negative emotion have been found to reduce startle amplitude, whereas instructions to enhance negative emotion have been found to increase it (Eippert et al., 2007; Jackson, Malmstadt, Larson, & Davidson, 2000).

The rationale for the use of the acoustic eyeblink startle as an index of valence encoding in the brain relies on the fact that its underlying neuroanatomy and neurophysiology have been extensively investigated. The acoustic startle reflex involves only a few synapses (Yeomans & Frankland, 1995): the nucleus cochlearis receives the auditory stimulus from the ear and projects it to the nucleus reticularis pontis caudalis, which in turn projects to the facial nerve (cranial nerve VII), the temporal branch of which, in addition to the frontalis and corrugator supercilii muscles, innervates the orbicularis oculi, the muscle that surrounds the eye and causes the eyelids to close. This simple pathway is intersected at the nucleus pontis caudalis by a direct projection from the central nucleus of the amygdala and the bed nucleus of the stria terminalis (Koch & Schnitzler, 1997; Lang & Davis, 2006).

The potentiation of startle associated with negative or unpleasant sensory stimulation is abolished by disruptions of the projection from the central nucleus of the amygdala (e.g., Hitchcock & Davis, 1991; Rosen, Hitchcock, Sananes, Miserendino, & Davis, 1991). In humans, brain lesions that include the amygdala abolish the potentiation of startle in the presence of negative affective stimuli (Angrilli et al., 1996; Buchanan, Tranel, & Adolphs, 2004; Funayama, Grillon, Davis, & Phelps, 2001). In addition, neuroimaging investigations in which startle was assessed concurrently with brain scanning and the analysis was based on a hypothesis-driven region-of-interest approach have shown an association between amygdala activation and startle potentiation in the presence of negative stimuli (Anders, Lotze, Erb, Grodd, & Birbaumer, 2004a; Kuhn et al., 2020; Pissiota et al., 2003; van Well, Visser, Scholte, & Kindt, 2012). On the other hand, the neural basis of the attenuation of startle due to positive or pleasant sensory stimulation is not as clearly understood. Nevertheless, it has been shown that lesions of the nucleus accumbens prevent the attenuation of startle in the presence of rewarding stimuli (Koch, Schmid, & Schnitzler, 1996), suggesting that the attenuating effect may be mediated by the interconnections of the amygdala and the bed nucleus of the stria terminalis with the nucleus accumbens (Salgado & Kaplitt, 2015; Zorrilla & Koob, 2013).

Although most studies on the affective modulation of startle have been conducted with pleasant and unpleasant pictorial stimuli, affective modulation has also been found when the manipulation of affective valence involved other sensory modalities. Experimental stimuli have included film snippets (e.g., Bos, Jentgens, Beckers, & Kindt, 2013; Jansen & Frijda, 1994; Kumari, Kaviani, Raven, Gray, & Checkley, 2001), mental imagery (e.g., Vrana & Lang, 1990), music (e.g., Roy, Mailhot, Gosselin, Paquette, & Peretz, 2009), sounds (Bradley & Lang, 2000), odors (e.g., Miltner, Matjak, Braun, Diekmann, & Brody, 1994), pain (e.g., Crombez, Baeyens, Vansteenwegen, & Eelen, 1997), and threat of pain (e.g., Naliboff et al., 2008; Twiss et al., 2009). These findings suggest that the affective modulation of startle is not limited to pictorial or visual stimuli.

The eyeblink startle response has traditionally been assessed with electromyography of the orbicularis oculi (Blumenthal et al., 2005), based mainly on the argument that electrical activity in the orbicularis oculi is the most sensitive indicator of startle since it precedes (by as much as 20–60 ms) the movement of the eyelid (given that muscular contraction has to overcome the inertia of the eyelid; Blumenthal et al., 2005). However, a promising alternative is infrared reflectance oculography (irROG). This approach uses a small infrared light-emitting diode and a phototransistor positioned in front of the eye. As the eye closes during a blink, the nature of the surface upon which the infrared beam is reflected changes (i.e., when the eyelid is open, the beam is reflected on the cornea, iris, sclera, and conjunctiva, whereas when the eyelid closes, the beam is reflected on the skin of the eyelid). As a result, the amount of light detected by the phototransistor changes due to differences in the reflectance of the materials and the fact that the eyelid is closer to the diode, resulting in changes in the voltage output of the phototransistor. The amplitude of the electromyographic response has been found to correlate significantly with the speed of eyelid closure measured with irROG (r = .58 to .81; Anders, Weiskopf, Lule, Birbaumer, 2004b), but the irROG signal is delayed compared to the electromyographic one (Lovelace, Elmore, & Filion, 2006).

This basic design has been described by several authors over the years (e.g., Anders et al., 2004a; Flaten, Vaksdal, & Hugdahl, 1989; Hoffman, Cohen, & English, 1985; Lovelace et al., 2006) but has remained underutilized, perhaps due to the absence of commercially available devices in conjunction with the wide availability of electromyography instruments in psychophysiological laboratories. However, irROG may offer additional practical advantages over electromyography, which could expand the application of eyeblink startle to more research contexts. First, unlike electromyography, irROG is a contactless method, thus eliminating the need for skin preparation to reduce impedance (i.e., skin abrasion, electroconductive gel), the placement of electrodes on the skin with adhesives, and therefore the potential for skin irritation. Second, irROG is not affected by ambient electromagnetic radiation, thus eliminating the need for shielding. Third, while the electromyographic signal requires multistage preprocessing before startle responses can be quantified (i.e., amplification, filtering, rectification, integration; see Blumenthal et al., 2005), the signal resulting from irROG is comparatively simpler to process. Finally, irROG may be less susceptible to artifacts associated with head and bodily movement, as well as activity in facial muscles (e.g., extraneous grimacing or squinting).

In the only known application of irROG to the study of affective modulation of startle, Anders, Eippert, Weiskopf, and Veit (2008) manipulated affective valence through either visual (n = 12; pictures from the International Affective Picture System; Lang, Bradley, & Cuthbert, 2008) or auditory (n = 22; non-linguistic human, animal, and environmental sounds) and found that (a) startle was stronger for negative than positive stimuli, with no difference for high-arousal versus low-arousal stimuli (with participants exposed to visual and auditory stimuli pooled together in the same analysis), and (b) startle was more strongly correlated with subjective valence ratings than arousal ratings. However, this study not only combined visual and auditory stimuli but was also conducted inside a magnetic resonance imaging scanner.

In the present study, we aimed to investigate the ability of contactless, irROG-based assessment of acoustic startle to differentiate between exposures to pleasant, neutral, and unpleasant pictorial stimuli, replicating the electromyography-based methodological approach of earlier studies by Vrana, Spence, and Lang (1988), and Bradley, Cuthbert, and Lang (1993). Specifically, we examined the amplitude of the eyeblink response to acoustic startle while viewing pictures with normatively pleasant, neutral, or unpleasant content using images from the International Affective Picture System (IAPS; Lang et al., 2008). Our hypothesis was that the amplitude of the startle eyeblink would be larger while viewing unpleasant than pleasant images, with responses while viewing affectively neutral images positioned between them. In addition, we examined the effect of different lengths of exposure to the affect-laden images within a time frame (1 to 5 s) that has been shown to manifest affective modulation (Bradley et al., 1993), and thus expected that affective modulation would be present throughout this window.

Methods

Participants

Electromyography-based studies examining the affective modulation of the startle eyeblink amplitude have reported medium to large effect sizes (Bradley, Codispoti, Cuthbert, & Lang, 2001; Bradley, Cuthbert, & Lang, 1996a; Cook, Hawk, Davis, & Stevenson, 1991; Vrana et al., 1988). Thus, power calculations for a three-condition (pleasant, neutral, unpleasant) within-subjects analysis of variance (ANOVA) were based on a medium effect (f = .25), α = 0.05, 1-β = 0.80, correlated repeated measurements (r = 0.50), and a violation of the assumption of sphericity (ε = 0.70), yielding a required sample size of 35 participants.

Participants were eligible if they (a) were between the ages of 18 and 45 years, (b) had normal vision or wore corrective contact lenses (but not eyeglasses), (c) had no history of mental health problems (e.g., anxiety, depression, bipolar disorder, panic episodes, posttraumatic stress disorder, history of traumatic life events), (d) had no history of hearing problems (e.g., hyperacusis, hearing loss, tinnitus), (e) had no history of neurological problems (e.g., concussion, epilepsy), (f) had no history of migraine headaches, and (g) reported no objection to viewing gruesome or erotic images. All study procedures and stimuli were preapproved by the Institutional Review Board. Participants were informed in advance of all experimental methods and possible risks before providing written informed consent.

The initial sample included 43 volunteers who responded to an electronic message sent to the members of a large university community, passed the initial eligibility screening, and underwent testing in the laboratory. However, the data obtained from seven participants did not include the minimum number of viable eyeblinks (i.e., at least 33% within any valence category; see Data Processing) and were, therefore, excluded from further analyses. This resulted in a final sample size of 36 (14 women, 22 men, Mage ± SD = 24 ± 8 years).

Apparatus

Eyeblink data were collected with a commercially available startle response system equipped with a photoelectric cell (model SR-HLAB, San Diego Instruments, Inc., San Diego, CA). Positioned at the participant's eye level, the detector measures the amount of infrared light reflected from the eye.

Stimuli

The IAPS (Lang et al., 2008) contains over 600 standardized images of emotionally evocative content across a wide range of semantic categories. The images have been rated by a normative sample by level of affective valence (pleasant-unpleasant), arousal (low-high), and dominance (low-high), using the 9-point rating scales of the Self-Assessment Manikin (SAM; Bradley & Lang, 1994). Following methods used in Vrana et al. (1988) and Bradley et al. (1993), the following 36 IAPS images were used in the present study: (1) 12 unpleasant images (9410, 3000, 3010, 3053, 3060, 3080, 3130, 3150, 6260, 6313, 6350, 6570), (2) 12 neutral images (7004, 7010, 7020, 7060, 7020, 7060, 7080, 7090, 7110, 7175, 7491, 7950), and (3) 12 pleasant images (8030, 8180, 8370, 8420, 8470, 1710, 4608, 4611, 4659, 4669, 4672, 4810). To maximize the amplitude differences between valence conditions, we selected images with (a) the highest normative valence ratings (most pleasant) for the pleasant content (7–9 on SAM-Valence), (b) the lowest normative valence ratings (most unpleasant) for the unpleasant content (1–3 on SAM-Valence), and (c) high normative arousal ratings (7–9 on SAM-Arousal) for both pleasant and unpleasant images. Accordingly, the selected pleasant images averaged 7.52 ± 0.33 for valence and 6.79 ± 0.29 for arousal, whereas the selected unpleasant images averaged 1.63 ± 0.20 for valence and 7.71 ± 0.14 for arousal. Neutral images were selected at random from images rated in the middle of the valence scale (i.e., 4–6 on SAM-Valence) and averaged 4.89 ± 0.20 for valence and 2.17 ± 0.23 for arousal.

Procedure

A photoelectric cell probe was mounted on a headset with a bendable arm and was pointed at the sclera of the left eye. The distance between the front edge of the probe and the eye was just enough to avoid the eyelashes touching the probe during a blink. The gain of the device was adjusted for each participant individually, to prevent clipping (range = 0.5–3.5), and was kept constant throughout each session. Headphones were worn over the headset (see Fig. 1).

Fig. 1
figure 1

Illustration of the placement of the infrared reflectance oculography probe

Participants sat 1 m away from a 40-inch monitor displaying a large cross centered on the screen. Participants were instructed that they would be viewing a series of images varying in content and that, although some images might be difficult to look at, it was important to continue to attend to the images during the entire viewing period. Participants were told they would hear occasional noises over the headphones but to ignore the noises and attend to the images.

The experimental session began with a 5-min acclimation period of 65-dB(A) broadband noise played through the headphones. Following the 5-min acclimation period, the slide presentation and startle protocol began. The IAPS images were displayed for 6 s each, with an interstimulus interval (cross display) varying between 16 and 24 s. The images were grouped in three blocks of 12 images each. Each block consisted of four pleasant images (e.g., erotica, pets, sports), four neutral images (e.g., household objects, geometric shapes), and four unpleasant images (e.g., mutilations, burn victims, threatening weapons). The order of the images was randomized for each block.

Eyeblinks were elicited by a white-noise startle stimulus (50 ms, with instantaneous rise time) presented binaurally through wired headphones. The sound pressure level was calibrated to 105 dB(A) at the headphones with a digital impulse sound level meter (model CEL-254, Casella, Buffalo, NY, USA), which itself had been calibrated prior to each session with an acoustic calibrator (model CEL-284/2, Casella, Buffalo, NY, USA).

Following the appearance of each image, an acoustic startle probe was either not presented (image with no startle) or was presented after a 1-, 3-, or 5-s time delay (image viewing time). Each of these four possibilities (the three time-delay periods and an image unaccompanied by a startle stimulus) occurred once within each block. Additionally, for any single block, two of the three valence categories contained all three possible time delays (1, 3, or 5 s) and an image presented without a startle stimulus. Accordingly, the occurrence of all three time delays and the no-startle image throughout the pleasant, neutral, and unpleasant conditions was balanced across the three blocks. To reduce the predictability of the startle probes, each block included two startle stimuli that occurred randomly between images (interstimulus startles, not included in the analysis). Background 65-dB(A) broadband noise (same as during the acclimation period) was present during the entire experimental session.

Data processing

The signal collected from the photoelectric cell sensor was sampled at 1000 Hz and was segmented into 500-ms epochs following each startle probe. The data were then imported into MATLAB (version 2019a, MathWorks, Natick, MA, USA) for analysis.

Given the delayed response of the eyelid compared to the electromyographic response of the orbicularis oculi, we extended the window for a valid response from the 21-120 ms recommended for electromyography-based studies (Blumenthal et al., 2005) to 21–200 ms. The amplitude of each eyeblink startle response was the highest peak of the signal detected within this window. According to Blumenthal et al. (2005), "the optimal method of onset latency determination is still an open question" (p. 10), with a variety of approaches having been implemented in the literature. Consistent with most prior approaches, we followed a piecewise linear regression approach but, to reduce the susceptibility of the solution to the idiosyncrasies of each dataset, we defined the onset as the mean of several criteria. Specifically, for all possible pairs of regressions from stimulus presentation to peak amplitude, we found the time points that (a) maximized the difference between the sum of squared residuals of the first and second regressions (because the baseline signal is characteristically noisy whereas the response signal is a smooth parabola), (b) maximized the difference between the slopes of the first and second regressions, (c) yielded the highest ratio of maximum slope difference divided by the total sum of squared residuals, (d) minimized the total sum of squared residuals, and (e) maximized the slope of the second regression (see Fig. 2). We also implemented several quality-control criteria as suggested by guidelines (Berg & Balaban, 1999; Blumenthal et al., 2005), rejecting responses that exhibited increasing or decreasing signal prior to response onset (as opposed to a stable baseline), absence of a peak or presence of multiple peaks within the 21–200-ms window, and absence of a return towards baseline following peak amplitude. All automatic solutions (peak amplitude, onset latency) were visually inspected. For each participant, eyeblink peak amplitudes were transformed into z-scores prior to analysis.

Fig. 2
figure 2

Example of the time-locked signal produced by the phototransistor and the results from the algorithm used to detect the peak amplitude and latency of the eyeblink startle response to the acoustic startle probe

We also conducted a preliminary screening to determine the percentage of viable eyeblinks. As noted earlier (see section on Participants), seven participants who had fewer than 33% viable eyeblinks for one or more valence categories were excluded from further analysis.

Statistical analyses

To determine if affective valence modulated startle eyeblink amplitude, a repeated-measures analysis of variance (ANOVA) was used, with image valence (pleasant, neutral, unpleasant) as the within-subjects factor. To quantify the habituation of startle, the change in eyeblink amplitude over the course of the testing session was calculated for each participant.

The data were checked for sphericity, but no violation was found, and therefore no adjustment was applied to the degrees of freedom. Results are reported as means (M) ± standard deviations (SD). Effect sizes are reported as partial eta-squared (η2p) for the condition main effect and Cohen's d for pairwise comparisons.

Results

Of the 36 participants who exceeded the minimum number of viable eyeblinks within each valence category (≥ 33%), the overall percentage of eyeblinks that satisfied the quality criteria and could be scored was 69.4%. Eyeblink viability was 63.6% while viewing pleasant images, 72.2% while viewing neutral images, and 68.8% while viewing unpleasant images.

The repeated-measures ANOVA showed that mean eyeblink amplitude differed significantly between the image valence categories, F(2, 70) = 20.75, p < .001, η2p = .37 (see Fig. 3). Post hoc tests with the Bonferroni correction showed that eyeblink amplitudes were significantly smaller when elicited during pleasant images (M = −0.42, SD = 0.45) compared to those elicited during unpleasant images (M = 0.20, SD = 0.34, p < .001, d = 1.56) and neutral images (M = 0.12, SD = 0.32, p < .001, d = 1.36). Eyeblink amplitudes elicited during neutral images were not significantly different from those elicited during unpleasant images (p = .37, d = 0.26).

Fig. 3
figure 3

Overall differences in the amplitude of the acoustic startle eyeblink response while viewing normatively pleasant, neutral, and unpleasant images

A smaller sample of 25 participants had data that satisfied the 33% viability threshold for all nine valence and time-delay cells. For these participants, a 3 (valence: pleasant, neutral, unpleasant) by 3 (time delay: 1 s, 3 s, 5 s) repeated-measures ANOVA showed a significant valence by time-delay interaction, F(4, 96) = 2.92, p = .025, η2p = .11, in addition to a significant main effect of valence, F(2, 48) = 31.45, p < .001, η2p = .57. Post hoc tests with the Bonferroni correction (see Fig. 4) showed that, after a 1-s delay, startle amplitudes while viewing pleasant images were significantly smaller than those while viewing neutral (p < .001, d = 1.97) and unpleasant images (p < .001, d = 1.32). After a 3-s delay, startle amplitudes were significantly smaller while viewing pleasant images than while viewing unpleasant images (p < .001, d = 1.19). In addition, startle amplitudes while viewing neutral images were significantly smaller than while viewing unpleasant images (p < .001, d = 0.70). After a 5-s delay, startle amplitudes were significantly smaller while viewing pleasant images than while viewing neutral (p = .009, d = 0.82) and unpleasant images (p = .001, d = 0.95).

Fig. 4
figure 4

Differences in the amplitude of the acoustic startle eyeblink response while viewing normatively pleasant, neutral, and unpleasant images presented for 1, 3, and 5 s

Discussion

The search for peripheral and central physiological markers of distinct emotions has yielded results characterized as promising but, despite more than a century of research, no model has emerged that can reliably distinguish between discrete emotions across individuals, situations, and sensory modalities. While there may be additional avenues that can be explored (e.g., focusing on teaching automated classifiers to distinguish individual- and situation-specific response patterns), it is clear that, for now, this line of research faces a multitude of conceptual and methodological challenges that will likely prevent it from delivering models that can be implemented in practical applications in the immediate future. In this context, it is somewhat surprising that an extensively researched index, namely the eyeblink startle response, which can reliably distinguish pleasant and unpleasant affective valence, remains underutilized in applied fields (e.g., affective computing, human–computer interface design). In the present study, we investigated a contactless method to assess the eyeblink response using infrared reflectance oculography (i.e., irROG). The irROG approach has been used in only one prior study investigating the affective modulation of startle (Anders et al., 2008). Although the results of that study were consistent with affective modulation, the generalizability of the findings was hampered by methodological limitations (i.e., due to low statistical power, participants exposed to visual and auditory stimuli were pooled together in the same analysis, and the experiment took place in the loud and confined space of a magnetic resonance imaging scanner).

In the present study, consistent with our hypothesis, the amplitude of startle eyeblinks elicited during the viewing of unpleasant images was larger than those elicited during the viewing of pleasant images. The difference was robust overall (d = 1.56) and remained so regardless of the length of exposure to the pleasant and unpleasant images within a window from 1 to 5 s (with effect sizes from d = 0.95 to d = 1.32). Although startle responses while viewing unpleasant, neutral, and pleasant images were ordered as expected (i.e., pleasant, neutral, unpleasant), the difference from unpleasant to neutral was smaller (d = 0.26) than the distance from pleasant to neutral (d = 1.36).

In investigations based on electromyographic assessment of startle, responses while viewing neutral images generally fall between responses while viewing pleasant and unpleasant images (e.g., Bradley, Cuthbert, & Lang, 1991; Vrana et al., 1988). However, the differences are rarely significant from both (e.g., Balaban & Taussig, 1994). In some studies, the pattern has been similar to the one found in the present study, with startle responses while viewing neutral images being different from those while viewing pleasant but not from those while viewing unpleasant images (e.g., Bradley et al., 1993, 2001). In other studies, however, the pattern was reversed: startle responses while viewing neutral images were found to be different from those while viewing unpleasant images but not from those while viewing pleasant images (e.g., Bradley et al., 1996b; Corr, Kumari, Wilson, Checkley, & Gray, 1997). In yet other studies, the neutral category was not significantly different from either the pleasant or the unpleasant category (Bradley et al., 1996a). There are also studies that have not reported pairwise comparisons (e.g., Bernat, Patrick, Benning, & Tellegen, 2006; Cuthbert, Bradley, & Lang, 1996). These inconsistencies may be due to various reasons, including the possibility that, while normatively pleasant and unpleasant images tend to be unambiguous in their valence content, the objects depicted in normatively "neutral" images are not necessarily devoid of valence content. They may still be imbued with pleasant or unpleasant significance for certain individuals. This possibility is supported by the observation that eyeblink responses elicited 1, 3, or 5 s after the presentation of different sets of normatively neutral images were inconsistent (see Fig. 4). As a peer reviewer suggested, affectively neutral images tend to be uninteresting or "boring," and boring is unpleasant (recall that eyeblink amplitudes during neutral images did not differ significantly from those elicited during unpleasant images in the present study).

In addition, it is possible that the inconsistency may be attributed to the lack of standardization of the arousal content of the images, both within and across studies. In the present study, the average arousal rating for pleasant images was 6.79, for unpleasant images 7.71, and for neutral images 2.17. This inconsistency stems from an inherent limitation of the International Affective Picture System (IAPS). As Bradley and Lang (2007) write: "as pictures are rated as more pleasant or more unpleasant, arousal ratings increase as well, and pictures that are rated as neutral tend to be rated low in arousal" (p. 32). In other words, researchers wishing to select unequivocally pleasant and unequivocally unpleasant images, in order to increase the statistical power of their experimental manipulation, must inevitably select images that are also highly rated in arousal content. On the other hand, images rated as neutrally valenced are also rated as low in arousal (there are no images that are rated as neutrally valenced and, at the same time, high in arousal).

The present study can be considered a proof-of-concept investigation demonstrating that distinguishing between pleasant and unpleasant affective valence through a peripheral physiological index that can be assessed in a contactless manner is possible. The important implications of this point come into sharper relief in the context of recent meta-analytic investigations showing that differentiating among distinct emotional states (e.g., anger, fear, shame, love) or subtle varieties of affect (e.g., boredom, confusion, frustration, enjoyment, interest) may not be possible at levels that are consistently above chance via the study of facial movements (Feldman Barrett, Adolphs, Marsella, Martinez, & Pollak, 2019) or autonomic indices (Siegel et al., 2018).

As noted in the introduction, the great challenge for affective computing and the design of affect-aware user interfaces is not building a model that achieves satisfactory rates of accurate classification in one sample but rather building a model that is valid across individuals, populations, and cultural contexts (D'Mello, Kappas, & Gratch, 2018). However, given the great variability in individual life-course experiences (Hoemann, Xu, & Feldman Barrett, 2019) and enculturation histories (Mesquita, Boiger, & De Leersnyder, 2017), the search for human universals in physiological or facial manifestations beyond the essential dipole of pleasure-displeasure will likely run into problems of individual response specificity (Engel, 1960; Lacey, 1950; Lacey et al., 1953; Lacey & Lacey, 1958). The ability to distinguish between discrete emotions of the same valence (e.g., fear, sadness, regret) can reasonably be expected to improve models of judgment and decision-making in such fields as consumer behavior (Lerner, Li, Valdesolo, & Kassam, 2015; Raghunathan & Pham, 1999). However, improvements over valence-only models may be relatively modest and limited to only some emotions (Kranzbühler, Zerres, Kleijnen, & Verlegh, 2020). Therefore, while obtaining information about valence rather than discrete emotions may be suboptimal for some applications, the ability to obtain signals that offer robust discrimination between pleasantly and unpleasantly valenced states may make this a reasonable compromise.

From a technical standpoint, the approach used in the present study to assess the startle eyeblink response via infrared oculography is clearly restricted to laboratory settings. Although the method is contactless, it is also relatively intrusive, requiring the placement of instruments on the head and the induction of startle responses via loud noises. However, conceivably, these practical challenges can be overcome. Assessment of the eyeblink response can be achieved with high-speed video and appropriate contour detection and monitoring algorithms (Bernard, Deuter, Gemmar, & Schachinger, 2013; Derakhshani & Lovelace, 2011; Essex et al., 2003), and the induction of startle can be achieved with less intrusive methods, such as low-intensity acoustic stimuli (Blumenthal & Goode, 1991), air-puffs (Haerich, 1998; Lissek et al., 2005), or perhaps the vibration of a mobile device or a steering wheel. With appropriate modifications to improve comfort and practicality, we can foresee that contactless methods of assessing the affective modulation of startle will find a wide array of practical applications, including, but not limited to, the evaluation of television programming (Bradley, 2007), the design of computer games and user–computer interfaces (Nesbitt, Blackmore, Hookham, Kay-Lambkin, & Walla, 2015), the study of marketing and consumer behavior (Bradley, Angelini, & Lee, 2007; Koller & Walla, 2015; Walla, Brenner, & Koller, 2011), and the analysis of the processes underlying economic decisions (Phelps, 2009).

The methodology of the present study adhered to the guidelines set forth for the study of startle eyeblink responses in general (Blumenthal et al., 2005) and followed the standard approach used in electromyography-based investigations of the affective modulation of startle (Bradley et al., 1993; Vrana et al., 1988). This should increase confidence in the central finding, namely that the acoustic startle eyeblink response is subject to affective modulation. At the same time, in evaluating the present results, readers should note that, while the reliability of affective modulation has been demonstrated in electromyography-based investigations, the present study was limited to pictorial affective stimuli and acoustic startle probes. As noted in the introduction, electromyography-based investigations have used a variety of affective stimuli (e.g., visual, auditory, somatosensory) and startle probes (e.g., visual, auditory, tactile), as well as a variety of healthy and clinical samples of participants. While it may seem reasonable to assume that affective modulation would still be detectable via infrared oculography in other combinations of affective stimuli, startle probes, and participants, empirical evidence is presently lacking.

In summary, the present study demonstrated that the phenomenon of affective modulation of the acoustic startle eyeblink response can be detected with a contactless method, namely infrared reflectance oculography, with a large effect size between responses while viewing normatively pleasant and unpleasant images. This finding suggests that, while researchers continue to pursue the "holy grail" of distinguishing between discrete emotions via autonomic or facial indices, applied fields such as affective computing and human–computer interface design can benefit from a readily available and relatively simple method that distinguishes between pleasant and unpleasant core affective valence.

Open Practices Statement

The data and materials for the experiment reported in this manuscript are available upon request from the corresponding author. The experiment was not preregistered.