During their everyday activities, people make rapid eye movements known as saccades. Saccades may occur several times a second, serving to align the high-acuity area of central vision with different points in the visual field (Rayner, 1998, 2009). Saccades are ballistic in nature in that once the eye movement has launched, the endpoint of the saccade is not influenced by new information from the visual field (Chen-Harris, Joiner, Ethier, Zee & Shadmehr, 2008). Rather, visual information that is used to determine saccadic targets is extracted during the periods in between saccades, known as fixations, when the eye is relatively still and a stable image is cast on the retina. Hence, prior to each saccadic eye movement, the oculomotor system must determine the target location for the next fixation (i.e., the “where” decision), as well as the time at which to initiate the saccade to that location (i.e., the “when” decision).

Within the domain of scene perception, a considerable body of research has focused on factors that influence the decision regarding fixation location (for reviews, see Henderson & Ferreira, 2004; Henderson & Hollingworth, 1999). More recently, there has been a growing interest in the oculomotor decision of when to move the eyes and the corresponding distribution of fixation durations. For example, fixation durations have been shown to differ depending on the requirements of the scene-viewing task (Henderson, Weeks & Hollingworth, 1999; Mills, Hollingworth, Van der Stigchel, Hoffman & Dodd, 2011; Võ & Henderson, 2009; but for a null finding, see Castelhano, Mack & Henderson, 2009; see also Nuthmann, Smith, Engbert & Henderson, 2010, for a model that captures task effects). It has also been demonstrated that when viewing an array of scenes, fixation durations depend on the task relevance of individual scene stimuli (Glaholt & Reingold, 2012). Taken together, these findings indicate that fixation durations are strongly influenced by top-down factors.

However, and of central interest to the present study, other findings point to a possible bottom-up effect on fixation durations based on low-level scene characteristics. For example, it has been shown that restricting or degrading the availability of stimulus information has an effect on fixation durations (Groner, Groner & von Mühlenen, 2008; Loftus, 1985; Loschky, McConkie, Yang & Miller, 2005; Mannan, Ruddock & Wooding, 1995; Parkhurst, Culurciello & Niebur, 2000). More specifically, when spatial frequency information is filtered from scenes (e.g., using a band-pass filter), fixation durations tend to increase. These findings are consistent with a bottom-up interpretation where the oculomotor system adjusts fixation durations depending on low-level spatial frequency characteristics of the scene. However, it might also be the case that the effect of spatial frequency filtering on fixation durations is mediated by higher-order scene processing. For example, the filtering of spatial frequency information might impair the recognition of objects or the processing of scene layout and gist, which might, in turn, cause a lengthening of fixation durations. Accordingly, a central interest of the present study was the locus of the effect of spatial frequency filtering on fixation durations during scene viewing.

In addition to whether changes in fixation durations result from low-level cues or from higher scene processing, it is important to consider three possible manifestations of changes to fixation durations: (1) changes that occur on average (e.g., over the whole viewing epoch), (2) changes that occur over segments of the viewing period (e.g., early vs. late fixations), or (3) changes that occur immediately for individual fixations. These patterns may be used to differentiate between models of oculomotor control over fixation duration. The latter case (3) is predicted by a hypothesized mechanism known as direct cognitive control (e.g. , Henderson & Pierce, 2008; Henderson & Smith, 2009; Nuthmann et al., 2010; for reviews, see Rayner, 1998, 2009; Reingold, Reichle, Glaholt & Sheridan, 2012), according to which the processing of information extracted during a fixation influences the timing of the saccade terminating that fixation. The direct-control hypothesis posits a very tight time window for the influence of cognitive processing on fixation duration. Fixation durations are typically brief (e.g., 200 – 300 ms), and due to the latency for oculomotor programming, the decision about when to initiate the eye movement must be made prior to the end of the fixation (see Reingold et al., 2012, for a detailed discussion of this issue). This places limits on the extent to which information processing can influence fixation durations via direct control. While certain scene information might be available rapidly (e.g., spatial frequency information), other information that will emerge from higher-level processing (e.g., the task relevance of the fixated material) might take longer to extract and, hence, be unavailable in time to influence the majority of fixations (for an example of this, see Glaholt & Reingold, 2012).

With regard to the effect of spatial frequency manipulations on fixation durations, it has been shown that filtering of spatial frequency information can lengthen fixation durations on average (Mannan et al., 1995), but it was not clear whether individual fixation durations were affected. The work by Loschky et al. (2005) suggests that it is indeed the case that individual fixations are lengthened when scenes are spatial frequency filtered. Loschky et al. introduced a paradigm where, for occasional fixations during scene viewing, the area in the visual periphery was low-pass spatial frequency filtered. They found that if the filtering of spatial frequency information in the periphery removed information that would normally be available at that retinal eccentricity (given that spatial frequency sensitivity, or contrast sensitivity, drops off with retinal eccentricity), subjects were likely to become aware of the display change, and fixation durations markedly increased. Hence, under certain circumstances, the filtering of spatial frequency information appears to have an effect on the duration of individual fixations, consistent with the operation of a direct-control mechanism.

The present study builds upon these findings and seeks to answer two basic questions about the effect of spatial frequency filtering on fixation durations during scene viewing. First, we asked whether the effect of spatial frequency filtering on fixation durations in scene viewing occurs via a direct-control mechanism. Second, we asked whether this effect is driven primarily by low-level stimulus characteristics or by an interaction with higher-order scene processing. To answer these questions, we employed a saccade-contingent change manipulation where, during randomly chosen saccades, the scene was changed to a version that had been spatial frequency filtered. This paradigm is similar to a paradigm used in research on eye movements during reading, where the text is degraded for certain fixations (Yang & McConkie, 2001, 2004). Within the current version of the paradigm, the presentation of the filtered scene was carried out during a randomly chosen saccade, and hence, it could not be predicted in advance of the critical fixation. Under these conditions, the finding of changes in the duration of the critical fixation as a function of spatial frequency filtering implies that the effect is mediated by a mechanism sensitive to the information extracted during that fixation (i.e., a direct-control mechanism). To anticipate one of the main findings, in Experiment 1a, we observed that fixation durations were affected on a fixation-by-fixation basis depending on the spatial frequency content of the fixated material.

In later experiments, we employed manipulations designed to tease apart potential loci for this effect within the information-processing stream. For example, it could be that the effect is based on a simple lower-level cue related to the quality of the fixated scene information, or alternatively, it may be caused by difficulty in higher scene processing. To address this, we compared cases where the filtered scene was presented in its correct orientation with cases where the scene was presented upside-down (Experiment 1b) or reversed horizontally (Experiment 1c). The upside-down and horizontally reversed scenes are identical to their correctly oriented counterparts in terms of low-level visual information (e.g., spatial frequency content) but are incongruent with the ongoing scene representation, as well as prior knowledge about scenes. If the influence of scene filtering on fixation durations is based on low-level visual information only, the effect of scene filtering should not depend on whether, during a critical fixation, the scene is presented in the correct orientation or not. However, if the effect of scene filtering is only present for scenes that are presented in the correct orientation, it would indicate that the effect of spatial frequency filtering is mediated by a higher-level scene representation. In Experiment 2, we employed control conditions to explore the boundaries of the effects that emerged in Experiment 1. We examined the effect of changing the scene orientation without any spatial frequency filtering in Experiment 2a, and in Experiment 2b, we asked whether the lengthening of fixation durations observed in Experiment 1 was specific to the case of degraded scene information (i.e., scene filtering) or whether it would also occur when scene information was added.

Finally, in addition to examining mean fixation duration, for each experiment, we conducted an analysis of the distributions of fixation durations for each condition. These distributions were modeled with the ex-Gaussian distribution, which reflects the convolution of a normal distribution and an exponential distribution. Prior research has used this approach to distinguish “early” effects that influence the central tendency of the ex-Gaussian distribution from “late” effects that influence the tail of the distribution (e.g., Luke, Nuthmann & Henderson, 2013; Staub, White, Drieghe, Hollway & Rayner, 2010; White & Staub, 2012). Modeling of the fixation duration distributions in this way was expected to provide converging evidence regarding the time course of the impact of the spatial frequency manipulation. In particular, we asked whether the spatial frequency distribution would be manifested in an effect on the mode or the tail of the distribution of fixation durations.

Experiment 1

In Experiment 1, we investigated the effect of spatial frequency filtering on individual fixation durations during scene viewing. We employed a saccade-contingent change paradigm where subjects viewed scenes under instructions to commit them to memory, and during certain pseudorandomly selected saccades, the scene was changed to a spatial frequency filtered version. Our analyses were focused on the critical fixations that followed these changes. In Experiment 1a, we contrasted critical fixations for conditions in which the scene was high-pass or low-pass filtered with those in a no-change condition. The high-pass and low-pass versions of the stimuli were expected, a priori, to have a different impact on scene processing. The high-pass filtering preserves the detail information, such as object shapes and outlines, but filters out the low-frequency contrast information (see Fig. 1), and the resulting images resemble line drawings. The low-pass filtering has the effect of blurring the image, which removes object detail but preserves the coarse scene layout information (e.g., the horizon). The low-pass filter was expected to have a greater effect on scene processing because it removes much of the detail. In Experiment 1b, the filtered scenes were rotated upside-down, and in Experiment 1c, they were reflected in the vertical axis (a horizontal flip). We reasoned that if the effects of spatial frequency filtering depended on the orientation of the scene, it would indicate that the effect is not strictly a low-level effect of stimulus quality but, instead, is occurring at a later point where the stimulus information is integrated with the scene representation in memory.

Fig. 1
figure 1

The saccade-contingent change paradigm employed in Experiment 1. On pseudorandomly selected saccades, the scene was changed to a high-pass or low-pass filtered version. The change was accomplished while the eye was in motion. The red arrow represents a saccade

Method

Subjects

For each of Experiments 1a, 1b, and 1c, a separate group of 16 undergraduate students at the University of California, San Diego participated for credit as part of an introductory psychology course. All subjects had normal or corrected-to-normal vision and provided informed consent to participate.

Apparatus

Eye movements were measured with an SR Research EyeLink 1000 system with high spatial resolution and a sampling rate of 1000 Hz. Viewing was binocular, but only the right eye was monitored. Following calibration, gaze-position error was less than 0.5º. The stimuli were presented on a 19-in. monitor with a refresh rate of 150 Hz and a screen resolution of 1,024 × 768 pixels (38° × 28.5°). Subjects were seated 65 cm from the display, and a chinrest with a head support was used to minimize head movement. The experiments were implemented in SR Research Experiment Builder, which allowed precise timing control of saccade-contingent display changes. Saccade onsets were detected online by the EyeLink 1000, and display changes were executed when saccade velocity was found to exceed 100° of visual angle per second. This value was chosen to ensure that there would be sufficient time to change the display while the eye was still in motion. On the basis of this method, the average interval to initiate a display update following detection of a saccade that exceeded the velocity threshold was 5.3 ms. The display update (one screen refresh) took 6.7 ms, and the average interval from the end of the display change to the onset of the following fixation event was 14.5 ms.

Materials and design

Stimuli were images from the Corel Image Database. We selected 204 images that depicted scenes of outdoor landscapes, rural and urban environments. All images were 1,024 × 683 pixels (38° × 25.3°) and were transformed from RGB color to grayscale and then centered on the screen horizontally and vertically over a black background.

In Experiment 1a, we examined fixations on scenes that had been high-pass or low-pass filtered, and hence, in addition to the original grayscale scene, two alternate versions of each image were created by applying a Butterworth band-pass spatial frequency filter in MATLAB. The high-pass version of the image included spatial frequencies in the band 1.5 – 15 cycles/degree, and the low-pass version included frequencies in the band 0.02 – 0.3 cycles/degree. In Experiments 1b and 1c, the filtered images underwent additional transformations: We created vertically and horizontally flipped versions where the high-pass and low-pass filtered images were rotated 180° (Experiment 1b) and horizontally flipped versions where they were reflected about the vertical axis (Experiment 1c). While subjects viewed the scenes (see the Procedure section), the scene was replaced with one of the filtered versions during pseudorandomly chosen saccades (every third, fourth, or fifth saccade). Within each experiment, it was equally likely that the scene would be replaced with either the high-pass or the low-pass filtered version. During the saccade following each critical fixation on the filtered scene, the original scene was redrawn (see Fig. 1).

Procedure

Within each experiment (1a, 1b, and 1c), subjects carried out a scene memory task in six blocks, with each block consisting of two phases: an encoding phase in which subjects viewed a set of 30 scene images to be remembered, followed by a recognition phase where subjects made eight forced choice yes/no recognition decisions. Of the set of 204 images, 180 were “old” images presented once during an encoding phase. A subset of 24 of the “old” images was presented again during a recognition phase (4 images per block). The remaining 24 images were used as “new” images in the recognition phase only. The same set of images was used for Experiments 1a, 1b, and 1c, but the assignment of images to these two roles was randomized for each subject, as was the order of images within each experiment. For each experiment, eye movements were recorded during the encoding phase, and it was during this phase that the saccade-contingent display manipulations took place. At the beginning of each experiment, a 9-point calibration procedure was performed, followed by a 9-point calibration accuracy test. Calibration was repeated if any point was in error by more than 1° or if the average error for all points was greater than 0.5°. Subjects were told that while they viewed the images during the encoding phase, certain changes would be made to the display (e.g., “the image might become blurry”) but that, regardless of these changes, they should do their best to remember the images. During the encoding phase, each of the 30 images was viewed for 6 s, during which time the display change manipulation occurred (see the Materials and Design section). During the recognition phase, the subject was presented with a sequence of images, and for each image, they were required to decide whether the image had been presented during the prior encoding phase. All of the images in the recognition phase were the original unfiltered grayscale images. Subjects had to indicate whether they recognized the image (“yes” or “no”), using a gamepad. Between blocks, the subject was given the opportunity to take a short break, and recalibration of the eyetracker was carried out, if necessary. The entire procedure for a single experiment (i.e., Experiment 1a, 1b, or 1c) lasted approximately 1 h.

Results

Of central interest was to determine whether there was an effect of spatial frequency filtering on the duration of eye fixations during the encoding phase of each block. Accordingly, for Experiments 1a, 1b, and 1c, we focused our analyses on the critical fixations for which the scene was changed to a filtered version during the saccade prior to the fixation onset. We considered only critical fixations that began following a saccade and that terminated with a saccade (rather than a blink, for instance) and for which the prior fixation was a fixation on the original unfiltered grayscale scene. We excluded cases where the display change during the saccade prior was not completed before the onset of the critical fixation (90.8 % of critical fixations included). As a baseline for comparison, we identified fixations on the original unfiltered scene image (no-change fixations), with the constraint that these fixations must also begin following a saccade and be terminated by a saccade and that the preceding fixation must be a fixation on the original unfiltered scene image. This yielded a large number of observations for each condition per subject (average number of low-pass fixations per subject = 239.9; high-pass = 244.8; no change = 1,247.4), and hence, in addition to analyzing the mean fixation duration across conditions, we were also able to examine the distribution of fixation durations. To quantitatively compare these distributions, we modeled them with the ex-Gaussian function. The ex-Gaussian function has three parameters: mu (μ) and sigma (σ) from the Gaussian distribution function and a third parameter, tau (τ), from the exponential distribution function. In order to fit these parameters to our data, we employed cumulative maximum probability estimation (CMPE; see Reingold et al., 2012; Staub, 2011; Staub et al., 2010; White & Staub, 2012). For one subject in one condition (Experiment 1b, no-change fixation distribution), the CMPE algorithm failed to converge on a stable solution, and hence, that data point was replaced by the average across other subjects for that condition. Overall, the model fits were good (the average R 2 for the model across all conditions and subjects was .92, and the minimum was .54). The mean fixation durations and ex-Gaussian parameter fits for Experiments 1a, 1b, and 1c are presented in Table 1. To analyze mean fixation duration and each of the parameters of the ex-Gaussian fits for Experiments 1a, 1b, and 1c, we conducted one-way repeated measures ANOVAs on the filtering factor, which had three levels: low-pass, high-pass, and no change. Note that the low-pass and high-pass filtered images were rotated 180° (horizontal and vertical flip) or reflected in the vertical axis (horizontal flip), for Experiments 1b and 1c, respectively. To qualify significant effects of the filtering factor we conducted planned t-tests comparing the low-pass and high-pass filtering conditions with the no-change condition and also comparing the low-pass and high-pass conditions with one another.

Table 1 Mean fixation duration and ex-Gaussian parameter estimates for Experiment 1

We begin by discussing the findings from Experiment 1a, where our primary interest was whether or not individual fixation durations would be sensitive to spatial frequency filtering and, in particular, whether or not there were differences in the effects of high-pass and low-pass filtering. We then contrast this pattern of results with those from Experiments 1b and 1c, in which the filtered images had also been rotated upside-down (1b) or reflected in the vertical plane (1c). To reiterate the logic of these manipulations, we argue that any differences observed in the effect of spatial frequency filtering between Experiments 1a, 1b, and 1c cannot be attributed to low-level stimulus characteristics, since these are maintained across experiments but, rather, can be explained at the level of higher-order scene processing.

Examination of the means for Experiment 1a (see Table 1) revealed that fixation durations depended on filtering condition, F(2, 30) = 117.56, MSE = 4.50 × 102, p < .001, where the low-pass and high-pass conditions produced longer fixation durations than did the no-change condition. Because the changes to a spatial frequency filtered version of the scene were unpredictable on a given fixation, fixation durations were evidently determined on a fixation-by-fixation basis, consistent with a direct-control mechanism. In further support of this hypothesis, we found that the low-pass condition produced longer fixation durations than did the high-pass condition, t(15) = 5.47, p < .05. Hence, we can conclude that fixation durations were determined by ongoing processing that occurred within individual fixation and was dependent on the spatial frequency content of the fixated scene stimulus. These effects are visible in the distribution of fixation durations, as can be seen in Fig. 2a and in the corresponding ex-Gaussian parameter fits in Table 1. Both the high-pass and low-pass fixations exhibit a distribution that is shifted toward longer durations, as compared with no-change fixations, and this shift is reflected in an effect of filtering in the μ parameter, F(2, 30) = 28.34, MSE = 3.51 × 102, p < .001, as well as the τ parameter, F(2, 30) = 33.10, MSE = 9.68 × 102, p < .001, while the σ parameter was unaffected, F(2, 30) = 2.28, MSE = 2.59 × 102, p > .1. These findings indicate that the effects of frequency filtering affect both the central tendency of the distribution (μ) and the tail of the distribution (τ). Interestingly, as can be seen from the distribution plots (Fig. 2a), the difference in mean fixation duration between high-pass and low-pass distributions is not related to differences in the μ and σ parameters (see Table 1 for parameter summary) but, rather, is related to a significant difference in the τ parameter, t(15) = 4.58, p < .001, where the low-pass distribution has a larger proportion of fixations in the tail of the distribution, as compared with the high-pass distribution.

Fig. 2
figure 2

Distributions of fixation duration for Experiment 1a (a), Experiment 1b (b), and Experiment 1c (c). In Experiment 1a, the filtered scenes were presented in their normal orientation, while in Experiment 1b, they were flipped vertically and horizontally, and in Experiment 1c, they were flipped horizontally

We now turn to the results of Experiments 1b and 1c in which the familiarity of the scene information during critical fixations was manipulated in two ways. To reiterate, during the critical fixations, the filtered scenes (high pass and low pass) were flipped vertically and horizontally in Experiment 1b, and in Experiment 1c, they were reflected in the vertical axis (i.e., flipped horizontally). These manipulations were intended to test the hypothesis that the effect of scene filtering on fixation durations observed in Experiment 1a was due solely to low-level scene characteristics—namely, the spatial frequency content of the images. If so, then Experiments 1b and 1c should produce the same pattern of results as Experiment 1a. In contrast, if the lengthening of fixation durations observed in Experiment 1a was due to the interaction between low-level scene characteristics and higher level scene processing, the manipulations in Experiments 1b and 1c, which were expected to disrupt higher scene processing, should produce a different pattern of effects.

As can be seen in Table 1, in Experiment 1b, there was a significant effect of scene filtering on mean fixation duration, F(2, 30) = 21.31, MSE = 1.87 × 103, p < .001, as well as the ex-Gaussian μ parameter, F(2, 30) = 11.32, MSE = 3.96 × 102, p < .01, reflecting a shift in the central tendency of the distribution of fixation durations for the filtering conditions. This pattern replicates the main effect of filtering in Experiment 1a, indicating that fixations were lengthened on a fixation-by-fixation basis depending on whether or not the scene had been filtered. Interestingly, however, in contrast to the clear differences between high-pass and low-pass filtering conditions in Experiment 1a, for Experiment 1b there was no difference between the high-pass and low-pass filtering conditions in mean fixation duration (t < 1) or in any of the ex-Gaussian parameter estimates (all ts < 1.39, all ps > .183). This indicates that the difference between high-pass and low-pass filtering observed in Experiment 1a was not due exclusively to low-level spatial frequency information in the scene but, rather, was likely to be due to the way in which that information interacts with higher scene processing. A nearly identical pattern of findings was obtained in Experiment 1c as in Experiment 1b—namely, a main effect of filtering on mean fixation duration, F(2, 30) = 39.78, MSE = 8.34 × 102, p < .001, and the μ parameter, F(2, 30) = 27.26, MSE = 2.71 × 102, p < .001, but no difference between the high-pass and low-pass conditions in mean fixation duration (t < 1) or any of the ex-Gaussian parameter estimates (all ts < 1). Accordingly, for Experiments 1b and 1c, the high-pass and low-pass fixation duration distributions are largely overlapping (see Fig. 2b, c), and as can be seen in Table 1, the difference that was observed in Experiment 1a between the high-pass and low-pass conditions in the τ parameter was absent in Experiments 1b and 1c. We confirmed this statistically by analysis of the τ parameter in a 2 × 3 mixed ANOVA crossing filtering condition (high pass, low pass) and experiment (1a, 1b, 1c), which revealed a significant interaction, F(2, 45) = 5.08, MSE = 1.01 × 103, p < .05.

Discussion

The findings from Experiments 1a, 1b, and 1c provided clear answers to several important questions. First of all, consistent with prior findings (Groner et al., 2008; Loschky et al., 2005; Mannan et al., 1995), we found that across all three experiments, fixation durations were lengthened when scenes were band-pass filtered for spatial frequencies. More important, because the filtered scene information was presented during the saccade prior to the critical fixation and could not be predicted in advance, we can infer that the duration of the critical fixation was influenced by processing of the filtered scene information that occurred during the critical fixation. This is consistent with prior research suggesting that some proportion of fixation durations are under direct control during scene viewing (e.g., Glaholt & Reingold, 2012; Henderson & Pierce, 2008; Henderson & Smith, 2009). More specifically, we found a shift in the central tendency of the distribution of fixation durations on filtered scenes, as compared with the no-change condition, indicating that the direct control signal can operate very rapidly in response to sudden change in the available scene information and lengthen the majority of fixations.

Second, there were clear differences between filtering conditions in Experiment 1a, with the low-pass condition producing a larger effect on mean fixation duration than did the high-pass condition. This is consistent with an earlier report by Mannan et al. (1995), who observed this difference on the average fixation duration over a viewing epoch. The present paradigm demonstrates that this difference can occur immediately on a fixation-by-fixation basis. In addition, this finding supports the intuition that the low-pass filtering manipulation introduced more processing difficulty than did the high-pass filtering condition. Consistent with an interpretation of a difference in cognitive processing in the two conditions, the difference in the distributions of fixation durations for high-pass and low-pass fixations was limited to the tail of the distribution (an effect on the τ parameter), indicating that only relatively long duration fixations were sensitive to the high-pass/low-pass difference.

Interestingly, in Experiment 1b, the difference between the high-pass and low-pass filtering conditions was not present, suggesting that when the filtered scenes were vertically flipped, there was no longer any savings in processing fluency for the high-pass filtering condition, relative to the low-pass condition. The vertical flipping of filtered scenes in Experiment 1b preserved the low-level characteristics of the scenes (i.e., the spatial frequency content was unaffected) but was expected to interfere with higher order scene processing. In particular, the vertical flipping of scenes disrupts familiarity with the scene orientation (e.g., the sky is on the bottom and the ground is on the top) and is likely to make the scene more difficult to encode (Walther, Caddigan, Fei-Fei & Beck, 2009). Beyond disrupting preexisting knowledge of scene layout and content, the flipping of scenes vertically would also disrupt the processing of the scene with respect to the ongoing scene representation that has been developed in that particular viewing epoch. For example, if the viewer has extracted the gist of a given scene and constructed an approximate layout of the scene in working memory, the sudden vertical flipping of the scene would produce a conflict with that working memory representation. It is in this regard that the results of Experiment 1c are particularly informative. In Experiment 1c, the filtered scenes were flipped in the horizontal axis, which should not impact upon general scene familiarity and knowledge with the information in the scene (i.e., it has a normal orientation) but, rather, should conflict only with the ongoing scene representation that has been developed in working memory during the viewing epoch. Under these conditions, we observed that fixation durations on high-pass and low-pass scenes did not differ, and hence we can infer that the difference in cognitive processing observed between these filtering conditions in Experiment 1a is not necessarily related to high-level scene familiarity but, perhaps, closer to the point where scene information is combined with the ongoing working memory representation.

Experiment 2

The results of Experiment 1 suggested that when scene information is suddenly degraded via spatial frequency filtering for individual fixations, there is a rapid effect that shifts the distribution of fixations and also a later effect on the tail of the distribution that is sensitive to the extent to which the filtering introduces a difficulty in scene processing. Experiment 2 further examined the boundary conditions for these effects. In Experiment 2a, we tested control conditions for the orientation manipulations used in Experiments 1b and 1c. Specifically, we asked whether flipping the scene vertically or horizontally, without any spatial frequency filtering, would impact upon fixation durations. In Experiment 2b, we asked whether lengthening of fixation durations would be restricted to cases where the image was degraded (e.g., information removed as in the case of band-pass spatial frequency filtering) or whether it would also hold when information was added. Accordingly, in Experiment 2, we contrasted a Gaussian blur condition to a condition where color was added to the grayscale scene. Prior research on scene perception has suggested that color is processed rapidly and directly contributes to the extraction of scene “gist” (Castelhano & Henderson, 2008). In the context of the present paradigm, the addition of color during the critical fixation might be expected to facilitate scene processing because of an increase in familiarity with the scene information and a corresponding increase in the fluency with which it is processed (e.g., a blue sky may be easier to parse as a sky than a gray sky). The Gaussian blur manipulation was similar to the low-pass filter manipulation used in Experiment 1, except that it produced more severe blurring of the scene. Importantly, the color manipulation does not entail spatial frequency filtering of scene information but does constitute a change in the quantity and quality of information available and, hence, provided a control for Experiment 1 with regard to the relationship between scene processing and fixation durations.

Method

Subjects

In Experiment 2a, 16 undergraduate students at the University of Toronto, Mississauga participated for credit as part of an introductory psychology course. In Experiment 2b, 16 undergraduate students at the University of California, San Diego participated for course credit. All subjects had normal or corrected-to-normal vision and were naïve with respect to the purpose of the experiment. None of the subjects from Experiment 1 participated in Experiment 2.

Apparatus

The apparatus used was equivalent to that used for Experiment 1; Experiments 2a and 2b were conducted at different laboratories, but the hardware used had the same specifications.

Materials and design

We used the 204 scene images from Experiment 1, but for each experiment, we created two alternate versions to be drawn just prior to critical fixations. For Experiment 2a, the images were either reflected in the vertical axis (horizontal flip condition) or rotated 180° (horizontal and vertical flip condition). For Experiment 2b, one version was created through the application of a Gaussian blur effect (10-pixel radius) in Adobe Photoshop CS4 (blur condition). The blurring effect obscured scene details in a way that was similar, but more severe, than the low-pass filter applied in Experiment 1. The second alternate version of each scene was the original RGB colored scene image taken from the Corel database prior to being transformed to grayscale (color condition). As in Experiment 1, for Experiments 2a and 2b, during pseudorandomly chosen saccades (every third, fourth, or fifth saccade), the scene was replaced with one of the alternate versions. It was equally likely that the scene would be replaced with the image from one or the other experimental condition. During the saccade following the critical fixation on the filtered scene, the original grayscale scene was redrawn.

Procedure

The 1-h procedure for Experiment 2 was the same as that for Experiment 1.

Results

The analysis strategy for Experiment 2 was analogous to that for Experiment 1. In particular, we computed mean fixation duration for each of the display change conditions (Experiment 2a, horizontal flip vs. horizontal and vertical flip; Experiment 2b, color vs. blur), as well as the no-change condition, and we also fitted each distribution of fixation durations (see Fig. 3) with the ex-Gaussian function. The model fits for Experiment 2 were good (the average R 2 for the model across all conditions and subjects was .93, and the minimum was .62). Effects of the display change manipulation for each experiment was evaluated via a one-way ANOVA, and direct comparisons between conditions were carried out via paired t-tests, the results of which are summarized in Table 2 (Experiment 2a) and Table 3 (Experiment 2b). The results for Experiments 2a and 2b will be discussed in turn.

Fig. 3
figure 3

Distributions of fixation duration for Experiment 2a (a) and Experiment 2b (b)

Table 2 Mean fixation duration and ex-Gaussian parameter estimates for Experiment 2a
Table 3 Mean fixation duration and ex-Gaussian parameter estimates for Experiment 2b

Both the horizontal flip and horizontal and vertical flip conditions in Experiment 2a caused a lengthening of fixation durations. This was reflected in a significant ANOVA main effect of display change on mean fixation duration, F(2, 30) = 3.89, MSE = 726.66, p < .05, which was driven by a significant effect on the μ parameter, F(2, 30) = 10.14, MSE = 545.27, p < .001. There was no effect on the σ and τ parameters (both Fs < 1). Interestingly, although there was a significant shift in the μ parameter for the horizontal flip condition, it did not differ from the horizontal and vertical flip condition in the mean or any of the ex-Gaussian parameters (all ts < 1). This indicates that the horizontal flip condition was sufficient to cause a shift in central tendency of the distribution, and the addition of vertical flipping of the scene did not appear to impose any additional lengthening of fixation durations.

The findings from Experiment 2b are summarized in Table 3. The ANOVA revealed that display change had a significant effect on mean fixation duration, F(2, 30) = 70.67, MSE = 7.88 × 102, p < .001, which was associated with changes in the μ, F(2, 30) = 21.21, MSE = 2.55 × 102, p < .001, σ, F(2, 30) = 5.75, MSE = 1.36 × 103, p < .01, and τ, F(2, 30) = 40.66, MSE = 1.12 × 103, p < .001, parameters of the distributions (see Fig. 3 for distributions of fixation durations for Experiment 2). The blur condition produced longer fixation durations than did the no-change condition, t(15) = 10.18, p < .001, reflected in shifts in the μ, t(15) = 4.73, p < .001, σ, t(15) = 3.38, p < .01, and τ, t(15) = 8.05, p < .001, parameters. These strong effects on the distribution of fixation durations in both the central tendency and the tail reproduce the pattern observed for the low-pass critical fixations in Experiment 1a, indicating that the effect of removing low spatial frequency content on fixation durations in this paradigm is robust to differences in the method of filtering. Of particular interest for Experiment 2b was whether or not fixations in the color condition differed from those in the no-change condition. As can be seen in Table 3, the mean fixation duration was significantly longer in the color condition than in the no-change condition, t(15) = 6.47, p < .001, and this was related to a shift in the distribution of fixation durations in the μ parameter, t(15) = 7.91, p < .001, and the σ parameter, t(15) = 2.31, p < .05, but not the τ parameter, t < 1. This demonstrates that fixation durations may become lengthened even for cases where information is added on the critical fixation. In addition, the blur condition produced an even greater lengthening of mean fixation durations than did the color condition, t(15) = 6.83, p < .001, and this effect was related to an increase in the proportion of fixations falling in the tail of the distribution, shown in a significant effect on the τ parameter, t(15) = 6.26, p < .001), but note that the fixation distributions for the color and blur conditions did not differ significantly for the μ, t(15) = 1.54, p > .14, and σ, t(15) = 1.37, p > .18, parameters.

Discussion

Experiment 2 provided important points of comparison against which to interpret the findings of Experiment 1. First, Experiment 2a demonstrated that changes to the scene orientation were sufficient to cause a shift in the distribution of fixation durations. Perhaps most surprisingly, the horizontal flip condition and the horizontal and vertical flip condition produced distributions that were nearly identical, and hence the critical factor causing a shift in the central tendency of the distribution appears to be present in both cases. Once again, the vertical flipping manipulation was expected to produce a scene that was more difficult to encode, while the horizontal flipping was expected to produce a scene that is not difficult to process per se but, rather, is difficult to reconcile with prior views of the scene. The latter manipulation was sufficient to produce an effect on the μ ex-Gaussian parameter, and consequently, we suggest that lengthening of fixation durations that occurred in Experiment 2a was due to a mismatch between the scene information extracted during the fixation and the ongoing representation of that scene in memory. Furthermore, it was particularly clear in Experiment 2a that neither of the orientation manipulations produced an effect on the tail of the distribution (the τ parameter), although fixations in this portion of the distribution were strongly affected in the spatial frequency filtering conditions in Experiments 1a, 1b, and 1c. This finding is instructive because it isolates the spatial frequency filtering manipulations as being the cause of fixation lengthening in the tail of the distribution.

The findings from Experiment 2b provide additional support for these conclusions. The blur condition replicated the finding from Experiment 1 that low-pass spatial frequency filtering induces a strong lengthening of fixation durations and shifts the distribution of fixation durations both in central tendency and for the tail. More important, we found that when the scene was changed from grayscale to color for the critical fixation (the color condition), fixation durations also increased. These findings are important because they demonstrate that the lengthening of fixation durations that occurs on a fixation-by-fixation basis in this paradigm is not restricted to cases where spatial frequency information is removed for the critical fixation but also holds for cases where information is added or changed. Interestingly however, when compared against the no-change condition, the color condition affected only the μ and σ distribution parameters, while the blur condition affected the μ, σ, and τ parameters. Considered together with the findings from Experiment 2a, it seems that the central tendency of the distribution of fixation durations (i.e., μ and σ) is affected when there is an unexpected change in the quality of the scene information from one fixation to the next, while the effect on the tail of the distribution (i.e., the τ effect) occurs when scene information has been spatial frequency filtered.

General discussion

In the present study, we investigated the effect of spatial frequency filtering on fixation durations during scene viewing. In particular, we tested the hypothesis that lengthening of fixation durations due to spatial frequency filtering of scene information can occur on a fixation-by-fixation basis via a direct-control mechanism. To do this, we employed a saccade-contingent change paradigm where, during a randomly chosen saccade, the scene was changed from a grayscale standard to one of two versions that had been band-pass filtered for spatial frequencies. Under these conditions, any differences in duration of the critical fixations that occurred as a function of the scene content display during that fixation would be attributable to the action of a direct-control mechanism.

In Experiment 1a, we contrasted high-pass and low-pass spatial frequency filtering conditions and found that fixation durations were increased in both conditions. Furthermore, the low-pass filtering condition produced a larger effect on fixation durations than did the high-pass condition, which replicated prior findings by Mannan et al. (1995) and also confirmed our expectation that the low-pass filtering manipulation removed information that was important for scene processing and that this information was at least partly preserved in the high-pass condition. Moreover, the difference between the high-pass and low-pass conditions in the present paradigm demonstrates that individual fixation durations were indeed determined by a direct-control mechanism that is sensitive to the scene spatial frequency content in the scene. These findings replicate the results of Loschky et al. (2005), where fixation durations were lengthened when information in the periphery was occasionally blurred, and also complement earlier demonstrations of direct control of fixation durations during scene viewing, such as direct influences on fixation durations as a result of scene masking (Henderson & Pierce, 2008; Henderson & Smith, 2009; Luke et al., 2013) and also due to the task relevance of scene information (Glaholt & Reingold, 2012).

The findings of Experiments 1b and 1c helped to further identify the nature of the processing differences between filtering conditions in Experiment 1a. In particular, if the difference between filtering conditions in Experiment 1a depended on a low-level cue regarding the spatial frequency characteristics of the scene information, then it should not depend on the orientation of the scene stimulus. In Experiment 1b, we presented the filtered scenes upside-down and found that while the high-pass and low-pass conditions differed strongly from the no-change condition, the difference between high-pass and low-pass filtering was absent. This tended to suggest that the relative savings in processing fluency for the high-pass condition were not driven by the low-level spatial frequency content of the scene per se but, rather, the integration of this information with higher-order scene knowledge. However, in Experiment 1c, we flipped the filtered scenes horizontally and found, somewhat surprisingly that, as in Experiment 1b, the two filtering conditions did not differ. This indicated that the difference between high-pass and low-pass fixation durations seen in Experiment 1a was not due to processing difficulty at the level of scene familiarity and knowledge but, rather, was likely to be driven by the mismatch between the information extracted for the critical fixation and the ongoing scene representation in memory.

The findings from Experiment 2 were especially informative with regard to the apparent processing differences that occur for critical fixations in this paradigm. In particular, we included control conditions where the unfiltered scene was rotated or flipped horizontally (Experiment 2a) and found that both manipulations were sufficient to induce a lengthening of fixation durations. We also tested a condition where, for critical fixations, color was added to the grayscale scene (Experiment 2b), and this also produced a lengthening of fixation durations. These finding are important because they indicate that the lengthening of fixation durations in this paradigm is not restricted to cases of spatial frequency filtering but appears to occur whenever information is changed from one fixation to the next. More specifically, it seems that a sudden mismatch between scene information extracted during the critical fixation and the ongoing scene representation in memory is sufficient to produce a lengthening of fixation durations.

Additional insight into the influence of direct control on fixation durations in this paradigm emerged from the ex-Gaussian modeling of the fixation distributions. This analysis allowed us to dissociate measures of the central tendency of the distribution (i.e., the mode; μ and σ parameters) from measures of the tail of the distribution (τ parameter). We found that the spatial frequency filtering manipulations affected a large portion of the distribution, including both the mode (central tendency) and the tail. In Experiment 1a, we found that the high-pass and low-pass filtering manipulations affected both portions of the distribution. In addition, and consistent with the interpretation of a difference in cognitive processing, we found that the increase in mean fixation duration in the low-pass condition, as compared with the high-pass condition, was driven exclusively by an increase in the proportion of fixations in the tail of the distribution. In analyzing the fixation duration distributions for Experiment 2, we found that the increases in fixation durations that occurred in orientation change conditions (Experiment 2a) and the color condition (Experiment 2b) were due to a shift in the mode of the distribution but that there was no effect on the tail. Taken together, these findings suggest that there may be two separate direct-control influences on fixation durations in this paradigm. The first factor is akin to a “surprise” effect that was present in all the display change conditions in Experiments 1 and 2. This effect is fast-acting and results in a shift in the central tendency of the distribution and, hence, is likely to operate on the basis of the detection of transsaccadic changes in stimulus content, regardless of whether or not the change in scene information impacts upon scene processing. This effect was present even in the horizontal flip and color control conditions in Experiment 2, in which there were scene changes from fixation to fixation but where the extraction of scene information during the critical fixation was not expected to be impaired.

Beyond the early effect of sudden display changes on the mode of the distribution, we observed a later effect on the tail of the distribution, and this was associated with the spatial frequency filtering manipulations. We speculate that this disruption has to do with difficulty in extracting detailed scene information, and because it occurs relatively late in the fixation interval (as compared with the factors that impact the mode of the distribution), we expect that this disruption occurs at the point where the scene information is integrated into the ongoing scene representation. Such a late processing disruption could explain the difference between the high-pass and low-pass conditions found in Experiment 1a and between the blur and color conditions in Experiment 2. These findings expand upon prior findings in which fixation durations were shown to increase when scene viewing was interrupted by a mask that was deployed at fixation onset (the stimulus onset delay paradigm; Henderson & Pierce, 2008; Henderson & Smith, 2009; Luke et al., 2013; see Rayner & Pollatsek, 1981, for the original development of the paradigm). In the stimulus onset delay paradigm, a noise mask is presented at the onset of certain randomly chosen fixations during scene viewing, and this noise mask is removed after a certain interval. Under these conditions, a subset of fixations are delayed and seem to “wait” for the mask to be removed. The present findings indicate that such a fixation delay might depend upon the extent to which processing is disrupted by the change in information content within a fixation. In particular, while all changes in scene information appear to cause a shift in the central tendency of the distribution of fixation durations, influences on the tail of the distribution appear to depend on the extent to which the change impacts upon the perceptual processing of scene information.

The present finding of a late effect of spatial frequency filtering is also broadly consistent with the results of a recent study that investigated the time course of the impact of scene relevance on fixation durations (Glaholt & Reingold, 2012). In this study, the task relevance of scenes was operationalized through a manipulation of scene category, whereby subjects were asked to select scenes from a particular category (e.g., nature scenes), while ignoring another category (e.g., buildings). The very first fixation on a scene was shown to have a longer duration if it was a relevant scene, and this effect was manifested in the later portion of the distribution of fixation durations. It was hypothesized that the extraction of scene gist and category information becomes available rapidly within a single fixation and may influence the duration of that ongoing fixation via direct control. Likewise, in the present study, the effect of spatial frequency filtering might impact scene processing at a similar point in processing, resulting in an effect that emerges on the tail of the distribution. These late scene-processing-related effects might be contrasted with effects on the mode of the distribution driven by the early detection of transsaccadic changes to the stimulus.

In summary, the present study used a saccade-contingent change paradigm to explore the influence of perceptual and cognitive factors on fixation durations during scene viewing. In Experiment 1, we confirmed the hypothesis that spatial frequency filtering of scene information can induce a lengthening of fixation durations on a fixation-by-fixation basis via a direct-control mechanism. In addition, through the control manipulations undertaken in Experiment 2, we were able to separate early and late influences in the distribution of fixation durations. While all the saccade-contingent changes employed in the present study had an early impact on the distribution of fixation durations (e.g., the central tendency), the effect on the later portion of the distribution was limited to changes that impair the perceptual processing of scene information. We encourage further research to explore this dichotomy and to further identify the factors that distinguish early and late effects on the distribution of fixation durations in scene viewing.