How does the visual system mediate between the needs to parse incoming sensory input into updated representations of objects, events, and scenes and to maintain our perception of a stable and continuous visual environment? Given that objects are of central importance for cognition and action, much vision research has focused on the problem of how objects are processed and recognized. However, object individuation is limited to, at most, only a few items in each glance. Not surprisingly, in addition to this form of detailed individual object processing, the visual system also relies on more global representations of the surrounding environment based on the average statistics of scenes and sets, or ensembles of objects (Ariely, 2001; Oliva & Torralba, 2001). Our naïve impression of a complete, detailed, continuously updated visual experience is likely based on a combination of these two types of visual processing. Together, the rich, foveal representation of one or two objects, memory for a handful of previously fixated items, and rapid impressions of the ensemble statistics of the rest of the scene may underlie the illusion of stable and complete visual perception (Ariely, 2001; Bar, 2004; Hochberg, 1978; Hock & Schmelzkopf, 1980; Melcher & Colby, 2008; Rensink, 2000). This feeling of perceiving everything in detail would be heightened by the ability to quickly fixate items in the periphery via saccadic eye movements.

Despite renewed interest in ensemble statistics, the spatiotemporal nature of these representations and their actual purpose in guiding cognition and action remain largely unknown. Given that ensemble representations do not depend on any single element being in view at a particular time, this raises the question of how we combine information about elements across space and over time during realistic viewing conditions. Previous studies have briefly flashed groups of elements on the screen, but natural viewing involves moving our eyes and bodies. The spatial and temporal attributes of ensemble statistics are critical, as it has been suggested that the relative stability of ensemble statistics across separate views might play an important role in our sense of visual stability (Corbett & Melcher, 2013; Melcher & Colby, 2008).

Beyond object-based representations: Ensemble statistics

As we described above, growing evidence indicates that our perception of the world combines two separate mechanisms—one involved in individuating a small number of objects, and the other representing the statistical properties of many objects. Visual input is highly redundant, allowing the brain to efficiently encode these regularities. Gombrich (1979) coined the term the “etcetera principle” to describe our perception of the overall pattern and meaning of a scene. He noted that global patterns enable us to predict what we will see, given the order and redundancy in the environment (Gombrich, 1979; Melcher & Colby, 2008).

Moving beyond descriptive terms such as “gist” or “etcetera,” Ariely (2001) first demonstrated that the visual system represents the statistical properties of sets of redundant and similar items. He presented observers with sets of heterogeneously sized dots for 500 ms, followed by a single test dot for 500 ms, and then asked them to perform a mean judgment task and a member identification task. In the mean judgment task, observers were required to determine whether the test dot was larger or smaller than the average size of the previously presented display of dots. In the member identification task, they were asked to determine whether the test dot had been present in the previous set. Performance was almost three times more precise in the mean judgment task than in the member identification task, suggesting that observers constructed a fairly accurate representation of the mean size of the set but did not retain information about the individual sizes of the elements comprising the sets. Chong and Treisman (2003) further demonstrated that observers could determine which of two side-by-side displays of heterogeneously sized dots had the larger mean size as accurately as they could determine which of two side-by-side patches of homogeneously sized dots had the larger size or which of two single circles was larger.

Several findings support Ariely’s (2001, 2008) proposal that the average properties of sets of similar objects are represented without retaining information about the individual items comprising the sets, even when the individual elements cannot be consciously perceived (e.g., Alvarez & Oliva, 2008; Choo & Franconeri, 2010; Corbett & Oriet, 2011; Demeyere, Rzeskiewicz, Humphreys, & Humphreys, 2008; Joo, Shin, Chong, & Blake, 2009; Yamanashi Leib, Landau, Baek, Chong, & Robertson, 2012; Parkes, Lund, Angelucci, & Morgan, 2001). Along these lines, recent work by Im and Halberda (2013) has suggested that ensemble statistics are represented more efficiently than properties of individual objects. Evidence even suggests that higher-level abstract properties, such as average numeric value (Corbett, Oriet, & Rensink, 2006) and average emotion (Haberman & Whitney, 2007), are statistically represented under conditions in which individual items cannot be explicitly identified. In fact, recent findings have suggested that even when observers are explicitly instructed to ignore a distractor set of items and only average the lengths of a target set, the average of the entire set is calculated automatically and influences perceptual judgments (Oriet & Brand, 2013).

It is interesting to note that the processing of ensemble statistics is also subject to adaptation. Previous reports have demonstrated evidence of perceptual adaptation to average orientation (e.g., Gibson & Radner, 1937), average direction of motion (e.g., Anstis, Verstraten, & Mather, 1998), average texture density (Durgin, 1995, 2008; Durgin & Huk, 1997), and numerosity (Burr & Ross, 2008). Recently, we established that mean size is also an adaptable property of visual scenes (Corbett, Wurnitsch, Schwartz, & Whitney, 2012). When participants adapted to two sets of dots with different mean sizes, a test dot appeared larger when presented in a region adapted to a set of dots with a smaller mean size than when it was presented in a region adapted to a set with a comparably larger average diameter. We also demonstrated that this aftereffect could not be attributed to the density of the adapting dot patches or the spatial-frequency energies of the displays, but was modulated by the variance of the adapting dot sets. Taken together, our results suggest that mean size is an adaptable, and therefore explicitly encoded, dimension of visual scenes.

The role of ensemble statistics in visual stability

After rejecting the theory that perception involves stitching together different views of the scene (Irwin, Yantis, & Jonides, 1983), most theories of visual stability have trended toward an anchoring hypothesis, in which visual stability depends on matching the saccadic target or some other salient visual stimulus across a saccade (Bridgeman, Van der Hejiden, & Velichkovsky, 1994; Cavanagh, Hunt, Afraz, & Rolfs, 2010; McConkie & Currie, 1996; Wurtz, 2008). This object would have to be matched across saccades on the basis of memory (Irwin, 1992), or by an active mechanism that takes saccadic planning into account (Cavanagh et al., 2010; Melcher & Colby, 2008; Wurtz, 2008).

One possibility is that the richness of perception across saccades is “based mainly on a few salient objects and a visual field full of ‘etcetera’ ” (Melcher & Colby, 2008). Indeed, scene statistics would seem to be a natural explanation for the experience of visual stability. If these ensemble representations were to remain relatively constant across a saccade, this would provide important evidence for visual stability. Support for this idea is given by findings from change blindness studies, in which changes that violate the gist and layout of a scene are noticed more than other types of changes (for reviews, see O’Regan & Noe, 2001; Simons & Rensink, 2005).

However, in order to support visual stability, ensemble representations would need to be matched across a single saccade and to persist over time. Along these lines, findings by Whiting and Oriet (2011) suggest that ensemble statistics are processed over a period of at least 200 ms. In terms of space, it is well known that sensory information is represented in multiple reference frames, including retinal, head-centered, and allocentric coordinate systems. Moreover, many studies have demonstrated a flexible use of multiple reference frames in visual experiments, even within the same task (e.g., Golomb, Chun, & Mazer, 2008; Malkinson, McKyton, & Zohary, 2012; Zimmermann, Morrone, Fink, & Burr, 2013), corresponding to allocentric/spatiotopic and retinotopic coordinates, but also in some cases to locations that are not matched retinotopically or spatiotopically (Biber & Ilg, 2011; Ezzati, Golzar, & Afraz, 2008; Melcher, 2007). This raises the question of whether statistical representations are strictly retinotopic, in which case their use across saccades would be limited to smaller saccades within a set of items. In contrast, representing ensemble statistics across different levels of the visual hierarchy with increasing abstraction and temporal persistence (Melcher & Colby, 2008) would allow statistical representations to play a greater role in visual stability.

Goal of the present study

To investigate the role of statistical representations in visual stability, we examined whether summary representations of mean size are formed in multiple reference frames. Specifically, we tested whether mean-size adaptation transferred across saccades relative to retinotopic, spatiotopic, and hemispheric coordinates, and whether mean-size adaptation transferred interocularly.

Experiment 1: Retinotopic, spatiotopic, and hemispheric mean-size adaptation

We began by comparing the effects of mean-size adaptation in four conditions. We adapted participants to two side-by-side displays of heterogeneously sized dots, then tested their perceptions of the sizes of single test dots presented in retinotopic or spatiotopic coordinates, relative to test dots presented in a baseline, no-saccade condition (i.e., both retinotopic and spatiotopic coordinates were matched), and a control condition in which the location of the test dots matched neither the retinotopic nor spatiotopic positions (“neither” condition). If mean-size adaptation occurs in a particular reference frame, participants’ perceptions of the sizes of the individual test dots presented in the corresponding spatial coordinates should be modulated as a function of the average sizes of the adapting patches. The methods and stimuli of all experiments were similar to those of Corbett et al. (2012), who first demonstrated adaptation to mean size.

Method

Participants

A group of ten students at the University of Trento, all right-handed with normal or corrected-to-normal vision with contact lenses, participated in a two-and-a-half-hour session for either course credit or payment. A mandatory 10-min break was imposed approximately halfway through each experimental session. All of the procedures and protocols were in accordance with the University of Trento’s Ethics Committee.

Task

On each trial, after adapting to two side-by-side displays of heterogeneously sized dots, participants shifted fixation vertically on the screen either up or down, or remained fixating centrally (see Fig. 1). Then two test dots appeared on the right and left sides of the screen, and participants were to indicate which dot appeared larger. If the test dot on the left was larger, participants were instructed to press the “z” key on a computer keyboard, and if the right test dot was larger, they pressed the “x” key.

Fig. 1
figure 1

Order of events in Experiment 1. Each block began with an initial adaptation phase, during which participants fixated while viewing a display of the two side-by-side adapting patches for 1 min. After this initial adaptation, trials in each block cycled between a top-up adapting display presented for 2 s, a single fixation cross, and a test display consisting of the two single dots and the fixation cross for 250 ms. The screen remained blank until the keypress response, or for 3 s, whichever came first, before the onset of the next top-up display. We manipulated the reference frame in which the test dots were presented relative to the adapting patches by keeping fixation in the center of the display or shifting it up or down along the vertical meridian by 8°. In the baseline condition, fixation remained central, and the test dots appeared on either side of it. In the retinotopic condition, fixation shifted up or down, and the test dots appeared on either side of the shifted fixation. In the spatiotopic condition, fixation shifted up or down, but the test dots were presented in the spatial locations previously occupied by the adapting patches. In the “neither” condition, fixation shifted up or down, and the two test dots appeared 4° above or below the shifted fixation.

Apparatus

We used a Dell PC to present black dots against a gray (midway between black and white) background on a 19-in. monitor with a vertical refresh rate of 60 Hz (1,024 pixel × 768 pixel resolution), and responses were recorded using the computer keyboard. MATLAB software (version 2009a), in conjunction with the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997), controlled all display, timing, and response functions. An EyeLink 1000 Desktop Mount (SR Research, Ontario, Canada), sampling at 1000 Hz, monitored right eye position in order to ensure proper fixation during the adaptation and test displays. The tracking system was calibrated for five points on the screen (top right, top left, center, bottom right, and bottom left) at the start of each experimental block.

Stimuli

In Experiments 1 and 2 of the present study, the adapting stimulus consisted of two sets of 14 dots. Each set of 14 dots was composed of two concentric rings, an inner ring of six dots subtending 2° of visual angle, and an outer ring of eight dots subtending 4° of visual angle. The outer eight dots were initially positioned at one of eight cardinal or 45° intercardinal locations around the outer ring. Each dot was jittered independently in the x- and y-directions by a random factor between –0.5° and +0.5° of visual angle on each trial. The six inner dots were initially positioned around the inner ring at the 30°, 90°, 150°, 210°, 270°, and 330° positions, and then were jittered in the same manner as the outer dots. Within each 14-dot patch, we restricted the positions of the dots such that no individual dot was within 0.125° of any other dot in either the x- or the y-direction.

The smaller adapting set contained 14 dots ranging in diameter from 0.5° to 1.15° in 0.05° steps, with a mean size of 0.825° of visual angle. The larger adapting set always contained 14 dots ranging in diameter from 0.85° to 1.5°, also in 0.05° steps, with a mean size of 1.175° of visual angle. Note that half of the dots in the small and large adapting sets were the same seven sizes. The positions of the 14 dots in each set were randomized on every trial, such that no location within either adapting patch consistently contained a dot that was larger or smaller than any other dot in the set; only the difference in mean dot size (diameter) between the two adapting sets was constant over the course of the experiment.

Also, in both Experiments 1 and 2, the test displays consisted of two single dots, one on each side of the display. Unknown to the participants, the dot on one side of the screen always served as a standard (left in Exp. 1, top in Exp. 2), and was the same size as the mean size of all 28 dots comprising the adapting displays (1°). The opposite test dot was ±0, 0.12, 0.25, or 0.53 standard deviations (of the whole set of 28 adapting dots) larger or smaller than the standard, resulting in seven possible test dots, respectively subtending 0.86°, 0.93°, 0.97°, 1°, 1.03°, 1.07°, and 1.14° of visual angle. We randomized the positions of the test dots within the two adapted regions from trial to trial, so that no given location in either adapted region was consistently probed, making it more likely that the mean size of the entire display of adapting dots would be responsible for any observed effects on perceived size (e.g., Chong, Joo, Emmanouil, & Treisman, 2008; Corbett & Oriet, 2011; Corbett et al., 2012).

In Experiment 1, each two-ringed adapting dot set was 6° of eccentricity from the center of the monitor, relative to the horizontal meridian. The two test dots were presented side by side, with the standard on the left side of the screen and the other test on the right side.

Procedure

Participants were tested individually in a semidarkened room. They were positioned 57 cm in front of the center of the monitor and restrained in a combination chin-and-head rest. Each participant performed one practice block of 16 trials, followed by eight experimental blocks of 112 trials each, for a total of 896 trials: 16 repetitions of each of the seven possible test-dot standard deviation differences per adapting condition (big on left [BoL] or big on right [BoR]) in each of the reference frame conditions (retinotopic, spatiotopic, neither, and baseline). Each participant completed four blocks of each adapting condition. The order of adapting conditions was counterbalanced over observers. The reference frame in which the test dots were presented was pseudorandomized on each trial, such that equal numbers of trials were presented in each of the four reference frames in each block. No information was provided about the relative locations or sizes of the adapting displays.

As is shown in Fig. 1, each block began with an initial adaptation phase, during which participants fixated while viewing a display of the two side-by-side adapting patches for 1 min. After this initial adaptation period, the trials in each block cycled between a top-up adapting display presented for 2 s, a single fixation cross, and a test display consisting of the two single dots and the fixation cross for 250 ms, signaling for participants to respond which dot appeared larger. The screen remained blank until the keypress response, or for 3 s, whichever came first, before the onset of the next top-up display. We excluded from further analysis responses made later than 3 s after the display offset (<3 % of trials for all participants). Importantly, the positions of the individual dots comprising each adapting patch and the positions of the individual test display dots were randomized on every trial (e.g., Chong et al., 2008; Corbett & Oriet, 2011; Corbett et al., 2012).

The eyetracker continuously monitored that participants were fixating the cross during each adaptation and test display. In the initial practice block, if fixation was broken, the computer emitted a 500-Hz tone every 200 ms until fixation was reestablished. In this manner, the test displays were never shown until fixation was maintained, and participants were trained that fixation was necessary to advance from trial to trial. In the experimental blocks, the warning sound was no longer used, but the eyetracker still ensured fixation in the same manner.

As is also shown in Fig. 1, we manipulated the reference frame in which the test dots were presented relative to the adapting patches by keeping fixation in the center of the display or shifting it up or down along the vertical meridian by 8° (twice the diameter of the adapting displays) during the delay period between adaptation and the test probe. In the baseline condition, fixation remained central, and the test dots appeared on either side of it. In the retinotopic condition, fixation shifted up or down (randomly determined on each trial), and the test dots appeared on either side of the new fixation position. In the spatiotopic condition, fixation shifted up or down, but the test dots were presented in the spatial locations previously occupied by the adapting patches. In the “neither” condition, fixation shifted up or down, and the two test dots appeared 4° (the diameter of the adapting displays) above or below the shifted fixation, well outside retinotopic or spatiotopic coordinates. Data were collapsed across the up/down fixation shift directions for all subsequent analyses.

Analysis

For each participant, in each of the eight combinations of the adapting (BoL/BoR) and reference frame (retinotopic, spatiotopic, neither, and baseline) conditions, we computed the average proportion of responses that the test dot on the right appeared larger than the test dot on the left. Using maximum likelihood estimation, we next fitted each participant’s averaged responses over the seven test–standard differences in each reference frame condition to two separate logistic functions (one for BoL, one for BoR) with lower and upper bounds of 0 and 1, respectively, for a total of eight logistic fits per participant. We evaluated goodness of fit by using deviance scores calculated as the log-likelihood ratio between a fully saturated, zero-residual model and the data model. A score above the critical chi-square value indicated a significant deviation between the fit and the data (Wichmann & Hill, 2001). All of the fits were significant, because all deviance scores were below the critical chi-square value, χ 2(7, 0.95) = 14.067. We next defined the magnitude of an individual participant’s aftereffect for each adapting and reference frame condition as the point of subjective equality (PSE)—the 50 % inflection point on the corresponding psychometric function. The PSE quantified the physical difference in dot size required for the two test dots to appear equal in diameter.Footnote 1

Results

Mean-size adaptation was apparent in multiple reference frames. A 2 × 4 repeated measures within-subjects analysis of variance (ANOVA) on participants’ grand-averaged PSEs in each Adapting (BoL, BoR) × Reference Frame (retinotopic, spatiotopic, neither, baseline) condition revealed a significant main effect of adapting condition [F(1, 9) = 135.923, MSE = 0.123, p < .001, η 2 = .938], and an interaction between adapting condition and reference frame [F(3, 27) = 12.380, MSE = 0.002, p < .001, η 2 = .579]. As is illustrated in Fig. 2a, planned comparisons between the two BoL/BoR adapting conditions in each of the four reference frames (with α set to .0125 using the Bonferroni correction for four multiple comparisons) further confirmed that a significant aftereffect was present, such that the PSEs in the BoL and BoR conditions were significantly different for each reference frame. A negative PSE in the BoR condition indicated a leftward shift in the logistic fit to the proportion of “Right appears larger” responses, and a positive PSE in the BoL condition represented a rightward shift, indicating that test dots presented to the region adapted to the larger mean size appeared smaller than when they were presented to the region adapted to the smaller mean size.

Fig. 2
figure 2

Experiment 1 results. (a) Participants’ grand-averaged points of subjective equality (PSEs) for the big-on-left (BoL) and big-on-right (BoR) adapting conditions were significantly different (with α set to .0125 using the Bonferroni correction for multiple comparisons) in each of the four reference frame conditions, indicating that mean-size adaptation occurred in all four sets of coordinates. (b) Magnitude of the mean-size adaptation effect in Experiment 1 for each condition. Participants’ grand-averaged PSE differences between the BoL and BoR conditions were significantly different between the “neither” reference frame and each of the three other frames, illustrating that mean-size adaptation was greatest in the “neither” condition of Experiment 1. (Asterisks represent significant differences, with α set to .0083 using the Bonferroni correction for multiple comparisons.) All error bars represent 95 % within-subjects confidence intervals, calculated using the method of Loftus and Masson (1994).

The results of Experiment 1 confirmed our predictions of a significant adaptation aftereffect in each of the four reference frames. However, we did not have any a priori predictions about whether the magnitudes of adaptation would differ between the different reference frame conditions, except that we expected no adaptation or attenuated adaptation when we placed the test dots well outside retinotopic and spatiotopic coordinates in the “neither” reference frame. However, visual inspection of Fig. 2a clearly shows a greater magnitude of adaptation, or difference in the PSEs for BoL and BoR, in the “neither” condition than in the other three conditions. Given the significant main effect of adapting condition, the interaction between adapting condition and reference frame, and the lack of a significant main effect of reference frame, we decided to further examine these unexpected differences between the magnitudes of the aftereffect over reference frames. For each participant, we defined the magnitude of the aftereffect within each reference frame condition as the difference between the individual’s BoL PSE – BoR PSE. We then compared the differences between participants’ grand-averaged PSE differences in each of the four reference frame conditions (with α set to .0083 using the Bonferroni correction for six multiple comparisons). As is illustrated in Fig. 2b, the magnitude of mean-size adaptation was significantly greater in the “neither” reference frame than in each of the other three reference frames.

Discussion

The results of Experiment 1 demonstrate that mean-size adaptation occurs in multiple spatial frames of reference. Specifically, we found a significant difference between participants’ grand-averaged PSEs in the BoL versus BoR conditions within each of the four reference frame conditions, such that they perceived test dots as being larger when presented in the corresponding regions adapted to the smaller- versus the larger-mean-size set of dots. These results support our proposal that mean size and similar ensemble statistics are represented across different spatial reference frames, persisting across saccades to facilitate visual stability.

When participants made an eye movement after adapting to the patches of dots, the adaptation effect was still apparent for stimuli presented in the original spatiotopic locations, even though the retinotopic coordinates were shifted. Even more surprisingly, we also observed an aftereffect in the “neither” condition, in which the test stimuli were presented well outside of regions corresponding to retinotopic or spatiotopic locations occupied by the adapting displays. Transfer of adaptation to locations that do not match retinotopically or spatiotopically has been reported in previous studies (Biber & Ilg, 2011; Ezzati et al., 2008; Melcher, 2007), but the nature of this transfer is not well understood. Because we had not predicted any adaptation in this condition, we conducted a control Experiment 1b to test for an adaptation aftereffect outside of retinotopic or spatiotopic coordinates when no eye movement was made between adaptation and test. The methods were exactly the same as in the main Experiment 1, except that we tested a new group of ten participants; the adapting stimuli were presented on the left and right sides of the screen, centered at 4° of vertical eccentricity in the upper visual field, and the test stimuli were presented on the left and right, centered at 4° of vertical eccentricity in the lower visual field; and the participants maintained constant central fixation during both the adaptation and test phases. Each participant completed one block of 112 trials in the BoL and BoR adapting conditions (the order of which was counterbalanced over participants), for a total of 16 repetitions of each of the seven possible test dot standard deviation differences in each adapting condition.

The results of this control experiment also showed adaptation in coordinates that were neither retinotopic nor spatiotopic. The participants’ PSEs in the BoL and BoR adapting conditions were significantly different [t(9) = 2.266, SEM = 0.014, p = .05], indexing an adaptation aftereffect of mean size outside retinotopic or spatiotopic regions when no eye movement was made. Given that the nonlocalized adaptation aftereffect persisted even in the absence of an eye movement, we conducted a second experiment with the adapting and test displays lateralized to the top and bottom of a single hemisphere, to examine whether this finding in Experiment 1 might have been due to adapting the entire left/right hemisphere (e.g., Nicholls, Bradshaw, & Mattingley, 1999). In addition, we tested a new, monocular condition to examine whether adaptation transferred between eyes.

Experiment 2: Hemispheric and interocularly transferred mean-size adaptation

To examine whether the unexpected finding of an aftereffect in the “neither” condition of Experiment 1 was due to presenting the adapting patches in separate hemispheres, in Experiment 2 we controlled for the possibility of left/right hemispheric adaptation by presenting the adapting dots in the upper and lower regions of a single hemisphere. If the adaptation that we observed in the “neither” condition of Experiment 1 was due to mean-size adaptation across the entire left and right hemispheres, then we should no longer observe an aftereffect when the stimuli were lateralized to one hemisphere in the “neither” condition of Experiment 2. Given the aftereffect that we uncovered in multiple spatial reference frames, we further tested whether mean-size adaptation transferred interocularly when one eye was adapted and the other eye was tested. If there is interocular transfer of mean-size adaptation, then we should find an aftereffect when we adapted one eye and presented the test dots to the opposite eye.

Method

Participants

A group of ten students at the University of Trento, all right-handed with normal or corrected-to-normal vision with contact lenses, participated in a two-and-a-half-hour session for either payment or course credit. A mandatory 10-min break was imposed approximately halfway through each experimental session. All of the procedures and protocols were in accordance with the Ethics Committee of the University of Trento.

Task

The participants’ task in Experiment 2 was identical to that in Experiment 1, except that in Experiment 2, after adapting to two side-by-side displays of heterogeneously sized dots, participants shifted fixation to either the left or right, or remained fixating centrally, and indicated which of two test dots presented at the top and bottom of the screen appeared larger. In other words, we replaced the vertical saccade in the first experiment with a horizontal one in the second experiment. If the test dot on the top was larger, participants were instructed to press the “z” key on a computer keyboard, and if the bottom test dot was larger, they pressed the “x” key.

Apparatus

In addition to the experimental apparatus of Experiment 1, we used a ScreenScope stereoscope (Stereo Aids, Albany, Australia) mounted on an adjustable platform to manipulate the presentation of the adapting and test dots to individual eyes in the monocular condition of Experiment 2. The stereoscope was calibrated individually for each participant before the beginning of each block by having the participant adjust two knobs that controlled the left and right mirror angles in four separate adjustment tasks. They first adjusted the mirrors until a white cross subtending 4° of visual angle (the size of the individual adapting-stimulus arrays), presented 12° of eccentricity along the horizontal meridian on the left of the screen, appeared directly in the center of a 4° black square presented 12° of visual angle along the horizontal meridian on the right of the screen. They next repeated this adjustment with a black cross on the right of the screen and a white square on the left of the screen. In the third adjustment task, the black square flickered at 4 Hz on the left, and the white square flicked in antiphase at 4 Hz on the right (when the black square was visible, the white square was not, and vice versa), and participants aligned them to centrally overlap. Finally, they aligned two similarly flickering fixation crosses, one 12° on the left along the horizontal meridian and one 12° on the right along the horizontal meridian. Because the experimental displays contained the same fixation crosses in the same locations, this final adjustment helped to ensure that the experimental displays would appear to be presented in the center, even though they were physically presented 12° to the left or right of the screen, separately to each eye.

Stimuli

In Experiment 2, each two-ringed adapting dot set was 6° of eccentricity from the center of the monitor, relative to the vertical meridian—one set 6° of eccentricity above the horizontal meridian and one 6° below the horizontal meridian. Importantly, in Experiment 2, the two adapting patches were always presented in the top and bottom of the same (left or right) hemisphere. The two test dots were also presented at the top and bottom of the same hemisphere, with the standard in the top half of the screen and the other test in the bottom half.

Procedure

The procedure in Experiment 2 was identical to that of Experiment 1, except for the hemisphere-specific display presentation, the manner in which fixation was manipulated, and the absence of the eyetracker in the monocular condition. Specifically, each participant performed one practice block of 16 trials, followed by eight experimental blocks of 112 trials each, for a total of 896 trials—16 repetitions of each of the seven possible test dot standard deviation differences per big-on-top (BoT)/big-on-bottom (BoB) adapting condition in each reference frame: retinotopic, spatiotopic, neither, and baseline. Each participant completed two blocks of each combination of adapting condition (BoT/BoB) and adapted side (left/right). The order of conditions was counterbalanced over observers. The reference frame in which the test dots were presented was pseudorandomized on each trial, such that equal numbers of trials were presented with each of the four reference frames in each block.

The participants in Experiment 2 also completed an additional four blocks of 56 trials each in the monocular condition, for a total of 224 trials—again with 16 repetitions of each of the seven possible test dot standard deviation differences per BoT/BoB adapting condition. Each participant completed one block of each combination of adapting condition (BoT/BoB) and adapted eye (left/right). The order of the conditions was counterbalanced over observers. The monocular condition was always run after participants had completed the initial eight blocks of 896 retinotopic, spatiotopic, neither, and baseline trials. This was done to ensure that participants understood the task and were well practiced at maintaining fixation when they performed the monocular portion of the experiment. The eyetracker was again used to ensure proper fixation in the first eight blocks of 896 trials, but not in the four blocks of subsequent monocular trials because the stereoscope blocked the participants’ eyes from recordable view.

As is shown in Fig. 3, we manipulated the reference frame in which the test dots were presented relative to the adapting patches by keeping fixation in the center of the display or shifting it left or right along the horizontal meridian. In the baseline condition, fixation remained central and the test dots appeared on the top and bottom of the display in the adapted locations (left or right). In the retinotopic condition, when the adapting patches were presented on the left side of the display, fixation shifted 6° of visual angle rightward along the horizontal meridian, and the test dots appeared 6° above and below the previous location of the central fixation. When the adapting patches were presented on the right side of the display in the retinotopic condition, fixation shifted 6° leftward and the test dots appeared above and below the previous location of central fixation. In the spatiotopic condition, when the left side was adapted, fixation shifted 12° to the left, and when the right side was adapted, fixation shifted 12° to the right (i.e., twice the eccentricity of the adapting patches). The test dots always remained in the spatial locations previously occupied by the adapting displays. In the “neither” condition, when the left hemisphere was adapted, fixation shifted 6° to the left along the horizontal meridian, and when the right hemisphere was adapted, fixation shifted 6° to the right (i.e., the respective original locations of the adapting displays). The test displays always shifted to the top and bottom of the vertical meridian. In the monocular condition, when the adapting patches were presented at 12° of horizontal eccentricity and 6° of vertical eccentricity and the test dots were presented in the adapted regions, they appeared to be centrally aligned 6° above and below the horizontal meridian. When the adapting dots were presented to the left eye, the test dots were presented to the right eye, and vice versa. The data in all reference frame conditions were collapsed across up–down fixation shift directions and left–right adapting sides–eyes for all of the subsequent analyses.

Fig. 3
figure 3

Order of events in Experiment 2. The methods of Experiment 2 were identical to those of Experiment 1, except that we manipulated the reference frame in which the test dots were presented relative to the adapting patches by keeping fixation in the center of the display or shifting it to the left or right along the horizontal meridian. In the baseline condition, fixation remained central, and the test dots appeared at the top and bottom of the locations previously occupied by the adapting patches. When the adapting patches were presented on the right side of the displays in the retinotopic condition, fixation shifted 6° leftward and the test dots appeared above and below the previous location of central fixation, and when the left side was adapted, fixation shifted 6° rightward and the test dots still appeared above and below the central fixation. In the spatiotopic condition, when the left side was adapted, fixation shifted 12° to the left, and when the right side was adapted, fixation shifted 12° to the right. The test dots always remained in the spatial locations previously occupied by the adapting displays. In the “neither” condition, when the left hemisphere was adapted, fixation shifted 6° to the left along the horizontal meridian, and when the right hemisphere was adapted, fixation shifted 6° to the right. The test displays always shifted to the top and bottom of the vertical meridian. In the monocular condition, when the adapting patches were presented at 12° of horizontal eccentricity and 6° of vertical eccentricity and the test dots were presented in the adapted regions, they appeared to be centrally aligned 6° above and below the horizontal meridian. When the adapting dots were presented to the left eye, the test dots were presented to the right eye, and vice versa.

Analysis

The analysis of Experiment 2 was identical to that of Experiment 1, except that we included the additional monocular condition and analyzed all responses as a function of the probability of a response that the test dot on the bottom appeared larger than the test dot on the top. All logistic fits were again significant [χ 2(7, 0.95) = 14.067], and we defined the magnitude of an individual’s aftereffect in each adapting and reference frame condition as the PSE.

Results

Mean-size adaptation was again apparent in all reference frames. A 2 × 5 repeated measures within-subjects ANOVA on the grand-averaged PSEs in each Adapting Condition (BoT, BoB) × Reference Frame (retinotopic, spatiotopic, neither, baseline, monocular) again revealed a significant main effect of adapting condition [F(1, 9) = 126.227, MSE = 0.067, p < .001, η 2 = .933], and an interaction between adapting condition and reference frame [F(4, 36) = 6.059, MSE = 0.002, p = .001, η 2 = .402]. Planned comparisons between the two BoT/BoB adapting conditions in each of the five reference frames (with α set to .01 using the Bonferroni correction for five multiple comparisons) again confirmed a significant aftereffect, such that the PSEs in BoT and BoB were significantly different for each reference frame.

We again compared the differences between grand-averaged PSEs in the BoT–BoB conditions within each reference frame condition. As is shown in Fig. 4, although we still observed a significant aftereffect in the “neither” condition of Experiment 2 when we controlled for left–right hemispheric differences, we found a marked decrease in this frame of reference, as in the spatiotopic frame. When looking at differences in the magnitude of the mean-size aftereffect between the reference frames, the results differed from those in the first experiment. Unlike the significant differences between the “neither” condition and all other reference frames found for Experiment 1, the only significant difference in adaptation magnitude observed in Experiment 2 was between the retinotopic and “neither” conditions (with α set to .005 using the Bonferroni correction for ten multiple comparisons).

Fig. 4
figure 4

Experiment 2 results. Mean-size adaptation was again observed in all reference frames tested in Experiment 2. The magnitudes of the mean-size adaptation effects, quantified by participants’ grand-averaged point-of-subjective-equality (PSE) differences between the big-on-top (BoT) and big-on-bottom (BoB) adapting conditions in each reference frame, were significantly different between the “neither” and retinotopic reference frames. (The asterisk represents a significant difference, with α set to .005 using the Bonferroni correction for multiple comparisons.) All error bars represent 95 % within-subjects confidence intervals, calculated using the method of Loftus and Masson (1994).

On the basis of this initial evidence of a reduction of mean-size adaptation in the “neither” condition of Experiment 2, we next directly compared the magnitude of the aftereffect in each reference frame across experiments. Table 1 lists the magnitudes of mean-size adaptation in each frame of reference in each experiment, as well as the differences between the effects in the four corresponding frames over experiments. Independent-samples t tests between each of the four corresponding reference frames in the two experiments revealed highly significant differences in the magnitudes of mean-size adaptation within the spatiotopic and “neither” reference frames between Experiments 1 and 2 (with α set to .0125 using the Bonferroni correction for four multiple comparisons). Whereas one adapting patch was presented to the right hemisphere and one to the left hemisphere in Experiment 1, when both adapting patches were lateralized to the top and bottom of a single left/right hemisphere in Experiment 2, we found a significant decrease in the magnitude of mean-size adaptation in the spatiotopic and “neither” reference frame conditions, suggesting that the effects observed in these coordinates in Experiment 1 were at least partially due to left/right hemispheric adaptation.

Table 1 Experiment 1 and 2 point-of-subjective-equality (PSE) differences (degrees of visual angle) in each reference frame

Discussion

When the large- and small-mean adapting displays were lateralized to the top and bottom of a single hemisphere in Experiment 2, we again observed a significant mean-size aftereffect in the retinotopic, spatiotopic, neither, and baseline reference frame conditions. In addition, we observed a significant aftereffect in the monocular condition, providing evidence that mean-size adaptation transferred interocularly when one eye was adapted and the other eye was tested. Given that the inputs from the eyes converge in V1, the interocular transfer of mean-size adaptation observed in Experiment 2, together with the retinotopic mean-size adaptation observed in Experiments 1 and 2, suggests that mean size is encoded across multiple spatial reference frames.

As in Experiment 1, we had not predicted an effect in the “neither” condition of Experiment 2 because the test dots were placed well outside any regions corresponding to adapted retinotopic or spatiotopic coordinates. However, the significant decrease in the magnitude of the aftereffect in the spatiotopic and “neither” reference frame conditions in Experiment 2 implies that the stronger effects observed in Experiment 1 were at least partially attributable to adapting the entire left and right hemispheres to the big- and small-mean-sized dot patches. When the displays were lateralized to the top and bottom of a single left/right hemisphere in Experiment 2, these effects were markedly attenuated, suggesting that mean-size adaptation transferred to the upper/lower hemispheres to a lesser extent than to the left/right hemispheres.

General discussion

The present results are the first to empirically demonstrate that statistical ensemble representations are not strictly retinotopic, but instead persist across saccades and different spatial reference coordinates. In Experiment 1, when we adapted participants to two side-by-side displays of heterogeneously sized dots on the left and right of the screen, they perceived the sizes of two test dots as a function of the average sizes of the adapting patches, such that when individual test dots were presented in regions adapted to the patch with the larger average size, they appeared smaller than when they were presented in regions adapted to the smaller average size. Importantly, when participants made a saccadic eye movement between the adaptation and test displays, mean-size adaptation transferred to test dots presented at retinotopic and spatiotopic coordinates, as well as at coordinates within the same hemisphere but well outside regions corresponding to either the retinotopic or the spatiotopic coordinates of the adapting patches. Furthermore, when no saccadic eye movement was made in the control Experiment 1b, an adaptation aftereffect was still observed in locations that did not correspond to retinotopically or spatiotopically adapted regions. In Experiment 2, when participants viewed the adapting displays at the top and bottom of a single right/left hemisphere, the effects of mean-size adaptation in both the spatiotopic and “neither” conditions were greatly attenuated as compared to those observed in Experiment 1, confirming that the effects observed in these conditions in Experiment 1 were at least partially attributable to adapting the entire left and right hemispheres to the mean sizes of the adapting patches, suggesting that mean-size representations were more persistent across the left/right hemispheres than the upper/lower hemispheres. Additionally, the results of Experiment 2 demonstrated that mean-size adaptation transferred interocularly beacuse the size of adapting patches presented exclusively to one eye affected the perceived sizes of test dots presented exclusively to the opposite eye. Taken together, these results demonstrate that mean size is encoded in multiple spatial reference frames, with increasing levels of abstraction.

It is clear from behavioral and neurophysiological studies that the brain uses multiple spatial reference frames. Which frame or frames are used at a particular moment may depend on the task and the temporal scale (e.g., Golomb et al., 2008; Malkinson et al., 2012; Zimmermann et al., 2013). Behavioral effects have been reported in locations that do not match either retinotopic or spatiotopic locations (Biber & Ilg, 2011; Ezzati et al., 2008; Melcher, 2007), consistent with our results. In the present study, we found strong adaptation effects in multiple frames of reference, such that no single reference frame can account for all of our findings. One possible explanation for our pattern of results is that it reflects the particular way in which saccades and global statistics may interact, both in the laboratory and in the real world. The visual system seeks correspondences across eye movements and is willing to accept some imprecision in this matching, since saccades are not particularly precise or accurate, as has been demonstrated in studies of saccadic suppression of displacement (Bridgeman, Hendry, & Stark, 1975) and saccadic adaptation (Semmlow, Gauthier, & Vercher, 1989; Straube & Deubel, 1995). In contrast to the case of single objects, for global statistics the location of the ensemble is defined on the basis of a potentially large spatial-pooling process whose features are not yet understood, but likely involves a measure of central tendency and dispersion rather than exact boundaries (McGowan, Kowler, Sharma, & Chubb, 1998; Melcher & Kowler, 1999).

Previous studies of spatial updating across saccades have not directly answered how such a global representation might be updated. In fact, some neurophysiological studies suggest an absence of exact spatial updating of individual items when the number of stimuli exceeds more than a few items (Joiner, Cavanaugh, & Wurtz, 2011). In the real world, global statistics are unlikely to change dramatically across most saccadic eye movements. Therefore, in order to take advantage of stable global statistics across saccades in an optimal way, the visual system would have to allow for a less-stringent criterion for matching pre- and postsaccadic input.

Not surprisingly, no study has succeeded in localizing the mechanisms responsible for the representation of mean size and other ensemble statistics. In line with our findings that these representations are preserved across multiple spatial reference frames, we found evidence to suggest that mechanisms throughout the visual hierarchy may give rise to ensemble representations. For example, retinal neurons that rapidly adapt to the spatial correlations within scenes (Smirnakis, Berry, Warland, Bialek, & Meister, 1997), neurons in early visual cortex that adapt to the statistical properties of scenes (e.g., Olshausen & Field, 1996; Simoncelli & Olshausen, 2001), and population coding in early cortical neurons that represent perceived size could all act to remove the statistical redundancy in incoming visual information. Behavioral evidence also suggests that multiple neural mechanisms are sensitive to ensemble representations. Although here we reported an adaptation aftereffect in purely retinal coordinates, mean-size adaptation also occurs when the density and spatial frequency of stimulus displays are controlled (Corbett et al., 2012). Therefore, these representations cannot arise only from the pooling of retinal cells tuned to density or spatial frequency. Furthermore, Im and Chong (2009) demonstrated that mean size can also be represented as a function of the perceived versus physical sizes of the constituent items. Taken together, evidence that multiple mechanisms throughout the hierarchy of visual information processing may be capable of ensemble representation provides further support for our present findings of mean-size adaptation in multiple spatial coordinates.

The present finding that mean-size representations persist across saccades in different coordinate reference frames support our proposal that ensemble statistics may facilitate visual stability (Corbett & Melcher, 2013). In contrast to previous theories of visual stability that have centered around a small number of explicitly perceived objects anchoring perception across saccades (e.g., Currie, McConkie, Carlson-Radvansky, & Irwin, 2000; Deubel, Schneider, & Bridgeman, 2002; Irwin, 1992; Prime, Tsotsos, Keith, & Crawford, 2007), the relative regularity and constancy of ensemble statistics allows for more efficient representation of more abstract, global scene properties as we interact within the surrounding environment. Relying on such summary representations would allow the visual system to integrate consistent information across saccades while freeing the resources necessary to detect changes to individual salient objects. Evidence in support of this proposal is given by a combination of findings from change blindness studies in which observers often miss large and salient events right in front of them (e.g., Rensink, O’Regan, & Clark, 1997; Simons & Levin, 1997), and studies of rapid scene perception in which observers almost immediately extract the gist of scenes (e.g., Oliva & Torralba, 2001; Potter & Levy, 1969). Even though the visual system is limited to representing only a handful of objects in detail, ensemble encoding allows for an efficient means of combining information over glances, without the needs to explicitly encode the individual items comprising the sets or for any single element to be in view at a particular time. As was originally posited by Ariely (2001), ensemble statistics are likely used in combination with occasional detailed samples of salient objects and an overall interpretation of scene gist to construct a unitary and stable percept of the visual environment, despite large disruptions in the retinal image as we move our eyes and bodies. In this manner, ensemble statistics provide an efficient, predictive visual mechanism capitalizing on the statistical redundancies over successive fixations and mediating between the needs to detect salient changes and to maintain stable perception.