Feature-specificity in visual statistical summary processing

Yörük, Harun; Boduroglu, Aysecan

doi:10.3758/s13414-019-01942-x

Feature-specificity in visual statistical summary processing

40 Years of Feature Integration: Special Issue in Memory of Anne Treisman
Published: 03 January 2020

Volume 82, pages 852–864, (2020)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Feature-specificity in visual statistical summary processing

Download PDF

Harun Yörük¹ &
Aysecan Boduroglu²

1667 Accesses
11 Citations
Explore all metrics

Abstract

Visual statistical summary processing enables people to extract the average feature of a set of items rapidly and accurately. Previous studies have demonstrated independent mechanisms for summarizing low (e.g. color, orientation) and high-level (facial identity, emotion) visual information. However, no study to date has conclusively determined whether there are feature-specific summarization mechanisms for low-level features or whether there are low-level, feature agnostic summarization mechanisms. To address this issue, we asked participants to report either the average orientation or the average size from a set of lines where both features varied. Participants completed these tasks either in single-task or mixed-task conditions; in the latter, successful performance required extraction of both summaries concurrently. If there were feature-specific summarization mechanisms that could operate in parallel, then errors in mean size and mean orientation tasks should be independent, in both single and mixed task conditions. On the other hand, a central domain-general mechanism for low-level summarization would imply a correlation between errors for both features and greater error in the mixed than single task trials. In Experiment 1, we found that there was no correlation between the mean size and mean orientation errors and performance was similar across single and mixed-task conditions, suggesting that there may be independent summarization mechanisms for size and orientation features. To further test the feature-specificity account, in Experiment 2 and 3 (with mask), we manipulated the display duration to determine whether there were any differences in the summarization of earlier (orientation) vs. later (size) features. While these experiments replicated the pattern of results observed in Experiment 1, at shorter display durations, no differences emerged across features. We argue that our data is consistent with independent, multi-level feature-specific statistical summary mechanisms for low-level visual features.

The capacity limitations of orientation summary statistics

Article 26 March 2015

Statistical Summary Perception in Vision

Article 22 November 2017

Temporal organization of color and shape processing during visual search

Article 01 February 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

People can extract average information from object sets by visual statistical summary processing (also called ensemble perception) (for a review, see Alvarez, 2011; Whitney & Yamanashi Leib, 2018). Complementing the foveal high-resolution representation, ensemble perception allows viewers to experience the world in a holistic and detail-rich manner (Cohen, Dennett & Kanwisher, 2016). Previous research has shown that viewers can efficiently summarize various low-level features such as orientation (Dakin, 2001; Parkes, Lund, Angelucci, Solomon & Morgan, 2001), brightness (Bauer, 2009), size (Ariely, 2001; Yıldırım, Öğreden & Boduroglu, 2018), color (Maule, Witzel & Franklin, 2014), position (Alvarez & Oliva, 2008), as well as higher-level features like facial identity (de Fockert & Wolfenstein, 2009) and facial emotion (Haberman & Whitney, 2007). Viewers do not only extract summaries of static visual displays, they can also summarize sequentially presented visual (Albrecht & Scholl, 2010; Hubert-Wallander & Boynton, 2015) and auditory information (Albrecht, Scholl, & Chun, 2012; Piazza, Sweeny, Wessel, Silver & Whitney, 2013). In addition, viewers can represent variance (Morgan, Chubb & Solomon, 2008; Khayat & Hochstein, 2018; Semizer & Boduroglu, 2013) and numerosity information (Utochkin & Vostrikov, 2017), as statistical summary information. Despite demonstrations of statistical summarizing across various domains, interestingly, there is no consensus on the mechanisms underlying this ability (e.g. Dubé & Sekuler, 2015; Whitney & Yamanashi Leib, 2018). While Haberman and colleagues suggested separate summarization mechanisms for higher level (e.g. facial identity) vs. lower level (e.g. size) features, their findings were inconclusive regarding the mechanisms supporting summarization of lower level features (Haberman, Brady & Alvarez, 2015). Specifically these findings ruled out a central domain-general summarization mechanism; nevertheless, they did not directly address the possibility of multiple, feature-specific summarization mechanisms. Actually, three separate findings, discussed in detail below, are suggestive of multiple feature specific summarization mechanisms. One, there are individual differences data that emphasize the independence of errors for the visual and spatial summarization tasks (Uner, Mutlutürk & Boduroglu, 2014). Two, there are findings showing that sequentially presented items are weighed differently when viewers are trying to estimate mean orientation versus size information (Hubert-Wallander & Boynton, 2015). Finally, there is evidence showing that there are no costs to summarizing concurrently, separate feature distributions of spatially segregated sets (e.g. Attarha & Moore, 2015).

Across three experiments, we specifically investigated whether there are multiple, domain-specific feature-based perceptual summarization mechanisms for low-level features. To do this, we took an experimental approach complemented by correlational analyses. Past research has shown that complementing behavioral experiments with an individual differences/correlational approach can be useful in identifying dissociations between constructs or mechanisms (e.g. Awh, Barton & Vogel, 2007; for reviews see Vogel & Awh, 2008; Wilmer 2008; Tulver, 2019). In all three experiments, we asked participants to extract the mean length (size) and/or mean orientation of a set of lines and report either one of these features. We compared error across blocks in which we manipulated whether participants had to extract the summary of one feature (single-task block) or of both features (mixed-task block); we also computed the correlations between mean length and mean orientation errors. As we explain in detail below, we predicted that if there are independently functioning domain-specific summarization mechanisms for low-level features, then they should be able to work in parallel, resulting in efficient concurrent summarization of different features and independence of errors across feature domains.

Domain-general vs. domain-specific mechanisms

Haberman, Brady and Alvarez (2015) proposed two possible mechanisms for how viewers extract visual summary information. One possibility is that there is a central, domain-general summary processor in the visual system that is responsible for averaging all types of information. The alternative possibility is that there are multi-level, domain-specific mechanisms for summarizing different visual properties, possibly at different cortical levels (Whitney, Haberman & Sweeny, 2014). While the former view would require a significant correlation between the errors in summarizing different types of information, the latter mechanism would allow for independence of errors.

To determine whether there is a domain-general or domain-specific visual summarizing mechanism, Haberman, Brady and Alvarez (2015) compared summarizing performance using an individual differences approach across various visual properties. In their first experiment, they found that there was no correlation between orientation averaging and facial averaging tasks suggesting that statistical summary representations of high-level and low-level features may be governed by separate mechanisms. Subsequent experiments compared performance across various high-level (e.g. facial identity and facial emotion) and low-level (e.g. orientation and color) visual features. When viewers completed either two low-level or two high-level tasks, then there was a significant correlation in performance on the two tasks. On the other hand, when one task was from a low-level and the other one was from a high-level domain, there was no correlation between the two tasks. Thus, they concluded that statistical summary processing is not a uniform process, and that there are at least two separate and independent domain-specific summarizing mechanisms, one for low-level and another one for high-level visual information, specifically.

These findings were in line with claims made by Whitney, Haberman and Sweeny (2014) who argued that there might be multilevel processing mechanisms in the ventral and dorsal pathways for summarizing information rather than a single cortical area responsible for summarizing all types of visual properties. They argued that orientation, color, and brightness features might be summarized in the early cortical stages, while motion, position information, size and shape features might be summarized by separate mechanisms in the dorsal and ventral pathways, respectively. In addition, they argued that complex higher-level face and biological motion summaries might be processed later, after the convergence of ventral and dorsal pathways. Hubert-Wallander & Boynton (2015) provided some empirical support for these claims based on their comparison of how observers summarized sequentially presented sets. In separate experiments, observers reported the mean location or mean size of dots, the average motion direction of moving dot fields, or the mean facial expression. The differences in the temporal summarization profiles (i.e. the differential weighing of first as opposed to last items in the sequence in the reported mean summary) lead them to conclude that there may be distinct mechanisms involved in the summarization of location-based versus non-location based features.

Recent findings from our lab also supports the possibility that there may be separate mechanisms to summarize spatial (location-based) and visual information (Uner, Mutlutürk & Boduroglu, 2014) ^{Footnote 1}. In two separate tasks, we asked participants to report the mean position of a set of colored squares (i.e. centroid) and the mean length of a set of randomly oriented lines. In both tasks, participants adjusted a probe to determine their response (i.e. drag to the centroid or extend the probe’s length), allowing us to compute error in a continuous scale. We found that there was no relationship between error on the spatial centroid task and the visual length task (r=.16, p=.16).

In sum, the empirical research to date, has provided some evidence that there may be separate mechanisms responsible for the summarization of lower and higher-level complex visual features; some preliminary evidence suggests that this separation might extend to visual and spatial domains. However, except for the Hubert-Wallander & Boynton (2015) study, in the other studies, viewers summarized certain dimensions across separate tasks/blocks (e.g. Haberman et al., 2015; Uner et al., 2014). Then, researchers computed the correlation between the errors for different summarization tasks. For instance, Haberman et al. (2015), presented participants with four features/items in each display and asked them to summarize either higher level (e.g. facial identity and emotional expression), lower level (e.g. Gabor orientation and color of dots) or mixed (e.g. facial identity and Gabor orientation) feature pairs. While they found clear evidence for independence between higher level and lower level errors in summarization, they also found moderate significant correlations (ranging between .54-.73) between errors when tasks required the summarizing of two lower level features (for details see Table 1 in Haberman et al., 2015). It was not clear whether the correlation between lower level features emerged due to there being a low-level, domain general pooling mechanism or was due to an artifact of shared perceptual noise.

Table 1. Bayesian ANOVA for size averaging errors

Full size table

To address this issue, one possibility may be to utilize designs in which participants would need to summarize concurrently two low-level features from the same display as opposed to designs where summarization tasks are blocked by feature. Specifically, participants could be notified about the to-be-reported dimension after the offset of the study display via a post-cue. If there was a significant correlation across errors for different features under such a design, this could mean a number of possible things. A negative correlation would be suggestive of a trade-off between features, with summarization of one of the features possibly being prioritized over the other one; a positive correlation would suggest that the summarization relies on or is impacted by some “shared” resource/noise. On the other hand, if under such a design, the errors were found to be independent (i.e. not correlated), this would suggest that the two summarization mechanisms are likely to be operating in parallel. Recent evidence demonstrating there are no costs to summarizing concurrently, separate feature dimensions of spatially segregated sets, suggest that there may indeed be feature-specific summarization mechanisms (e.g. for size and orientation: Attarha & Moore, 2015 for color and numerosity: Poltoratski & Xu, 2013; for size and numerosity: Utochkin & Vostrikov, 2017, but also see Emmanouil & Treisman, 2008).

In the present study, we specifically took the approach suggested above. To further test the domain-specificity argument, we asked viewers to summarize sets that consisted of heterogeneously sized lines presented at different orientations; depending on the condition, viewers either summarized length or orientation information. We chose these two dimensions for two reasons. One, there is evidence to suggest that both size and orientation may be summarized by mid-level texture processing mechanisms (e.g. Cain & Cain, 2018; Parkes et al., 2001); any evidence of independence of errors for the two summarization tasks would be a strong test for the domain-specific hypothesis. Two, using lines and asking about either mean size or mean orientation allowed us to use a single set and to equate physical properties of the presented stimuli for both summarization tasks and eliminate the need to present separate sets for each condition in a spatially segregated or a sequential fashion (e.g. Attarha & Moore, 2015). In this regard, the approach taken in the present study is different from earlier research because the to-be-summarized features belonged to the same items in a given set and participants reported the summary of either one of the features via a post-cue (mixed-task blocks). Successful performance on these mixed-task trials would require viewers to attend concurrently to both feature dimensions. If there is a domain-general summarization mechanism, then there may be costs to concurrent summarization, and consequently size and orientation errors may be correlated. On the other hand, if there are feature-specific summarization mechanisms, then summarization errors for these two low-level features should be independent.

In this study, in addition to the mixed task trials, we also included single-trial blocks, to investigate further the possible mechanisms of summarization. In these blocks, participants always reported either the mean size or the mean orientation of the viewed line set. Inclusion of the single-trial blocks enabled us to compare the error between single and mixed task conditions to test the domain specificity account. If there is a domain-general mechanism, then greater interference during the concurrent summarization in the mixed-task trials compared to the single-task trials is likely, possibly resulting in larger error in mixed as opposed to single task conditions. On the other hand, if there are independent mechanisms of summarization for different low-level features, errors may be similar across single and mixed task conditions for each feature.

Experiment 1

Method

Participants

We determined the necessary sample size for this and subsequent experiments by reviewing the sample sizes of recent experimental studies investigating the attentional modulation of ensemble perception (e.g. Attarha & Moore, 2015; Poltaratski & Xu, 2013). These studies had relatively small samples (approximating 10 participants per experiments), with similar number of trials to our experiments. We also reviewed recent studies that complement their behavioral findings with correlational analyses (e.g. Awh, Barton & Vogel, 2007; Scolari, Vogel & Awh, 2008; Vogel & Machizawa, 2004); in these studies, sample sizes were 20, 11, and 12, respectively. Based on these observations, in the first two experiments we collected data from 25 and 26 participants; in the third experiment, we used a slightly larger sample (40). Our post-hoc power analyses with GPower, revealed only a minimal increase when the sample size was increased to 100 (from .12 to .26). In addition, given that we were interested in showing support for a theoretically meaningful null hypothesis, we wanted to make sure that our experiments were not underpowered and our findings were not due to a Type II error. Therefore, we chose to also run Bayesian analyses to determine the support for the null hypotheses.

Twenty-five Bogazici University undergraduate students participated in the experiment, in return for course credit. We excluded data from 4 participants. Two of them had size and orientation averaging errors that were more than 3 standard deviations away from the group mean, another one because s/he was color blind and a final participant because s/he randomly responded throughout the experiment. Therefore, the following analyses were conducted on the data from 21 participants.

Materials and stimuli

We programmed the experiment in E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA) and we ran it on a 17” monitor with screen resolution set to 1024x768 pixels (32x24 cm). Participants were sitting approximately 57 centimeters away from the computer screen. From that viewing distance, 1 cm was equal to 32 pixels and 1° visual angle.

Each trial (see Fig. 1) began with a green fixation cross, presented for 1500ms, which turned red for 500ms to indicate the beginning of a trial. We instructed the participants to fixate on the cross when it turned red. The study display consisted of 12 uniquely sized, randomly oriented lines and was presented for 200ms. Then, we presented the response screen. Participants had to adjust the length of the response probe for size averaging trials or rotate the orientation of the response probe for orientation averaging trials by using the left button of the mouse. When they finalized their response, they completed the trial by clicking to the right button of the mouse. In single-task blocks, we identified the relevant feature at the beginning of the block in the task instructions. In mixed-trial blocks, with the onset of the response probe, participants heard an auditory verbal cue from their headphones indicating the feature dimension that was supposed to be averaged (either orientation or size).

In each display, we presented 12 white lines on a gray background. To determine the orientation of the lines, we pseudo-randomly generated distributions within 60 and 90-degree intervals. For example, from a 60-degree interval, ranging between 25 and 85 degrees, we determined 12 different orientations.^{Footnote 2} The length of each line was determined randomly from a range of 24 and 152 pixels (.75° – 4.75° of visual angle) with the following constraints. No two lines were of the same length or orientation as the set mean and there were no repetitions of length or angular orientation within sets. We positioned the lines on a 5x4 invisible grid (760x608 pixels), with 3 locations randomly chosen from each row. We shifted the middle two rows of the grid by ±16 pixels in order to prevent the endpoints of the lines touching the center of the display.

On the response screen, we presented viewers with either a blue or green response probe to distinguish the two types of trials, size and orientation averaging. For the size adjustment task, the length of the response probe was determined by adding/subtracting a random value between 21 and 27 pixels, to that of the half of the mean length of the set of lines. For the orientation rotation task, we determined the angle of the probe line by either adding to or subtracting a random number between 21 and 27 degrees from the mean angle of the set of lines. In mixed-task blocks, we presented an auditory verbal cue along with the color-coded response probe (green for orientation, blue for size). In both types of trials, the red fixation cross, indicated the beginning of a trial.

Procedure

Before the actual experiment trials, we instructed the participants about the experiment procedure and they first completed a training session. To illustrate what the average size of a set of lines looked like we presented participants with example displays, which consisted of 2, 3, 4, 6, 8, 10, and 12 lines, sequentially. On the subsequent display following the example display, there was a red line indicating the mean length of the lines in the previous set. After that, participants completed 10 practice trials for the size-averaging task. During practice trials, we presented the displays for 1000 ms and we provided visual feedback by showing the correct response line in red. We followed the same procedure for the orientation-averaging task. After these training trials, participants also received a practice block with 16 trials, which was identical to the actual experimental trials. There were 360 trials in the actual experimental phase presented in 4 even blocks. Half of the trials were single-task trials evenly split across size averaging and orientation averaging blocks (90 trials per feature, per block). The remaining 180 trials were mixed-task trials in which we randomly presented orientation and size-averaging tasks across two blocks. We counterbalanced the order of tasks and blocks across participants.

Results

We conducted the pairwise t-test and Pearson’s correlation analyses using SPSS Version 25.0 (IBM Corp, 2017). For the Bayesian Factor analyses, we used the open source statistical program JASP (JASP Team, 2018). For each participant, for each task we calculated the mean errors on the size and orientation tasks in the single and mixed blocks. To determine whether there was an attentional cost of attending to both featural dimensions, for each summarization task, we first compared performance across single and mixed blocks. Pairwise t-tests showed that, there was no difference in errors in single (M = 19.61, SD = 7.76) versus mixed size trials (M = 18.51, SD = 6.25), (t(20) = .70, p = .14, Cohen’s d = .15). Results from Bayesian t-tests provided further support for this null finding; we found moderate evidence in favor of the null hypothesis (BF₀₁ = 3.52). There was also no difference between averaging errors for single (M = 15.22, SD = 3.73) and mixed orientation averaging tasks (M = 14.88, SD = 2.83), (t(20) = .808, p = .43, Cohen’s d = 0.10). Bayesian analyses showed a moderate evidence in favor the null model (BF₀₁ = 3.28), suggesting that observing no difference between the single and mixed orientation conditions were more likely than observing a difference (see Fig. 2).

To test for the independence of size and orientation averaging, we computed the correlations between the errors on each of these tasks, separately for the single and mixed-task blocks. There was no significant correlation between size and orientation averaging errors in neither the single nor the mixed blocks (r = -.08, p = .75 and r = .-11, p = .64, respectively). Also, according to Bayesian Correlation Pairs analyses, there was a moderate evidence in favor of the Null hypothesis (BF₀₁ = 3.52, BF₀₁ = 3.35, for single and mixed task conditions, respectively), suggesting that finding independence between size and orientation errors was more likely than finding a relationship between them, for both single and mixed task conditions (see Fig. 3).

Discussion

There were two main findings from Experiment 1. One, there was no significant correlation between size and orientation errors in neither the single nor the mixed task conditions. Two, Experiment 1 also revealed similar levels of errors for both tasks across both single and mixed-task conditions suggesting that concurrent averaging of multiple summaries may be somewhat automatic. These findings, suggest that there may be independent mechanisms that support ensemble perception of these low-level features. Our findings demonstrating that participants can extract mean size and mean orientation information similarly across single task and mixed task trials is consistent with evidence showing that participants can extract separate feature summaries of spatially segregated sets in parallel (e.g. Attarha & Moore, 2015; Utochkin & Vostrikov, 2017, but also see Emmanouil & Treisman, 2008). However, we extend these findings by showing that viewers can summarize concurrently features that covary within a set (size and orientation of lines) without a cost. Utochkin & Vostrikov (2017) similarly reported that two features (size and numerosity) was averaged concurrently from a single set, however, in their study, within the display, there were two spatially intermixed subsets denoted by color. Thus, viewers could have globally attended to the central region and utilize color-based grouping cues to extract the two summaries of the two subsets concurrently. In that regard, our study differs because the two features we asked participants to summarize belonged to the same items and the fact that viewers could summarize these featural distributions independently suggests that the summarization most likely happens before features are bound in object-file type representations (Treisman, 1996). The fact that line length variability did not influence errors in orientation summary further strengthens this possibility. Specifically, we categorized displays as having low, medium and high variance in line length and showed that orientation averaging errors did not vary as a function of variability in line length (F (2, 180)=2.82, p=.06, \( {\eta}_p^2 \) = .03). These results suggesting there are independent summarization mechanisms for low-level features operating prior to object-file formation may be driven by the visual system’s tendency to represent the incoming information as “loose bundles of features” as opposed to object files. Specifically, when the visual system encounters objects, the pattern of activations across a population of neurons may not be selective enough to ensure independent coding of each object. This may be particularly true when the objects share visual features and are presented simultaneously, as in our experimental displays (Treisman, 1999).

Experiment 2

Experiment 1 demonstrated that viewers could summarize independently size and orientation distributions. This finding is in line with the idea of feature-specific and domain-specific mechanisms for ensemble perception. However, the relatively long display durations in Experiment 1 (200ms, unmasked) may have allowed viewers to utilize different mechanisms across the two conditions, possibly a feature-based mechanism for summarizing orientation and a separate system operating at the item level. It is known that orientation is processed early in the visual processing hierarchy, and size is processed further along the ventral stream (Whitney, Haberman & Sweeny, 2014), with there being dedicated groups of receptors to process different orientations but not size (Myczek & Simons, 2008). Instead, there may be a separate, focused attention-based, pooling mechanism that support the summarizing of size information as opposed to more feature-based mechanisms supporting summarization of orientation, motion and spatial frequency information (Simons & Myczek, 2008; but also see Ariely, 2008, Chong, Joo, Emmanouil & Treisman, 2008). If this is indeed the case, when viewers summarize sets having only viewed them for brief durations, pre-attentive processes (or processes driven by global attention) may only support the summarization of orientation information but not size information. Consequently, for briefly presented displays, errors for mean size trials may increase while errors for orientation summary trials may be immune to this manipulation. Furthermore, in mixed-task blocks, viewers may unknowingly prioritize orientation information during shorter presentation trials, at the cost of errors in size averaging. To test these possibilities, in Experiment 2, we examined the temporal independence of size and orientation summarization processes, by presenting displays for 50, 100 or 200ms.