Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity

Baek, Jongsoo; Chong, Sang Chul

doi:10.3758/s13423-020-01718-7

Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity

Theoretical Review
Published: 03 March 2020

Volume 27, pages 602–606, (2020)
Cite this article

Download PDF

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity

Download PDF

Jongsoo Baek¹ &
Sang Chul Chong²

4239 Accesses
25 Citations
Explore all metrics

Abstract

The visual system has a limited capacity for dealing with complex and redundant information in a scene. Here, we propose that a distributed attention mode of processing is necessary for coping with this limit, together with a focused attention mode of processing. The distributed attention mode provides a statistical summary of a scene, whereas the focused attention mode provides relevant information for object recognition. In this paper, we claim that a distributed mode of processing is necessary because (1) averaging performance improves with increased set-sizes, (2) even unselected items are likely to contribute to averaging, and (3) the assumption of variable capacity limits in averaging over different set-sizes is not plausible. We then propose how the averaging process can access multiple items over the capacity limit of focused attention. The visual system can represent multiple items as population responses and read out relevant information using the two modes of attention. It can summarize population responses with a broad application of a Gaussian profile (i.e., distributed attention) and represent its peak as the mean. It can focus on relevant population responses with a narrow application of a Gaussian profile (i.e., focused attention) and select important information for object recognition. The two attention modes of processing provide a framework for incorporating two seemingly opposing fields of study (ensemble perception and selective attention) and a unified theory of a coping strategy with our limited capacity.

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

A review of unsupervised feature selection methods

Article 29 January 2019

Image Fusion Techniques: A Survey

Article 24 January 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The visual system has limited capacity (Broadbent, 1958; Luck & Vogel, 1997). One way of utilizing its limited capacity efficiently is through ensemble representation. By summarizing redundant and complex information in a scene, the visual system can rapidly assess the overall properties of the scene and extract its gist (Cavanagh, 2001; Chong & Treisman, 2003), despite the focused attention limits (Cavanagh & Alvarez, 2005; Dux & Marois, 2009; Simons & Levin, 1997). Another way of coping with our limited capacity is focused attention. It selects and processes relevant information (Carrasco, 2011; Chun, Golomb, & Turk-Browne, 2011), thereby reducing the load on the visual system.

These two separate modes of processing to deal with our capacity limit have been proposed before (Chong & Evans, 2011; Treisman, 2006) because of the following differences. First, the two modes serve different purposes: Ensemble perception is used for extracting the gist of a scene, while focused attention is used for recognizing a few relevant object(s). Second, they deal with the capacity limitation in different ways: Ensemble perception summarizes complex and redundant information, whereas focused attention filters out irrelevant information. Third, they have been suggested to use functionally different pathways: a non-selective pathway for ensemble perception versus a selective pathway for focused attention (Wolfe, Võ, Evans, & Greene, 2011).

Nevertheless, there have been some doubts about whether the ensemble processing mode, separate from focused attention, is necessary. For example, Myczek and Simons (2008) have shown that findings attributed to ensemble processing could be explained by a focused attention mode of sampling a few items. The noise-and-selection model of averaging also showed that observers’ averaging performance could well be described by averaging a few selected items within the limited capacity of focused attention (Allik, Toom, Raidvee, Averin, & Kreegipuu, 2013). Some studies have found that there are limits to the number of averages computed using the simultaneous-sequential paradigm (Attarha, Moore, & Vecera, 2014) and the pre-cueing method (Huang, 2015). Other studies have found that there are also limits to the number of items included in an average using the ideal observer analysis (Maule & Franklin, 2016) and the set-size manipulation (Ji & Pourtois, 2018).

These findings of capacity-limited ensemble processing led many researchers to investigate how many items contribute to averaging (Allik et al., 2013; Im & Halberda, 2013; Solomon, 2010; Solomon, Morgan, & Chubb, 2011; for a review, see Whitney & Yamanashi Leib, 2018). The estimated number of items varied widely depending on the types of stimuli to be averaged and other variations across studies. It was three in the case of size averaging (Allik et al., 2013; Solomon et al., 2011), but nearly 40 sizes in Lee, Baek, and Chong (2016); more than four for facial expressions (Haberman & Whitney, 2010); and up to 90 orientations (Dakin, 2001). To find a trend in these widely varied estimations, Whitney and Yamanashi Leib (2018) plotted the estimated number of items from 21 studies and found that observers average approximately the square root of the number of items in a display.

Thus, the visual system seems not to use all the available information for averaging. This could be because (1) only selected items contribute to averaging, (2) the averaging process is inaccurate or imprecise, or (3) both. Please note that inaccurate averaging could be due to noise involved with both individual items and the averaging process. Since most studies aimed to determine the capacity limit of ensemble processing, they focused on finding the maximum number of items included in averaging. This led previous studies to conclude that only some items contribute to averaging, whereas others do not contribute to averaging at all (Allik et al., 2013; Myczek & Simons, 2008; Solomon, 2010; Solomon et al., 2011). If we assume that attention selects items that can contribute to averaging (Allik et al., 2013), there is no need to assume two modes of coping with our limited capacity: ensemble perception and focused attention. Only focused attention is an important strategy of coping with our limited capacity.

However, there are reasons to believe that the visual system has an ensemble processing mode, separate from focused attention. First, as we mentioned before, ensemble perception and focused attention serve different purposes, using different methods to cope with limited capacity. Second, some studies have found an improvement in the precision of averaging with increased set-sizes (Allik et al., 2013^{Footnote 1}; Baek & Chong, 2020; Haberman & Whitney, 2010; Lee et al., 2016; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001; Robitaille & Harris, 2011; but see also Ji & Pourtois, 2018). This is presumably because the noise of individual items could be cancelled out during the averaging task (Baek & Chong, 2020; Galton, 1907; Parkes et al., 2001; Sun & Chong, 2019). On the other hand, performance usually dropped with increased set-sizes in other tasks that required focused attention (e.g., conjunction search: Treisman & Gelade, 1980; visual working memory: Luck & Vogel, 1997), if a set-size exceeded the capacity limit. Indeed, using the same display, Robitaille and Harris (2011) found that averaging performance improved with increased set-sizes, whereas search performance deteriorated. This opposite trend of the set-size effect suggests a separate mode of processing (i.e., ensemble perception).

Third, we think that the visual system is not likely to use only selected items for averaging because even unselected information can contribute to visual processing (Treisman, 1969; Wolford & Morrison, 1980). Consistent with this claim, previous studies have shown that nearly all items contributed to averaging. Chong and his colleagues (Chong, Joo, Emmanouil, & Treisman, 2008) showed that averaging performance dropped when small samples, rather than the entire display, were given. The averaging performance depended on the number of visible items (Joo, Shin, Chong, & Blake, 2009) and on the quality of to-be-averaged items (Jacoby, Kamke, & Mattingley, 2013; Sun & Chong, 2019), and improved with the number of items included (Allik et al., 2013; Baek & Chong, 2020; Haberman & Whitney, 2010; Lee et al., 2016; Parks et al., 2001; Robitaille & Harris, 2011). Alvarez and Oliva (2008) even found that all presented items had to be included in their average to achieve the observed precision of observers’ averaging performance. Finally, to-be-ignored items also contributed to averaging (Oriet & Brand, 2013). These results suggest that the visual system includes far more items for averaging than the limit of focused attention.

Finally, some averaging models (Allik et al., 2013; Dakin et al., 2005; Solomon, 2010; Solomon et al., 2011) and a capacity-estimation method (Rodriguez-Cintron, Wright, Chubb, & Sperling, 2019) assumed a variable capacity when estimating how many items contribute to averaging. In other words, the number of items contributing to averaging can vary across different set-sizes in these studies. We think that observers’ intrinsic capacity limit should not vary depending on set-sizes. Baek and Chong (2020) showed that observers’ averaging performance can be well described by a model with a fixed attention limit across different set-sizes. In this distributed attention model of averaging, each item contributes to averaging evenly, but its contribution decreases with increasing set-sizes owing to the fixed limits of capacity. This model (Baek & Chong, 2020) outperformed the noise-and-selection model with the assumption of variable capacity (Allik et al., 2013) in predicting observers’ performance of averaging. Thus, the averaging process is likely to consider all items evenly, rather than only a few selected items.

Thus, it seems that the visual system has two different modes of visual processing to cope with its limited capacity: ensemble perception and selective attention. How then does the visual system access so many items (i.e., over the limit of focused attention) for averaging, given its limited capacity? Hierarchically organized receptive fields in visual processing (Ungerleider & Bell, 2011) and population coding of individual items (Georgopoulos, Schwartz, & Kettner, 1986) may provide an answer to this question. If individual items are represented as a population code in a relevant stage of visual processing, population responses can be easily summarized as a Gaussian-shaped activity profile over them, and we can take the peak value as representing the mean (Fig. 1 bottom left). At the same time, if selection of individual object(s) is necessary for object recognition, the visual system can narrow a Gaussian profile down to relevant responses among population responses (Fig. 1 bottom right).

This idea is schematically demonstrated in Fig. 1. Incoming visual inputs can be represented as population responses that reflect their magnitude and quality depending on locations (Fig. 1 top). If the visual system requires a statistical summary, it can use the distributed attention mode to read it out from population responses (Fig. 1 bottom left). If attention is applied to a broader region in a wide Gaussian profile, responses within a region can be summarized. Likewise, the visual system can use the focused attention mode to select important responses for object recognition (Fig. 1 bottom right). If attention is applied to a specific region in a narrow Gaussian profile, selected responses will increase and thus be distinguished from others. Previous studies have also suggested the use of a population code to represent statistical summaries (Chong & Treisman, 2003; Hochstein, Pavlovskaya, Bonneh, & Soroker, 2018).

The two different readout mechanisms (distributed and focused attention modes) are different manifestations of a single attention system and can be hierarchically organized. Depending on the purpose of an ongoing task, the visual system flexibly deploys attention in two modes: distributed attention is used for extracting the gist of a scene, while focused attention is used for recognizing specific objects. Consistent with this idea, Cha and Chong (2018) found that observers averaged only relevant orientations for surface perception, suggesting utilization of a different mode of attention to access relevant information. Since attention is involved with multiple stages of visual processing (Kastner & Ungerleider, 2000), these two modes can still interact with each other. For example, an attended object contributes to averaging more than unattended objects (Choi & Chong, 2019; De Fockert & Marchant, 2008).

In conclusion, we propose that the visual system has two separate modes of efficiently managing its limited capacity. Ensemble representation provides a summary of complex and redundant information of a scene, whereas focused attention selects important information from a scene to recognize a few objects. Based on population responses of individual objects, the visual system can read out either a statistical summary for gist perception or crucial information for object recognition.

Notes

In this study, thresholds decreased after set-size 2.

References

Allik, J., Toom, M., Raidvee, A., Averin, K., & Kreegipuu, K. (2013). An almost general theory of mean size perception. Vision Research, 83, 25-39. https://doi.org/10.1016/j.visres.2013.02.018
Article PubMed Google Scholar
Alvarez, G. A., & Oliva, A. (2008). The representation of simple ensemble visual features outside the focus of attention. Psychological Science, 19(4), 392-398. https://doi.org/10.1111/j.1467-9280.2008.02098.x
Article PubMed PubMed Central Google Scholar
Attarha, M., Moore, C. M., & Vecera, S. P. (2014). Summary statistics of size: Fixed processing capacity for multiple ensembles but unlimited processing capacity for single ensembles. Journal of Experimental Psychology: Human Perception and Performance, 40(4), 1440-1449. https://doi.org/10.1037/a0036206
Article PubMed Google Scholar
Baek, J., & Chong, S. C. (2020). Distributed attention model of perceptual averaging. Attention, Perception, & Psychophysics, 82(1), 63-79. https://doi.org/10.3758/s13414-019-01827-z
Broadbent, D. E. (1958). Perception and communication. London: Pergamon Press.
Book Google Scholar
Carrasco, M. (2011). Visual attention: The past 25 years. Vision Research, 51(13), 1484-1525. https://doi.org/10.1016/j.visres.2011.04.012
Article PubMed PubMed Central Google Scholar
Cavanagh, P. (2001). Seeing the forest but not the trees. Nature Neuroscience, 4(7), 673-674. https://doi.org/10.1038/89436
Article PubMed Google Scholar
Cavanagh, P., & Alvarez, G. A. (2005). Tracking multiple targets with multifocal attention. Trends in Cognitive Sciences, 9(7), 349-354. https://doi.org/10.1016/j.tics.2005.05.009
Article PubMed Google Scholar
Cha, O., & Chong, S. C. (2018). Perceived average orientation reflects effective gist of the surface. Psychological Science, 29(3), 319-327. https://doi.org/10.1177/0956797617735533
Article PubMed Google Scholar
Choi, Y. M., & Chong, S. C. (2019). Attending to individual size modulates mean size computation. Journal of Vision, 19(10), 99b. https://doi.org/10.1167/19.10.99b
Article Google Scholar
Chong, S. C., & Evans, K. K. (2011). Distributed versus focused attention (count vs estimate). Wiley Interdisciplinary Reviews: Cognitive Science, 2(6), 634-638. https://doi.org/10.1002/wcs.136
Article PubMed Google Scholar
Chong, S. C., Joo, S. J., Emmanouil, T. A., & Treisman, A. (2008). Statistical processing: Not so implausible after all. Perception & Psychophysics, 70(7), 1327-1334. https://doi.org/10.3758/PP.70.7.1327
Article Google Scholar
Chong, S. C., & Treisman, A. (2003). Representation of statistical properties. Vision Research, 43(4), 393-404. https://doi.org/10.1016/S0042-6989(02)00596-5
Article PubMed Google Scholar
Chun, M. M., Golomb, J. D., & Turk-Browne, N. B. (2011). A taxonomy of external and internal attention. Annual Review of Psychology, 62, 73-101. https://doi.org/10.1146/annurev.psych.093008.100427
Article Google Scholar
Dakin, S. C. (2001). Information limit on the spatial integration of local orientation signals. Journal of the Optical Society of America A, 18(5), 1016-1026. https://doi.org/10.1364/JOSAA.18.001016
Article Google Scholar
Dakin, S. C., Mareschal, I., & Bex, P. J. (2005). Local and global limitations on direction integration assessed using equivalent noise analysis. Vision Research 45(24), 3027-3049. https://doi.org/10.1016/j.visres.2005.07.037
Article PubMed Google Scholar
De Fockert, J. W., & Marchant, A. P. (2008). Attention modulates set representation by statistical properties. Perception & Psychophysics, 70(5), 789-794. doi:https://doi.org/10.3758/PP.70.5.789
Article Google Scholar
Dux, P. E., & Marois, R. (2009). The attentional blink: A review of data and theory. Attention, Perception, & Psychophysics, 71(8), 1683-1700. https://doi.org/10.3758/APP.71.8.1683
Article Google Scholar
Galton, F. (1907). One vote, one value. Nature, 75(1948), 414. https://doi.org/10.1038/075414a0
Article Google Scholar
Georgopoulos, A. P., Schwartz, A. B., & Kettner, R. E. (1986). Neuronal population coding of movement direction. Science, 233(4771), 1416-1419. https://doi.org/10.1126/science.3749885
Article PubMed Google Scholar
Haberman, J., & Whitney, D. (2010). The visual system discounts emotional deviants when extracting average expression. Attention, Perception, & Psychophysics, 72(7), 1825-1838. https://doi.org/10.3758/APP.72.7.1825
Article Google Scholar
Hochstein, S., Pavlovskaya, M., Bonneh, Y. S., & Soroker, N. (2018). Comparing set summary statistics and outlier pop out in vision. Journal of Vision, 18(13), 1-13. https://doi.org/10.1167/18.13.12
Article Google Scholar
Huang, L. (2015). Statistical properties demand as much attention as object features. PLoS ONE, 10(8), e0131191. https://doi.org/10.1371/journal.pone.0131191
Article PubMed PubMed Central Google Scholar
Im, H. Y., & Halberda, J. (2013). The effects of sampling and internal noise on the representation of ensemble average size. Attention, Perception, & Psychophysics, 75(2), 278-286. https://doi.org/10.3758/s13414-012-0399-4
Article Google Scholar
Jacoby, O., Kamke, M. R., & Mattingley, J. B. (2013). Is the whole really more than the sum of its parts? Estimates of average size and orientation are susceptible to object substitution masking. Journal of Experimental Psychology: Human Perception and Performance, 39(1), 233–244. https://doi.org/10.1037/a0028762
Ji, L., & Pourtois, G. (2018). Capacity limitations to extract the mean emotion from multiple facial expressions depend on emotion variance. Vision Research, 145, 39-48. https://doi.org/10.1016/j.visres.2018.03.007
Article PubMed Google Scholar
Joo, S. J., Shin, K., Chong, S. C., & Blake, R. (2009). On the nature of the stimulus information necessary for estimating mean size of visual arrays. Journal of Vision, 9(9), 1-12.
Article Google Scholar
Kastner, S., & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience, 23, 315-341.
Article Google Scholar
Lee, H., Baek, J., & Chong, S. C. (2016). Perceived magnitude of visual displays: Area, numerosity, and mean size. Journal of Vision, 16(3), 1-11. https://doi.org/10.1167/16.3.12
Article Google Scholar
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(6657), 279-281. https://doi.org/10.1038/36846
Article PubMed Google Scholar
Maule, J., & Franklin, A. (2016). Accurate rapid averaging of multihue ensembles is due to a limited capacity subsampling mechanism. Journal of the Optical Society of America A, 33(3), A22-A29. https://doi.org/10.1364/JOSAA.33.000A22
Article Google Scholar
Myczek, K., & Simons, D. J. (2008). Better than average: Alternatives to statistical summary representations for rapid judgments of average size. Perception & Psychophysics, 70(5), 772-788. https://doi.org/10.3758/PP.70.5.772
Article Google Scholar
Oriet, C., & Brand, J. (2013). Size averaging of irrelevant stimuli cannot be prevented. Vision Research, 79, 8–16. https://doi.org/10.1016/j.visres.2012.12.004
Article PubMed Google Scholar
Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., & Morgan, M. (2001). Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience, 4(7), 739-744. https://doi.org/10.1038/89532
Article PubMed Google Scholar
Robitaille, N., & Harris, I. M. (2011). When more is less: Extraction of summary statistics benefits from larger sets. Journal of Vision, 11(12), 1-8. https://doi.org/10.1167/11.12.18
Article Google Scholar
Rodriguez-Cintron, L. M., Wright, C. E., Chubb, C., & Sperling, G. (2019). How can observers use perceived size? Centroid versus mean-size judgments. Journal of Vision, 19(3), 1-14. https://doi.org/10.1167/19.3.3
Article Google Scholar
Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1(7), 261-267. https://doi.org/10.1016/S1364-6613(97)01080-2
Article PubMed Google Scholar
Solomon, J. A. (2010). Visual discrimination of orientation statistics in crowded and uncrowded arrays. Journal of Vision, 10(14), 1-16. https://doi.org/10.1167/10.14.19
Article PubMed Google Scholar
Solomon, J. A., Morgan, M., & Chubb, C. (2011). Efficiencies for the statistics of size discrimination. Journal of Vision, 11(12), 1-11. https://doi.org/10.1167/11.12.13
Article Google Scholar
Sun, J., & Chong, S. C. (2019). Power of averaging: Noise reduction by ensemble coding of multiple faces. Journal of Experimental Psychology: General. Advance online publication. https://doi.org/10.1037/xge0000667
Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76(3), 282-299. https://doi.org/10.1037/h0027242
Article PubMed Google Scholar
Treisman, A. (2006). How the deployment of attention determines what we see. Visual Cognition, 14(4-8), 411-443. https://doi.org/10.1080/13506280500195250
Article PubMed PubMed Central Google Scholar
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97-136. https://doi.org/10.1016/0010-0285(80)90005-5
Article PubMed Google Scholar
Ungerleider, L. G., & Bell, A. H. (2011). Uncovering the visual “alphabet”: Advances in our understanding of object perception. Vision Research, 51(7), 782-799. https://doi.org/10.1016/j.visres.2010.10.002
Article PubMed Google Scholar
Whitney, D., & Yamanashi Leib, A. (2018). Ensemble perception. Annual Review of Psychology, 69, 105-129. https://doi.org/10.1146/annurev-psych-010416-044232
Article PubMed Google Scholar
Wolfe, J. M., Võ, M. L. H., Evans, K. K., & Greene, M. R. (2011). Visual search in scenes involves selective and nonselective pathways. Trends in Cognitive Sciences, 15(2), 77-84. https://doi.org/10.1016/j.tics.2010.12.001
Article PubMed PubMed Central Google Scholar
Wolford, G., & Morrison, F. (1980). Processing of unattended visual information. Memory & Cognition, 8(6), 521-527. https://doi.org/10.3758/BF03213771
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1A2B5B01070038).

For helpful comments and discussion of this manuscript, we thank Min-Suk Kang and our lab members.

Open Practices Statement

There are no data, program code, or experiment associated with the current paper.

Author information

Authors and Affiliations

Yonsei Institute of Convergence Technology, Yonsei University, Seoul, Korea
Jongsoo Baek
Graduate Program in Cognitive Science and Department of Psychology, Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Korea
Sang Chul Chong

Authors

Jongsoo Baek
View author publications
You can also search for this author in PubMed Google Scholar
Sang Chul Chong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sang Chul Chong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baek, J., Chong, S.C. Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity. Psychon Bull Rev 27, 602–606 (2020). https://doi.org/10.3758/s13423-020-01718-7

Download citation

Published: 03 March 2020
Issue Date: August 2020
DOI: https://doi.org/10.3758/s13423-020-01718-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity

Abstract

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

A review of unsupervised feature selection methods

Image Fusion Techniques: A Survey

Introduction

Notes

References

Acknowledgements

Open Practices Statement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble perception and focused attention: Two different modes of visual processing to cope with limited capacity

Abstract

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

A review of unsupervised feature selection methods

Image Fusion Techniques: A Survey

Introduction

Notes

References

Acknowledgements

Open Practices Statement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation