Abstract
Although perceptual grouping has been widely studied, its mechanisms remain poorly understood. We propose a neural model of grouping that, through top-down control of its circuits, implements a grouping strategy involving both a connection strategy (which elements to connect) and a selection strategy (that defines spatiotemporal properties of a selection signal to segment target elements and facilitate identification). We apply the model to a letter discrimination task that investigated relationships among uniform connectedness and the grouping principles of proximity and shape similarity. Participants reported whether small circles formed a global letter E or H, and these circles could be connected by a line or be embedded in a matrix of squares. In the model, a good grouping strategy for this task consists of a connection strategy that connects circles but not squares for all conditions and a selection strategy that uses a selection signal of varying size, depending on whether squares were present. Consistent with empirical results, which were verified in two replication studies, model performance is worse with distractor squares, and line connectors improve performance only in the condition with squares. Rather than relying on abstract grouping principles, we show how the empirical results can be explained in terms of observers implementing a task-dependent grouping strategy that promotes overall performance.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Spatially disconnected visual elements can appear to form a perceptual group, and much effort over the last century investigated how groups form. For example, Köhler (1929) proposed that any observer who “looks passively” at Fig. 1 will see two groups of patches. The example was used to argue against the possibility that observers group elements when they have previous experience of these elements behaving as a unit, e.g., we perceive a pencil as a unit because it behaves as a single unit when we use it. Since an observer looking at the image has not seen the left three patches behaving as a unit, Köhler argued that grouping was not learnt from experience. As a member of the Gestalt School, Köhler proposed that these groupings can be accounted for by some rule, e.g., grouping by proximity, that generalizes to other cases.
Köhler’s argument includes the following assumption: for a given stimulus, perceptual groups are, in some sense, primitives. In other words, it is assumed that there is one way to group the six patches, and everyone will perceive them as forming a group of three patches on the left and another group of three patches on the right. A similar assumption is relied on for the demonstrative examples throughout Wertheimer’s (1923/1950) seminal paper on perceptual organization (although Wertheimer did identify some possible roles of experience; for discussion see Wagemans, 2018), and it continues to be presupposed in modern experimental work on grouping (Palmer and Beck, 2007; Trick & Enns, 1997; Vickery, 2008) and in examples of grouping rules given in textbooks (e.g., Palmer, 1999).
However, Köhler’s argument involves a subtle qualification: everyone who looks passively at Fig. 1 perceives two groups of patches. This statement does not rule out the possibility that perceived groups can be task-dependent, i.e., the way in which an observer groups stimulus elements may depend on the particular task at hand. Other Gestalt psychologists were less flexible about the role of experience in perceptual grouping, which was vigorously debated (Braly, 1933; Gottschaldt, 1926/1950; Koffka, 1935/1963; Moore, 1930; Wertheimer, 1923/1950). For example, experiments by Gottschaldt led him to conclude that experience has a negligible effect on perceived organization, e.g., repeatedly seeing a figure has little impact on whether an observer reports seeing this figure when it is embedded in a larger figure. His conclusion continued to be contested through the 1950s (for overviews, see Bevan, 1961; Bevan & Zener, 1952). This issue of whether past experience influences perceived groupings continues to be experimentally investigated (e.g., Kimchi & Hadad, 2002; Vickery & Jiang, 2009; Zemel et al.,, 2002; for a review see Peterson & Kimchi, 2013).
Our view is motivated by what is readily achieved in a computational model of visual perception. We recently (Kon & Francis, 2022) proposed a flexible model of visual perception that uses top-down control to modulate neural activity corresponding to perceptual groups. Here, we show that such top-down control can be applied in a way that helps observers segment and identify elements in a visual scene. Thus, we propose that observers develop task-dependent grouping strategies to promote performance. Importantly, this view suggests that perceptual groups will vary across tasks and stimuli.
This paper describes the first application of the new model to empirical data about grouping. We start by describing data both from Han et al., (1999a) and, given the small sample size of the original study, from two replication experiments. We then briefly describe the model and explain how it can be applied to the experimental task of Han et al., (1999a). Simulation results show that the model, with an appropriate grouping strategy, does a good job capturing the observed pattern of data. Additionally, we show that alternative grouping strategies do not perform as well, which helps explain why observers would apply the particular grouping strategy that we identify.
Han et al.’s (1999a) Aims and Task
Han et al., (1999a) investigated relationships among uniform connectedness and the grouping principles of proximity and shape similarity. Uniform connectedness is a characteristic of perceptual organization introduced by Palmer and Rock (1994) in which a connected region of uniform visual properties tends to be organized as a single perceptual unit. Han et al. use the example of circles that are joined by a line the same color as the circles’ borders (see Fig. 2, stimulus B). Han et al., (1999a) wanted to investigate the relative order of visual processing for uniform connectedness compared to classic Gestalt principles. Their Experiment 1 measured response times in a letter discrimination task where each letter was made up of small spatially separated circles (stimuli are shown in Fig. 2, top row). Han et al., (1999a) explored three main hypotheses by comparing mean reaction times to different stimulus conditions. First, due to previous work indicating that grouping by proximity occurs earlier than grouping by shape similarity (Ben-Av and Sagi, 1995; Han et al., 1999b), they expected (1) response times for stimuli in which target circles were grouped solely by proximity (Fig. 2, stimulus A) to be faster than those where target circles were grouped solely by shape similarity (Fig. 2, stimulus C). Based on Palmer and Rock’s (1994) theoretically-motivated claim that uniform connectedness occurs prior to classic grouping principles, they also expected (2) mean response times for stimuli where target circles were connected by lines and grouped by proximity (Fig. 2, stimulus B) to be faster than those for stimuli in which target circles must be grouped solely by proximity (Fig. 2, stimulus A) and (3) response times for stimuli where target circles were connected by lines and grouped by shape similarity (Fig. 2, stimulus D) to be faster than those for stimuli in which target circles must be grouped solely by shape similarity (Fig. 2, stimulus C).
The leftmost column of Fig. 3 shows the mean response time (top) and error rate (bottom) for each stimulus condition, averaged across the n = 17 observers. As expected, grouping by proximity was much faster than grouping by shape (compare conditions A and C). Likewise, when grouping was based on shape, uniform connectedness led to much faster responses and fewer errors (compare conditions C and D). However, contrary to expectations, uniform connectedness did not have much impact when grouping could be done by proximity (compare conditions A and B). The last result was somewhat surprising because Palmer and Rock’s theory indicates that uniform connectedness should dominate the grouping principles. We will return to this finding in the General Discussion section below.
These conclusions were supported by three additional experiments in Han et al., (1999a) that varied the stimuli and task. Before attempting to model the experimental results, we wanted to be sure that the empirical findings were solid. We were a bit worried about the small sample sizes used by the original study, so we replicated key parts of Experiment 1 in Han et al., (1999a). As it turns out, our worries were largely unfounded; however, since future scientists may have similar concerns, we share our experimental details and findings.
Experiment 1: Direct Replication of Han et al. (1999a)
Our first experiment is a replication of Experiment 1 in Han et al., (1999a). One deviation is switching the contrast of the stimuli and background. The original experiment used black line figures on a white background, but for the model simulations it was better to use white line figures on a black background, so we used the same stimuli for the experiment. An additional change is the removal of variation in ISI between stimulus offset and mask onset. Han et al., (1999a) did not find an effect of ISI, so to simplify the experiment we only used ISI = 80 ms. A side effect of this choice is that the warning tone at the start of a trial in the original experiment no longer seemed to be necessary and so was omitted. Finally, although the original experiment counterbalanced responses for two unnamed keyboard keys, we used the E-key and H-key to register responses and did not counterbalance them between participants.
Method
Participants.
To identify an appropriate sample size, we first determined that we wanted to measure each mean response time with a precision that would have a standard error of 10 milliseconds. Response time standard deviations across observers for identification tend to be around 100 milliseconds (for the findings in Han et al., 1999, the standard deviations are around 106 milliseconds), so we planned for a sample size of around 100 observers because this would give us a standard error of \(100/\sqrt {100} = 10\). Due to excess sign-ups for the experiment, we ended up with a total of 125 observers. Participants were naïve undergraduates from Purdue University who took part in exchange for course credit. All participants provided informed consent in accordance with Purdue’s Institutional Review Board.
Apparatus.
This study was conducted online using bespoke JavaScript and HTML code (Francis and Neath, 2015). Participants used a computer to take the experiment, and those who attempted to use a tablet or phone were prompted to switch to a laptop or desktop computer. Participants used the E-key and H-key on their keyboard to register responses for E and H target letters, respectively. We provide a local version of the code for the experiment at the Open Science Framework (https://osf.io/zrkue/).
Stimuli.
The stimuli used in the experiment are shown in Fig. 2, top row. Since the study was conducted online, a participant’s distance from the monitor and the monitor’s size are unknown, and thus the visual angle subtended by the stimuli is also unknown. To provide some sense of the size of stimulus elements, a hypothetical participant might use a laptop on a desk with the monitor 18 inches away. With a 13.3-inch (diagonal) monitor with a width of around 11.25 inches and resolution of 2560x1600 (making pixel density 227 pixels per inch), a circle diameter or square side would subtend approximately 0.5∘ of visual angle. The width of the middle row of circles, i.e., the distance from the leftmost circle to the rightmost circle in the center row, would subtend approximately 5.23∘. The height of the leftmost column of circles would subtend approximately 6.72∘.
Procedure.
After reading instructions that explained the task, encouraged them to look at the fixation cross for the duration of each trial, and prompted them to make responses as quickly and as accurately as possible, participants scrolled to the bottom of the webpage, where the experiment took place. Figure 4 schematizes what occurred on a trial. A participant initiated the first trial by pressing the B-key on their keyboard. 2000 milliseconds after this key was pressed, the stimulus appeared and was shown for 160 ms. This was followed by a fixed inter-stimulus interval of 80 ms and then a random dot mask for 80 ms. After mask offset the screen was blank except for a fixation cross that remained onscreen for the full duration of each trial. The participant responded by pressing the E-key if the target letter was an E or the H-key if the target letter was an H. Participants were instructed to rest their left index finger on the E-key, right index finger on the H-key, and thumb on the B-key for the duration of the experiment. After completing 24 practice trials in which all conditions were experienced, the participant completed two blocks of experimental trials, with each block having 72 trials. Over the course of the experiment, all stimulus conditions were randomly interleaved. The total number of experimental trials was the same number of experimental trials that participants in the original experiment completed for each ISI condition. If the response was incorrect, too fast (if the response time was less than 100 ms), or too slow (if the response time was greater than 2000 ms), participants were given feedback at the end of a trial. After each trial, participants were prompted to press the B-key to initiate the next trial. Trials with reaction times lower than 100 ms or greater than 2000 ms were omitted from analysis. Reaction times were only based on correct trials.
Results and Discussion
Mean response times for correct trials and error rates for Experiment 1 are shown in the second column of Fig. 3 and in Table 1. We ran an ANOVA model in R (version 4.0.2; R Core Team, 2020) using the ez package (Lawrence, 2016). (The data and R script for the analysis can be found on the OSF at https://osf.io/zrkue/)
A repeated measures ANOVA showed condition type had a significant effect on response time, F(2.32,276.02) = 277.05, p < .001 (since Mauchly’s test indicated a violation of sphericity, ε = .77, Huyn-Feldt corrected results are reported). Planned contrasts indicated that response times were significantly lower for the condition with similarity grouping and uniform connectedness (condition D) compared to the condition with similarity grouping only (condition C), t(372) = 11.27, p < .001, and for the condition with proximity grouping only (condition A) compared to the condition with shape grouping only (condition C), t(372) = − 24.28, p < .001. There was no significant difference in response times for the condition with proximity grouping only (condition A) compared to the condition with proximity grouping and uniform connectedness (condition B), t(372) = 0.23, p = .815.
Following Han et al., the error rate for each observer was transformed with an arcsine square-root function prior to analysis. Results for error rates mirrored those for response times. A repeated measures ANOVA with a Huynh-Feldt correction (ε = .96) showed condition type had a significant effect on error rates, F(2.88,342.51) = 126.61, p < .001. Planned contrasts indicated that error rates were significantly lower for the condition with similarity grouping and uniform connectedness (condition D) compared to the condition with similarity grouping only (condition C), t(372) = 11.21, p < .001, and for the condition with proximity grouping only (condition A) compared to the condition with shape grouping only (condition C), t(372) = − 17.47, p < .001. There was no significant difference in error rates for the condition with proximity grouping only (condition A) compared to the condition with proximity grouping and uniform connectedness (condition B), t(372) = − 1.25, p = .214.
The results of this experiment largely replicate the pattern of results in Han et al., (1999a). Response times are longer for our experiment, which probably reflects differences in equipment, context, and training. Our results also do not show as big a difference between conditions C and D as in the original findings, but the pattern is still convincing.
Experiment 2: Slightly Modified Stimuli
Our Experiment 1 had stimuli with the same spacing as Han et al., where the vertical spacing of shapes is greater than the horizontal. As we started to run model simulations with these stimuli, we realized that the difference between the horizontal and vertical spacing might quantitatively affect the required top-down connection control settings in the model. The model could handle such stimuli, but we thought it would be prudent to empirically check whether the spacing difference had a measurable impact on behavior (this would guide model development). Thus, our second experiment was identical to the first except the vertical spacing was equal to the horizontal spacing (see Fig. 2, bottom row). We also felt that having equidistant elements was a better way to measure the relative impact of grouping principles.
Method
Participants.
We again planned to gather data so that we measured mean response time with a standard error of around 10 milliseconds. Experiment 1 above found standard deviations across participants around 113 milliseconds, so we again planned for approximately 100 participants. Due to excess sign ups, we ended up with 120 naïve undergraduates from Purdue University taking part in the experiment in exchange for course credit. All participants provided informed consent in accordance with Purdue’s Institutional Review Board.
Apparatus, Stimuli, and Procedure.
All aspects were identical to those of Experiment 1 except for the vertical spacing of elements, which was the same as the horizontal spacing of elements in Experiment 1. Using the same hypothetical observer as for Experiment 1, the height of the leftmost column of circles would subtend approximately 6.1∘.
Results and Discussion
Mean response times and error rates for Experiment 2 are shown in the third column of Fig. 3 and in Table 2, and they largely match the results from Experiment 1 and the results from Han et al., (1999a).
A repeated measures ANOVA with a Huynh-Feldt correction (ε = .75) showed condition type had a significant effect on response time, F(2.26,269.29) = 321.41, p < .001. Planned contrasts indicated that response times were significantly faster for the condition with similarity grouping and uniform connectedness (condition D) compared to the condition with similarity grouping only (condition C), t(357) = 12.44, p < .001, and for the condition with proximity grouping only (condition A) compared to the condition with shape grouping only (condition C), t(357) = − 26.13, p < .001. There was no significant difference in response times for the condition with proximity grouping only (condition A) compared to the condition with proximity grouping and uniform connectedness (condition B), t(357) = 0.41, p = .685.
Similar to Experiment 1, condition type had a significant effect on transformed error rates, F(3,357) = 107.08, p < .001. Planned contrasts indicated that error rates were significantly lower for the condition with similarity grouping and uniform connectedness (condition D) compared to the condition with similarity grouping only (condition C), t(357) = 7.60, p < .001, and for the condition with proximity grouping only (condition A) compared to the condition with shape grouping only (condition C), t(357) = − 15.89, p < .001. There was no significant difference in error rates for the condition with proximity grouping only (condition A) compared to the condition with proximity grouping and uniform connectedness (condition B), t(357) = − 1.30, p = .194.
Comparing the results of the two replication experiments indicates that having stimulus elements equidistant or not hardly affects the pattern of mean response times or error rates. One small difference is that the mean response times between these experiments differ by approximately 20 milliseconds for each condition (faster for the equidistant stimuli of Experiment 2).
Model Simulations
Simulated Grouping Strategy
We propose that human observers promote performance on a particular task and stimulus set by implementing a grouping strategy, which consists of separate connection and selection strategies that are chosen in tandem. The model is a version of the LAMINART cortical neural network (Grossberg and Raizada, 2000; Raizada & Grossberg, 2001). The version of the model used here includes the connection circuit introduced in Kon and Francis (2022), which we use to implement connection strategies, and the segmentation circuit introduced by Francis et al., (2017), which we use to implement selection strategies.
Connections in the Model
A connection strategy concerns which stimulus elements to connect. According to the model, these connections occur among orientation-sensitive complex cells in cortical area V2 and spread between active cells responding to stimulus edges. For example, the second and third columns in Fig. 5 show model V2 activity. As in prior work (Francis et al., 2017; Kon and Francis, 2022), this activity is color coded where red indicates that the vertically-tuned cell at a pixel is active, green horizontal, and blue diagonal. Notice that some of the V2 activity in Fig. 5 corresponds to oriented edges in the stimulus but other activity—the connections—is generated by the connection circuits (Grossberg & Mingolla, 1985a).
The spread of connections can be altered by tuning the parameters of three circuits: (a) the spread controller circuit enables the spread of connections from detected edges, (b) the long controller circuit reduces the spread of connections from long edges, and (c) the short controller circuit reduces the spread of connections from short edges (for details about the connection circuits, see Kon & Francis, 2022). If these connection circuits are off, then no connections form. For example, if the stimulus shown in Fig. 5 is input to the model for 500 milliseconds and if the connection circuits are off, then the V2 activity only represents stimulus edges (Fig. 5, image labelled “No connections”).
By top-down control of connection parameters, the connection circuits can be tuned to produce various connection patterns (see the Appendix for details about the connection circuit parameter values used for the simulations reported here). For example, they can be tuned so that connections form only between nearby circles (Fig. 5, image labelled “Circles only”), between nearby circles and between nearby squares but not between circles and squares (Fig. 5, image labelled “Circles and squares”), or between all shapes (Fig. 5, image labelled “All connected”). These connected elements may be regarded as forming groups, so the model can group this set of stimulus elements in several ways via connections.
Building on claims made by Francis et al., (2017) and Kon and Francis (2022), we assume that observers use top-down control to tune the connection parameters in order to promote performance on a given task and stimulus set so that, e.g., target and/or distractor elements link together. We refer to this type of tuning as a connection strategy. To see how connections allow for faster identification of targets, we first require a sketch of how the segmentation circuit functions.
Selection in the Model
The model uses segmentation layers and a selection signal to perform a kind of figure-ground processing. Selected contours are transferred from a default image plane (called Segmentation Layer 0) to a separate image plane (Segmentation Layer 1) (Francis et al., 2017). Fig. 6 demonstrates this segmentation process as it unfolds in time. The bottom row shows how the contours corresponding to the letter formed by the connected circles are transferred to Segmentation Layer 1.
In Fig. 6, a single selection signal is represented by a gray circle in Segmentation Layer 0, and it acts like an attentional spotlight. The selection signal has two key traits. First, contours “selected” by the signal, i.e., contours that are at the same location as the signal, are transferred to Segmentation Layer 1, as can be seen at time 250-300 ms in Fig. 6. Second, the signal itself spreads across contours that are connected to any selected contour. Thus, even though the selection signal in Fig. 6 is placed on a single circle, the contours of that circle are connected to its neighboring circles, so the signal spreads to those neighbors as well. The signal keeps spreading among connected contours so that, given enough time, it selects and segments all the elements that form the global shape.
If only the target contours are segmented to the separate image plane, then identification is improved because the surrounding distractors do not interfere with identification. A selection strategy concerns the number, placement, size, and timing of a selection signal, and we propose that these properties are subject to top-down control. As is obvious from the example in Fig. 6, what gets selected depends very much on what elements form connections. Thus, it is the combination of a connection strategy and a selection strategy that defines a grouping strategy. A good grouping strategy promotes good identification, so we describe a plausible identification mechanism before describing a grouping strategy that is easy to implement and produces results that match the experimental data of Han et al., (1999a).
Model Evidence, Template Matching, and Stopping Rule
The observer’s task was to identify the global target letter, which could be modeled with a number of different mechanisms. We chose to use a pair of templates to calculate a model evidence score and to apply a stopping rule that gives an indication of confidence in the identity of the letter. After each 50 milliseconds of model time, a model evidence score was calculated from activity in Segmentation Layer 1. Neural responses within the areas covered by H and E templates, shown in Fig. 7, were summed to produce VH and VE, respectively, and activity across the entire layer was summed to produce \(\sum V\). Model evidence is
Since the proportion of activity corresponding to an H is subtracted from that corresponding to an E, a model evidence score greater than zero is considered to be evidence for the letter being an E, while a score less than zero is evidence that the segmented responses form an H. Because it takes time for the selection signal to spread and for the selected boundary activity to be segmented in to layer 1, the model often does not have enough information to make an accurate judgment about the letter type early in a trial. So, a stopping rule was implemented in the model as a measure of model confidence regarding its judgment about letter type. According to the stopping rule, if the model evidence score was greater than zero (or less than zero) for three consecutive 50 millisecond time steps, then the trial ended and the response corresponding to the final model evidence score was taken to be the response. Otherwise, the trial continued up to a maximum of 750 milliseconds after stimulus onset, at which point a letter was randomly chosen as a “guess.” This stopping rule was chosen based on pilot simulations that indicated the model was relatively fast and accurate with this stopping rule.
Grouping Strategy 1
Now that we have summarized the main functions of the connection and segmentation circuits, we can identify a grouping strategy that could be used for this task and stimulus set, which we will call “Grouping Strategy 1.” Recall that a grouping strategy consists of a connection strategy and a selection strategy. A simple connection strategy would be to connect only target circles for each stimulus condition, which is shown in row 2 of Fig. 8. With this connection strategy, if a selection signal falls on part of any circle or their connections, then the entire letter will eventually be selected and segmented, which allows for easy identification of the target letter.
Following ideas in Francis et al., (2017), a plausible selection strategy involves two selection signals placed at locations that help to quickly identify the global letter shape. Specifically, the left selection signal aimed for a location that was centered on the second circle from the left in the row with the fixation cross, and the right selection signal aimed for a location on the second circle from the right in the same row. Because it is unlikely that an observer would precisely place a selection signal at the same location across trials, noise was added to the x and y coordinates of the aimed-for location by adding a value that was randomly drawn (for each coordinate on each trial) from a normal distribution with a mean of zero and standard deviation of 10 (see Fig. 8, bottom row, for examples of selection signal locations with noise added). These locations were chosen because they are near the fixation cross and are likely to result in the selection and segmentation of the target letter.
An additional aspect of the selection strategy is that the size of the selection signals is varied across conditions. For conditions A and B, larger selection signals with a diameter of 67 pixels were used, while selection signals of a diameter of 19 pixels were used for conditions C and D. For conditions C and D, the small size was chosen because it could select only a target circle and not select any surrounding squares. For conditions A and B, larger selection signals were used because there was no risk of selecting a distractor square. Since it takes time for the selection signal to spread along edges/connections and segment them (e.g., Fig. 9), a larger selection signal will lead to faster segmentation of the target elements. We propose that an observer uses gist information from the image to know which selection signal size to use on a given trial. Each selection signal was initiated at 50 milliseconds after stimulus onset and remained at the same location until the end of the trial. The V2 activity selected by the pair of selection signals was segmented into a single layer (Segmentation Layer 1). Figure 9 provides some sample trials that used this selection strategy.
Simulation Stimuli, Method and Procedure
For each simulation, 120 simulated trials were run for each stimulus condition (i.e., 60 trials for each target letter for each condition). As depicted in Fig. 10, on each simulated trial the stimulus image was presented for 150 ms followed by a black image for 100 ms, which served as a 100 ms ISI. Then a random dot mask was shown for 100 ms that was followed by a black image until enough model evidence accumulated for the model to indicate whether the letter was an E or H (see Fig. 11 for an example trial that shows model activity given a random dot mask). If there was not enough evidence for either letter after a total of 750 ms after stimulus onset, then the trial terminated and a guess was made about the identity of the letter, i.e., each letter had a probability of 0.5 of being selected. The time it took on each trial for the model evidence score to go above (or below) zero and trigger the stopping rule provides a measure of response time that can be compared with that of human observers. Additionally, whether the model chose the correct letter on a given trial contributed to an error rate that can be compared with that of human observers.
All simulations were programmed using Python2 scripts with the package NEST 2.14.0 (Peyser et al., 2017) for creating the cells and synapses and for managing network dynamics. A single cell type (iaf_psc_alpha, which is a leaky integrate-and-fire neuron model with alpha-function shaped synaptic currents) and synapse type (static_synapse) were used, and synapse weights were manually set to implement the various circuits (for orientation detectors and connection formation). Each trial takes approximately 11–17 minutes (real time) to run, depending on the time at which a trial was terminated due to a target letter being identified. Stimuli were made using a custom Python3 script with standard packages (numpy, random) and were written to .bmp files using the package ImageIO (Klein et al., 2018). As in the experiment, a different random dot mask was generated for each simulated trial. The simulations were run in batches on two computers in parallel to reduce overall run time. The computers were a 2019 MacBook Pro (32 GB RAM and 8 cores) and a 2018 Linux (16 GB RAM and 6 cores running Debian). Checks of the different computing systems indicate that they give the same results. (All code and stimuli can be found on the OSF at https://osf.io/zrkue/)
Model Results and Discussion
The grouping strategy summarized in Fig. 8, i.e., Grouping Strategy 1, was implemented in the model and produced the simulated results shown in the last column of Fig. 3. Recall that with this grouping strategy, circle elements connect to each other, but square elements do not connect with anything. The circle group is then selected/segmented to layer 1 and separated from unselected elements. Activity in layer 1 is then interpreted by the templates. Model results from the simulation of this experiment are well correlated with our Experiment 2 data (r = .994 for mean response times, r = .998 for error rates). Some discrepancies with the data are easily explained. Response times are much faster for the simulations compared with the experimental results, but this is expected because the model lacks a motor component, among other things. Additionally, model error rates for the simulated data are at zero for conditions A and B, but non-zero for the experimental results. We suspect that participants were performing near ceiling for these conditions and that their non-zero error rates are largely noise, e.g., accidentally pressing the wrong key, which does not happen in the model. Overall, the model using Grouping Strategy 1 produces results similar to those measured in Experiment 2.
Performance for Other Connection Strategies
It is encouraging that model performance with Grouping Strategy 1 seems to closely mimic human behavior. But, we do not mean to propose that this grouping strategy is the best because it matches human behavior. To justify why observers might use a given grouping strategy, we need some other motivation such as good performance or easy implementation. Grouping Strategy 1 seems pretty easy to implement: the connection strategy uses a fixed set of top-down control parameters for all conditions and the selection strategy involves consistent placement of selection signals with only variations in selection signal size for sets of conditions.
To investigate how well Grouping Strategy 1 does with regard to overall performance, we consider two alternative approaches that differ in their connection strategies. In Grouping Strategy 0, no stimulus elements were connected. In Grouping Strategy 2, circles connected with other circles and squares connected with other squares (yet circles and squares did not connect with each other). Simulation response times and error rates for all three strategies are shown in Fig. 12.
Comparing Grouping Strategies 0 and 1, for conditions A and C where the target circles are not joined by stimulus lines, the connections generated with Grouping Strategy 1 clearly lead to better performance: response times and error rates for Grouping Strategy 0 were quite high compared to Grouping Strategy 1. Notice, however, that the lack of connections for condition A does not increase the response time as much as for condition C. This is due to the relatively large selection signal size used for condition A; the large selection signal size selects target objects relatively fast, effectively grouping elements by selection, and, thus, the observer does not need to rely on the spread of connections before making a decision (see Fig. 9). However, this selection strategy comes at the cost of a high error rate since, if the target letter is an E, the selection signal is unlikely to cover the top or bottom row of target circles and, thus, the observer tends to incorrectly respond that the target letter is an H. An extremely large selection signal could be used in tandem with no connections for condition A, which would group all stimulus elements by selection and result in the same fast and accurate performance that Grouping Strategy 1 produces. However, the use of connections allows the observer to use a broader range of selection signal sizes and have fast, accurate performance for condition A. In sum, connections are not necessary to produce good performance for condition A if a very large selection signal were used, but connections do allow a broad range of selection signal size strategies to promote good task performance. Additionally, for condition C connections are needed to promote good performance because increasing the selection signal size results in the selection of surrounding squares, which interferes with template matching. Given that connections need to be created for some trials and generally do not hurt performance, we suspect that observers elect to form connections between target elements in all conditions. Keeping a consistent grouping strategy also simplifies the task for observers, which probably reduces response times overall.
Adding connections among distractor squares (Grouping Strategy 2) makes performance worse for conditions C and D. Connecting only target circles in Grouping Strategy 1 has the advantage of not being too costly if a square is mistakenly selected. If this square is connected to other squares as in Grouping Strategy 2, then the observer will select and segment a large number of distractors, which interferes with the template calculations.
For conditions B and D, Grouping Strategy 0 (with no connections) leads to similar performance as Grouping Strategy 1, which is unsurprising given that their target circles are joined by lines in the stimulus. For these conditions the selection signal spreads across the target even without connections.
Overall, Grouping Strategy 1 leads to the best performance compared to the other strategies (assuming that the selection signal size strategy described in section 5.5 is used, which was the same in Grouping Strategies 0, 1 and 2). Thus, the observed empirical pattern of results is arguably due to observers implementing something similar to Grouping Strategy 1 because it is easy to do and because it does a good job on the task.
Performance for Different Selection Strategies
To assess the role of the selection strategy on performance, a simulation was conducted that implemented Grouping Strategy 3, which had the same connection strategy as Grouping Strategy 1 but the opposite selection size strategy. The strategies and results are summarized in Fig. 13.
Compared with Grouping Strategy 1, all mean response times for Grouping Strategy 3 are higher. For conditions A and B, this is due to the small selection signals missing a target circle or being at locations, e.g., the middle of the middle row, that required more time for the selection signal to spread across the target elements and, thus, took longer to reach a decision. Therefore, larger selection signals, like those implemented in Grouping Strategy 1, lead to better performance for conditions A and B because they rarely miss a target element and because they allow the selection signal to spread quickly, which leads to faster segmentation and identification of the target.
For conditions C and D, response times are slow for Grouping Strategy 3 because the larger selection signals segment more square distractors. For reasons similar to why Grouping Strategy 2 was slower than Grouping Strategy 1, the segmented distractors often interfere with the model evidence calculations that might favor identification of the target. In turn, the model often has to wait for selection signals to spread across the entire target before making a decision. Although Grouping Strategy 3 results in few errors, it comes at the cost of higher response times.
Thus, Grouping Strategy 1 results in better overall performance than Grouping Strategy 3. A smaller selection signal size reduces response time for conditions C and D yet hinders performance for conditions A and B. In turn, a selection strategy with small selection signals for conditions C and D and large selection signals for conditions A and B will result in good performance (assuming that the selection strategy is coupled with a connection strategy in which only circles connect).
Exploratory Analyses for Task-Set Switching
A reviewer of an earlier version of this manuscript suggested exploring the empirical data for evidence of “task-set switching” (Kiesel et al., 2010; Rogers & Monsell, 1995; Schneider & Logan, 2014). As described above, our model suggests that observers use different selection strategies depending on whether the target elements are embedded among distractors (conditions C and D) or not (conditions A and B). We suppose that observers select which strategy to use for a given trial by extracting gist information from the scene. Setting up a selection strategy might take some time, so responses might be a bit slower if the preceding trial used a different selection strategy. By looking at the response times for sequential trial pairs, we can check whether response times for the second trial are slower when the previous trial should involve a different selection strategy. The experiment was not designed to test for task-set switching effects (e.g., the number of repetition and switch trials are not necessarily balanced), and we do not have a theoretically motivated estimate of the size of such an effect. If we assume that the grouping strategy of the previous trial “carries over” to the next trial, we expect a small advantage for repetition trials because the modification to the grouping strategy only involves changing the size of the selection signal. If, on the other hand, we assume that the grouping strategy is effectively reset for each trial, then we would not expect an effect, as both repetition and switch trials require some time to set up the grouping strategy. For the sake of simplicity, the simulations reported above were run in accord with the second assumption; however, the model provides no reason to hold one assumption over the other. For these reasons, we consider this analysis to be an exploratory investigation of possible model properties rather than a test of model predictions.
Table 3 shows the mean response times for the second of a pair of consecutive trials in Experiment 1, based on whether the model predicts that observers use either the same (repeated) or a different (switched) selection strategy for the two trials. There is a small (around 6 ms) increase in response times when the previous trial would use a different model-predicted grouping strategy than the current trial. Although small, an ANOVA indicates a significant effect of grouping strategy sequence on response time, F(1,124) = 9.26, p = .003.
Table 4 shows the statistics corresponding to the same analysis for Experiment 2. Here, the effect of grouping strategy sequence among trial pairs is 7 ms and significant, F(1,119) = 10.75, p = .001.
General Discussion
The close match between the experimental results and the simulated results with Grouping Strategy 1 provides support for our claim that human observers may be using this kind of grouping strategy. This grouping strategy consists of two key components. First, the same connection strategy was used for all conditions so that only target circles formed connections between themselves. This connection strategy allows a range of selection signal sizes to promote performance for conditions A and B. Second, a condition-dependent selection strategy with small selection signals for conditions with surrounding squares and large selection signals for conditions without surrounding squares, produces fast letter identification with few errors. In our simulations, Grouping Strategy 1 not only best resembled the pattern of responses from the experiments but produced the best overall performance.
Recall from Section 1 that Han et al. designed their experiment to explore three main hypotheses: (1) grouping by proximity occurs earlier than grouping by shape, (2) uniform connectedness occurs prior to grouping by proximity, and (3) uniform connectedness occurs prior to grouping by shape similarity.
Han et al. regard the difference in performance for conditions A and C as support for hypothesis (1) that grouping by proximity occurs earlier than grouping by shape. In the model, this difference in performance is largely due to the selection strategy, which is chosen to avoid interference from the distractors in condition C. While all conditions have the same connection strategy of connecting only target circles, the selection strategy involves using a larger selection signal for condition A compared to condition C. A large selection signal quickly segments elements even without spreading across connected elements. Due to the risk of selecting nearby square distractors in condition C, smaller selection signals are used, which comes at the cost of taking more time to segment enough target signal to identify the global letter. Errors in condition C typically occur when a selected square falls in the template of the other letter or if one of the small selection signals does not land on a circle due to misplacement, which can result in the observer having evidence in favor of the incorrect letter. For condition A, there is no risk of selecting a distractor and, thus, no such errors.
Han et al. took the difference in performance between conditions C and D as support for hypothesis (3) that uniform connectedness occurs before grouping by shape similarity. In the model, this difference in performance is due to the time it takes for connections to form. In condition D, the circles are already joined by physical lines, so the selection signal can spread across these lines as soon as the signal is initiated. For condition C, the target elements must be joined by connections before the selection signal can spread across them. The formation of connections takes time, which produces slower responses for condition C, compared to condition D. Errors occur for both conditions due to randomness in the placement of selection signals and inadvertent selection of distractors. Such errors occur more frequently for condition C because it takes time for connections to spread and, thus, the observer could have information indicating the incorrect letter for a longer period of time, thereby increasing the chance of making a fast but incorrect response. Thus, the empirical data does not necessarily indicate an order to processes; the model detects and manipulates edges, but no edges are more basic than (or necessarily prior to) other edges. Likewise, the model suggests that uniform connectedness is an emergent property of selection/segmentation rather than being due to a specialized mechanism that results in basic units that are then grouped at a later stage of processing.
Han et al. regard the similarity in performance for conditions A and B as support against hypothesis (2) that uniform connectedness occurs prior to grouping by proximity. In the model, this similarity in performance is due to the combination of a large selection signal for both conditions and a connection strategy where target elements are connected in all conditions. The similar performance for these conditions is quite robust in the model because the lack of distractors makes it easy to select target elements by using large selection signals provided that the target circles are connected, which, as argued above, is an easy connection strategy to implement for this stimulus set. There are a range of grouping strategies that all involve using relatively large selection signals to promote performance for condition A, and these strategies also perform well for condition B. Thus, rather than indicating that uniform connectedness does not occur prior to grouping by proximity, the similarity in performance for conditions A and B reflects the ability of the observer to use relatively large selection signals for these conditions, which results in fast, accurate identification of the target letter regardless of whether the circles are joined by lines (condition B) or not (condition A).
Overall, the results in Han et al., (1999a) support our hypothesis that observers use a grouping strategy, which involves both a connection strategy and a selection strategy, that promotes performance on a given task.
Conclusions
According to the model, a grouping strategy consists of a connection strategy and a selection strategy, both of which are subject to top-down control. A connection strategy is implemented by tuning the connection parameters to create connections among stimulus elements that will promote performance on a given task. A selection strategy, which concerns the number, placement, size, and timing of a selection signal, is chosen in conjunction with a connection strategy with the aim of promoting performance on a particular task by guiding selection signals to areas that will separate targets and distractors, thereby making the target(s) easier to identify.
For the Han et al. task, the simulation results quite closely resemble those of human observers when the model implements a grouping strategy that is easy to apply and promotes good performance. The model works with low-level mechanisms instead of relying on Gestalt grouping principles (Kon and Francis, 2022). We anticipate that many empirical measures of grouping reflect different grouping strategies that are implemented by observers to easily and efficiently complete specified tasks for that measure. Given that we claim grouping strategies are task-dependent, an implication of this hypothesis is that it will be uncommon for strategies to generalize from one situation to another. Instead, observers will create and implement novel strategies for each situation and task. From this perspective grouping is rarely the passive process suggested by Köhler (1929), and comparisons across conditions might need to consider how various grouping strategies could be brought to bear.
The model’s interpretation of the empirical data highlights that perceptual grouping is neither a well-defined concept nor a process that has a single mechanism or even a series of mechanisms. Rather, we propose that what is referred to as “perceptual grouping” reflects many different model behaviors that together achieve a given task. What is described as a type of grouping may involve many different mechanisms depending on the task and stimulus set. For example, in the model differences in grouping can occur through different connections between elements or by different selection approaches. For conditions C and D, the surrounding squares prevent the use of large selection signals, so small selection signals are used that spread along connected stimulus edges. In these conditions, it might be suggested that target circles were grouped by the formed connections. However, such connections were not required for conditions A and B. Here, large selection signals can be used to quickly segment the target elements. In these conditions, it might be suggested that grouping was done by selection rather than by the formed connections (although such connections do not hurt the process). The complexity of involved mechanisms that operate in parallel make it challenging to empirically isolate one mechanism from others, a point that has been apparent in the empirical literature for quite some time (Wagemans, 2018). However, in the model top-down control of these mechanisms can be directly manipulated, thereby allowing for better understanding of how these mechanisms contribute to performing the task at hand.
Importantly, although the top-down grouping strategy implemented by the model was chosen to promote performance on the task, the model mechanisms were not designed to emulate performance for the specific task of Han et al., (1999a). The selection/segmentation mechanisms utilized here were proposed to play an important role in “uncrowding” (Francis et al., 2017). Likewise, the connection circuits used here were originally proposed to explain the general flexibility of perceptual grouping (Kon and Francis, 2022). The application of the same mechanisms to the task of Han et al., (1999a) demonstrates how a few basic circuits can be surreptitiously combined to solve novel visual tasks.
References
Ben-Av, M. B., & Sagi, D. (1995). Perceptual grouping by similarity and proximity: Experimental results can be predicted by intensity autocorrelations. Vision Research, 35, 853–866.
Bevan, W. (1961). Perceptual learning: An overview. The Journal of General Psychology, 64(1), 69–99.
Bevan, W., & Zener, K. (1952). Some influences of past experience upon the perceptual thresholds of visual form. The American Journal of Psychology, 65(3), 434–442.
Braly, K. W. (1933). The influence of past experience in visual perception. Journal of Experimental Psychology, 16(5), 613–643.
Francis, G., Manassi, M., & Herzog, M. H. (2017). Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124(4), 483–504.
Francis, G. , & Neath, I. (2015). CogLab 5. Cengage Publishing. https://coglab.cengage.com.
Gottschaldt, K. (1926/1950). Gestalt factors and repetition. In W. D. Ellis (Ed.) A sourcebook of Gestalt psychology (pp. 109–122). New York: Humanities Press.
Grossberg, S., & Mingolla, E. (1985a). Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. Perception & Psychophysics, 38, 141–171.
Grossberg, S., & Mingolla, E. (1985b). Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychological Review, 92, 173–211.
Grossberg, S., & Raizada, R. D. S. (2000). Contrast-sensitive perceptual grouping and object-based attention in the laminar circuits of primary visual cortex. Vision Research, 40, 1413–1432.
Han, S., Humphreys, G. W., & Chen, L. (1999a). Uniform connectedness and classical Gestalt grouping principles of perceptual grouping. Perception & Psychophysics, 61(4), 661–674.
Han, S., Humphreys, G. W., & Chen, L. (1999b). Parallel and competitive processes in hierarchical analysis: Perceptual grouping and encoding of closure. Journal of Experimental Psychology: Human Perception & Performance, 25(5), 1411–1432.
Kiesel, A., Steinhauser, M., Wendt, M., Falkenstein, M., Jost, K., Philipp, A. M., & Koch, I. (2010). Control and interference in task switching—A review. Psychological Bulletin, 136(5), 849–874.
Kimchi, R., & Hadad, B. S. (2002). Influence of past experience on perceptual grouping. Psychological Science, 13, 41–47.
Klein, A., Silvester, S., Tanbakuchi, A., Müller, P., Nunez-Iglesias, J., Harfouche, M., ..., Elliott, A. (2018). imageio/imageio: V2.4.1 (Version v2.4.1). Zenodo. http://doi.org/10.5281/zenodo.1488562.
Koffka, K. (1935/1963) Principles of Gestalt psychology. New York: Harcourt, Brace & World.
Köhler, W. (1929) Gestalt psychology. New York: Horace Liveright.
Kon, M., & Francis, G. (2022). Cortical circuits for top-down control of perceptual grouping. Neural Networks, 151, 190–210.
Lawrence, M. A. (2016). EZ: Easy analysis and visualization of factorial experiments (R Package Version 4.4-0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=ez.
Moore, M. G. (1930). Gestalt vs. experience. The American Journal of Psychology, 42(3), 453–455.
Palmer, S. E. (1999) Vision science: Photons to phenomenology. Cambridge: MIT Press.
Palmer, S. E., & Beck, D. M. (2007). The repetition discrimination task: An objective method for studying perceptual grouping. Attention, Perception & Psychophysics, 69(1), 68–78.
Palmer, S., & Rock, I. (1994). Rethinking perceptual organization: The role of uniform connectedness. Psychonomic Bulletin & Review, 1(1), 29–55.
Peterson, M. A., & Kimchi, R. (2013). Perceptual organization in vision. In D. Reisberg (Ed.) The Oxford Handbook of Cognitive Psychology. New York: Oxford University Press.
Peyser, A., Sinha, A., Vennemo, S. B., Ippen, T., Jordan, J., Graber, S., ..., Plesser, H. E. (2017). NEST 2.14.0. Zenodo. http://doi.org/10.5281/zenodo.882971.
R Core Team (2020). R: A language and environment for statistical computing. In R Foundation for Statistical Computing. [Computer software]. Retrieved from https://www.R-project.org/: Vienna, Austria.
Raizada, R., & Grossberg, S. (2001). Context-sensitive bindings by the laminar circuits of V1 and V2: A unified model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 8, 431–466.
Rogers, R. D., & Monsell, S. (1995). Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124(2), 207–231.
Schneider, D. W., & Logan, G. D. (2014). Tasks, task sets, and the mapping between them. In G. Houghton, & J. Grange (Eds.) Task switching and cognitive control. Oxford: Oxford University Press.
Trick, L. M., & Enns, J. T. (1997). Clusters precede shapes in perceptual organization. Psychological Science, 8(2), 124–129.
Vickery, T. J. (2008). Induced perceptual grouping. Psychological Science, 19(7), 693–701.
Vickery, T. J., & Jiang, Y. V. (2009). Associative grouping: Perceptual grouping of shapes by association. Attention, Perception, Psychophysics, 71(4), 896–909.
Wagemans, J. (2018). Perceptual organization. In J. T. Wixted, & J. Serences (Eds.) The Stevens’ handbook of experimental psychology and cognitive neuroscience, Sensation, perception & attention, (Vol. 2 pp. 803–872). Hoboken: Wiley.
Wertheimer, M. (1923/1950). Laws of organization in perceptual forms. In W. D. Ellis (Ed.) A sourcebook of Gestalt psychology (pp. 71–81). New York: Humanities Press.
Zemel, R. S., Behrmann, M., Mozer, M. C., & Bavelier, D. (2002). Experience-dependent perceptual grouping and object-based attention. Journal of Experimental Psychology: Human Perception & Performance, 28, 202–217.
Funding
GF was supported by the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 945539 (Human Brain Project SGA3) and by a Visiting Scientist Grant from the Swiss National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Disclosures
All authors contributed in a significant way to the manuscript and have read and approved the final manuscript. All authors report that there are no conflicts for this work. Some of these results were presented at the 2021 annual meeting of the Vision Sciences Society and the 2021 European Conference on Visual Perception.
Data Availabilty
The data and materials for all experiments are available at https://osf.io/zrkue/, and none of the experiments were preregistered.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Part of this research was performed while GF was a visiting professor at École Polytechnique Fédérale de Lausanne, Switzerland.
Appendix
Appendix
The connection circuits of the model are presented in depth in Kon and Francis (2022), but we here provide a sketch of how the connection circuits operate so that the connection parameters are somewhat interpretable. There are three connection circuits: the spread controller, the long controller, and the short controller. They are subject to top-down control via the tuning of four parameters. The spread controller regulates the spread of connections from contours corresponding to stimulus edges. Top-down control of the spread controller consists of tuning the onset of top-down input to the controller and the duration of this input. The longer the duration, the longer the spread controller functions and, thus, the farther connections spread. In contrast, the long and short controllers reduce the spread from long and short edges, respectively. Kon and Francis (2022) showed that tuning these four parameters enables the connection circuits to connect stimulus elements in various ways, some of which reflect classic Gestalt grouping principles.
Top-down control of the long controller consists of tuning long controller input, which defines a “long” stimulus edge. When this parameter is very high, the circuit is effectively off and does not restrict spreading of any edges. At moderate values, the long controller prevents connections between long edges but allows connections to form between short edges.
Like the long controller, the short controller is also subject to top-down control by tuning a short controller input, which defines a “short” stimulus edge. The short controller is off when its input is zero, and then does not restrict the spread of connections from any edges. For higher input values, the short controller prevents short edges from connecting while allowing long edges to connect.
Additionally, it is important for the Han simulations to note that each connection circuit is orientation-specific, i.e., each circuit can impact the spread of active V2 cells that are tuned to a specific orientation independently of active cells tuned to other orientations. For example, for Grouping Strategy 1 where only circles are connected by horizontal and vertical connections, the spread of horizontally-tuned and vertically-tuned cells was encouraged by having the duration parameter set to some positive value for these orientations (20 ms), while the duration parameter was set to 0 ms for diagonally-tuned cells.
Table 5 provides the connection parameter values (for horizontally- and vertically-tuned cells) for the connection strategies shown in Fig. 5. The first three connections strategies in the table were also used in Grouping Strategies 0, 1, and 2, respectively.
In turn, connections in Fig. 5 were formed using the connection circuits as follows. For the “No connections” simulation, the connection circuits were off. For the “Circles only” simulation, spread controller duration for horizontal and vertical orientations was long enough (20 ms) to allow connections to form among nearest neighbors, and the control parameters for the horizontal and vertical long controllers was set at a value (2.0) that allowed the small horizontal/vertical contours of a circle to connect but prevented the long contours of a square from connecting. The short controller did not contribute for this case. For the “Circles and squares” simulation, connection parameters were the same as for the “Circles only” simulation except horizontal and vertical long controller input was reduced to 1.4, which allowed squares to connect with other squares but prevented connections between squares and circles. For the “All connected” simulation, connection parameters were the same as for the “Circles only” simulation except the long controller circuit did not contribute, so all contours could form connections.
Rights and permissions
About this article
Cite this article
Kon, M., Francis, G. Perceptual Grouping Strategies in a Letter Identification Task: Strategic Connections, Selection, and Segmentation. Atten Percept Psychophys 84, 1944–1963 (2022). https://doi.org/10.3758/s13414-022-02515-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13414-022-02515-1