Research ReportInteractions between higher and lower visual areas improve shape selectivity of higher level neurons—Explaining crowding phenomena
Introduction
Within 40 ms after the light of an image hits the retina, cells in the primary visual cortex (V1) start to fire. The very first spikes already express orientation and spatial frequency selectivity. The same applies to cells in extrastriate areas that, only 10 ms later, instantaneously code for color, motion, stereo depth, etc. Even the highest levels of selectivity, such as face versus non-face in inferotemporal cortex, appear to be already expressed some 80–120 ms after the image is presented (Oram and Perret, 1992). A great part of object recognition is thus already established during the initial fast feedforward sweep of information transfer.
From anatomical studies it is clear, however, that cortico-cortical connections not only run from lower areas to higher ones, but also in the reverse direction (Salin and Bullier, 1995). These fibers provide feedback signals that may play a role in object recognition as well. For example, it has been hypothesized that these feedback signals are necessary to process spatially detailed information (Hochstein and Ahissar, 2002). In this framework, feedforward processing involves rapid and automatic processes that provide basic object categorizations, yet incorporating a limited amount of detail. For a detailed and complete representation, higher areas would need to reach back to the low-level areas by means of cortical feedback mechanisms. Low-level areas contain neurons with smaller receptive fields than neurons in higher areas and are in that respect more suitable for the signaling of spatial detail (Hochstein and Ahissar, 2002, Lee et al., 1998, Roelfsema et al., 2000).
Recordings from face selective cells in the inferotemporal cortex are consistent with such an idea: these neurons convey two different modes of information in their firing patterns, starting at different latencies: global information (“is it a face or a non-face?”) is transmitted in the initial part of the response and more detailed information (“who's face is it?”) is transmitted later, beginning on average about 50 ms after global information (Sugase et al., 1999). Such dynamic changes in the tuning properties of IT neurons seem compatible with feedback interactions with lower visual areas, but whether this is the case has not yet been firmly established. It could equally well be that the two types of information are carried by parallel feedforward streams (such as the magno- and parvocellular inputs) having different latencies, one of which carries low-resolution information, while the other carries high-resolution information.
To verify which of these hypotheses is correct, it would be necessary to separate the contributions of feedforward and feedback signals to the selectivity of cortical neurons, which is notoriously difficult in in-vivo preparations. Some attempts have been made (Hupé et al., 1998, Lamme, 1995, Roelfsema et al., 1998). An example, in the context of scene segmentation, is shown in Fig. 1. Scene segmentation may benefit from interactions between higher and lower visual areas when segregation of a figure from the background is necessary at high spatial resolution (Roelfsema et al., 2000); effects of these interactions would, among others, manifest themselves in lower visual areas. Indeed, presented with the texture-defined square, neurons in V1 respond stronger when their receptive fields lie within the square than when they are stimulated by identical background elements, even when the square is much larger than the classical receptive field size (Lamme, 1995). This contextual modulation typically occurs at some delay with respect to the visual response itself and is abolished when V1 is isolated from feedback from higher areas (Lamme et al., 1998a). Other studies confirm the idea that feedback from extrastriate areas is necessary for V1 cells to signal figure-ground differences (Hupé et al., 1998).
To isolate the potential contribution of feedforward and feedback signals to the receptive field tuning properties of cortical neurons, computational neural network models provide a useful tool. Here, we study these contributions in a simulation that is modeled after the hierarchical feedforward–feedback organization of cortical areas with increasing receptive field size, as is found in the ventral stream of primates. We ask the following questions:
- 1.
In the recurrent models, the higher areas ‘reach back’ to lower areas to find high-resolution information that was not provided by the feedforward sweep. Given the absence of direct connections between low-level areas such as V1 and motor areas (Felleman and Van Essen, 1991), this implies that the only way to report about these details would be via the higher areas. So, these will have to express that high-resolution information in some way by differentially firing to patterns that differ in their details. How do these high-level neurons in the end obtain their tuning to spatial detail? Is this possible for cells with large receptive fields (but see DiCarlo and Maunsell, 2003)? Does this not disturb their original tuning properties?
- 2.
Detailed spatial features of an individual item can be more difficult to identify when other shapes are near it, a phenomenon known as crowding (Bouma, 1970, Toet and Levi, 1992). Global features of the stimulus set as a whole, such as the average orientation of a group of tilted patches, can nonetheless be reported under crowding conditions (Parkes et al., 2001). If top–down feedback interactions process spatial detail, this suggests that they do not come about for individual items when other shapes are in the vicinity. Paradoxically, however, spatial detail that is lost in crowding nevertheless is able to evoke specific adaptation effects (He et al., 1996). Can we explain both the loss of spatial detail in crowding and the paradoxical adaptation to these details in the same framework?
Here, we study these questions in the context of texture segregation, for which top–down feedback processes strongly related to a conscious report have been observed (Lamme et al., 1998b, Lamme et al., 2002, Supèr et al., 2001). We use a recurrent neural network model that faithfully simulates these texture-segregation-related processes in V1 in a normal situation, and when the area is isolated from feedback, and compare the firing rate of the model's temporal cortex neurons in these two conditions. We simplify the object recognition aspect of the temporal cortex neurons to the extent that we consider a neuron's response as indicative of object discrimination whenever its responses reflect differences between objects. This is not the strong type of object selectivity that is traditionally observed in temporal cortex neurons, but we argue that the selectivity of our neurons could easily be amplified to obtain strong selectivity if further processing stages were added to the model. We furthermore consider two very basic aspects of shape selectivity, namely selectivity for the overall length of the contour that encloses an object, and the surface area that is covered by an object. These are abstractions of the tuning properties usually found for cells in TE, though it is not unlikely that responses of TE neurons do in fact reflect such object parameters. For example, Sary et al. (1993) studied cue invariant shape tuning of TE neurons using stimuli like squares, crosses, stars etc., which differed along various dimensions. It is not clear what feature dimensions were critical in mediating the strong selectivity of TE cells found in that study. One of these dimensions, however, could have been the contour–surface ratio. Hence, we use three stimuli (a bar, a square and a cross), two of which (the bar versus the other two) differ in the first aspect (contour length) and two of which (the square and the cross) only differ in the last aspect (figure area). We study to what extent selectivity to contour length and figure area depend on feedforward versus re-entrant processing.
Section snippets
A re-entrant model for texture segregation
We use as starting point the model of Roelfsema et al. (2002) and its main principles for texture segregation. The model was first motivated to resolve the ‘grouping–segmentation paradox’ (Roelfsema et al., 2002), which refers to conflicting constraints posed on neural architecture by grouping (similar image elements should support each other to allow grouping of coherent image regions) and segmentation (similar edge elements should inhibit one another to allow boundary detection and pop out).
Dynamical tuning properties of TE neurons
We have shown that single neurons in model TE reveal two different modes of information in their firing patterns, starting at different latencies, one of which results from feedback interactions with lower-level areas. With the initial feedforward sweep, the TE neurons can detect a segregating texture-defined stimulus. However, only after additional feedback–feedforward passes, the neurons also distinguish between stimuli of different shape. More precisely, the feature that is detected during
Model architecture
The model (Fig. 2) is composed of five areas, each of which is subdivided into feedforward and feedback layers. The first area (model V1) contains 64 × 64 units tuned to a left-tilted orientation and the same number of units tuned to a right-tilted orientation. Units are selective to orientation, but the neuronal mechanisms leading to orientation selectivity are not modeled explicitly (see Olshausen and Field, 1996, Rao and Ballard, 1999, Somers et al., 1995 for models of orientation
References (50)
The link between brain learning, attention, and consciousness
Conscious. Cogn.
(1999)- et al.
View from the top: hierarchies and reverse hierarchies in the visual system
Neuron
(2002) - et al.
Feedforward, horizontal, and feedback processing in the visual cortex
Curr. Opin. Neurobiol.
(1998) - et al.
The implementation of visual routines
Vis. Res.
(2000) Time as coding space?
Curr. Opin. Neurobiol.
(1999)- et al.
The two-dimensional shape of spatial interaction zones in the parafovea
Vis. Res.
(1992) - et al.
Modeling visual attention via selective tuning
Artif. Intell.
(1995) - et al.
Invariant face and object recognition in the visual system
Prog. Neurobiol.
(1997) Interaction effects in parafoveal letter recognition
Nature
(1970)- et al.
The analysis of visual motion: a comparison of neuronal and psychophysical performance
J. Neurosci.
(1992)
The laplacian pyramid as a compact image code
IEEE Trans. Commun.
ART2, self-organization of stable category recognition codes for analog input patterns
Appl. Opt.
Neural dynamics of motion grouping, from aperture ambiguity to object speed and direction
J. Opt. Soc. Am. A, Opt. Image Sci. Vis.
Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position
J. Neurophysiol.
Distributed hierarchical processing in the primate cerebral cortex
Cereb. Cortex
Integration of distributed cortical systems by reentry: a computer simulation of interactive functionally segregated visual areas
J. Neurosci.
A neural network for visual pattern recognition
IEEE Comput.
Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading
Psychol. Rev.
Feedback interactions between neuronal pointers and maps for attentional processing
Nat. Neurosci.
Attentional resolution and the locus of visual awareness
Nature
Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons
Nature
The neurophysiology of figure-ground segregation in primary visual cortex
J. Neurosci.
Separate processing dynamics for texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey
Cereb. Cortex
Figure-ground activity in primary visual cortex is suppressed by anesthesia
Proc. Natl. Acad. Sci. U. S. A.
Masking interrupts figure-ground signals in V1
J. Cogn. Neurosci.
Cited by (43)
Parietal tACS at beta frequency improves vision in a crowding regime
2020, NeuroImageCitation Excerpt :There is general consent that to perceive an uncrowded visual scene the output of detectors activated by several simple features belonging to a target has to be combined into an integrative receptive field of appropriate size for isolating the target from the background (not too large and not too small). Fast and automatic feedforward processing in the dorsal stream is inadequate because it only provides implicit information on undetailed global features (Hochstein and Ahissar, 2002; Jehee et al., 2007) and their spatial location (Vidyasagar and Pammer, 2010). Figure-ground segmentation mechanisms need to be activated in the ventral stream or in lower level areas in order to select the appropriate (smaller) receptive fields (Lamme and Roelfsema, 2000; Lee et al., 1998) for isolating the target from flankers.
Incremental Integration of Global Contours through Interplay between Visual Cortical Areas
2014, NeuronCitation Excerpt :The push-pull response mode in V1 may in turn allow V4 to analyze contour shape unhindered by distracters in the visual image; in addition, the contour signals in V1 may confer higher spatial resolution for scene segmentation by providing more precise information concerning contour position, alignment of contour elements, the contour/background boundary, and details of contour shape. The above processing scheme is consistent with a general theoretic framework (Epshtein et al., 2008; Hochstein and Ahissar, 2002; Jehee et al., 2007; Roelfsema, 2006; Ullman, 1984) that is efficient in analyzing and disambiguating complex visual scenes based on bottom-up and top-down recurrent processing. A modeling study has shown that a cortex-like hierarchical network is able to reliably and rapidly recognize objects and their parts almost simultaneously by a single feedforward sweep followed by a feedback sweep (Epshtein et al., 2008).
Switching dynamics of border ownership: A stochastic model for bi-stable perception
2011, Vision ResearchCitation Excerpt :Other models also implemented feedback mechanisms between the higher level and the lower level (Dayan, 1998; Furstenau, 2003; Grossberg et al., 2008). A feedback system reflecting contextual properties and influencing lower level activities has also been applied in neuro-computational models of figure–ground segregation (Domijan & Setic, 2008; Jehee, Lamme, & Roelfsema, 2007; Jehee et al., 2007; Roelfsema et al., 2002). In developing our model, we focused on the possible involvement of the feedback system in the stochastic properties of the bi-stable perception of the face or vase figure.
Beta oscillations in vision: a (preconscious) neural mechanism for the dorsal visual stream?
2023, Frontiers in PsychologyMulti-level selective potentiality maximization for interpreting multi-layered neural networks
2022, Applied IntelligenceBinding Mechanisms in Visual Perception and Their Link With Neural Oscillations: A Review of Evidence From tACS
2021, Frontiers in Psychology