Multi-scale lines and edges in V1 and beyond: Brightness, object categorization and recognition, and consciousness
Introduction
The visual cortex detects and recognizes objects by means of the ventral “what” and dorsal “where” subsystems. The “bandwidth” of these systems is limited: only one object can be attended at any time (Rensink, 2000). In a current model by Deco and Rolls (2004) the ventral what system receives input from area V1 which proceeds through V2 and V4 to IT (inferior temporal cortex). The dorsal where system connects V1 and V2 through MT (medial temporal) to area PP (posterior parietal). Both systems are controlled, top-down, by attention and short-term memory with object representations in PF (prefrontal) cortex, i.e., a what component from PF46v to IT and a where component from PF46d to PP. The bottom-up (visual input code) and top-down (expected object and position) data streams are necessary for obtaining size, rotation and translation invariance, assuming that object views are normalized in visual memory.
Signal propagation from the retinas through the LGN (lateral geniculate nucleus) and areas V1, V2 etc., including feature extractions in V1 and groupings in higher areas, takes time. Object recognition is achieved in 150–200 ms and category-specific activation of PF cortex starts after about 100 ms (Bar, 2004). In addition, IT cortex first receives coarse-scale information and later fine-scale information. Apparently, one very brief glance is sufficient for the system to develop a gist of the contents from an image (Oliva and Torralba, 2006). This implies that some information propagates very rapidly and directly to “attention” in PF cortex in order to pre-select possible object-group templates and positions that then propagate down the what and where systems. This process we call object categorization, which cannot be obtained by the CBF (Categorical Basis Functions) model by Riesenhuber and Poggio (2000) because categorization (e.g. a cat) is obtained by grouping outputs of identification cells (cat-1, cat-2, cat-3). In other words, categorization would be obtained after recognition. In contrast, the LF (Low Frequency) model Oliva et al., 2003, Bar, 2004 assumes that categorization is obtained before recognition: low-frequency information that passes directly from V1/V2 to PF cortex, although the LF information actually proposed consists of lowpass-filtered images, but not of e.g. outputs of simple and complex cells in V1 which are tuned to low spatial frequencies. The latter option will be explored in this paper.
After object categorization on the basis of coarse-scale information has narrowed the set of objects to be tested, the recognition process can start by also applying fine-scale information. We will focus on how such processes can be embedded in the architecture referred to above, with special focus on face recognition. Despite the impressive number and variety of computer-vision methods devised for faces and facial landmarks, see e.g. Yang et al. (2002), we show that very promising results with a cortical model can be obtained, even in the case of some classical complications involving changes of pose (frontal vs. 3/4), facial expression, some lighting and noise conditions, and the wearing of spectacles.
In computer vision there exists a vast literature, from basic feature extraction to object segregation, categorization and recognition, and from image reconstruction (coding and decoding) to scale stabilization to disparity, but much less in biological vision. We therefore continue with a very brief summary of approaches related to this paper, with special focus on biological methods.
In addition to a few general overviews, see e.g. Hubel (1995), Bruce et al. (2000), Rasche (2005) and Miikkulainen et al. (2005), there also are detailed and quantitative models of simple and complex cells Heitger et al., 1992, Petkov and Kruizinga, 1997, plus various models for inhibitions Heitger et al., 1992, Petkov et al., 1993b, Barth et al., 1998, Rodrigues and du Buf, 2006a, edge detection Smith and Brady, 1997, Elder and Zucker, 1998, Kovesi., 1999, Grigorescu et al., 2003 and combined line and edge detection Verbeek and van Vliet, 1992, van Deemter and du Buf, 2000, Rodrigues and du Buf, 2004, Rodrigues and du Buf, 2006a. Other models address saliency maps and Focus-of-Attention Itti and Koch, 2001, Parkhurst et al., 2002, Deco and Rolls, 2004, Rodrigues and du Buf, 2006d, figure-ground segregation Heitger and von der Heydt, 1993, Hupe et al., 2001, Zhaoping, 2003, Rodrigues and du Buf, 2006a and object categorization Riesenhuber and Poggio, 2000, Leibe and Schiele, 2003, Csurka et al., 2004, Rodrigues and du Buf, 2006a. Concerning faces, various approaches have been proposed, from detecting faces and facial landmarks to the influence of different factors such as race, gender and age Delorme and Thorpe, 2001, Yang et al., 2002, Ban et al., 2003, Rodrigues and du Buf, 2005b, including final face recognition Kruizinga and Petkov, 1995, Zhao et al., 2003, Rodrigues and du Buf, 2006c, Rodrigues and du Buf, 2006d. Yet other models have been devised for disparity Fleet et al., 1991, Ohzawa et al., 1997, Qian, 1997, Rodrigues and du Buf, 2004, automatic scale selection (Lindeberg, 1994), visual reconstruction (Rodrigues and du Buf, 2006b) and brightness perception (du Buf, 2001). In this paper we show that one basic process, namely line and edge detection in V1 (and possibly V2), can be linked to most if not all the topics mentioned above, even to consciousness. We present an improved scheme for multi-scale line/edge extraction in V1, which is truly multi-scale with no free parameters. We illustrate the line/edge interpretation (coding and representation) for automatic scale selection and explore the importance of this interpretation in object reconstruction, segregation, categorization and recognition. Since experiments with possible Low-Frequency models based on lowpass-filtered images, following Bar (2004), gave rather disappointing results, which is due to smeared blobs of objects that lack any structure, we propose that categorization is based on coarse-scale line/edge coding, and that recognition involves all scales. Processing schemes are discussed in the framework of a complete cortical architecture. We emphasize that the multi-scale keypoint information also extracted in V1, which was shown to be very important for detection of facial landmarks and entire faces (Rodrigues and du Buf, 2006d), and other important features such as texture information that can be retrieved from bar and grating cells (du Buf, 2007), will not be employed here, because we want to focus completely on the multi-scale line/edge information in V1 and beyond. Therefore, this paper complements the previous one dedicated to keypoints (Rodrigues and du Buf, 2006d).
In Section 2 we present line/edge detection and classification in single- and multi-scale contexts, plus the application of non-classical receptive field (NCRF) inhibition. Section 3 illustrates the visual reconstruction model in relation to brightness perception. Section 4 deals with object segregation, Section 5 with automatic scale selection, and Section 6 with object categorization. This is followed by face recognition in Section 7 and consciousness in Section 8. We conclude with a final discussion in Section 9.
Section snippets
Line and Edge Detection and Classification
In many models it is assumed that Gabor quadrature filters provide a good model of receptive fields (RFs) of cortical simple cells. In the spatial domain they consist of a real cosine and an imaginary sine, both with a Gaussian envelope Lee, 1996, Grigorescu et al., 2003, Rodrigues and du Buf, 2006d. As in Rodrigues and du Buf (2006d), an RF is given bywith and , where is the spatial frequency, being the
Visual Reconstruction and Brightness Perception
Image reconstruction can be obtained by assuming one lowpass filter plus a complete set of (Gabor) bandpass filters that cover the entire frequency domain, such that an allpass filter is formed—this concept is exploited in wavelet image compression and coding. The goal of our visual system is to detect objects, with no need, nor capacity, to reconstruct a complete image of our visual environment; see change blindness and the limited “bandwidth” of the what and where subsystems (Rensink, 2000).
Object Segregation
Until here we have illustrated multi-scale line and edge detection in area V1 and the symbolic interpretation for visual reconstruction in brightness perception, but one of the other goals of the visual cortex is to detect and recognize objects by means of the what and where systems. Object detection and recognition seem like a typical chicken-or-egg problem: without having some idea about the type and characteristics of the object it is not possible to separate the object from its background
Automatic Scale Selection
Apart from object segregation, other processes may play an important role in the fast where and slower what systems. Concentrating on lines and edges – ignoring other features extracted in V1 – there may be many scales and the tremendous amount of information may not propagate in parallel and at once to IT and PF cortex. It might be useful that lines and edges which are most characteristic for an object are extracted and that these propagate first, for example for a first but coarse object
Object Categorization
Object recognition is a clearly defined task: a certain cat, like the neighbors’ red tabby called Toby, is recognized or not. Categorization is more difficult to define because there are different levels, for example (a) an animal, (b) one with four legs, (c) a cat and (d) a red tabby, before deciding between our own red tabby called Tom and his brother Toby living next door. It is as if we were developing categorization by very young children: once they are familiar with the family’s cat,
Face Recognition
The final goal in vision is object recognition, but here we focus on face recognition by the multi-scale line and edge representations. This completes face detection as presented in the previous paper (Rodrigues and du Buf, 2006d), in which saliency maps and the multi-scale keypoint representation have been used for detecting facial landmarks and thus entire faces. In addition, it was also shown that keypoints can be used for Focus-of-Attention, i.e., to “gate” detected keypoints in associated
Consciousness
The fact that object recognition and brightness perception can be based on the same image representation has interesting consequences for consciousness or at least visual awareness. As Crick and Koch (2003) pointed out in their framework, the brain is divided into front and back parts, roughly at the central sulcus, with the front “looking at” the back with most sensory systems, including the visual cortex (in contrast to visual reconstruction in one neural map, as discussed in Section 3, this
Discussion
Computer vision for realtime applications requires tremendous computational power because all images must be processed from the first to the last pixel. Probing specific objects on the basis of already acquired context may lead to a significant reduction of processing. This idea is based on a few concepts from our visual cortex (Rensink, 2000): (1) our physical surround can be seen as external memory, i.e., there is no need to construct detailed and complete maps, (2) the bandwidth of the what
Acknowledgments
The authors thank the anonymous reviewers for their comments which helped to improve the manuscript. This research is partly supported by the Foundation for Science and Technology FCT (ISR/IST pluri-annual funding) through the POS-Conhecimento Program which includes FEDER funds, and by the FCT project PTDC/EIA/73633/2006 – SmartVision: active vision for the blind. The orange image is available at http://marathon.csee.usf/edge/edge_detection.html; the elephant image is available at //www.cs.rug.nl/~imaging/databases/contour_database/contour_2.html
References (77)
Strange vision: ganglion cells as circadian photoreceptors
Trends Neurosci.
(2003)- et al.
A neurodynamical cortical model of visual attention and invariant object recognition
Vision Res.
(2004) Improved grating and bar cell models in cortical area V1 and texture coding
Image Vision Comput.
(2007)- et al.
Contour integration by the human visual system: evidence for a local “association field”
Vision Res.
(1993) - et al.
Phase-based disparity measurement
Image Understand.
(1991) - et al.
Simulation of neural contour mechanisms: from simple to end-stopped cells
Vision Res.
(1992) - et al.
A deficit in strabismic amblyopia for global shape detection
Vision Res.
(1999) - et al.
Global contour processing in amblyopia
Vision Res.
(2007) - et al.
White’s effect: a dual mechanism
Vision Res.
(1989) - et al.
Modelling the role of salience in the allocation of overt visual attention
Vision Res.
(2002)
Mach bands: how many models are possible? Recent experimental findings and modeling attemps
Vision Res.
Binocular disparity and the perception of depth
Neuron
Multi-scale keypoints in V1 and beyond: object segregation, scale selection, saliency maps and face detection
BioSystems
V1 mechanisms and some figure-ground and border effects
J. Physiol.
Face detection using biologically motivated saliency map model
Proc. Int. Joint Conf. Neural Netw.
Endstopped operators based on iterated nonlinear center-surround inhibition
Human Vision Electronic Imaging, SPIE
A cortical mechanism for triggering top-down facilitation in visual object recognition
J. Cogn. Neurosci.
Visual objects in context
Nature Rev.: Neurosci.
Neurocomputational bases of object and face recognition
Philosoph. Trans. R. Soc.: Biol. Sci.
Visual perception
Physiology, Psychology and Ecology
A framework for consciousness
Nature Neurosci.
Visual categorization with bags of keypoints
Face identification using one spike per neuron: resistance to image degradations
Neural Netw.
The role of the neocortical laminar microcircuitry in perception, cognition, and consciousness
Modeling brightness perception and syntactical image coding
Optical Eng.
Responses of simple cells: events, interferences, and ambiguities
Biol. Cybern.
Ramp edges, Mach bands, and the functional significance of simple cell assembly
Biol. Cybern.
Modeling brightness perception
Multiresolution face recognition
Image Vision Comput.
Local scale control for edge detection and blur estimation
IEEE Tr. PAMI
Contour detection based on nonclassical receptive field inhibition
IEEE Tr. IP
The reentry hypothesis: the putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement
Cerebral Cortex
A recurrent model of contour integration in primary visual cortex
J. Vision
A robust visual method for assessing the relative performance of edge-detection algorithms
IEEE Tr. PAMI
A computational model of neural contour processing: figure-ground segregation and illusory contours
Face matching through information theoretical attention points and its applications to face detection and classification
Feedback connections act on the early part of the responses in monkey visual cortex
J. Neurophysiol.
Cited by (19)
BINK: Biological binary keypoint descriptor
2017, BioSystemsCitation Excerpt :In cortical area V1 there are different types of cells: simple, complex and end-stopped. These cells are thought to play an important role in coding the visual input: they allow to extract multiscale lines and edges (Rodrigues and du Buf, 2009) and keypoints, which are line/edge vertices or junctions, but also blobs (Rodrigues and du Buf, 2006). In summary, from each patch we can extract responses of even and odd simple cells at 8 orientations and 7 different scales.
Proto-object categorisation and local gist vision using low-level spatial features
2015, BioSystemsCitation Excerpt :We also note that conspicuity features were able to outperform both luminance and colour data, highlighting their discriminative capabilities. In further research it makes sense to expand the categorisation scheme, integrating additional low-level input features, such as lines/edges, textures and keypoints that are readily available from simple, complex and end-stopped cells in V1/V2 (Rodrigues and du Buf, 2006, 2009; Martins et al., 2012). Our ROR shape-only result (51.6%) indicates that the proposed proto-object shape categorisation method is within reach of much more computationally advanced and complex methods, based on state-of-the-art spin images with EMK features (53.1%) (Lai et al., 2011).
Perception of objects and scenes in age-related macular degeneration
2012, Journal Francais d'Ophtalmologie