Multi-scale lines and edges in V1 and beyond: Brightness, object categorization and recognition, and consciousness

doi:10.1016/j.biosystems.2008.10.006

Biosystems

Volume 95, Issue 3, March 2009, Pages 206-226

https://doi.org/10.1016/j.biosystems.2008.10.006 Get rights and content

Abstract

In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness.

Introduction

The visual cortex detects and recognizes objects by means of the ventral “what” and dorsal “where” subsystems. The “bandwidth” of these systems is limited: only one object can be attended at any time (Rensink, 2000). In a current model by Deco and Rolls (2004) the ventral what system receives input from area V1 which proceeds through V2 and V4 to IT (inferior temporal cortex). The dorsal where system connects V1 and V2 through MT (medial temporal) to area PP (posterior parietal). Both systems are controlled, top-down, by attention and short-term memory with object representations in PF (prefrontal) cortex, i.e., a what component from PF46v to IT and a where component from PF46d to PP. The bottom-up (visual input code) and top-down (expected object and position) data streams are necessary for obtaining size, rotation and translation invariance, assuming that object views are normalized in visual memory.

Signal propagation from the retinas through the LGN (lateral geniculate nucleus) and areas V1, V2 etc., including feature extractions in V1 and groupings in higher areas, takes time. Object recognition is achieved in 150–200 ms and category-specific activation of PF cortex starts after about 100 ms (Bar, 2004). In addition, IT cortex first receives coarse-scale information and later fine-scale information. Apparently, one very brief glance is sufficient for the system to develop a gist of the contents from an image (Oliva and Torralba, 2006). This implies that some information propagates very rapidly and directly to “attention” in PF cortex in order to pre-select possible object-group templates and positions that then propagate down the what and where systems. This process we call object categorization, which cannot be obtained by the CBF (Categorical Basis Functions) model by Riesenhuber and Poggio (2000) because categorization (e.g. a cat) is obtained by grouping outputs of identification cells (cat-1, cat-2, cat-3). In other words, categorization would be obtained after recognition. In contrast, the LF (Low Frequency) model Oliva et al., 2003, Bar, 2004 assumes that categorization is obtained before recognition: low-frequency information that passes directly from V1/V2 to PF cortex, although the LF information actually proposed consists of lowpass-filtered images, but not of e.g. outputs of simple and complex cells in V1 which are tuned to low spatial frequencies. The latter option will be explored in this paper.

After object categorization on the basis of coarse-scale information has narrowed the set of objects to be tested, the recognition process can start by also applying fine-scale information. We will focus on how such processes can be embedded in the architecture referred to above, with special focus on face recognition. Despite the impressive number and variety of computer-vision methods devised for faces and facial landmarks, see e.g. Yang et al. (2002), we show that very promising results with a cortical model can be obtained, even in the case of some classical complications involving changes of pose (frontal vs. 3/4), facial expression, some lighting and noise conditions, and the wearing of spectacles.

In computer vision there exists a vast literature, from basic feature extraction to object segregation, categorization and recognition, and from image reconstruction (coding and decoding) to scale stabilization to disparity, but much less in biological vision. We therefore continue with a very brief summary of approaches related to this paper, with special focus on biological methods.

In addition to a few general overviews, see e.g. Hubel (1995), Bruce et al. (2000), Rasche (2005) and Miikkulainen et al. (2005), there also are detailed and quantitative models of simple and complex cells Heitger et al., 1992, Petkov and Kruizinga, 1997, plus various models for inhibitions Heitger et al., 1992, Petkov et al., 1993b, Barth et al., 1998, Rodrigues and du Buf, 2006a, edge detection Smith and Brady, 1997, Elder and Zucker, 1998, Kovesi., 1999, Grigorescu et al., 2003 and combined line and edge detection Verbeek and van Vliet, 1992, van Deemter and du Buf, 2000, Rodrigues and du Buf, 2004, Rodrigues and du Buf, 2006a. Other models address saliency maps and Focus-of-Attention Itti and Koch, 2001, Parkhurst et al., 2002, Deco and Rolls, 2004, Rodrigues and du Buf, 2006d, figure-ground segregation Heitger and von der Heydt, 1993, Hupe et al., 2001, Zhaoping, 2003, Rodrigues and du Buf, 2006a and object categorization Riesenhuber and Poggio, 2000, Leibe and Schiele, 2003, Csurka et al., 2004, Rodrigues and du Buf, 2006a. Concerning faces, various approaches have been proposed, from detecting faces and facial landmarks to the influence of different factors such as race, gender and age Delorme and Thorpe, 2001, Yang et al., 2002, Ban et al., 2003, Rodrigues and du Buf, 2005b, including final face recognition Kruizinga and Petkov, 1995, Zhao et al., 2003, Rodrigues and du Buf, 2006c, Rodrigues and du Buf, 2006d. Yet other models have been devised for disparity Fleet et al., 1991, Ohzawa et al., 1997, Qian, 1997, Rodrigues and du Buf, 2004, automatic scale selection (Lindeberg, 1994), visual reconstruction (Rodrigues and du Buf, 2006b) and brightness perception (du Buf, 2001). In this paper we show that one basic process, namely line and edge detection in V1 (and possibly V2), can be linked to most if not all the topics mentioned above, even to consciousness. We present an improved scheme for multi-scale line/edge extraction in V1, which is truly multi-scale with no free parameters. We illustrate the line/edge interpretation (coding and representation) for automatic scale selection and explore the importance of this interpretation in object reconstruction, segregation, categorization and recognition. Since experiments with possible Low-Frequency models based on lowpass-filtered images, following Bar (2004), gave rather disappointing results, which is due to smeared blobs of objects that lack any structure, we propose that categorization is based on coarse-scale line/edge coding, and that recognition involves all scales. Processing schemes are discussed in the framework of a complete cortical architecture. We emphasize that the multi-scale keypoint information also extracted in V1, which was shown to be very important for detection of facial landmarks and entire faces (Rodrigues and du Buf, 2006d), and other important features such as texture information that can be retrieved from bar and grating cells (du Buf, 2007), will not be employed here, because we want to focus completely on the multi-scale line/edge information in V1 and beyond. Therefore, this paper complements the previous one dedicated to keypoints (Rodrigues and du Buf, 2006d).

In Section 2 we present line/edge detection and classification in single- and multi-scale contexts, plus the application of non-classical receptive field (NCRF) inhibition. Section 3 illustrates the visual reconstruction model in relation to brightness perception. Section 4 deals with object segregation, Section 5 with automatic scale selection, and Section 6 with object categorization. This is followed by face recognition in Section 7 and consciousness in Section 8. We conclude with a final discussion in Section 9.

Section snippets

Line and Edge Detection and Classification

In many models it is assumed that Gabor quadrature filters provide a good model of receptive fields (RFs) of cortical simple cells. In the spatial domain $(x, y)$ they consist of a real cosine and an imaginary sine, both with a Gaussian envelope Lee, 1996, Grigorescu et al., 2003, Rodrigues and du Buf, 2006d. As in Rodrigues and du Buf (2006d), an RF is given by $G_{λ, σ, θ, ϕ} (x, y) = \exp (- \frac{{\tilde{x}}^{2} + γ {\tilde{y}}^{2}}{2 σ^{2}}) \cdot \cos (2 π \frac{\tilde{x}}{λ} + ϕ),$ with $\tilde{x} = x \cos θ + y \sin θ$ and $\tilde{y} = y \cos θ - x \sin θ$ , where $1 / λ$ is the spatial frequency, $λ$ being the

Visual Reconstruction and Brightness Perception

Image reconstruction can be obtained by assuming one lowpass filter plus a complete set of (Gabor) bandpass filters that cover the entire frequency domain, such that an allpass filter is formed—this concept is exploited in wavelet image compression and coding. The goal of our visual system is to detect objects, with no need, nor capacity, to reconstruct a complete image of our visual environment; see change blindness and the limited “bandwidth” of the what and where subsystems (Rensink, 2000).

Object Segregation

Until here we have illustrated multi-scale line and edge detection in area V1 and the symbolic interpretation for visual reconstruction in brightness perception, but one of the other goals of the visual cortex is to detect and recognize objects by means of the what and where systems. Object detection and recognition seem like a typical chicken-or-egg problem: without having some idea about the type and characteristics of the object it is not possible to separate the object from its background

Automatic Scale Selection

Apart from object segregation, other processes may play an important role in the fast where and slower what systems. Concentrating on lines and edges – ignoring other features extracted in V1 – there may be many scales and the tremendous amount of information may not propagate in parallel and at once to IT and PF cortex. It might be useful that lines and edges which are most characteristic for an object are extracted and that these propagate first, for example for a first but coarse object

Object Categorization

Object recognition is a clearly defined task: a certain cat, like the neighbors’ red tabby called Toby, is recognized or not. Categorization is more difficult to define because there are different levels, for example (a) an animal, (b) one with four legs, (c) a cat and (d) a red tabby, before deciding between our own red tabby called Tom and his brother Toby living next door. It is as if we were developing categorization by very young children: once they are familiar with the family’s cat,

Face Recognition

The final goal in vision is object recognition, but here we focus on face recognition by the multi-scale line and edge representations. This completes face detection as presented in the previous paper (Rodrigues and du Buf, 2006d), in which saliency maps and the multi-scale keypoint representation have been used for detecting facial landmarks and thus entire faces. In addition, it was also shown that keypoints can be used for Focus-of-Attention, i.e., to “gate” detected keypoints in associated

Consciousness

The fact that object recognition and brightness perception can be based on the same image representation has interesting consequences for consciousness or at least visual awareness. As Crick and Koch (2003) pointed out in their framework, the brain is divided into front and back parts, roughly at the central sulcus, with the front “looking at” the back with most sensory systems, including the visual cortex (in contrast to visual reconstruction in one neural map, as discussed in Section 3, this

Discussion

Computer vision for realtime applications requires tremendous computational power because all images must be processed from the first to the last pixel. Probing specific objects on the basis of already acquired context may lead to a significant reduction of processing. This idea is based on a few concepts from our visual cortex (Rensink, 2000): (1) our physical surround can be seen as external memory, i.e., there is no need to construct detailed and complete maps, (2) the bandwidth of the what

Acknowledgments

The authors thank the anonymous reviewers for their comments which helped to improve the manuscript. This research is partly supported by the Foundation for Science and Technology FCT (ISR/IST pluri-annual funding) through the POS-Conhecimento Program which includes FEDER funds, and by the FCT project PTDC/EIA/73633/2006 – SmartVision: active vision for the blind. The orange image is available at http://marathon.csee.usf/edge/edge_detection.html; the elephant image is available at //www.cs.rug.nl/~imaging/databases/contour_database/contour_2.html

References (77)

D. Berson
Strange vision: ganglion cells as circadian photoreceptors
Trends Neurosci.
(2003)
G. Deco et al.
A neurodynamical cortical model of visual attention and invariant object recognition
Vision Res.
(2004)
J. du Buf
Improved grating and bar cell models in cortical area V1 and texture coding
Image Vision Comput.
(2007)
D. Field et al.
Contour integration by the human visual system: evidence for a local “association field”
Vision Res.
(1993)
D. Fleet et al.
Phase-based disparity measurement
Image Understand.
(1991)
F. Heitger et al.
Simulation of neural contour mechanisms: from simple to end-stopped cells
Vision Res.
(1992)
R. Hess et al.
A deficit in strabismic amblyopia for global shape detection
Vision Res.
(1999)
D. Levi et al.
Global contour processing in amblyopia
Vision Res.
(2007)
B. Moulden et al.
White’s effect: a dual mechanism
Vision Res.
(1989)
D. Parkhurst et al.
Modelling the role of salience in the allocation of overt visual attention
Vision Res.
(2002)

L. Pessoa

Mach bands: how many models are possible? Recent experimental findings and modeling attemps

Vision Res.

(1996)

N. Qian

Binocular disparity and the perception of depth

Neuron

(1997)

J. Rodrigues et al.

Multi-scale keypoints in V1 and beyond: object segregation, scale selection, saliency maps and face detection

BioSystems

(2006)

L. Zhaoping

V1 mechanisms and some figure-ground and border effects

J. Physiol.

(2003)

S. Ban et al.

Face detection using biologically motivated saliency map model

Proc. Int. Joint Conf. Neural Netw.

(2003)

E. Barth et al.

Endstopped operators based on iterated nonlinear center-surround inhibition

Human Vision Electronic Imaging, SPIE

(1998)

M. Bar

A cortical mechanism for triggering top-down facilitation in visual object recognition

J. Cogn. Neurosci.

(2003)

M. Bar

Visual objects in context

Nature Rev.: Neurosci.

(2004)

I. Biederman et al.

Neurocomputational bases of object and face recognition

Philosoph. Trans. R. Soc.: Biol. Sci.

(1997)

V. Bruce et al.

Visual perception

Physiology, Psychology and Ecology

(2000)

F. Crick et al.

A framework for consciousness

Nature Neurosci.

(2003)

G. Csurka et al.

Visual categorization with bags of keypoints

A. Delorme et al.

Face identification using one spike per neuron: resistance to image degradations

Neural Netw.

(2001)

M. Denham

The role of the neocortical laminar microcircuitry in perception, cognition, and consciousness

J. du Buf et al.

Modeling brightness perception and syntactical image coding

Optical Eng.

(1995)

J. du Buf

Responses of simple cells: events, interferences, and ambiguities

Biol. Cybern.

(1993)

J. du Buf

Ramp edges, Mach bands, and the functional significance of simple cell assembly

Biol. Cybern.

(1994)

J. du Buf

Modeling brightness perception

H. Ekenel et al.

Multiresolution face recognition

Image Vision Comput.

(2005)

J. Elder et al.

Local scale control for edge detection and blur estimation

IEEE Tr. PAMI

(1998)

C. Grigorescu et al.

Contour detection based on nonclassical receptive field inhibition

IEEE Tr. IP

(2003)

F. Hamker

The reentry hypothesis: the putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement

Cerebral Cortex

(2005)

T. Hansen et al.

A recurrent model of contour integration in primary visual cortex

J. Vision

(2008)

M. Heath et al.

A robust visual method for assessing the relative performance of edge-detection algorithms

IEEE Tr. PAMI

(2000)

F. Heitger et al.

A computational model of neural contour processing: figure-ground segregation and illusory contours

K. Hotta et al.

Face matching through information theoretical attention points and its applications to face detection and classification

Hubel, D., 1995. Eye, Brain and Vision, Scientific American...

J. Hupe et al.

Feedback connections act on the early part of the responses in monkey visual cortex

J. Neurophysiol.

(2001)

Cited by (19)

BINK: Biological binary keypoint descriptor
2017, BioSystems
Citation Excerpt :
In cortical area V1 there are different types of cells: simple, complex and end-stopped. These cells are thought to play an important role in coding the visual input: they allow to extract multiscale lines and edges (Rodrigues and du Buf, 2009) and keypoints, which are line/edge vertices or junctions, but also blobs (Rodrigues and du Buf, 2006). In summary, from each patch we can extract responses of even and odd simple cells at 8 orientations and 7 different scales.
Learning robust keypoint descriptors has become an active research area in the past decade. Matching local features is not only important for computational applications, but may also play an important role in early biological vision for disparity and motion processing. Although there were already some floating-point descriptors like SIFT and SURF that can yield high matching rates, the need for better and faster descriptors for real-time applications and embedded devices with low computational power led to the development of binary descriptors, which are usually much faster to compute and to match. Most of these descriptors are based on purely computational methods. The few descriptors that take some inspiration from biological systems are still lagging behind in terms of performance. In this paper, we propose a new biologically inspired binary keypoint descriptor: BINK. Built on responses of cortical V1 cells, it significantly outperforms the other biologically inspired descriptors. The new descriptor can be easily integrated with a V1-based keypoint detector that we previously developed for real-time applications.
Proto-object categorisation and local gist vision using low-level spatial features
2015, BioSystems
Citation Excerpt :
We also note that conspicuity features were able to outperform both luminance and colour data, highlighting their discriminative capabilities. In further research it makes sense to expand the categorisation scheme, integrating additional low-level input features, such as lines/edges, textures and keypoints that are readily available from simple, complex and end-stopped cells in V1/V2 (Rodrigues and du Buf, 2006, 2009; Martins et al., 2012). Our ROR shape-only result (51.6%) indicates that the proposed proto-object shape categorisation method is within reach of much more computationally advanced and complex methods, based on state-of-the-art spin images with EMK features (53.1%) (Lai et al., 2011).
Object categorisation is a research area with significant challenges, especially in conditions with bad lighting, occlusions, different poses and similar objects. This makes systems that rely on precise information unable to perform efficiently, like a robotic arm that needs to know which objects it can reach. We propose a biologically inspired object detection and categorisation framework that relies on robust low-level object shape. Using only edge conspicuity and disparity features for scene figure-ground segregation and object categorisation, a trained neural network classifier can quickly categorise broad object families and consequently bootstrap a low-level scene gist system. We argue that similar processing is possibly located in the parietal pathway leading to the LIP cortex and, via areas V5/MT and MST, providing useful information to the superior colliculus for eye and head control.
Perception of objects and scenes in age-related macular degeneration
2012, Journal Francais d'Ophtalmologie
Les questionnaires sur la qualité de vie suggèrent que les patients souffrant de dégénérescence maculaire liée à l’âge (DMLA) rencontrent des difficultés dans la recherche d’objets et dans leurs déplacements. En effet, dans un environnement naturel, les objets apparaissent rarement isolés. Ils apparaissent dans un contexte spatial qui peut les masquer en partie ou provoquer des obstacles dans les déplacements. De plus, la luminance d’une scène naturelle varie au cours de la journée et peut donc altérer la perception. Nous étudions la capacité de reconnaissance des objets et des scènes naturelles chez les patients DMLA en utilisant des photographies de scènes naturelles. Les études montrent que les patients DMLA sont capables de catégoriser des scènes naturelles ou urbaines et de discriminer une scène d’intérieur d’une scène extérieur avec un niveau de précision élevé. Ils détectent mieux un objet lorsque celui-ci est isolé ou en couleur ou est séparé du fond par un espace blanc que lorsque l’objet est présenté dans son contexte naturel. Ils présentent plus de difficultés que les sujets avec vision normale pour détecter un objet dans une scène achromatique dont le contraste est réduit. Ces résultats peuvent avoir des applications pratiques dans la rééducation, dans la mise en page des textes et des magazines et dans l’agencement de l’environnement spatial des personnes âgées souffrant de DMLA afin d’améliorer la recherche d’objets, la mobilité et diminuer le risque de chute.
Vision related quality of life questionnaires suggest that patients with AMD exhibit difficulties in finding objects and in mobility. In the natural environment, objects seldom appear in isolation. They appear in a spatial context which may obscure them in part or place obstacles in the patient's path. Furthermore, the luminance of a natural scene varies as a function of the hour of the day and the light source, which can alter perception. This study aims to evaluate recognition of objects and natural scenes by patients with AMD, by using photographs of such scenes. Studies demonstrate that AMD patients are able to categorize scenes as nature scenes or urban scenes and to discriminate indoor from outdoor scenes with a high degree of precision. They detect objects better in isolation, in color, or against a white background than in their natural contexts. These patients encounter more difficulties than normally sighted individuals in detecting objects in a low-contrast, black-and-white scene. These results may have implications for rehabilitation, for layout of texts and magazines for the reading-impaired and for the rearrangement of the spatial environment of older AMD patients in order to facilitate mobility, finding objects and reducing the risk of falls.
Innovative Analysis Ready Data (ARD) product and process requirements, software system design, algorithms and implementation at the midstream as necessary-but-not-sufficient precondition of the downstream in a new notion of Space Economy 4.0 - Part 1: Problem background in Artificial General Intelligence (AGI)
2023, Big Earth Data
Innovative Analysis Ready Data (ARD) product and process requirements, software system design, algorithms and implementation at the midstream as necessary-but-not-sufficient precondition of the downstream in a new notion of Space Economy 4.0 - Part 2: Software developments
2023, Big Earth Data
Luminance, colour, viewpoint and border enhanced disparity energy model
2015, PLoS ONE

View all citing articles on Scopus

View full text

Multi-scale lines and edges in V1 and beyond: Brightness, object categorization and recognition, and consciousness

Abstract

Introduction

Section snippets

Line and Edge Detection and Classification

Visual Reconstruction and Brightness Perception

Object Segregation

Automatic Scale Selection

Object Categorization

Face Recognition

Consciousness

Discussion

Acknowledgments

Trends Neurosci.

Vision Res.

Image Vision Comput.

Vision Res.

Image Understand.

Vision Res.

Vision Res.

Vision Res.

Vision Res.

Vision Res.

Vision Res.

Neuron

BioSystems

J. Physiol.

Face detection using biologically motivated saliency map model

Proc. Int. Joint Conf. Neural Netw.

Endstopped operators based on iterated nonlinear center-surround inhibition

Human Vision Electronic Imaging, SPIE

A cortical mechanism for triggering top-down facilitation in visual object recognition

J. Cogn. Neurosci.

Visual objects in context

Nature Rev.: Neurosci.

Neurocomputational bases of object and face recognition

Philosoph. Trans. R. Soc.: Biol. Sci.

Visual perception

Physiology, Psychology and Ecology

A framework for consciousness

Nature Neurosci.

Visual categorization with bags of keypoints

Face identification using one spike per neuron: resistance to image degradations

Neural Netw.

The role of the neocortical laminar microcircuitry in perception, cognition, and consciousness

Modeling brightness perception and syntactical image coding

Optical Eng.

Responses of simple cells: events, interferences, and ambiguities

Biol. Cybern.

Ramp edges, Mach bands, and the functional significance of simple cell assembly

Biol. Cybern.

Modeling brightness perception

Multiresolution face recognition

Image Vision Comput.

Local scale control for edge detection and blur estimation

IEEE Tr. PAMI

Contour detection based on nonclassical receptive field inhibition

IEEE Tr. IP

The reentry hypothesis: the putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement

Cerebral Cortex

A recurrent model of contour integration in primary visual cortex

J. Vision

A robust visual method for assessing the relative performance of edge-detection algorithms

IEEE Tr. PAMI

A computational model of neural contour processing: figure-ground segregation and illusory contours

Face matching through information theoretical attention points and its applications to face detection and classification

Feedback connections act on the early part of the responses in monkey visual cortex

J. Neurophysiol.