Elsevier

Biosystems

Volume 95, Issue 3, March 2009, Pages 206-226
Biosystems

Multi-scale lines and edges in V1 and beyond: Brightness, object categorization and recognition, and consciousness

https://doi.org/10.1016/j.biosystems.2008.10.006Get rights and content

Abstract

In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness.

Introduction

The visual cortex detects and recognizes objects by means of the ventral “what” and dorsal “where” subsystems. The “bandwidth” of these systems is limited: only one object can be attended at any time (Rensink, 2000). In a current model by Deco and Rolls (2004) the ventral what system receives input from area V1 which proceeds through V2 and V4 to IT (inferior temporal cortex). The dorsal where system connects V1 and V2 through MT (medial temporal) to area PP (posterior parietal). Both systems are controlled, top-down, by attention and short-term memory with object representations in PF (prefrontal) cortex, i.e., a what component from PF46v to IT and a where component from PF46d to PP. The bottom-up (visual input code) and top-down (expected object and position) data streams are necessary for obtaining size, rotation and translation invariance, assuming that object views are normalized in visual memory.

Signal propagation from the retinas through the LGN (lateral geniculate nucleus) and areas V1, V2 etc., including feature extractions in V1 and groupings in higher areas, takes time. Object recognition is achieved in 150–200 ms and category-specific activation of PF cortex starts after about 100 ms (Bar, 2004). In addition, IT cortex first receives coarse-scale information and later fine-scale information. Apparently, one very brief glance is sufficient for the system to develop a gist of the contents from an image (Oliva and Torralba, 2006). This implies that some information propagates very rapidly and directly to “attention” in PF cortex in order to pre-select possible object-group templates and positions that then propagate down the what and where systems. This process we call object categorization, which cannot be obtained by the CBF (Categorical Basis Functions) model by Riesenhuber and Poggio (2000) because categorization (e.g. a cat) is obtained by grouping outputs of identification cells (cat-1, cat-2, cat-3). In other words, categorization would be obtained after recognition. In contrast, the LF (Low Frequency) model Oliva et al., 2003, Bar, 2004 assumes that categorization is obtained before recognition: low-frequency information that passes directly from V1/V2 to PF cortex, although the LF information actually proposed consists of lowpass-filtered images, but not of e.g. outputs of simple and complex cells in V1 which are tuned to low spatial frequencies. The latter option will be explored in this paper.

After object categorization on the basis of coarse-scale information has narrowed the set of objects to be tested, the recognition process can start by also applying fine-scale information. We will focus on how such processes can be embedded in the architecture referred to above, with special focus on face recognition. Despite the impressive number and variety of computer-vision methods devised for faces and facial landmarks, see e.g. Yang et al. (2002), we show that very promising results with a cortical model can be obtained, even in the case of some classical complications involving changes of pose (frontal vs. 3/4), facial expression, some lighting and noise conditions, and the wearing of spectacles.

In computer vision there exists a vast literature, from basic feature extraction to object segregation, categorization and recognition, and from image reconstruction (coding and decoding) to scale stabilization to disparity, but much less in biological vision. We therefore continue with a very brief summary of approaches related to this paper, with special focus on biological methods.

In addition to a few general overviews, see e.g. Hubel (1995), Bruce et al. (2000), Rasche (2005) and Miikkulainen et al. (2005), there also are detailed and quantitative models of simple and complex cells Heitger et al., 1992, Petkov and Kruizinga, 1997, plus various models for inhibitions Heitger et al., 1992, Petkov et al., 1993b, Barth et al., 1998, Rodrigues and du Buf, 2006a, edge detection Smith and Brady, 1997, Elder and Zucker, 1998, Kovesi., 1999, Grigorescu et al., 2003 and combined line and edge detection Verbeek and van Vliet, 1992, van Deemter and du Buf, 2000, Rodrigues and du Buf, 2004, Rodrigues and du Buf, 2006a. Other models address saliency maps and Focus-of-Attention Itti and Koch, 2001, Parkhurst et al., 2002, Deco and Rolls, 2004, Rodrigues and du Buf, 2006d, figure-ground segregation Heitger and von der Heydt, 1993, Hupe et al., 2001, Zhaoping, 2003, Rodrigues and du Buf, 2006a and object categorization Riesenhuber and Poggio, 2000, Leibe and Schiele, 2003, Csurka et al., 2004, Rodrigues and du Buf, 2006a. Concerning faces, various approaches have been proposed, from detecting faces and facial landmarks to the influence of different factors such as race, gender and age Delorme and Thorpe, 2001, Yang et al., 2002, Ban et al., 2003, Rodrigues and du Buf, 2005b, including final face recognition Kruizinga and Petkov, 1995, Zhao et al., 2003, Rodrigues and du Buf, 2006c, Rodrigues and du Buf, 2006d. Yet other models have been devised for disparity Fleet et al., 1991, Ohzawa et al., 1997, Qian, 1997, Rodrigues and du Buf, 2004, automatic scale selection (Lindeberg, 1994), visual reconstruction (Rodrigues and du Buf, 2006b) and brightness perception (du Buf, 2001). In this paper we show that one basic process, namely line and edge detection in V1 (and possibly V2), can be linked to most if not all the topics mentioned above, even to consciousness. We present an improved scheme for multi-scale line/edge extraction in V1, which is truly multi-scale with no free parameters. We illustrate the line/edge interpretation (coding and representation) for automatic scale selection and explore the importance of this interpretation in object reconstruction, segregation, categorization and recognition. Since experiments with possible Low-Frequency models based on lowpass-filtered images, following Bar (2004), gave rather disappointing results, which is due to smeared blobs of objects that lack any structure, we propose that categorization is based on coarse-scale line/edge coding, and that recognition involves all scales. Processing schemes are discussed in the framework of a complete cortical architecture. We emphasize that the multi-scale keypoint information also extracted in V1, which was shown to be very important for detection of facial landmarks and entire faces (Rodrigues and du Buf, 2006d), and other important features such as texture information that can be retrieved from bar and grating cells (du Buf, 2007), will not be employed here, because we want to focus completely on the multi-scale line/edge information in V1 and beyond. Therefore, this paper complements the previous one dedicated to keypoints (Rodrigues and du Buf, 2006d).

In Section 2 we present line/edge detection and classification in single- and multi-scale contexts, plus the application of non-classical receptive field (NCRF) inhibition. Section 3 illustrates the visual reconstruction model in relation to brightness perception. Section 4 deals with object segregation, Section 5 with automatic scale selection, and Section 6 with object categorization. This is followed by face recognition in Section 7 and consciousness in Section 8. We conclude with a final discussion in Section 9.

Section snippets

Line and Edge Detection and Classification

In many models it is assumed that Gabor quadrature filters provide a good model of receptive fields (RFs) of cortical simple cells. In the spatial domain (x,y) they consist of a real cosine and an imaginary sine, both with a Gaussian envelope Lee, 1996, Grigorescu et al., 2003, Rodrigues and du Buf, 2006d. As in Rodrigues and du Buf (2006d), an RF is given byGλ,σ,θ,ϕ(x,y)=expx˜2+γy˜22σ2cos(2πx˜λ+ϕ),with x˜=xcosθ+ysinθ and y˜=ycosθxsinθ, where 1/λ is the spatial frequency, λ being the

Visual Reconstruction and Brightness Perception

Image reconstruction can be obtained by assuming one lowpass filter plus a complete set of (Gabor) bandpass filters that cover the entire frequency domain, such that an allpass filter is formed—this concept is exploited in wavelet image compression and coding. The goal of our visual system is to detect objects, with no need, nor capacity, to reconstruct a complete image of our visual environment; see change blindness and the limited “bandwidth” of the what and where subsystems (Rensink, 2000).

Object Segregation

Until here we have illustrated multi-scale line and edge detection in area V1 and the symbolic interpretation for visual reconstruction in brightness perception, but one of the other goals of the visual cortex is to detect and recognize objects by means of the what and where systems. Object detection and recognition seem like a typical chicken-or-egg problem: without having some idea about the type and characteristics of the object it is not possible to separate the object from its background

Automatic Scale Selection

Apart from object segregation, other processes may play an important role in the fast where and slower what systems. Concentrating on lines and edges – ignoring other features extracted in V1 – there may be many scales and the tremendous amount of information may not propagate in parallel and at once to IT and PF cortex. It might be useful that lines and edges which are most characteristic for an object are extracted and that these propagate first, for example for a first but coarse object

Object Categorization

Object recognition is a clearly defined task: a certain cat, like the neighbors’ red tabby called Toby, is recognized or not. Categorization is more difficult to define because there are different levels, for example (a) an animal, (b) one with four legs, (c) a cat and (d) a red tabby, before deciding between our own red tabby called Tom and his brother Toby living next door. It is as if we were developing categorization by very young children: once they are familiar with the family’s cat,

Face Recognition

The final goal in vision is object recognition, but here we focus on face recognition by the multi-scale line and edge representations. This completes face detection as presented in the previous paper (Rodrigues and du Buf, 2006d), in which saliency maps and the multi-scale keypoint representation have been used for detecting facial landmarks and thus entire faces. In addition, it was also shown that keypoints can be used for Focus-of-Attention, i.e., to “gate” detected keypoints in associated

Consciousness

The fact that object recognition and brightness perception can be based on the same image representation has interesting consequences for consciousness or at least visual awareness. As Crick and Koch (2003) pointed out in their framework, the brain is divided into front and back parts, roughly at the central sulcus, with the front “looking at” the back with most sensory systems, including the visual cortex (in contrast to visual reconstruction in one neural map, as discussed in Section 3, this

Discussion

Computer vision for realtime applications requires tremendous computational power because all images must be processed from the first to the last pixel. Probing specific objects on the basis of already acquired context may lead to a significant reduction of processing. This idea is based on a few concepts from our visual cortex (Rensink, 2000): (1) our physical surround can be seen as external memory, i.e., there is no need to construct detailed and complete maps, (2) the bandwidth of the what

Acknowledgments

The authors thank the anonymous reviewers for their comments which helped to improve the manuscript. This research is partly supported by the Foundation for Science and Technology FCT (ISR/IST pluri-annual funding) through the POS-Conhecimento Program which includes FEDER funds, and by the FCT project PTDC/EIA/73633/2006 – SmartVision: active vision for the blind. The orange image is available at http://marathon.csee.usf/edge/edge_detection.html; the elephant image is available at //www.cs.rug.nl/~imaging/databases/contour_database/contour_2.html

References (77)

  • L. Pessoa

    Mach bands: how many models are possible? Recent experimental findings and modeling attemps

    Vision Res.

    (1996)
  • N. Qian

    Binocular disparity and the perception of depth

    Neuron

    (1997)
  • J. Rodrigues et al.

    Multi-scale keypoints in V1 and beyond: object segregation, scale selection, saliency maps and face detection

    BioSystems

    (2006)
  • L. Zhaoping

    V1 mechanisms and some figure-ground and border effects

    J. Physiol.

    (2003)
  • S. Ban et al.

    Face detection using biologically motivated saliency map model

    Proc. Int. Joint Conf. Neural Netw.

    (2003)
  • E. Barth et al.

    Endstopped operators based on iterated nonlinear center-surround inhibition

    Human Vision Electronic Imaging, SPIE

    (1998)
  • M. Bar

    A cortical mechanism for triggering top-down facilitation in visual object recognition

    J. Cogn. Neurosci.

    (2003)
  • M. Bar

    Visual objects in context

    Nature Rev.: Neurosci.

    (2004)
  • I. Biederman et al.

    Neurocomputational bases of object and face recognition

    Philosoph. Trans. R. Soc.: Biol. Sci.

    (1997)
  • V. Bruce et al.

    Visual perception

    Physiology, Psychology and Ecology

    (2000)
  • F. Crick et al.

    A framework for consciousness

    Nature Neurosci.

    (2003)
  • G. Csurka et al.

    Visual categorization with bags of keypoints

  • A. Delorme et al.

    Face identification using one spike per neuron: resistance to image degradations

    Neural Netw.

    (2001)
  • M. Denham

    The role of the neocortical laminar microcircuitry in perception, cognition, and consciousness

  • J. du Buf et al.

    Modeling brightness perception and syntactical image coding

    Optical Eng.

    (1995)
  • J. du Buf

    Responses of simple cells: events, interferences, and ambiguities

    Biol. Cybern.

    (1993)
  • J. du Buf

    Ramp edges, Mach bands, and the functional significance of simple cell assembly

    Biol. Cybern.

    (1994)
  • J. du Buf

    Modeling brightness perception

  • H. Ekenel et al.

    Multiresolution face recognition

    Image Vision Comput.

    (2005)
  • J. Elder et al.

    Local scale control for edge detection and blur estimation

    IEEE Tr. PAMI

    (1998)
  • C. Grigorescu et al.

    Contour detection based on nonclassical receptive field inhibition

    IEEE Tr. IP

    (2003)
  • F. Hamker

    The reentry hypothesis: the putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement

    Cerebral Cortex

    (2005)
  • T. Hansen et al.

    A recurrent model of contour integration in primary visual cortex

    J. Vision

    (2008)
  • M. Heath et al.

    A robust visual method for assessing the relative performance of edge-detection algorithms

    IEEE Tr. PAMI

    (2000)
  • F. Heitger et al.

    A computational model of neural contour processing: figure-ground segregation and illusory contours

  • K. Hotta et al.

    Face matching through information theoretical attention points and its applications to face detection and classification

  • Hubel, D., 1995. Eye, Brain and Vision, Scientific American...
  • J. Hupe et al.

    Feedback connections act on the early part of the responses in monkey visual cortex

    J. Neurophysiol.

    (2001)
  • Cited by (19)

    • BINK: Biological binary keypoint descriptor

      2017, BioSystems
      Citation Excerpt :

      In cortical area V1 there are different types of cells: simple, complex and end-stopped. These cells are thought to play an important role in coding the visual input: they allow to extract multiscale lines and edges (Rodrigues and du Buf, 2009) and keypoints, which are line/edge vertices or junctions, but also blobs (Rodrigues and du Buf, 2006). In summary, from each patch we can extract responses of even and odd simple cells at 8 orientations and 7 different scales.

    • Proto-object categorisation and local gist vision using low-level spatial features

      2015, BioSystems
      Citation Excerpt :

      We also note that conspicuity features were able to outperform both luminance and colour data, highlighting their discriminative capabilities. In further research it makes sense to expand the categorisation scheme, integrating additional low-level input features, such as lines/edges, textures and keypoints that are readily available from simple, complex and end-stopped cells in V1/V2 (Rodrigues and du Buf, 2006, 2009; Martins et al., 2012). Our ROR shape-only result (51.6%) indicates that the proposed proto-object shape categorisation method is within reach of much more computationally advanced and complex methods, based on state-of-the-art spin images with EMK features (53.1%) (Lai et al., 2011).

    View all citing articles on Scopus
    View full text