Improved object recognition using neural networks trained to mimic the brain’s statistical properties

doi:10.1016/j.neunet.2020.07.013

Neural Networks

Volume 131, November 2020, Pages 103-114

https://doi.org/10.1016/j.neunet.2020.07.013 Get rights and content

Abstract

The current state-of-the-art object recognition algorithms, deep convolutional neural networks (DCNNs), are inspired by the architecture of the mammalian visual system, and are capable of human-level performance on many tasks. As they are trained for object recognition tasks, it has been shown that DCNNs develop hidden representations that resemble those observed in the mammalian visual system (Razavi and Kriegeskorte, 2014; Yamins and Dicarlo, 2016; Gu and van Gerven, 2015; Mcclure and Kriegeskorte, 2016). Moreover, DCNNs trained on object recognition tasks are currently among the best models we have of the mammalian visual system. This led us to hypothesize that teaching DCNNs to achieve even more brain-like representations could improve their performance. To test this, we trained DCNNs on a composite task, wherein networks were trained to: (a) classify images of objects; while (b) having intermediate representations that resemble those observed in neural recordings from monkey visual cortex. Compared with DCNNs trained purely for object categorization, DCNNs trained on the composite task had better object recognition performance and are more robust to label corruption. Interestingly, we found that neural data was not required for this process, but randomized data with the same statistical properties as neural data also boosted performance. While the performance gains we observed when training on the composite task vs the “pure” object recognition task were modest, they were remarkably robust. Notably, we observed these performance gains across all network variations we studied, including: smaller (CORNet-Z) vs larger (VGG-16) architectures; variations in optimizers (Adam vs gradient descent); variations in activation function (ReLU vs ELU); and variations in network initialization. Our results demonstrate the potential utility of a new approach to training object recognition networks, using strategies in which the brain – or at least the statistical properties of its activation patterns – serves as a teacher signal for training DCNNs.

Introduction

Deep convolutional neural networks (DCNNs) have recently led to a rapid advance in the state-of-the-art object recognition systems (Lecun, Bengio, & Hinton, 2015). At the same time, there remain critical shortcomings in these systems (Rajalingham et al., 2018). We asked whether training DCNNs to respond to images in a more brain-like manner could lead to better performance. DCNN architectures are directly inspired by that of the mammalian visual system (MVS) (Hubel & Wiesel, 1968), and as DCNNs improve at object recognition tasks, they learn representations that are increasingly similar to those found in the MVS (Gu and van Gerven, 2015, Mcclure and Kriegeskorte, 2016, Razavi and Kriegeskorte, 2014, Yamins and Dicarlo, 2016). Consequently, we expected that forcing the DCNNs to have image representations that were even more similar to those found in the MVS, could lead to better performance.

Previous work showed that the performance of smaller “student” DCNNs could be improved by training them to match the image representations of larger “teacher” DCNNs (Hinton et al., 2015, Mcclure and Kriegeskorte, 2016, Romero et al., 2015), and that DCNNs could be directly trained to reproduce image representations formed by the V1 area of monkey visual cortex (Kindel, Christensen, & Zylberberg, 2019). These studies provide a foundation for the current work, in which we used monkey V1 as a teacher network for training DCNNs to categorize images. We then tested the hypothesis that DCNNs trained with the monkey V1 as a teacher would outperform those trained without this teacher signal. By several relevant metrics (including accuracy), we found that performance was increased when monkey V1 was used as a teacher. Importantly, the monkey V1 data were collected in response to different images than the ones in the object recognition task. As a result, our approach, of using the brain as a teacher signal, can leverage pre-existing, and publicly-available neural data, without necessarily requiring new neuroscience experiments for each new machine learning task. Moreover, we also trained DCNNs with random teacher signals that matched the statistics of monkey V1 neural activations, and found that those outperformed networks trained without a teacher signal. This emphasizes a potential role for using the statistical properties of neural activations as a form of regularizer, that could be useful for training DCNNs.

Related recent work has demonstrated success in using neural data to train machine learning models. One study found that using fMRI measurements of human brain activations from subjects viewing images could guide Support Vector Machines (SVMs) decision boundaries (Fong, Scheirer, & Cox, 2018). In this study, the authors weighted the training data based on how easy it was for the human brain to recognize the example as a member of a class (Fong et al., 2018). This is different from our work where we train deep convolutional neural networks with a two-part cost function that explicitly trains on matching the neural representations, as opposed to the previous approach which weighted the cost function on specific training examples. In our work, the images shown to the animal during the collection of neural data were not category-labeled images from a machine learning benchmark task. For contrast, in the recent Fong et al. study (Fong et al., 2018), the neural data had to be collected for the same image set used for the categorization task. Previous work from Peterson et al. demonstrated that training deep convolutional neural networks with human perceptual uncertainty makes classification more robust to variations in the test set and to adversarial examples Peterson, Battleday, Griffiths, and Russakovsky (2020). In this study, they use human guidance to change the labels for training, incorporating uncertainties. We instead focus on changing the cost function – incorporating neural data into the network evaluation – without considering behavioral reports of uncertainty. Finally, previous work from Linsley, Shiebler, Eberhardt, and Serre (2019) demonstrated that using human behavioral data to add supervisory attention guidance improved object recognition performance. This is again quite a different approach from ours: while they focused on behavioral data, we instead used signals recorded from visual cortical neurons.

Notably, we did not aim to achieve state-of-the-art classification performance in this work: we instead sought to test whether the use of neural data as a “teacher” signal could robustly improve DCNN performance. For that reason, we studied a wide variety of network properties: different architectures and network sizes; different activation functions; and different optimizers. Our results indicate that, over all of these variations, (1) DCNNs trained to mimic monkey V1 (or surrogate data matching the statistics of monkey V1) have better object recognition performance, (2) DCNNs trained to mimic monkey V1 make fewer errors, and of these errors, more of them are within the correct superclass, and (3) DCNNs trained to mimic monkey V1 are more robust to label corruption. While the performance gains we observed were somewhat modest, based on the robustness of those performance gains, we anticipate that future work could productively apply our new training method to other networks, potentially improving on the current state-of-the-art object recognition systems.

Section snippets

Monkey visual cortex data

Our monkey V1 teacher signal is from publicly-available multielectrode recordings from anesthetized monkeys presented with a series of images while experimenters recorded the spiking activity of neurons in primary visual cortex (V1) with a multielectrode array (Coen-Cagli, Kohn, & Schwartz, 2015). These recordings were conducted in 10 experimental sessions with 3 different animals, resulting in recordings from 392 neurons. The monkeys were shown 270 static natural images as well as various

Results

We trained neural networks on the composite cost (Eq. (1)), with varying ratios $r$ describing the trade-off between representational similarity cost and categorization cost. We evaluated the trained networks based on categorization accuracy achieved on held-out data (not used in training) from the CIFAR100 dataset. We ran each experiment 10 times with different initial randomized conditions to demonstrate that the difference in accuracy is not due to initial conditions (Mehrer et al., 2020). In

Discussion

Training the early layers of convolutional neural networks to mimic the image representations from monkey V1 improves those networks’ ability to categorize previously-unseen images. Moreover, networks trained using monkey V1 as a representation “teacher” made errors that were more often within the correct superclass than that did networks without the “teacher” signal. While the performance gains were modest, they were remarkably robust: we observed similar performance gains on large and small

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

JZ is an Associate Fellow of CIFAR, in the Learning in Machines and Brains Program. JZ further acknowledges the following funding sources: Sloan Fellowship, Canada Research Chairs Program, and Natural Science and Engineering Research Council of Canada (NSERC). CF was supported by an NSF Graduate Research Fellowship, Award # 1553798. AF is a Fellow of CIFAR program for Learning in Machines and Brains, and holds a Canada CIFAR AI Chair. AF and HX are funded through CIFAR and an NSERC Discovery

References (32)

Barron, J. T. (2019). A general and adaptive robust loss function. In Proc IEEE comput soc conf comput vis...
CadenaS.A. et al.
Deep convolutional models improve predictions of macaque v1 responses to natural images
(2017)
ClevertD. et al.
Fast and accurate deep network learning by exponential linear units (ELUs)
Coen-CagliR. et al.
Flexible gating of contextual influences in natural vision
Nature Neuroscience
(2015)
FongR.C. et al.
Using human brain activity to guide machine learning
Scientific Reports
(2018)
GlorotX. et al.
Understanding the difficulty of training deep feedforward neural networks
Journal of Machine Learning Research
(2010)
GuU. et al.
Deep neural networks reveal a gradient in the complexity
The Journal of Neuroscience
(2015)
HintonG. et al.
Distilling the knowledge in a neural network
(2015)
HubelD.H. et al.
Receptive fields and functional architecture of monkey striate cortex
Journal Physiology
(1968)
KietzmannT.C. et al.
Recurrence is required to capture the representational dynamics of the human visual system
Proceedings of the National Academy of Sciences of the United States of America
(2019)

KindelW. et al.

Using deep learning to probe the neural code for images in primary visual cortex

Journal of Vision

(2019)

KingmaD.P. et al.

Adam: A method for stochastic optimization

KriegeskorteN. et al.

Representational similarity analysis – connecting the branches of systems neuroscience

Frontiers in Systems Neuroscience

(2008)

KrizhevskyA.

Learning multiple layers of features from tiny imagesTechnical Report TR-2009

(2009)

KrizhevskyA. et al.

Imagenet classification with deep convolutional neural networks

KubiliusJ. et al.

Cornet : Modeling the neural mechanisms of core object recognition

(2018)

Cited by (19)

Natural and Artificial Intelligence: A brief introduction to the interplay between AI and neuroscience research
2021, Neural Networks
Citation Excerpt :
The challenge of creating artificial systems capable of emulating biological visual processing is formidable. However, recent efforts to understand and reverse engineer the brain’s ventral visual stream, a series of interconnected cortical nuclei responsible for hierarchically processing and encoding of images into explicit neural representations, have shown great promise in the creation of robust AI systems capable of decoding and interpreting human visual processing, as well as performing complex visual intelligence skills including image recognition (Federer, Xu, Fyshe, & Zylberberg, 2020; Verschae & Ruiz-del-Solar, 2015), motion detection (Manchanda & Sharma, 2016; Wu, McGinnity, Maguire, Cai, & Valderrama-Gonzalez, 2008), and object tracking (Luo et al., 2020; Soleimanitaleb, Keyvanrad, & Jafari, 2019; Zhang et al., 2021). In an effort to understand and measure human visual perception, machine learning models, including support-vector networks, have been trained to decode stimulus-induced fMRI activity patterns in the human V1 cortical area, and were able to visually reconstruct the local contrast of presented and internal mental images (Kamitani & Tong, 2005; Miyawaki et al., 2008).
Neuroscience and artificial intelligence (AI) share a long history of collaboration. Advances in neuroscience, alongside huge leaps in computer processing power over the last few decades, have given rise to a new generation of in silico neural networks inspired by the architecture of the brain. These AI systems are now capable of many of the advanced perceptual and cognitive abilities of biological systems, including object recognition and decision making. Moreover, AI is now increasingly being employed as a tool for neuroscience research and is transforming our understanding of brain functions. In particular, deep learning has been used to model how convolutional layers and recurrent connections in the brain’s cerebral cortex control important functions, including visual processing, memory, and motor control. Excitingly, the use of neuroscience-inspired AI also holds great promise for understanding how changes in brain networks result in psychopathologies, and could even be utilized in treatment regimes. Here we discuss recent advancements in four areas in which the relationship between neuroscience and AI has led to major advancements in the field; (1) AI models of working memory, (2) AI visual processing, (3) AI analysis of big neuroscience datasets, and (4) computational psychiatry.
Words as a window: Using word embeddings to explore the learned representations of Convolutional Neural Networks
2021, Neural Networks
Citation Excerpt :
The majority of movement towards semantic representations at the end of training happens in the later layers of the CNN. This finding has many implications, including possible new regularization schemes where DS models are used to guide CNN training (similar to the regularization schemes in Federer, Xu, Fyshe, & Zylberberg, 2020). This result also implies that we could improve training time by freezing the weights of early layers after the first few epochs of training.
As deep neural net architectures minimize loss, they accumulate information in a hierarchy of learned representations that ultimately serve the network’s final goal. Different architectures tackle this problem in slightly different ways, but all create intermediate representational spaces built to inform their final prediction. Here we show that very different neural networks trained on two very different tasks build knowledge representations that display similar underlying patterns. Namely, we show that the representational spaces of several distributional semantic models bear a remarkable resemblance to several Convolutional Neural Network (CNN) architectures (trained for image classification). We use this information to explore the network behavior of CNNs (1) in pretrained models, (2) during training, and (3) during adversarial attacks. We use these findings to motivate several applications aimed at improving future research on CNNs. Our work illustrates the power of using one model to explore another, gives new insights into the function of CNN models, and provides a framework for others to perform similar analyses when developing new architectures. We show that one neural network model can provide a window into understanding another.
Brain-grounding of semantic vectors improves neural decoding of visual stimuli
2024, arXiv
How Can the Current State of AI Guide Future Conversations of General Intelligence?
2024, Journal of Intelligence
GETTING ALIGNED ON REPRESENTATIONAL ALIGNMENT
2023, arXiv
Robust deep learning object recognition models rely on low frequency information in natural images
2023, PLoS Computational Biology

View all citing articles on Scopus

View full text

Improved object recognition using neural networks trained to mimic the brain’s statistical properties

Abstract

Introduction

Section snippets

Monkey visual cortex data

Results

Discussion

Declaration of Competing Interest

Acknowledgments

Deep convolutional models improve predictions of macaque v1 responses to natural images

Fast and accurate deep network learning by exponential linear units (ELUs)

Flexible gating of contextual influences in natural vision

Nature Neuroscience

Using human brain activity to guide machine learning

Scientific Reports

Understanding the difficulty of training deep feedforward neural networks

Journal of Machine Learning Research

Deep neural networks reveal a gradient in the complexity

The Journal of Neuroscience

Distilling the knowledge in a neural network

Receptive fields and functional architecture of monkey striate cortex

Journal Physiology

Recurrence is required to capture the representational dynamics of the human visual system

Proceedings of the National Academy of Sciences of the United States of America