Visualized layer-wise visual features in deep residual neural network

Haiguang Wen; Junxing Shi; Wei Chen; Zhongming Liu

doi:10.4231/R7PR7T1G

Description

Real-life visual world is highly diverse and complex, but could be decomposed into a large yet finite number of visual features (LeCun et al., 2015). In general, these features span various levels of abstraction ranging from the low-level (e.g. orientation and color) to the mid-level (e.g. shapes and textures) to the high-level (e.g. objects and actions). So far, deep learning provides the most comprehensive computational models to encode and extract the hierarchical visual features from natural pictures or videos. Importantly, the deep-learning models, as a class of deep artificial neural networks, are built and trained with similar organizational and coding principles as biological neuronal networks in the brain itself. Recent studies demonstrate that such models are well aligned to and predictive of cascaded cortical processes underlying visual perception. For these reasons, deep learning offers a fully observable and computable model of the human visual system and thus it may facilitate the understanding of cortical processes underlying natural vision. As in our study (Wen et al., 2017), the deep residual neural network (ResNet) (He et al., 2016), a specific type of deep neural network, was able to predict the cortical responses to novel natural movie stimuli across nearly the entire visual cortex. To further understand the internal representations in ResNet, we visualized the features encoded at distinct layers as images in pixel space for intuitive interpretation. Briefly, we visualize the feature at a computational unit by optimizing visual inputs that maximize the unit activation, as proposed in (Yosinski et al., 2015). The results were obtained by using the Caffe Deep Learning framework (Jia et al., 2014).

Cite this work

Researchers should cite this work as follows:

Wen, H.; Shi, J.; Chen, W.; Liu, Z. (2017). Visualized layer-wise visual features in deep residual neural network. Purdue University Research Repository. doi:10.4231/R7PR7T1G
BibTex | EndNote

Notes

Visualizing visual features encoded in ResNet-50 by using optimization-based method, as proposed in (Yosinski et al., 2015)

Version 1.0

References:
Wen H., Shi J., Chen W. and Liu Z. (2017) Deep Residual Neural Network Reveals a Nested Hierarchy of Distributed Cortical Representation for Visual Categorization.

Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.

File lists:

ResNet_layer31.pdf contains 1024 visual features encoded in layer 31.

ResNet_layer50.pdf contains 2048 visual features encoded in layer 50.

Laboratory of Integrated Brain Imaging

This publication belongs to the Laboratory of Integrated Brain Imaging group.

Description

Cite this work

Tags

Notes

Laboratory of Integrated Brain Imaging