Abstract
Pre-trained convolutional neural network (CNN) models have been widely applied in many computer vision tasks, especially in transfer learning tasks. In transfer learning, the target domain may be in a different feature space or follow a different data distribution, compared to the source domain. In CNN transfer tasks, we often transfer visual representations from a source domain (e.g., ImageNet) to target domains with fewer training images or have different image properties. It is natural to explore which CNN model performs better in visual representation transfer. Through visualization analyses and extensive experiments, we show that when either image properties or task objective in the target domain is far away from those in the source domain, having the fully connected layers in the source domain pre-trained model is essential in achieving high accuracy after transferring to the target domain.
This work was supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, M., Süsstrunk, S.: Multi-spectral SIFT for scene category recognition. In: CVPR, pp. 177–184 (2011)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: ICML, pp. 647–655 (2014)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. CVIU 106, 59–70 (2007)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)
Lin, T.Y., RoyChowdhury, A., Majiu, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR, pp. 413–420 (2009)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR 14 Workshops (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report. CNS-TR-2011-001, California Institute of Technology (2011)
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. TIP 26(6), 2868–2881 (2017)
Wei, X.S., Xie, C.W., Wu, J.: Mask-CNN: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)
Xiao, Y., Wu, J., Yuan, J.: mCENTRIST: a multi-channel feature generation mechanism for scene categorization. TIP 23, 823–836 (2014)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS, pp. 3320–3328 (2014)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhang, CL., Luo, JH., Wei, XS., Wu, J. (2018). In Defense of Fully Connected Layers in Visual Representation Transfer. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_79
Download citation
DOI: https://doi.org/10.1007/978-3-319-77383-4_79
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)