Skip to main content

Deep Learning-Based Weight Initialization on Multi-layer Perceptron for Image Recognition

  • Conference paper
  • First Online:
Recent Trends in Intelligence Enabled Research (DoSIER 2022)

Abstract

The performance of the multi-layer perceptron (MLP) degrades while working with high-resolution images due to the issues of vanishing gradient and overfitting. However, the performance of an MLP can be improved with efficient weight initialization techniques. This paper puts forward a systematic deep neural network (DNN)-based weight initialization strategy for an MLP to enhance its classification accuracy. Moreover, the training of an MLP may not converge due to the presence of many local minima. It is feasible to avoid local minima by properly initializing weights as an alternative to random weight initialization. A restrictive Boltzmann machine (RBM) has been used in this paper to pre-train the MLP. An MLP is trained layer-by-layer, with weights between each neighboring layer up to the penultimate layer. The whole network is then fine-tuned after the pre-training, by the reduction of the mean square error (MSE). To compare the performance between the proposed initialization of weights and random weight initialization, two standard image classification data sets, (i) CIFER-10 and (ii) STL-10 are used. The outcomes of the simulation showcase the enhancement of the performance of MLP with the proposed weight initialization. Furthermore, the proposed method yields a better convergence speed than standard MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153 (2007)

    Google Scholar 

  2. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR), pp. 3642–3649. IEEE (2012)

    Google Scholar 

  3. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  4. Divya, S., Adepu, B., Kamakshi, P.: Image enhancement and classification of CIFAR-10 using convolutional neural networks. In: 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1–7. IEEE (2022)

    Google Scholar 

  5. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)

    Google Scholar 

  6. Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial Intelligence and Statistics, pp. 153–160 (2009)

    Google Scholar 

  7. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  8. Güler, O., Yücedağ, İ: Hand gesture recognition from 2d images by using convolutional capsule neural networks. Arab. J. Sci. Engi. 47(2), 1211–1225 (2022)

    Article  Google Scholar 

  9. Hendrycks, D., Gimpel, K.: Generalizing and improving weight initialization. arXiv preprint arXiv:1607.02488 (2016)

  10. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  12. Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (Canadian Institute for Advanced Research). 5(4), 1. http://www cs.toronto.edu/kriz/cifar.html (2010)

    Google Scholar 

  13. Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)

    Google Scholar 

  14. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  15. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)

    Google Scholar 

  16. Ludwig, O., Nunes, U., Araujo, R.: Eigenvalue decay: a new method for neural network regularization. Neurocomputing 124, 33–42 (2014)

    Article  Google Scholar 

  17. Ngiam, J., Chen, Z., Chia, D., Koh, P.W., Le, Q.V., Ng, A.Y.: Tiled convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1279–1287 (2010)

    Google Scholar 

  18. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)

    Google Scholar 

  19. Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y., Glorot, X.: Higher order contractive auto-encoder. Machine Learning and Knowledge Discovery in Databases, pp. 645–660 (2011)

    Google Scholar 

  20. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840 (2011)

    Google Scholar 

  21. Seyyedsalehi, S.Z., Seyyedsalehi, S.A.: A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing 168, 669–680 (2015)

    Article  Google Scholar 

  22. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)

    Google Scholar 

  23. Wang, L., Liu, J., Chen, X.: Microsoft malware classification challenge (big 2015) first place team: say no to overfitting (2015) (2015)

    Google Scholar 

  24. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasenjit Dey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukherjee, S., Dey, P. (2023). Deep Learning-Based Weight Initialization on Multi-layer Perceptron for Image Recognition. In: Bhattacharyya, S., Das, G., De, S., Mrsic, L. (eds) Recent Trends in Intelligence Enabled Research. DoSIER 2022. Advances in Intelligent Systems and Computing, vol 1446. Springer, Singapore. https://doi.org/10.1007/978-981-99-1472-2_17

Download citation

Publish with us

Policies and ethics