Deep Learning-Based Weight Initialization on Multi-layer Perceptron for Image Recognition

Mukherjee, Sourabrata; Dey, Prasenjit

doi:10.1007/978-981-99-1472-2_17

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1446))

Included in the following conference series:

Doctoral Symposium on Intelligence Enabled Research

85 Accesses

Abstract

The performance of the multi-layer perceptron (MLP) degrades while working with high-resolution images due to the issues of vanishing gradient and overfitting. However, the performance of an MLP can be improved with efficient weight initialization techniques. This paper puts forward a systematic deep neural network (DNN)-based weight initialization strategy for an MLP to enhance its classification accuracy. Moreover, the training of an MLP may not converge due to the presence of many local minima. It is feasible to avoid local minima by properly initializing weights as an alternative to random weight initialization. A restrictive Boltzmann machine (RBM) has been used in this paper to pre-train the MLP. An MLP is trained layer-by-layer, with weights between each neighboring layer up to the penultimate layer. The whole network is then fine-tuned after the pre-training, by the reduction of the mean square error (MSE). To compare the performance between the proposed initialization of weights and random weight initialization, two standard image classification data sets, (i) CIFER-10 and (ii) STL-10 are used. The outcomes of the simulation showcase the enhancement of the performance of MLP with the proposed weight initialization. Furthermore, the proposed method yields a better convergence speed than standard MLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153 (2007)
Google Scholar
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR), pp. 3642–3649. IEEE (2012)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Divya, S., Adepu, B., Kamakshi, P.: Image enhancement and classification of CIFAR-10 using convolutional neural networks. In: 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1–7. IEEE (2022)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
Google Scholar
Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial Intelligence and Statistics, pp. 153–160 (2009)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Güler, O., Yücedağ, İ: Hand gesture recognition from 2d images by using convolutional capsule neural networks. Arab. J. Sci. Engi. 47(2), 1211–1225 (2022)
Article Google Scholar
Hendrycks, D., Gimpel, K.: Generalizing and improving weight initialization. arXiv preprint arXiv:1607.02488 (2016)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (Canadian Institute for Advanced Research). 5(4), 1. http://www cs.toronto.edu/kriz/cifar.html (2010)
Google Scholar
Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)
Google Scholar
Ludwig, O., Nunes, U., Araujo, R.: Eigenvalue decay: a new method for neural network regularization. Neurocomputing 124, 33–42 (2014)
Article Google Scholar
Ngiam, J., Chen, Z., Chia, D., Koh, P.W., Le, Q.V., Ng, A.Y.: Tiled convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1279–1287 (2010)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Google Scholar
Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y., Glorot, X.: Higher order contractive auto-encoder. Machine Learning and Knowledge Discovery in Databases, pp. 645–660 (2011)
Google Scholar
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840 (2011)
Google Scholar
Seyyedsalehi, S.Z., Seyyedsalehi, S.A.: A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing 168, 669–680 (2015)
Article Google Scholar
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)
Google Scholar
Wang, L., Liu, J., Chen, X.: Microsoft malware classification challenge (big 2015) first place team: say no to overfitting (2015) (2015)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, NIT Durgapur, Durgapur, India
Sourabrata Mukherjee
Department of Computer Science and Engineering, Cooch Behar Government Engineering College, Cooch Behar, India
Prasenjit Dey

Authors

Sourabrata Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Prasenjit Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prasenjit Dey .

Editor information

Editors and Affiliations

Rajnagar Mahavidyalaya, Birbhum, India
Siddhartha Bhattacharyya
Cooch Behar Government Engineering College, Cooch Behar, India
Gautam Das
Cooch Behar Government Engineering College, Cooch Behar, India
Sourav De
Algebra University College, Zagreb, Croatia
Leo Mrsic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukherjee, S., Dey, P. (2023). Deep Learning-Based Weight Initialization on Multi-layer Perceptron for Image Recognition. In: Bhattacharyya, S., Das, G., De, S., Mrsic, L. (eds) Recent Trends in Intelligence Enabled Research. DoSIER 2022. Advances in Intelligent Systems and Computing, vol 1446. Springer, Singapore. https://doi.org/10.1007/978-981-99-1472-2_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-1472-2_17
Published: 23 June 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1471-5
Online ISBN: 978-981-99-1472-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics