Abstract
The performance of the multi-layer perceptron (MLP) degrades while working with high-resolution images due to the issues of vanishing gradient and overfitting. However, the performance of an MLP can be improved with efficient weight initialization techniques. This paper puts forward a systematic deep neural network (DNN)-based weight initialization strategy for an MLP to enhance its classification accuracy. Moreover, the training of an MLP may not converge due to the presence of many local minima. It is feasible to avoid local minima by properly initializing weights as an alternative to random weight initialization. A restrictive Boltzmann machine (RBM) has been used in this paper to pre-train the MLP. An MLP is trained layer-by-layer, with weights between each neighboring layer up to the penultimate layer. The whole network is then fine-tuned after the pre-training, by the reduction of the mean square error (MSE). To compare the performance between the proposed initialization of weights and random weight initialization, two standard image classification data sets, (i) CIFER-10 and (ii) STL-10 are used. The outcomes of the simulation showcase the enhancement of the performance of MLP with the proposed weight initialization. Furthermore, the proposed method yields a better convergence speed than standard MLP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153 (2007)
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE conference on Computer vision and pattern recognition (CVPR), pp. 3642–3649. IEEE (2012)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Divya, S., Adepu, B., Kamakshi, P.: Image enhancement and classification of CIFAR-10 using convolutional neural networks. In: 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1–7. IEEE (2022)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial Intelligence and Statistics, pp. 153–160 (2009)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Güler, O., Yücedağ, İ: Hand gesture recognition from 2d images by using convolutional capsule neural networks. Arab. J. Sci. Engi. 47(2), 1211–1225 (2022)
Hendrycks, D., Gimpel, K.: Generalizing and improving weight initialization. arXiv preprint arXiv:1607.02488 (2016)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (Canadian Institute for Advanced Research). 5(4), 1. http://www cs.toronto.edu/kriz/cifar.html (2010)
Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)
Ludwig, O., Nunes, U., Araujo, R.: Eigenvalue decay: a new method for neural network regularization. Neurocomputing 124, 33–42 (2014)
Ngiam, J., Chen, Z., Chia, D., Koh, P.W., Le, Q.V., Ng, A.Y.: Tiled convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1279–1287 (2010)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y., Glorot, X.: Higher order contractive auto-encoder. Machine Learning and Knowledge Discovery in Databases, pp. 645–660 (2011)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840 (2011)
Seyyedsalehi, S.Z., Seyyedsalehi, S.A.: A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks. Neurocomputing 168, 669–680 (2015)
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)
Wang, L., Liu, J., Chen, X.: Microsoft malware classification challenge (big 2015) first place team: say no to overfitting (2015) (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mukherjee, S., Dey, P. (2023). Deep Learning-Based Weight Initialization on Multi-layer Perceptron for Image Recognition. In: Bhattacharyya, S., Das, G., De, S., Mrsic, L. (eds) Recent Trends in Intelligence Enabled Research. DoSIER 2022. Advances in Intelligent Systems and Computing, vol 1446. Springer, Singapore. https://doi.org/10.1007/978-981-99-1472-2_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-1472-2_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1471-5
Online ISBN: 978-981-99-1472-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)