Skip to main content

How to Pretrain Deep Boltzmann Machines in Two Stages

  • Conference paper
Artificial Neural Networks

Part of the book series: Springer Series in Bio-/Neuroinformatics ((SSBN,volume 4))

  • 4128 Accesses

Abstract

A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 153–160. MIT Press, Cambridge (2007)

    Google Scholar 

  3. Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 899–907 (2013)

    Google Scholar 

  4. Besag, J.: Statistical Analysis of Non-Lattice Data. Journal of the Royal Statistical Society. Series D (The Statistician) 24(3), 179–195 (1975)

    MathSciNet  Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)

    Google Scholar 

  6. Brakel, P., Stroobandt, D., Schrauwen, B.: Training energy-based models for time-series imputation. Journal of Machine Learning Research 14, 2771–2797 (2013)

    MathSciNet  Google Scholar 

  7. Cho, K.: Improved Learning Algorithms for Restricted Boltzmann Machines. Master’s thesis, Aalto University School of Science (2011)

    Google Scholar 

  8. Cho, K., Raiko, T., Ilin, A.: Parallel tempering is efficient for learning restricted Boltzmann machines. In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN 2010), pp. 1–8 (July 2010)

    Google Scholar 

  9. Cho, K., Raiko, T., Ilin, A.: Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 105–112. ACM, New York (2011)

    Google Scholar 

  10. Cho, K., Raiko, T., Ilin, A.: Enhanced gradient for training restricted Boltzmann machines. Neural Computation 25(3), 805–831 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  11. Cho, K., Raiko, T., Ilin, A.: Gaussian-Bernoulli deep Boltzmann machines. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2013), Texas, USA (August 2013)

    Google Scholar 

  12. Cho, K., Raiko, T., Ilin, A., Karhunen, J.: A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines. In: NIPS 2012 Workshop on Deep Learning and Unsupervised Feature Learning, Lake Tahoe (December 2012)

    Google Scholar 

  13. Cho, K., Raiko, T., Ilin, A., Karhunen, J.: A two-stage pretraining algorithm for deep Boltzmann machines. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 106–113. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Courville, A., Bergstra, J., Bengio, Y.: A spike and slab restricted Boltzmann machine. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011) (2011)

    Google Scholar 

  15. Desjardins, G., Courville, A., Bengio, Y.: On training deep Boltzmann machines. arXiv:1203.4416 (cs.NE) (March 2012)

    Google Scholar 

  16. Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Delalleau, O.: Parallel tempering for training of restricted Boltzmann machines. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010). JMLR Workshop and Conference Proceedings, vol. 9, pp. 145–152. JMLR W&CP (2010)

    Google Scholar 

  17. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  18. Goodfellow, I., Miraz, M., Courville, A., Bengio, Y.: Multi-prediction deep Boltzmann machines. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 548–556 (December 2013)

    Google Scholar 

  19. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (cs.NE) (July 2012)

    Google Scholar 

  20. Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)

    Article  MATH  Google Scholar 

  21. Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  22. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  23. Honkela, A., Harmeling, S., Lundqvist, L., Valpola, H.: Using kernel PCA for initialisation of variational Bayesian nonlinear blind source separation method. In: Puntonet, C.G., Prieto, A.G. (eds.) ICA 2004. LNCS, vol. 3195, pp. 790–797. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Huang, F., Ogata, Y.: Generalized pseudo-likelihood estimates for Markov random fields on lattice. Annals of the Institute of Statistical Mathematics 54(1), 1–18 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  25. Hyvärinen, A.: Some extensions of score matching. Computational Statistics & Data Analysis 51(5), 2499–2512 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  26. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  27. Luo, H., Carrier, P., Courville, A., Bengio, Y.: Texture modeling with convolutional spike-and-slab RBMs and deep extensions. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2013). JMLR Workshop and Conference Proceedings, vol. 31, pp. 415–423. JMLR W&CP (April 2013)

    Google Scholar 

  28. Marlin, B.M., Swersky, K., Chen, B., de Freitas, N.: Inductive principles for restricted Boltzmann machine learning. In: Proceedings of the Thirteenth Internation Conference on Artificial Intelligence and Statistics (AISTATS 2010). JMLR Workshop and Conference Proceedings, vol. 9, pp. 509–516. JMLR W&CP (2010)

    Google Scholar 

  29. Martens, J.: Deep learning via Hessian-free optimization. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th Internation Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 735–742 (June 2010)

    Google Scholar 

  30. Montavon, G., Müller, K.R.: Deep Boltzmann machines and the centering trick. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 621–637. Springer, Heidelberg (2012)

    Google Scholar 

  31. Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv:1003.0358 (cs.NE) (2013)

    Google Scholar 

  32. Raiko, T., Valpola, H., LeCun, Y.: Deep learning made easier by linear transformations in perceptrons. In: Proceedings of the Fifteenth Internation Conference on Artificial Intelligence and Statistics (AISTATS 2012). JMLR Workshop and Conference Proceedings, vol. 22, pp. 924–932. JMLR W&CP (April 2012)

    Google Scholar 

  33. Rumelhart, D.E., Hinton, G., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

  34. Salakhutdinov, R.: Learning in Markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1598–1606 (2009)

    Google Scholar 

  35. Salakhutdinov, R.: Learning deep Boltzmann machines using adaptive MCMC. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 943–950 (June 2010)

    Google Scholar 

  36. Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: Proceedings of the Twelfth Internation Conference on Artificial Intelligence and Statistics (AISTATS 2009). JMLR Workshop and Conference Proceedings, vol. 5, pp. 448–455. JMLR W&CP (2009)

    Google Scholar 

  37. Salakhutdinov, R., Hinton, G.: A better way to pretrain deep Boltzmann machines. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 2456–2464 (2012)

    Google Scholar 

  38. Salakhutdinov, R., Hinton, G.: An effcient learning procedure for deep Boltzmann machines. Neural Computation 24, 1967–2006 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  39. Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (2011)

    Google Scholar 

  40. Salakhutdinov, R., Murray, I.: On the quantatitive analysis of deep belief networks. In: Proceedings of the 25th International Conference on Machine learning (ICML 2008), pp. 872–879. ACM, New York (2008)

    Chapter  Google Scholar 

  41. Schulz, H., Behnke, S.: Learning two-layer contractive encodings. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 620–628. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  42. Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. foundations, vol. 1, pp. 194–281. MIT Press, Cambridge (1986)

    Google Scholar 

  43. Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th Internation Conference on Machine Learning (ICML 2008), pp. 1064–1071. ACM, New York (2008)

    Chapter  Google Scholar 

  44. Tieleman, T., Hinton, G.: Using fast weights to improve persistent contrastive divergence. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pp. 1033–1040. ACM, New York (2009)

    Google Scholar 

  45. Vincent, P.: A connection between score matching and denoising autoencoders. Neural Computation 23(7), 1661–1674 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  46. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyunghyun Cho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cho, K., Raiko, T., Ilin, A., Karhunen, J. (2015). How to Pretrain Deep Boltzmann Machines in Two Stages. In: Koprinkova-Hristova, P., Mladenov, V., Kasabov, N. (eds) Artificial Neural Networks. Springer Series in Bio-/Neuroinformatics, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09903-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09903-3_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09902-6

  • Online ISBN: 978-3-319-09903-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics