Skip to main content

Deep Neural Networks

  • Chapter
  • First Online:
Automatic Speech Recognition

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

In this chapter, we introduce deep neural networks (DNNs)—multilayer perceptrons with many hidden layers. DNNs play an important role in the modern speech recognition systems, and are the focus of the rest of the book. We depict the architecture of DNNs, describe the popular activation functions and training criteria, illustrate the famous backpropagation algorithm for learning DNN model parameters, and introduce practical tricks that make the training process robust.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term deep neural network first appeared in [21] in the context of speech recognition, but was coined in [5] which converted the term deep belief network in the earlier studies into the more appropriate term of deep neural network [4, 17, 24]. The term deep neural network was originally introduced to mean multilayer perceptrons with many hidden layers, but was later extended to mean any neural network with a deep structure.

  2. 2.

    The output of the sigmoid function can be very close to 0 but cannot reach 0, while the output of the ReLU function can be exactly 0.

  3. 3.

    Although the name backpropagation was coined in 1986 [19] the algorithm itself can be traced back at least to 1969 [3] as a multistage dynamic system optimization method.

  4. 4.

    In practice, we have found out that we may achieve slightly better result if we only use momentum after the first epoch.

References

  1. Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer (2012)

    Google Scholar 

  2. Bottou, L.: Online learning and stochastic approximations. On-line Learn. Neural Netw. 17, 9 (1998)

    Google Scholar 

  3. Bryson, E.A., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation, and Control. Blaisdell Publishing Company, US (1969)

    Google Scholar 

  4. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)

    Google Scholar 

  5. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  6. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 2121–2159 (2011)

    Google Scholar 

  7. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks, pp. 315–323 (2011)

    Google Scholar 

  8. Guenter, B., Yu, D., Eversole, A., Kuchaiev, O., Seltzer, M.L.: “Stochastic gradient descent algorithm in the computational network toolkit”, OPT2013: NIPS 2013 Workshop on Optimization for Machine Learning (2013)

    Google Scholar 

  9. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems (1952)

    Google Scholar 

  10. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  11. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Google Scholar 

  12. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  13. Kirkpatrick, S., Gelatt Jr, D., Vecchi, M.P.: Optimization by simmulated annealing. Science 220(4598), 671–680 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  14. LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. Neural Networks: Tricks of The Trade, pp. 9–50. Springer, Berlin (1998)

    Chapter  Google Scholar 

  15. Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: Proceedings of ACL Workshop on Human Language Technologies (ACL-HLT), pp. 69–74 (1993)

    Google Scholar 

  16. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  17. Mohamed, A., Dahl, G.E., Hinton, G.E.: Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)

    Google Scholar 

  18. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  19. Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  20. Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech dnns. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)

    Google Scholar 

  21. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)

    Google Scholar 

  22. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944 (2012)

  23. Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 118–126 (2013)

    Google Scholar 

  24. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Deep Neural Networks. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5779-3_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5778-6

  • Online ISBN: 978-1-4471-5779-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics