Skip to main content

Recurrent Neural Networks

  • Chapter
  • First Online:
Deep Learning for NLP and Speech Recognition

Abstract

In the previous chapter, CNNs provided a way for neural networks to learn a hierarchy of weights, resembling that of n-gram classification on the text. This approach proved to be very effective for sentiment analysis, or more broadly text classification. One of the disadvantages of CNNs, however, is their inability to model contextual information over long sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This statement is made in a basic context of CNNs and RNNs. The CNN vs. RNN superiority debate in sequential contexts is an active area of research.

  2. 2.

    This history vector will be called the hidden state later on for obvious reasons.

  3. 3.

    It is common to split the single weight matrix W of an RNN in Eq. (7.3) into two separate weight matrices, here U and W. Doing this allows for a lower computational cost and forces separation between the hidden state and input in the early stages of training.

  4. 4.

    The \(\tanh \) activation function bounds the gradient between 0 and 1. This has the effect of shrinking the gradient in these circumstances.

  5. 5.

    This is not to say that academic benchmarks are not relevant, but rather to point out the importance of domain and technological understanding for domain adaptation.

References

  1. Jeremy Appleyard, Tomas Kocisky, and Phil Blunsom. “Optimizing performance of recurrent neural networks on GPUs”. In: arXiv preprint arXiv:1604.01946 (2016).

    Google Scholar 

  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).

    Google Scholar 

  3. Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166.

    Google Scholar 

  4. Samuel R. Bowman et al. “Generating Sentences from a Continuous Space”. In: CoRR abs/1511.06349 (2015).

    Google Scholar 

  5. James Bradbury et al. “Quasi-Recurrent Neural Networks”. In: CoRR abs/1611.01576 (2016).

    Google Scholar 

  6. Denny Britz et al. “Massive exploration of neural machine translation architectures”. In: arXiv preprint arXiv:1703.03906 (2017).

    Google Scholar 

  7. Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014).

    Google Scholar 

  8. Junyoung Chung et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling”. In: arXiv preprint arXiv:1412.3555 (2014).

    Google Scholar 

  9. Michael Denkowski and Graham Neubig. “Stronger baselines for trustable results in neural machine translation”. In: arXiv preprint arXiv:1706.09733 (2017).

    Google Scholar 

  10. Chris Dyer et al. “Transition-Based Dependency Parsing with Stack Long Short-Term Memory”. In: CoRR abs/1505.08075 (2015).

    Google Scholar 

  11. Salah El Hihi and Yoshua Bengio. “Hierarchical recurrent neural networks for long-term dependencies”. In: Advances in neural information processing systems. 1996, pp. 493–499.

    Google Scholar 

  12. Yarin Gal and Zoubin Ghahramani. “A theoretically grounded application of dropout in recurrent neural networks”. In: Advances in neural information processing systems. 2016, pp. 1019–1027.

    Google Scholar 

  13. Jonas Gehring et al. “Convolutional Sequence to Sequence Learning”. In: Proc. of ICML. 2017.

    Google Scholar 

  14. Shalini Ghosh et al. “Contextual lstm (clstm) models for large scale nlp tasks”. In: arXiv preprint arXiv:1602.06291 (2016).

    Google Scholar 

  15. Christoph Goller and Andreas Kuchler. “Learning task-dependent distributed representations by backpropagation through structure”. In: Neural Networks, 1996., IEEE International Conference on. Vol. 1. IEEE. 1996, pp. 347–352.

    Google Scholar 

  16. Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014).

    Google Scholar 

  17. Ed Grefenstette. Beyond Seq2Seq with Augmented RNNs. 2016.

    Google Scholar 

  18. Danijar Hafner. “Tips for Training Recurrent Neural Networks”. In: (2017). url: https://danijar.com/tips-for-training-recurrent-neural-networks/

  19. Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.

    Article  Google Scholar 

  20. Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. In: To appear (2017).

    Google Scholar 

  21. Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging”. In: arXiv preprint arXiv:1508.01991 (2015).

    Google Scholar 

  22. Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. “Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition”. In: CoRR abs/1701.03360 (2017).

    Google Scholar 

  23. Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).

    Google Scholar 

  24. Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013).

    Google Scholar 

  25. Eliyahu Kiperwasser and Yoav Goldberg. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. In: arXiv preprint arXiv:1603.04351 (2016).

    Google Scholar 

  26. David Krueger et al. “Zoneout: Regularizing rnns by randomly preserving hidden activations”. In: arXiv preprint arXiv:1606.01305 (2016).

    Google Scholar 

  27. Guillaume Lample et al. “Neural architectures for named entity recognition”. In: arXiv preprint arXiv:1603.01360 (2016).

    Google Scholar 

  28. Ji Young Lee and Franck Dernoncourt. “Sequential short-text classification with recurrent and convolutional neural networks”. In: arXiv preprint arXiv:1603.03827 (2016).

    Google Scholar 

  29. Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: CoRR abs/1709.02755 (2017).

    Google Scholar 

  30. Adam Liska, Germán Kruszewski, and Marco Baroni. “Memorize or generalize? Searching for a compositional RNN in a haystack”. In: CoRR abs/1802.06467 (2018).

    Google Scholar 

  31. Ryan Lowe et al. “The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems”. In: arXiv preprint arXiv:1506.08909 (2015).

    Google Scholar 

  32. Thang Luong, Richard Socher, and Christopher Manning. “Better word representations with recursive neural networks for morphology”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 104–113.

    Google Scholar 

  33. Xuezhe Ma and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf”. In: arXiv preprint arXiv:1603.01354 (2016).

    Google Scholar 

  34. Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images”. In: Proceedings of the IEEE international conference on computer vision. 2015, pp. 1–9.

    Google Scholar 

  35. Gábor Melis, Chris Dyer, and Phil Blunsom. “On the state of the art of evaluation in neural language models”. In: arXiv preprint arXiv:1707.05589 (2017).

    Google Scholar 

  36. Pingbo Pan et al. “Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning”. In: CoRR abs/1511.03476 (2015).

    Google Scholar 

  37. Razvan Pascanu et al. “How to construct deep recurrent neural networks.”. In: arXiv preprint arXiv:1312.6026 (2013).

    Google Scholar 

  38. Aaditya Prakash et al. “Neural Paraphrase Generation with Stacked Residual LSTM Networks”. In: CoRR abs/1610.03098 (2016).

    Google Scholar 

  39. Danilo Jimenez Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: arXiv preprint arXiv:1505.05770 (2015).

    Google Scholar 

  40. Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681.

    Article  Google Scholar 

  41. Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. “Recurrent Dropout without Memory Loss”. In: CoRR abs/1603.05118 (2016).

    Google Scholar 

  42. Richard Socher, Christopher D Manning, and Andrew Y Ng. “Learning continuous phrase representations and syntactic parsing with recursive neural networks”. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. Vol. 2010. 2010, pp. 1–9.

    Google Scholar 

  43. Richard Socher et al. “Semantic compositionality through recursive matrix-vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics. 2012, pp. 1201–1211.

    Google Scholar 

  44. Richard Socher et al. “Reasoning with neural tensor networks for knowledge base completion”. In: Advances in neural information processing systems. 2013, pp. 926–934.

    Google Scholar 

  45. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems. 2014, pp. 3104–3112.

    Google Scholar 

  46. Kai Sheng Tai, Richard Socher, and Christopher D Manning. “Improved semantic representations from tree-structured long short-term memory networks”. In: arXiv preprint arXiv:1503.00075 (2015).

    Google Scholar 

  47. Ming Tan et al. “LSTM-based deep learning models for non-factoid answer selection”. In: arXiv preprint arXiv:1511.04108 (2015).

    Google Scholar 

  48. Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.

    Google Scholar 

  49. Subhashini Venugopalan et al. “Translating videos to natural language using deep recurrent neural networks”. In: arXiv preprint arXiv:1412.4729 (2014).

    Google Scholar 

  50. Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. “Pointer networks”. In: Advances in Neural Information Processing Systems. 2015, pp. 2692–2700.

    Google Scholar 

  51. Oriol Vinyals et al. “Show and tell: A neural image caption generator”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3156–3164.

    Google Scholar 

  52. Cheng Wang et al. “Image captioning with deep bidirectional LSTMs”. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM. 2016, pp. 988–997.

    Google Scholar 

  53. Xin Wang et al. “Predicting polarities of tweets by composing word embeddings with long short-term memory”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1343–1353.

    Google Scholar 

  54. Caiming Xiong, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering”. In: International conference on machine learning. 2016, pp. 2397–2406.

    Google Scholar 

  55. Wenpeng Yin et al. “Comparative study of CNN and RNN for natural language processing”. In: arXiv preprint arXiv:1702.01923 (2017).

    Google Scholar 

  56. Julian G. Zilly et al. “Recurrent Highway Networks”. In: CoRRabs/1607.03474 (2016).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Recurrent Neural Networks. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14596-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14595-8

  • Online ISBN: 978-3-030-14596-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics