Recurrent Neural Networks

Kamath, Uday; Liu, John; Whitaker, James

doi:10.1007/978-3-030-14596-5_7

Uday Kamath⁴,
John Liu⁵ &
James Whitaker⁴

8921 Accesses
1 Citations

Abstract

In the previous chapter, CNNs provided a way for neural networks to learn a hierarchy of weights, resembling that of n-gram classification on the text. This approach proved to be very effective for sentiment analysis, or more broadly text classification. One of the disadvantages of CNNs, however, is their inability to model contextual information over long sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This statement is made in a basic context of CNNs and RNNs. The CNN vs. RNN superiority debate in sequential contexts is an active area of research.
2.
This history vector will be called the hidden state later on for obvious reasons.
3.
It is common to split the single weight matrix W of an RNN in Eq. (7.3) into two separate weight matrices, here U and W. Doing this allows for a lower computational cost and forces separation between the hidden state and input in the early stages of training.
4.
The \(\tanh \) activation function bounds the gradient between 0 and 1. This has the effect of shrinking the gradient in these circumstances.
5.
This is not to say that academic benchmarks are not relevant, but rather to point out the importance of domain and technological understanding for domain adaptation.

References

Jeremy Appleyard, Tomas Kocisky, and Phil Blunsom. “Optimizing performance of recurrent neural networks on GPUs”. In: arXiv preprint arXiv:1604.01946 (2016).
Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).
Google Scholar
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166.
Google Scholar
Samuel R. Bowman et al. “Generating Sentences from a Continuous Space”. In: CoRR abs/1511.06349 (2015).
Google Scholar
James Bradbury et al. “Quasi-Recurrent Neural Networks”. In: CoRR abs/1611.01576 (2016).
Google Scholar
Denny Britz et al. “Massive exploration of neural machine translation architectures”. In: arXiv preprint arXiv:1703.03906 (2017).
Google Scholar
Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014).
Google Scholar
Junyoung Chung et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling”. In: arXiv preprint arXiv:1412.3555 (2014).
Google Scholar
Michael Denkowski and Graham Neubig. “Stronger baselines for trustable results in neural machine translation”. In: arXiv preprint arXiv:1706.09733 (2017).
Google Scholar
Chris Dyer et al. “Transition-Based Dependency Parsing with Stack Long Short-Term Memory”. In: CoRR abs/1505.08075 (2015).
Google Scholar
Salah El Hihi and Yoshua Bengio. “Hierarchical recurrent neural networks for long-term dependencies”. In: Advances in neural information processing systems. 1996, pp. 493–499.
Google Scholar
Yarin Gal and Zoubin Ghahramani. “A theoretically grounded application of dropout in recurrent neural networks”. In: Advances in neural information processing systems. 2016, pp. 1019–1027.
Google Scholar
Jonas Gehring et al. “Convolutional Sequence to Sequence Learning”. In: Proc. of ICML. 2017.
Google Scholar
Shalini Ghosh et al. “Contextual lstm (clstm) models for large scale nlp tasks”. In: arXiv preprint arXiv:1602.06291 (2016).
Google Scholar
Christoph Goller and Andreas Kuchler. “Learning task-dependent distributed representations by backpropagation through structure”. In: Neural Networks, 1996., IEEE International Conference on. Vol. 1. IEEE. 1996, pp. 347–352.
Google Scholar
Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014).
Google Scholar
Ed Grefenstette. Beyond Seq2Seq with Augmented RNNs. 2016.
Google Scholar
Danijar Hafner. “Tips for Training Recurrent Neural Networks”. In: (2017). url: https://danijar.com/tips-for-training-recurrent-neural-networks/
Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.
Article Google Scholar
Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. In: To appear (2017).
Google Scholar
Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging”. In: arXiv preprint arXiv:1508.01991 (2015).
Google Scholar
Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. “Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition”. In: CoRR abs/1701.03360 (2017).
Google Scholar
Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
Google Scholar
Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013).
Google Scholar
Eliyahu Kiperwasser and Yoav Goldberg. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. In: arXiv preprint arXiv:1603.04351 (2016).
Google Scholar
David Krueger et al. “Zoneout: Regularizing rnns by randomly preserving hidden activations”. In: arXiv preprint arXiv:1606.01305 (2016).
Google Scholar
Guillaume Lample et al. “Neural architectures for named entity recognition”. In: arXiv preprint arXiv:1603.01360 (2016).
Google Scholar
Ji Young Lee and Franck Dernoncourt. “Sequential short-text classification with recurrent and convolutional neural networks”. In: arXiv preprint arXiv:1603.03827 (2016).
Google Scholar
Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: CoRR abs/1709.02755 (2017).
Google Scholar
Adam Liska, Germán Kruszewski, and Marco Baroni. “Memorize or generalize? Searching for a compositional RNN in a haystack”. In: CoRR abs/1802.06467 (2018).
Google Scholar
Ryan Lowe et al. “The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems”. In: arXiv preprint arXiv:1506.08909 (2015).
Google Scholar
Thang Luong, Richard Socher, and Christopher Manning. “Better word representations with recursive neural networks for morphology”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 104–113.
Google Scholar
Xuezhe Ma and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf”. In: arXiv preprint arXiv:1603.01354 (2016).
Google Scholar
Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images”. In: Proceedings of the IEEE international conference on computer vision. 2015, pp. 1–9.
Google Scholar
Gábor Melis, Chris Dyer, and Phil Blunsom. “On the state of the art of evaluation in neural language models”. In: arXiv preprint arXiv:1707.05589 (2017).
Google Scholar
Pingbo Pan et al. “Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning”. In: CoRR abs/1511.03476 (2015).
Google Scholar
Razvan Pascanu et al. “How to construct deep recurrent neural networks.”. In: arXiv preprint arXiv:1312.6026 (2013).
Google Scholar
Aaditya Prakash et al. “Neural Paraphrase Generation with Stacked Residual LSTM Networks”. In: CoRR abs/1610.03098 (2016).
Google Scholar
Danilo Jimenez Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: arXiv preprint arXiv:1505.05770 (2015).
Google Scholar
Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681.
Article Google Scholar
Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. “Recurrent Dropout without Memory Loss”. In: CoRR abs/1603.05118 (2016).
Google Scholar
Richard Socher, Christopher D Manning, and Andrew Y Ng. “Learning continuous phrase representations and syntactic parsing with recursive neural networks”. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. Vol. 2010. 2010, pp. 1–9.
Google Scholar
Richard Socher et al. “Semantic compositionality through recursive matrix-vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics. 2012, pp. 1201–1211.
Google Scholar
Richard Socher et al. “Reasoning with neural tensor networks for knowledge base completion”. In: Advances in neural information processing systems. 2013, pp. 926–934.
Google Scholar
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems. 2014, pp. 3104–3112.
Google Scholar
Kai Sheng Tai, Richard Socher, and Christopher D Manning. “Improved semantic representations from tree-structured long short-term memory networks”. In: arXiv preprint arXiv:1503.00075 (2015).
Google Scholar
Ming Tan et al. “LSTM-based deep learning models for non-factoid answer selection”. In: arXiv preprint arXiv:1511.04108 (2015).
Google Scholar
Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.
Google Scholar
Subhashini Venugopalan et al. “Translating videos to natural language using deep recurrent neural networks”. In: arXiv preprint arXiv:1412.4729 (2014).
Google Scholar
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. “Pointer networks”. In: Advances in Neural Information Processing Systems. 2015, pp. 2692–2700.
Google Scholar
Oriol Vinyals et al. “Show and tell: A neural image caption generator”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3156–3164.
Google Scholar
Cheng Wang et al. “Image captioning with deep bidirectional LSTMs”. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM. 2016, pp. 988–997.
Google Scholar
Xin Wang et al. “Predicting polarities of tweets by composing word embeddings with long short-term memory”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1343–1353.
Google Scholar
Caiming Xiong, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering”. In: International conference on machine learning. 2016, pp. 2397–2406.
Google Scholar
Wenpeng Yin et al. “Comparative study of CNN and RNN for natural language processing”. In: arXiv preprint arXiv:1702.01923 (2017).
Google Scholar
Julian G. Zilly et al. “Recurrent Highway Networks”. In: CoRRabs/1607.03474 (2016).
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Reasoning Systems Inc., McLean, VA, USA
Uday Kamath & James Whitaker
Intelluron Corporation, Nashville, TN, USA
John Liu

Authors

Uday Kamath
View author publications
You can also search for this author in PubMed Google Scholar
John Liu
View author publications
You can also search for this author in PubMed Google Scholar
James Whitaker
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Recurrent Neural Networks. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-14596-5_7
Published: 11 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14595-8
Online ISBN: 978-3-030-14596-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics