Abstract
In the previous chapter, CNNs provided a way for neural networks to learn a hierarchy of weights, resembling that of n-gram classification on the text. This approach proved to be very effective for sentiment analysis, or more broadly text classification. One of the disadvantages of CNNs, however, is their inability to model contextual information over long sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This statement is made in a basic context of CNNs and RNNs. The CNN vs. RNN superiority debate in sequential contexts is an active area of research.
- 2.
This history vector will be called the hidden state later on for obvious reasons.
- 3.
It is common to split the single weight matrix W of an RNN in Eq. (7.3) into two separate weight matrices, here U and W. Doing this allows for a lower computational cost and forces separation between the hidden state and input in the early stages of training.
- 4.
The \(\tanh \) activation function bounds the gradient between 0 and 1. This has the effect of shrinking the gradient in these circumstances.
- 5.
This is not to say that academic benchmarks are not relevant, but rather to point out the importance of domain and technological understanding for domain adaptation.
References
Jeremy Appleyard, Tomas Kocisky, and Phil Blunsom. “Optimizing performance of recurrent neural networks on GPUs”. In: arXiv preprint arXiv:1604.01946 (2016).
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166.
Samuel R. Bowman et al. “Generating Sentences from a Continuous Space”. In: CoRR abs/1511.06349 (2015).
James Bradbury et al. “Quasi-Recurrent Neural Networks”. In: CoRR abs/1611.01576 (2016).
Denny Britz et al. “Massive exploration of neural machine translation architectures”. In: arXiv preprint arXiv:1703.03906 (2017).
Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014).
Junyoung Chung et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling”. In: arXiv preprint arXiv:1412.3555 (2014).
Michael Denkowski and Graham Neubig. “Stronger baselines for trustable results in neural machine translation”. In: arXiv preprint arXiv:1706.09733 (2017).
Chris Dyer et al. “Transition-Based Dependency Parsing with Stack Long Short-Term Memory”. In: CoRR abs/1505.08075 (2015).
Salah El Hihi and Yoshua Bengio. “Hierarchical recurrent neural networks for long-term dependencies”. In: Advances in neural information processing systems. 1996, pp. 493–499.
Yarin Gal and Zoubin Ghahramani. “A theoretically grounded application of dropout in recurrent neural networks”. In: Advances in neural information processing systems. 2016, pp. 1019–1027.
Jonas Gehring et al. “Convolutional Sequence to Sequence Learning”. In: Proc. of ICML. 2017.
Shalini Ghosh et al. “Contextual lstm (clstm) models for large scale nlp tasks”. In: arXiv preprint arXiv:1602.06291 (2016).
Christoph Goller and Andreas Kuchler. “Learning task-dependent distributed representations by backpropagation through structure”. In: Neural Networks, 1996., IEEE International Conference on. Vol. 1. IEEE. 1996, pp. 347–352.
Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014).
Ed Grefenstette. Beyond Seq2Seq with Augmented RNNs. 2016.
Danijar Hafner. “Tips for Training Recurrent Neural Networks”. In: (2017). url: https://danijar.com/tips-for-training-recurrent-neural-networks/
Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.
Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. In: To appear (2017).
Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging”. In: arXiv preprint arXiv:1508.01991 (2015).
Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. “Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition”. In: CoRR abs/1701.03360 (2017).
Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013).
Eliyahu Kiperwasser and Yoav Goldberg. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. In: arXiv preprint arXiv:1603.04351 (2016).
David Krueger et al. “Zoneout: Regularizing rnns by randomly preserving hidden activations”. In: arXiv preprint arXiv:1606.01305 (2016).
Guillaume Lample et al. “Neural architectures for named entity recognition”. In: arXiv preprint arXiv:1603.01360 (2016).
Ji Young Lee and Franck Dernoncourt. “Sequential short-text classification with recurrent and convolutional neural networks”. In: arXiv preprint arXiv:1603.03827 (2016).
Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: CoRR abs/1709.02755 (2017).
Adam Liska, Germán Kruszewski, and Marco Baroni. “Memorize or generalize? Searching for a compositional RNN in a haystack”. In: CoRR abs/1802.06467 (2018).
Ryan Lowe et al. “The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems”. In: arXiv preprint arXiv:1506.08909 (2015).
Thang Luong, Richard Socher, and Christopher Manning. “Better word representations with recursive neural networks for morphology”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 104–113.
Xuezhe Ma and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf”. In: arXiv preprint arXiv:1603.01354 (2016).
Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images”. In: Proceedings of the IEEE international conference on computer vision. 2015, pp. 1–9.
Gábor Melis, Chris Dyer, and Phil Blunsom. “On the state of the art of evaluation in neural language models”. In: arXiv preprint arXiv:1707.05589 (2017).
Pingbo Pan et al. “Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning”. In: CoRR abs/1511.03476 (2015).
Razvan Pascanu et al. “How to construct deep recurrent neural networks.”. In: arXiv preprint arXiv:1312.6026 (2013).
Aaditya Prakash et al. “Neural Paraphrase Generation with Stacked Residual LSTM Networks”. In: CoRR abs/1610.03098 (2016).
Danilo Jimenez Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: arXiv preprint arXiv:1505.05770 (2015).
Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681.
Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. “Recurrent Dropout without Memory Loss”. In: CoRR abs/1603.05118 (2016).
Richard Socher, Christopher D Manning, and Andrew Y Ng. “Learning continuous phrase representations and syntactic parsing with recursive neural networks”. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. Vol. 2010. 2010, pp. 1–9.
Richard Socher et al. “Semantic compositionality through recursive matrix-vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics. 2012, pp. 1201–1211.
Richard Socher et al. “Reasoning with neural tensor networks for knowledge base completion”. In: Advances in neural information processing systems. 2013, pp. 926–934.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems. 2014, pp. 3104–3112.
Kai Sheng Tai, Richard Socher, and Christopher D Manning. “Improved semantic representations from tree-structured long short-term memory networks”. In: arXiv preprint arXiv:1503.00075 (2015).
Ming Tan et al. “LSTM-based deep learning models for non-factoid answer selection”. In: arXiv preprint arXiv:1511.04108 (2015).
Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.
Subhashini Venugopalan et al. “Translating videos to natural language using deep recurrent neural networks”. In: arXiv preprint arXiv:1412.4729 (2014).
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. “Pointer networks”. In: Advances in Neural Information Processing Systems. 2015, pp. 2692–2700.
Oriol Vinyals et al. “Show and tell: A neural image caption generator”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3156–3164.
Cheng Wang et al. “Image captioning with deep bidirectional LSTMs”. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM. 2016, pp. 988–997.
Xin Wang et al. “Predicting polarities of tweets by composing word embeddings with long short-term memory”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1343–1353.
Caiming Xiong, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering”. In: International conference on machine learning. 2016, pp. 2397–2406.
Wenpeng Yin et al. “Comparative study of CNN and RNN for natural language processing”. In: arXiv preprint arXiv:1702.01923 (2017).
Julian G. Zilly et al. “Recurrent Highway Networks”. In: CoRRabs/1607.03474 (2016).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kamath, U., Liu, J., Whitaker, J. (2019). Recurrent Neural Networks. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-14596-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14595-8
Online ISBN: 978-3-030-14596-5
eBook Packages: Computer ScienceComputer Science (R0)