Minimal gated unit for recurrent neural networks

Zhou, Guo-Bing; Wu, Jianxin; Zhang, Chen-Lin; Zhou, Zhi-Hua

doi:10.1007/s11633-016-1006-2

Minimal gated unit for recurrent neural networks

Research Article
Published: 11 June 2016

Volume 13, pages 226–234, (2016)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

Guo-Bing Zhou¹,
Jianxin Wu¹,
Chen-Lin Zhang¹ &
…
Zhi-Hua Zhou¹

1692 Accesses
207 Citations
8 Altmetric
Explore all metrics

Abstract

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CS-RNN: efficient training of recurrent neural networks with continuous skips

Article 24 June 2022

Tianyu Chen, Sheng Li & Jun Yan

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Gated Memory Unit: A Novel Recurrent Neural Network Architecture for Sequential Analysis

References

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Article Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.
Google Scholar
K. Cho, B. van Meriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734, 2014.
Google Scholar
I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Montreal, Canada, pp. 3104–3112, 2014.
Google Scholar
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations 2015, San Diego, USA, 2015.
Google Scholar
A. Graves, A. R. Mohamed, G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 6645–6649, 2013.
Google Scholar
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2048–2057, 2015.
Google Scholar
A. Karpathy, F. F. Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3128–3137, 2015.
Google Scholar
R. Lebret, P. O. Pinheiro, R. Collobert. Phrase-based image captioning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2085–2094, 2015.
Google Scholar
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2625–2634, 2015.
Google Scholar
N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 843–852, 2015.
Google Scholar
X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of Advances in Neural Information Processing Systems 28, NIPS, Montreal, Canada, pp. 802–810, 2015.
Google Scholar
M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Zurich, Switzerland, vol. 8689, pp. 818–833, 2014.
Article Google Scholar
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
Article Google Scholar
F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks, IEEE, Edinburgh, UK, vol. 2, pp. 850–855, 1999.
Google Scholar
F. A. Gers, N. N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, vol. 3, pp. 115–143, 2003.
MathSciNet MATH Google Scholar
J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014.
Google Scholar
R. Jozefowicz, W. Zaremba, I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2342–2350, 2015.
Google Scholar
K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, J. Schmidhuber. LSTM: A search space odyssey. arXiv: 1503.04069, 2015.
Google Scholar
Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
Article Google Scholar
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005.
Article Google Scholar
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, M. Ranzato. Learning longer memory in recurrent neural networks. In Proceedings of International Conference on Learning Representations, San Diego, CA, 2015.
Google Scholar
Q. V. Le, N. Jaitly, G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv: 1504.00941, 2015.
Google Scholar
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL, Stroudsburg, USA, pp. 142–150, 2011.
Google Scholar
M. P. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993.
Google Scholar
W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv: 1409.2329, 2014.
Google Scholar
Z. Z. Wu, S. King. Investigating gated recurrent neural networks for speech synthesis. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, 2016.
Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Guo-Bing Zhou, Jianxin Wu, Chen-Lin Zhang & Zhi-Hua Zhou

Authors

Guo-Bing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Lin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Wu.

Additional information

This work was supported by National Natural Science Foundation of China (Nos. 61422203 and 61333014), and National Key Basic Research Program of China (No. 2014CB340501).

Recommended by Associate Editor Yi Cao

Guo-Bing Zhou received the B. Sc. degree in computer science from Nanjing University, China in 2013. He is currently a postgraduate student in Nanjing University and will receive the M. Sc. degree in July, 2016.

His research interest is machine learning.

ORCID iD: 0000-0001-9779-481X

Jianxin Wu received the Ph.D. degree in computer science from the Georgia Institute of Technology, USA in 2009. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, China. He has served as an area chair for ICCV 2015 and senior PC member for AAAI 2016.

His research interests include computer vision and machine learning.

ORCID iD: 0000-0002-2085-7568

Chen-Lin Zhang is a candidate for the Bachelor's degree in the Department of Computer Science and Technology, Nanjing University, China.

His research interests include computer vision and machine learning.

Zhi-Hua Zhou is a professor, standing deputy director of the National Key Laboratory for Novel Software Technology, and Founding Director of the LAMDA Group at Nanjing University. He is a Fellow of the AAAI, IEEE, IAPR, IET/IEE, CCF, and an ACM Distinguished Scientist.

His research interests include artificial intelligence, machine learning and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, GB., Wu, J., Zhang, CL. et al. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13, 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2

Download citation

Received: 06 April 2016
Accepted: 09 May 2016
Published: 11 June 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11633-016-1006-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimal gated unit for recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

CS-RNN: efficient training of recurrent neural networks with continuous skips

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Gated Memory Unit: A Novel Recurrent Neural Network Architecture for Sequential Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimal gated unit for recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

CS-RNN: efficient training of recurrent neural networks with continuous skips

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Gated Memory Unit: A Novel Recurrent Neural Network Architecture for Sequential Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation