Skip to main content
Log in

Minimal gated unit for recurrent neural networks

  • Research Article
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

    Article  Google Scholar 

  2. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.

    Google Scholar 

  3. K. Cho, B. van Meriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734, 2014.

    Google Scholar 

  4. I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Montreal, Canada, pp. 3104–3112, 2014.

    Google Scholar 

  5. D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations 2015, San Diego, USA, 2015.

    Google Scholar 

  6. A. Graves, A. R. Mohamed, G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 6645–6649, 2013.

    Google Scholar 

  7. K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2048–2057, 2015.

    Google Scholar 

  8. A. Karpathy, F. F. Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3128–3137, 2015.

    Google Scholar 

  9. R. Lebret, P. O. Pinheiro, R. Collobert. Phrase-based image captioning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2085–2094, 2015.

    Google Scholar 

  10. J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2625–2634, 2015.

    Google Scholar 

  11. N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 843–852, 2015.

    Google Scholar 

  12. X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of Advances in Neural Information Processing Systems 28, NIPS, Montreal, Canada, pp. 802–810, 2015.

    Google Scholar 

  13. M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Zurich, Switzerland, vol. 8689, pp. 818–833, 2014.

    Article  Google Scholar 

  14. S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

    Article  Google Scholar 

  15. F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks, IEEE, Edinburgh, UK, vol. 2, pp. 850–855, 1999.

    Google Scholar 

  16. F. A. Gers, N. N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, vol. 3, pp. 115–143, 2003.

    MathSciNet  MATH  Google Scholar 

  17. J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014.

    Google Scholar 

  18. R. Jozefowicz, W. Zaremba, I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2342–2350, 2015.

    Google Scholar 

  19. K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, J. Schmidhuber. LSTM: A search space odyssey. arXiv: 1503.04069, 2015.

    Google Scholar 

  20. Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.

    Article  Google Scholar 

  21. A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005.

    Article  Google Scholar 

  22. T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, M. Ranzato. Learning longer memory in recurrent neural networks. In Proceedings of International Conference on Learning Representations, San Diego, CA, 2015.

    Google Scholar 

  23. Q. V. Le, N. Jaitly, G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv: 1504.00941, 2015.

    Google Scholar 

  24. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL, Stroudsburg, USA, pp. 142–150, 2011.

    Google Scholar 

  25. M. P. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993.

    Google Scholar 

  26. W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv: 1409.2329, 2014.

    Google Scholar 

  27. Z. Z. Wu, S. King. Investigating gated recurrent neural networks for speech synthesis. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, 2016.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Wu.

Additional information

This work was supported by National Natural Science Foundation of China (Nos. 61422203 and 61333014), and National Key Basic Research Program of China (No. 2014CB340501).

Recommended by Associate Editor Yi Cao

Guo-Bing Zhou received the B. Sc. degree in computer science from Nanjing University, China in 2013. He is currently a postgraduate student in Nanjing University and will receive the M. Sc. degree in July, 2016.

His research interest is machine learning.

ORCID iD: 0000-0001-9779-481X

Jianxin Wu received the Ph.D. degree in computer science from the Georgia Institute of Technology, USA in 2009. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, China. He has served as an area chair for ICCV 2015 and senior PC member for AAAI 2016.

His research interests include computer vision and machine learning.

ORCID iD: 0000-0002-2085-7568

Chen-Lin Zhang is a candidate for the Bachelor's degree in the Department of Computer Science and Technology, Nanjing University, China.

His research interests include computer vision and machine learning.

Zhi-Hua Zhou is a professor, standing deputy director of the National Key Laboratory for Novel Software Technology, and Founding Director of the LAMDA Group at Nanjing University. He is a Fellow of the AAAI, IEEE, IAPR, IET/IEE, CCF, and an ACM Distinguished Scientist.

His research interests include artificial intelligence, machine learning and data mining.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, GB., Wu, J., Zhang, CL. et al. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13, 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-016-1006-2

Keywords

Navigation