skip to main content
10.1145/3321707.3321721acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Evolutionary neural AutoML for deep learning

Published:13 July 2019Publication History

ABSTRACT

Deep neural networks (DNNs) have produced state-of-the-art results in many benchmarks and problem domains. However, the success of DNNs depends on the proper configuration of its architecture and hyperparameters. Such a configuration is difficult and as a result, DNNs are often not used to their full potential. In addition, DNNs in commercial applications often need to satisfy real-world design constraints such as size or number of parameters. To make configuration easier, automatic machine learning (AutoML) systems for deep learning have been developed, focusing mostly on optimization of hyperparameters.

This paper takes AutoML a step further. It introduces an evolutionary AutoML framework called LEAF that not only optimizes hyperparameters but also network architectures and the size of the network. LEAF makes use of both state-of-the-art evolutionary algorithms (EAs) and distributed computing frameworks. Experimental results on medical image classification and natural language analysis show that the framework can be used to achieve state-of-the-art performance. In particular, LEAF demonstrates that architecture optimization provides a significant boost over hyperparameter optimization, and that networks can be minimized at the same time with little drop in performance. LEAF therefore forms a foundation for democratizing and improving AI, as well as making AI practical in future applications.

References

  1. 2017. AutoML for large scale image classification and object detection. https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html. (Nov 2017).Google ScholarGoogle Scholar
  2. 2019. Amazon Web Services (AWS) - Cloud Computing Services. aws.amazon.com. (2019).Google ScholarGoogle Scholar
  3. 2019. Google Cloud. cloud.google.com. (2019).Google ScholarGoogle Scholar
  4. 2019. Jigsaw Toxic Comment Classification Challenge. kaggle.com/jigsaw-toxic-comment-classification-challenge. (2019).Google ScholarGoogle Scholar
  5. 2019. Metric Optimization Engine. https://github.com/Yelp/MOE. (2019).Google ScholarGoogle Scholar
  6. 2019. Microsoft Azure Cloud Computing Platform and Services. azure.microsoft.com. (2019).Google ScholarGoogle Scholar
  7. 2019. Research:Detox/Data Release. meta.wikimedia.org/wiki/Research:Detox. (2019).Google ScholarGoogle Scholar
  8. 2019. SigOpt. sigopt.com/product. (2019).Google ScholarGoogle Scholar
  9. 2019. StudioML. https://www.studio.ml/. (2019).Google ScholarGoogle Scholar
  10. 2019. Using the Microsoft TLC Machine Learning Tool. jamesmccaffrey.wordpress.com/2017/01/13/using-the-microsoft-tlc-machine-learning-tool. (2019).Google ScholarGoogle Scholar
  11. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google ScholarGoogle Scholar
  13. Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2016. Recurrent Neural Networks for Multivariate Time Series with Missing Values. CoRR abs/1606.01865 (2016). http://arxiv.org/abs/1606.01865Google ScholarGoogle Scholar
  14. F. Chollet et al. 2015. Keras. (2015).Google ScholarGoogle Scholar
  15. Theodora Chu, Kylie Jue, and Max Wang. 2016. Comment Abuse Classification with Deep Learning. Von https://web.stanford.edu/class/cs224n/reports/2762092.pdf abgerufen (2016).Google ScholarGoogle Scholar
  16. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kalyanmoy Deb. 2015. Multi-objective evolutionary algorithms. In Springer Handbook of Computational Intelligence. Springer, 995--1015.Google ScholarGoogle Scholar
  18. Thomas Elsken, J Hendrik Metzen, and Frank Hutter. 1804. Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution. ArXiv e-prints (1804).Google ScholarGoogle Scholar
  19. Faustino Gomez, Jürgen Schmidhuber, and Risto Miikkulainen. 2008. Accelerated neural evolution through cooperatively coevolved synapses. Journal of Machine Learning Research 9, May (2008), 937--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Faustino J Gomez and Risto Miikkulainen. 1999. Solving non-Markovian control tasks with neuroevolution. In IJCAI, Vol. 99. 1356--1361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645--6649.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. CoRR abs/1603.05027 (2016). http://arxiv.org/abs/1603.05027Google ScholarGoogle Scholar
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European conference on computer vision. Springer, 630--645.Google ScholarGoogle ScholarCross RefCross Ref
  25. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  26. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely Connected Convolutional Networks.. In CVPR, Vol. 1. 3.Google ScholarGoogle Scholar
  27. Eamonn Keogh and Abdullah Mueen. 2011. Curse of dimensionality. In Encyclopedia of machine learning. Springer, 257--258.Google ScholarGoogle Scholar
  28. D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).Google ScholarGoogle Scholar
  29. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  30. Jason Liang, Elliot Meyerson, and Risto Miikkulainen. 2018. Evolutionary Architecture Search for Deep Multitask Networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 466--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).Google ScholarGoogle Scholar
  32. Ilya Loshchilov and Frank Hutter. 2016. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. arXiv preprint arXiv:1604.07269 (2016).Google ScholarGoogle Scholar
  33. Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf. 2018. NSGA-NET: A Multi-Objective Genetic Algorithm for Neural Architecture Search. arXiv preprint arXiv:1810.03522 (2018).Google ScholarGoogle Scholar
  34. E. Meyerson and R. Miikkulainen. 2018. Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering. ICLR (2018).Google ScholarGoogle Scholar
  35. R. Miikkulainen, J. Liang, E. Meyerson, et al. 2017. Evolving deep neural networks. arXiv preprint arXiv:1703.00548 (2017).Google ScholarGoogle Scholar
  36. Tomáš Mikolov, Martin Karafát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  37. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David E Moriarty and Risto Miikkulainen. 1998. Hierarchical evolution of neural networks. In Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on. IEEE, 428--433.Google ScholarGoogle ScholarCross RefCross Ref
  39. Joe Yue-Hei Ng, Matthew J. Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond Short Snippets: Deep Networks for Video Classification. CoRR abs/1503.08909 (2015). http://arxiv.org/abs/1503.08909Google ScholarGoogle Scholar
  40. Mitchell A Potter and Kenneth A De Jong. 1994. A cooperative coevolutionary approach to function optimization. In International Conference on Parallel Problem Solving from Nature. Springer, 249--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. 2017. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).Google ScholarGoogle Scholar
  42. Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2018. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018).Google ScholarGoogle Scholar
  43. E. Real, S. Moore, A. Selle, et al. 2017. Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Mr Prabhat, and Ryan P Adams. 2015. Scalable Bayesian Optimization Using Deep Neural Networks.. In ICML. 2171--2180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation 10 (2002), 99--127. stanley:ec02 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proc. of GECCO. ACM, 497--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  49. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11, Dec (2010), 3371--3408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 3462--3471.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Kohsuke Yanai and Hitoshi Iba. 2001. Multi-agent robot learning by means of genetic programming: Solving an escape problem. In International Conference on Evolvable Systems. Springer, 192--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chern Han Yong and Risto Miikkulainen. 2001. Cooperative coevolution of multi-agent systems. University of Texas at Austin, Austin, TX (2001).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. 2011. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation 1, 1 (2011), 32--49.Google ScholarGoogle ScholarCross RefCross Ref
  55. Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. CoRR abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578Google ScholarGoogle Scholar

Index Terms

  1. Evolutionary neural AutoML for deep learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
          July 2019
          1545 pages
          ISBN:9781450361118
          DOI:10.1145/3321707

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 July 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader