ABSTRACT
Deep neural networks (DNNs) have produced state-of-the-art results in many benchmarks and problem domains. However, the success of DNNs depends on the proper configuration of its architecture and hyperparameters. Such a configuration is difficult and as a result, DNNs are often not used to their full potential. In addition, DNNs in commercial applications often need to satisfy real-world design constraints such as size or number of parameters. To make configuration easier, automatic machine learning (AutoML) systems for deep learning have been developed, focusing mostly on optimization of hyperparameters.
This paper takes AutoML a step further. It introduces an evolutionary AutoML framework called LEAF that not only optimizes hyperparameters but also network architectures and the size of the network. LEAF makes use of both state-of-the-art evolutionary algorithms (EAs) and distributed computing frameworks. Experimental results on medical image classification and natural language analysis show that the framework can be used to achieve state-of-the-art performance. In particular, LEAF demonstrates that architecture optimization provides a significant boost over hyperparameter optimization, and that networks can be minimized at the same time with little drop in performance. LEAF therefore forms a foundation for democratizing and improving AI, as well as making AI practical in future applications.
- 2017. AutoML for large scale image classification and object detection. https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html. (Nov 2017).Google Scholar
- 2019. Amazon Web Services (AWS) - Cloud Computing Services. aws.amazon.com. (2019).Google Scholar
- 2019. Google Cloud. cloud.google.com. (2019).Google Scholar
- 2019. Jigsaw Toxic Comment Classification Challenge. kaggle.com/jigsaw-toxic-comment-classification-challenge. (2019).Google Scholar
- 2019. Metric Optimization Engine. https://github.com/Yelp/MOE. (2019).Google Scholar
- 2019. Microsoft Azure Cloud Computing Platform and Services. azure.microsoft.com. (2019).Google Scholar
- 2019. Research:Detox/Data Release. meta.wikimedia.org/wiki/Research:Detox. (2019).Google Scholar
- 2019. SigOpt. sigopt.com/product. (2019).Google Scholar
- 2019. StudioML. https://www.studio.ml/. (2019).Google Scholar
- 2019. Using the Microsoft TLC Machine Learning Tool. jamesmccaffrey.wordpress.com/2017/01/13/using-the-microsoft-tlc-machine-learning-tool. (2019).Google Scholar
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305. Google ScholarDigital Library
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
- Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2016. Recurrent Neural Networks for Multivariate Time Series with Missing Values. CoRR abs/1606.01865 (2016). http://arxiv.org/abs/1606.01865Google Scholar
- F. Chollet et al. 2015. Keras. (2015).Google Scholar
- Theodora Chu, Kylie Jue, and Max Wang. 2016. Comment Abuse Classification with Deep Learning. Von https://web.stanford.edu/class/cs224n/reports/2762092.pdf abgerufen (2016).Google Scholar
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167. Google ScholarDigital Library
- Kalyanmoy Deb. 2015. Multi-objective evolutionary algorithms. In Springer Handbook of Computational Intelligence. Springer, 995--1015.Google Scholar
- Thomas Elsken, J Hendrik Metzen, and Frank Hutter. 1804. Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution. ArXiv e-prints (1804).Google Scholar
- Faustino Gomez, Jürgen Schmidhuber, and Risto Miikkulainen. 2008. Accelerated neural evolution through cooperatively coevolved synapses. Journal of Machine Learning Research 9, May (2008), 937--965. Google ScholarDigital Library
- Faustino J Gomez and Risto Miikkulainen. 1999. Solving non-Markovian control tasks with neuroevolution. In IJCAI, Vol. 99. 1356--1361. Google ScholarDigital Library
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645--6649.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. CoRR abs/1603.05027 (2016). http://arxiv.org/abs/1603.05027Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European conference on computer vision. Springer, 630--645.Google ScholarCross Ref
- Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely Connected Convolutional Networks.. In CVPR, Vol. 1. 3.Google Scholar
- Eamonn Keogh and Abdullah Mueen. 2011. Curse of dimensionality. In Encyclopedia of machine learning. Springer, 257--258.Google Scholar
- D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google Scholar
- Jason Liang, Elliot Meyerson, and Risto Miikkulainen. 2018. Evolutionary Architecture Search for Deep Multitask Networks. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 466--473. Google ScholarDigital Library
- Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).Google Scholar
- Ilya Loshchilov and Frank Hutter. 2016. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. arXiv preprint arXiv:1604.07269 (2016).Google Scholar
- Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf. 2018. NSGA-NET: A Multi-Objective Genetic Algorithm for Neural Architecture Search. arXiv preprint arXiv:1810.03522 (2018).Google Scholar
- E. Meyerson and R. Miikkulainen. 2018. Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering. ICLR (2018).Google Scholar
- R. Miikkulainen, J. Liang, E. Meyerson, et al. 2017. Evolving deep neural networks. arXiv preprint arXiv:1703.00548 (2017).Google Scholar
- Tomáš Mikolov, Martin Karafát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarDigital Library
- David E Moriarty and Risto Miikkulainen. 1998. Hierarchical evolution of neural networks. In Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on. IEEE, 428--433.Google ScholarCross Ref
- Joe Yue-Hei Ng, Matthew J. Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond Short Snippets: Deep Networks for Video Classification. CoRR abs/1503.08909 (2015). http://arxiv.org/abs/1503.08909Google Scholar
- Mitchell A Potter and Kenneth A De Jong. 1994. A cooperative coevolutionary approach to function optimization. In International Conference on Parallel Problem Solving from Nature. Springer, 249--257. Google ScholarDigital Library
- Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. 2017. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).Google Scholar
- Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2018. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018).Google Scholar
- E. Real, S. Moore, A. Selle, et al. 2017. Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041 (2017). Google ScholarDigital Library
- Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarDigital Library
- Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Mr Prabhat, and Ryan P Adams. 2015. Scalable Bayesian Optimization Using Deep Neural Networks.. In ICML. 2171--2180. Google ScholarDigital Library
- Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation 10 (2002), 99--127. stanley:ec02 Google ScholarDigital Library
- M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proc. of GECCO. ACM, 497--504. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarCross Ref
- Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11, Dec (2010), 3371--3408. Google ScholarDigital Library
- Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 3462--3471.Google ScholarCross Ref
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256. Google ScholarDigital Library
- Kohsuke Yanai and Hitoshi Iba. 2001. Multi-agent robot learning by means of genetic programming: Solving an escape problem. In International Conference on Evolvable Systems. Springer, 192--203. Google ScholarDigital Library
- Chern Han Yong and Risto Miikkulainen. 2001. Cooperative coevolution of multi-agent systems. University of Texas at Austin, Austin, TX (2001).Google ScholarDigital Library
- Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. 2011. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation 1, 1 (2011), 32--49.Google ScholarCross Ref
- Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. CoRR abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578Google Scholar
Index Terms
- Evolutionary neural AutoML for deep learning
Recommendations
Flood susceptibility mapping using AutoML and a deep learning framework with evolutionary algorithms for hyperparameter optimization
AbstractFlooding is one of the most common natural hazards that have extremely detrimental consequences. Understanding which areas are vulnerable to flooding is crucial to addressing these effects. In this work, we use machine learning models and ...
Graphical abstractDisplay Omitted
Highlights- A 3D CNN model is proposed to assess flood susceptibility.
- Model performance is compared with state-of-the-art machine learning and AutoML models.
- A novel hyperparameter optimization model designed for the 3D CNN model.
- The ...
Evolutionary architecture search for deep multitask networks
GECCO '18: Proceedings of the Genetic and Evolutionary Computation ConferenceMultitask learning, i.e. learning several tasks at once with the same neural network, can improve performance in each of the tasks. Designing deep neural network architectures for multitask learning is a challenge: There are many ways to tie the tasks ...
Autonomous Learning Rate Optimization for Deep Learning
Learning and Intelligent OptimizationAbstractA significant question in deep learning is: what should that learning rate be? The answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and ...
Comments