survey

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Author:
Gaurav Menghani

Google Research, Mountain View (CA)

Google Research, Mountain View (CA)

0000-0003-2912-2522
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 55 Issue 12Article No.: 259pp 1–37https://doi.org/10.1145/3578938

Published:02 March 2023Publication History

ACM Computing Surveys

Abstract

Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

Supplemental Material

Available for Download

pdf

3578938-supp.pdf (57.3 KB)

Supplementary material

REFERENCES

[1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved February 1, 2023 from https://www.tensorflow.org/.(Software available from tensorflow.org.)Google Scholar
[2] Agnihotri Apoorv and Batra Nipun. 2020. Exploring Bayesian optimization. Distill 5, 5 (2020), e26.Google ScholarCross Ref
[3] Akay Bahriye, Karaboga Dervis, and Akay Rustu. 2022. A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55, 2 (Feb.2022), 829–894. Google ScholarDigital Library
[4] Developers Android. 2021. Neural Networks API \(\vert\) Android NDK \(\vert\) Android Developers. Retrieved June 3, 2021 from https://developer.android.com/ndk/guides/neuralnetworks.Google Scholar
[5] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1–18.Google ScholarDigital Library
[6] Authors Apple. 2021. Accelerate \(\vert\) Apple Developer Documentation. Retrieved June 3, 2021 from https://developer.apple.com/documentation/accelerate.Google Scholar
[7] Authors PyTorch. 2021. Performance Tuning Guide—PyTorch Tutorials 1.8.1+cu102 Documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html.Google Scholar
[8] Authors PyTorch. 2021. PyTorch Mobile. Retrieved June 3, 2021 from https://pytorch.org/mobile/home.Google Scholar
[9] Authors PyTorch. 2021. Quantization Recipe—PyTorch Tutorials 1.8.1+cu102 documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/quantization.html.Google Scholar
[10] Authors PyTorch. 2021. torch.jit.script—PyTorch 1.8.1 Documentation. Retrieved June 3, 2021 from https://pytorch.org/docs/stable/generated/torch.jit.script.html.Google Scholar
[11] Authors TensorFlow. 2021. Post-Training Quantization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/post_training_quantization.Google Scholar
[12] Authors Tensorflow. 2021. TensorFlow Lite Converter. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/convert.Google Scholar
[13] Authors Tensorflow. 2021. TensorFlow Lite \(\vert\) ML for Mobile and Edge Devices. Retrieved June 3, 2021 from https://www.tensorflow.org/lite.Google Scholar
[14] Authors TensorFlow. 2021. XLA: Optimizing Compiler for Machine Learning \(\vert\) TensorFlow. Retrieved June 3, 2021 from https://www.tensorflow.org/xla.Google Scholar
[15] Authors XNNPACK. 2021. XNNPACK. Retrieved June 3, 2021 from https://github.com/google/XNNPACK.Google Scholar
[16] Authors XNNPACK. 2021. XNNPACK Backend for TensorFlow Lite. Retrieved June 3, 2021 from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md/#sparse-inference.Google Scholar
[17] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google Scholar
[18] Bergstra James and Bengio Yoshua. 2012. Random search for hyper-parameter optimization.Journal of Machine Learning Research 13, 2 (2012), 281–305.Google Scholar
[19] Blum Avrim and Mitchell Tom. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.Google ScholarDigital Library
[20] Bradbury James, Merity Stephen, Xiong Caiming, and Socher Richard. 2017. Quasi-recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google Scholar
[21] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS’20).Google Scholar
[22] Buciluǎ Cristian, Caruana Rich, and Niculescu-Mizil Alexandru. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535–541.Google ScholarDigital Library
[23] Charikar Moses S.. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 380–388.Google ScholarDigital Library
[24] Chaudhari Sneha, Mithal Varun, Polatkan Gungor, and Ramanath Rohan. 2021. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), Article 53, 32 pages. DOI: DOI: https://doi.org/10.1145/3465055Google ScholarDigital Library
[25] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarCross Ref
[26] Cheng Yu, Wang Duo, Zhou Pan, and Zhang Tao. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136. DOI: DOI: https://doi.org/10.1109/MSP.2017.2765695Google ScholarCross Ref
[27] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.Google ScholarCross Ref
[28] Chollet Francois. 2020. The Keras Blog. Retrieved June 4, 2021 from https://blog.keras.io.Google Scholar
[29] Chowdhery Aakanksha, Narang Sharan, Devlin Jacob, Bosma Maarten, Mishra Gaurav, Roberts Adam, Barham Paul, et al. 2022. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).Google Scholar
[30] Ciresan Dan C., Meier Ueli, Masci Jonathan, Gambardella Luca Maria, and Schmidhuber Jürgen. 2011. High-performance neural networks for visual object classification. CoRR abs/1102.0183 (2011). http://arxiv.org/abs/1102.0183.Google Scholar
[31] Projects Contributors to Wikimedia. 2021. AVX-512—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1025044245.Google Scholar
[32] Projects Contributors to Wikimedia. 2021. CUDA—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=CUDA&oldid=1025500257.Google Scholar
[33] Projects Contributors to Wikimedia. 2021. Hyperparameter Optimization—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Hyperparameter_optimization&oldid=1022309479.Google Scholar
[34] Projects Contributors to Wikimedia. 2021. Multiply-Accumulate Operation—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Multiply-accumulate_operation&oldid=1026461481.Google Scholar
[35] Projects Contributors to Wikimedia. 2021. SSE\(^{4}\)—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=SSE4&oldid=1023092035.Google Scholar
[36] Cubuk Ekin D., Zoph Barret, Shlens Jonathon, and Le Quoc V.. 2020. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702–703.Google ScholarCross Ref
[37] Dehghani Mostafa, Tay Yi, Arnab Anurag, Beyer Lucas, and Vaswani Ashish. 2022. The efficiency misnomer. In Proceedings of the International Conference on Learning Representations.Google Scholar
[38] Dettmers Tim and Zettlemoyer Luke. 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).Google Scholar
[39] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).Google Scholar
[40] Dietterich Thomas G.. 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.Google ScholarDigital Library
[41] Doersch Carl, Gupta Abhinav, and Efros Alexei A.. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.Google ScholarDigital Library
[42] Dong Xin, Chen Shangyu, and Pan Sinno Jialin. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Google Scholar
[43] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR ’21). https://openreview.net/forum?id=YicbFdNTTy.Google Scholar
[44] Dukhan Marat, Wu Yiming Wu, and Lu Hao. 2020. QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering. Retrieved June 3, 2021 from https://engineering.fb.com/2018/10/29/ml-applications/qnnpack.Google Scholar
[45] Elsen Erich, Dukhan Marat, Gale Trevor, and Simonyan Karen. 2020. Fast sparse ConvNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14629–14638.Google ScholarCross Ref
[46] Elsken Thomas, Metzen Jan Hendrik, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research 20, 55 (2019), 1–21.Google Scholar
[47] Evci Utku, Gale Trevor, Menick Jacob, Castro Pablo Samuel, and Elsen Erich. 2020. Rigging the lottery: Making all tickets winners. In Proceedings of the International Conference on Machine Learning. 2943–2952.Google Scholar
[48] Fawzi Alhussein, Samulowitz Horst, Turaga Deepak, and Frossard Pascal. 2016. Adaptive data augmentation for image classification. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, Los Alamitos, CA, 3688–3692.Google ScholarCross Ref
[49] Gidaris Spyros, Singh Praveer, and Komodakis Nikos. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google Scholar
[50] Golovin Daniel, Solnik Benjamin, Moitra Subhodeep, Kochanski Greg, Karro John, and Sculley D.. 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487–1495.Google ScholarDigital Library
[51] Google. 2021. Edge TPU Performance Benchmarks \(\vert\) Coral. Retrieved June 3, 2021 from https://coral.ai/docs/edgetpu/benchmarks.Google Scholar
[52] Gopalan Arjun, Juan Da-Cheng, Magalhaes Cesar Ilharco, Ferng Chun-Sung, Heydon Allan, Lu Chun-Ta, Pham Philip, Yu George, Fan Yicheng, and Wang Yueqi. 2021. Neural structured learning: Training neural networks with structured signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1150–1153.Google ScholarDigital Library
[53] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR ’16).Google Scholar
[54] Han Song, Pool Jeff, Tran John, and Dally William J.. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 1135–1143.Google Scholar
[55] Hansen Lars Kai and Salamon Peter. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993–1001.Google ScholarDigital Library
[56] Hassibi Babak, Stork David G., and Wolff Gregory J.. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks. IEEE, Los Alamitos, CA, 293–299.Google ScholarCross Ref
[57] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[58] He Yihui, Lin Ji, Liu Zhijian, Wang Hanrui, Li Li-Jia, and Han Song. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–800.Google ScholarDigital Library
[59] Heim Lennart. 2022. Estimating xn–PaLM-kd53c’s training cost. Blog.heim. Retrieved February 1, 2023 from https://blog.heim.xyz/palm-training-cost.Google Scholar
[60] Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. 2014. Distilling the knowledge in a neural network. In Proceedings of the NeurIPS 2014 Deep Learning Workshop.Google Scholar
[61] Hooker Sara, Moorosi Nyalleng, Clark Gregory, Bengio Samy, and Denton Emily. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google Scholar
[62] Howard Andrew, Sandler Mark, Chu Grace, Chen Liang-Chieh, Chen Bo, Tan Mingxing, Wang Weijun, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.Google ScholarCross Ref
[63] Howard Jeremy and Ruder Sebastian. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers). 328–339.Google Scholar
[64] Hsu Chi-Hung, Chang Shu-Huan, Liang Jhao-Hong, Chou Hsin-Ping, Liu Chun-Hao, Chang Shih-Chieh, Pan Jia-Yu, Chen Yu-Ting, Wei Wei, and Juan Da-Cheng. 2018. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332 (2018).Google Scholar
[65] Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869–6898.Google ScholarDigital Library
[66] Inoue Hiroshi. 2018. Data augmentation by pairing samples for images classification. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).Google Scholar
[67] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google ScholarCross Ref
[68] Jaderberg Max, Dalibard Valentin, Osindero Simon, Czarnecki Wojciech M., Donahue Jeff, Razavi Ali, Vinyals Oriol, et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).Google Scholar
[69] Jamieson Kevin and Talwalkar Ameet. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 240–248.Google Scholar
[70] Jordan Jeremy. 2020. Setting the learning rate of your neural network.Jeremy Jordan. Retrieved February 1, 2023 from https://www.jeremyjordan.me/nn-learning-rate.Google Scholar
[71] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.Google ScholarDigital Library
[72] Kaliamoorthi Prabhu, Ravi Sujith, and Kozareva Zornitsa. 2019. PRADO: Projection attention networks for document classification on-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5012–5021.Google ScholarCross Ref
[73] Kaliamoorthi Prabhu, Siddhant Aditya, Li Edward, and Johnson Melvin. 2021. Distilling large language models into tiny and effective students using pQRNN. arXiv preprint arXiv:2101.08890 (2021).Google Scholar
[74] Kanwar Pankaj, Brandt Peter, and Zhou Zongwei. 2021. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud. Retrieved June 3, 2021 from https://blog.tensorflow.org/2020/07/tensorflow-2-mlperf-submissions.html.Google Scholar
[75] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018). https://arxiv.org/abs/1806.08342v1.Google Scholar
[76] Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.Google Scholar
[77] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097–1105.Google Scholar
[78] Krogh Anders and Vedelsby Jesper. 1994. Neural network ensembles, cross validation and active learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems(NIPS’94).Google ScholarDigital Library
[79] Kung H. T. and Leiserson C. E.. 1980. Introduction to VLSI systems. Algorithms for VLSI Processor Arrays, C. A. Mead and L. Conway (Eds.). Addison-Wesley, Reading, MA, 271–292.Google Scholar
[80] Kung Hsiang-Tsung. 1982. Why systolic architectures?IEEE Computer 15, 1 (1982), 37–46.Google ScholarDigital Library
[81] Lecun Yann, Bottou Leon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov.1998), 2278–2324. DOI: DOI: https://doi.org/10.1109/5.726791Google ScholarCross Ref
[82] LeCun Yann, Denker John S., and Solla Sara A.. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2. 598–605.Google ScholarDigital Library
[83] Li Fengfu, Zhang Bo, and Liu Bin. 2016. Ternary weight networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16).Google Scholar
[84] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2016. Pruning filters for efficient ConvNets. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Poster.Google Scholar
[85] Li Lisha, Jamieson Kevin, DeSalvo Giulia, Rostamizadeh Afshin, and Talwalkar Ameet. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.Google ScholarDigital Library
[86] Li Sharon Y.. 2020. Automating data augmentation: Practice, theory and new direction. SAIL Blog. Retrieved February 1, 2023 from http://ai.stanford.edu/blog/data-augmentation.Google Scholar
[87] Liu Chenxi, Zoph Barret, Neumann Maxim, Shlens Jonathon, Hua Wei, Li Li-Jia, Fei-Fei Li, Yuille Alan, Huang Jonathan, and Murphy Kevin. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19–34.Google ScholarDigital Library
[88] Liu Hanxiao, Simonyan Karen, and Yang Yiming. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google Scholar
[89] Liu Zechun, Mu Haoyuan, Zhang Xiangyu, Guo Zichao, Yang Xin, Cheng Kwang-Ting, and Sun Jian. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3296–3305.Google ScholarCross Ref
[90] Ltd. Arm2021. SIMD ISAs \(\vert\) Neon—Arm Developer. Retrieved June 3, 2021 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.Google Scholar
[91] Mehta Sachin and Rastegari Mohammad. 2021. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).Google Scholar
[92] Menghani Gaurav and Ravi Sujith. 2019. Learning from a teacher using unlabeled data. arXiv preprint arXiv:1911.05275 (2019).Google Scholar
[93] Mikolov Tomas, Grave Edouard, Bojanowski Piotr, Puhrsch Christian, and Joulin Armand. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).Google Scholar
[94] Mishra Rahul, Gupta Hari Prabhat, and Dutta Tanima. 2020. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954 (2020).Google Scholar
[95] MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/training-normal-20.Google Scholar
[96] MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/inference-datacenter-20.Google Scholar
[97] Močkus Jonas. 1975. On Bayesian methods for seeking the extremum. In Proceedings of the Optimization Techniques IFIP Technical Conference. 400–404.Google ScholarCross Ref
[98] Molchanov Pavlo, Tyree Stephen, Karras Tero, Aila Timo, and Kautz Jan. 2017. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440 (2017).Google Scholar
[99] MosaicML. 2022. Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer. Retrieved June 29, 2022 from https://www.mosaicml.com/blog/mosaic-resnet.Google Scholar
[100] NVIDIA. 2020. GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture. Retrieved June 3, 2021 from https://developer.nvidia.com/gtc/2020/video/s22085-vid.Google Scholar
[101] NVIDIA. 2020. Inside Volta: The World’s Most Advanced Data Center GPU \(\vert\) NVIDIA Developer Blog. Retrieved June 3, 2021 from https://developer.nvidia.com/blog/inside-volta.Google Scholar
[102] NVIDIA. 2021. NVIDIA Embedded Systems for Next-Gen Autonomous Machines. Retrieved June 4, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems.Google Scholar
[103] Panigrahy Rina. 2021. Matrix Compression Operator. Retrieved June 5, 2021 from https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html.Google Scholar
[104] PapersWithCode.com. 2021. Papers with Code—The Latest in Machine Learning. Retrieved June 3, 2021 from https://paperswithcode.com.Google Scholar
[105] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (NeurIPS’19). 8024–8035.Google Scholar
[106] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.Google ScholarCross Ref
[107] Pham Hieu, Guan Melody, Zoph Barret, Le Quoc, and Dean Jeff. 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning. 4095–4104.Google Scholar
[108] Raina Rajat, Madhavan Anand, and Ng Andrew Y.. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.Google ScholarDigital Library
[109] Ramesh Aditya, Dhariwal Prafulla, Nichol Alex, Chu Casey, and Chen Mark. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).Google Scholar
[110] Rastegari Mohammad, Ordonez Vicente, Redmon Joseph, and Farhadi Ali. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525–542.Google ScholarCross Ref
[111] Ravi Sujith. 2019. ProjectionNet: Learning efficient on-device deep networks using neural projections. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar
[112] Ravi Sujith and Kozareva Zornitsa. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 887–893.Google ScholarCross Ref
[113] Real Esteban, Aggarwal Alok, Huang Yanping, and Le Quoc V.. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.Google ScholarDigital Library
[114] Research Google. 2021. Fast Sparse ConvNets—GitHub Repository. Retrieved June 3, 2021 from https://github.com/google-research/google-research/tree/master/fastconvnets.Google Scholar
[115] Rotem Nadav, Fix Jordan, Abdulrasool Saleem, Catron Garret, Deng Summer, Dzhabarov Roman, Gibson Nick, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
[116] Saharia Chitwan, Chan William, Saxena Saurabh, Li Lala, Whang Jay, Denton Emily, Ghasemipour Seyed Kamyar Seyed, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).Google Scholar
[117] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google ScholarCross Ref
[118] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS’19).Google Scholar
[119] Sankar Chinnadhurai, Ravi Sujith, and Kozareva Zornitsa. 2019. Transferable neural projection representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 3355–3360.Google ScholarCross Ref
[120] Sankar Chinnadhurai, Ravi Sujith, and Kozareva Zornitsa. 2021. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2823–2828.Google Scholar
[121] Sato Kaz. 2021. What Makes TPUs Fine-Tuned for Deep Learning? \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning.Google Scholar
[122] Schrittwieser Julian, Antonoglou Ioannis, Hubert Thomas, Simonyan Karen, Sifre Laurent, Schmitt Simon, Guez Arthur, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.Google ScholarCross Ref
[123] Simard Patrice Y., Steinkraus David, and John C. Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03).Google ScholarCross Ref
[124] Simonyan Karen and Zisserman Andrew. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015).Google Scholar
[125] Stosic Dusan. 2020. Training Neural Networks with Tensor Cores—Dusan Stosic, NVIDIA. Retrieved June 3, 2021 from https://www.youtube.com/watch?v=jF4-_ZK_tyc.Google Scholar
[126] Sun Chen, Shrivastava Abhinav, Singh Saurabh, and Gupta Abhinav. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision. 843–852.Google ScholarCross Ref
[127] Sun Zhiqing, Yu Hongkun, Song Xiaodan, Liu Renjie, Yang Yiming, and Zhou Denny. 2020. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2158–2170. DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.195.Google Scholar
[128] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 3104–3112.Google Scholar
[129] Sze Vivienne, Chen Yu-Hsin, Yang Tien-Ju, and Emer Joel S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
[130] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
[131] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarCross Ref
[132] Tay Yi, Dehghani Mostafa, Bahri Dara, and Metzler Donald. 2023. Efficient transformers: A survey. ACM Computing Surveys 55, 6 (2023), Article 109: 28 pages.Google ScholarDigital Library
[133] TensorFlow. 2021. Model Optimization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/model_optimization.Google Scholar
[134] Tsang Sik-Ho. 2019. Review: Xception—With depthwise separable convolution, better than Inception-v3 (image classification). Medium. Retrieved February 1, 2023 from https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568.Google Scholar
[135] Urban Gregor, Geras Krzysztof J., Kahou Samira Ebrahimi, Aslan Özlem, Wang Shengjie, Mohamed Abdelrahman, Philipose Matthai, Richardson Matthew, and Caruana Rich. 2017. Do deep convolutional nets really need to be deep and convolutional? In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).Google Scholar
[136] Vanhoucke Vincent, Senior Andrew, and Mao Mark Z.. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).Google Scholar
[137] Vasilache Nicolas, Zinenko Oleksandr, Theodoridis Theodoros, Goyal Priya, DeVito Zachary, Moses William S., Verdoolaege Sven, Adams Andrew, and Cohen Albert. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google Scholar
[138] Vaswani Ashish, Shazeer Noam M., Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural InformationProcessingSystems (NIPS’17). 1–11.Google Scholar
[139] Wang Peisong, Chen Qiang, He Xiangyu, and Cheng Jian. 2020. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the International Conference on Machine Learning. 9847–9856. http://proceedings.mlr.press/v119/wang20c.html.Google Scholar
[140] Wang Shibo and Kanwar Pankaj. 2021. BFloat16: The Secret to High Performance on Cloud TPUs \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.Google Scholar
[141] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742.Google ScholarCross Ref
[142] Xie Qizhe, Luong Minh-Thang, Hovy Eduard, and Le Quoc V.. 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687–10698.Google ScholarCross Ref
[143] Yalniz I. Zeki, Jégou Hervé, Chen Kan, Paluri Manohar, and Mahajan Dhruv. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).Google Scholar
[144] Yu Jiahui, Xu Yuanzhong, Koh Jing Yu, Luong Thang, Baid Gunjan, Wang Zirui, Vasudevan Vijay, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).Google Scholar
[145] Yu Tong and Zhu Hong. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).Google Scholar
[146] Yu Xiyu, Liu Tongliang, Wang Xinchao, and Tao Dacheng. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 67–76. DOI: DOI: https://doi.org/10.1109/CVPR.2017.15Google ScholarCross Ref
[147] Zagoruyko Sergey and Komodakis N.. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). 1–13.Google Scholar
[148] Zhang Hongyi, Cissé Moustapha, Dauphin Yann N., and Lopez-Paz David. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18). 1–13.Google Scholar
[149] Zhu Michael and Gupta Suyog. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google Scholar
[150] Zoph Barret and Le Quoc V.. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google Scholar
[151] Zoph Barret, Vasudevan Vijay, Shlens Jonathon, and Le Quoc V.. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.Google ScholarCross Ref

Index Terms

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
  2. Machine learning

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Read More
Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Read More
Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 55, Issue 12
December 2023
825 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3582891
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 March 2023
- Online AM: 20 January 2023
- Accepted: 22 November 2022
- Revised: 30 June 2022
- Received: 13 July 2021
Published in csur Volume 55, Issue 12

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Efficient deep learning
efficient machine learning
efficient artificial intelligence
quantization
pruning
sparsity
distillation
model compression
model optimization
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 6,530
  Total Downloads
- Downloads (Last 12 months)4,757
- Downloads (Last 6 weeks)536
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

REFERENCES

Cited By

Index Terms

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Deep learning: systematic review, models, challenges, and research directions