skip to main content
survey

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Published:02 March 2023Publication History
Skip Abstract Section

Abstract

Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved February 1, 2023 from https://www.tensorflow.org/.(Software available from tensorflow.org.)Google ScholarGoogle Scholar
  2. [2] Agnihotri Apoorv and Batra Nipun. 2020. Exploring Bayesian optimization. Distill 5, 5 (2020), e26.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Akay Bahriye, Karaboga Dervis, and Akay Rustu. 2022. A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55, 2 (Feb.2022), 829894. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Developers Android. 2021. Neural Networks API \(\vert\) Android NDK \(\vert\) Android Developers. Retrieved June 3, 2021 from https://developer.android.com/ndk/guides/neuralnetworks.Google ScholarGoogle Scholar
  5. [5] Anwar Sajid, Hwang Kyuyeon, and Sung Wonyong. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Authors Apple. 2021. Accelerate \(\vert\) Apple Developer Documentation. Retrieved June 3, 2021 from https://developer.apple.com/documentation/accelerate.Google ScholarGoogle Scholar
  7. [7] Authors PyTorch. 2021. Performance Tuning Guide—PyTorch Tutorials 1.8.1+cu102 Documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html.Google ScholarGoogle Scholar
  8. [8] Authors PyTorch. 2021. PyTorch Mobile. Retrieved June 3, 2021 from https://pytorch.org/mobile/home.Google ScholarGoogle Scholar
  9. [9] Authors PyTorch. 2021. Quantization Recipe—PyTorch Tutorials 1.8.1+cu102 documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/quantization.html.Google ScholarGoogle Scholar
  10. [10] Authors PyTorch. 2021. torch.jit.script—PyTorch 1.8.1 Documentation. Retrieved June 3, 2021 from https://pytorch.org/docs/stable/generated/torch.jit.script.html.Google ScholarGoogle Scholar
  11. [11] Authors TensorFlow. 2021. Post-Training Quantization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/post_training_quantization.Google ScholarGoogle Scholar
  12. [12] Authors Tensorflow. 2021. TensorFlow Lite Converter. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/convert.Google ScholarGoogle Scholar
  13. [13] Authors Tensorflow. 2021. TensorFlow Lite \(\vert\) ML for Mobile and Edge Devices. Retrieved June 3, 2021 from https://www.tensorflow.org/lite.Google ScholarGoogle Scholar
  14. [14] Authors TensorFlow. 2021. XLA: Optimizing Compiler for Machine Learning \(\vert\) TensorFlow. Retrieved June 3, 2021 from https://www.tensorflow.org/xla.Google ScholarGoogle Scholar
  15. [15] Authors XNNPACK. 2021. XNNPACK. Retrieved June 3, 2021 from https://github.com/google/XNNPACK.Google ScholarGoogle Scholar
  16. [16] Authors XNNPACK. 2021. XNNPACK Backend for TensorFlow Lite. Retrieved June 3, 2021 from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md/#sparse-inference.Google ScholarGoogle Scholar
  17. [17] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  18. [18] Bergstra James and Bengio Yoshua. 2012. Random search for hyper-parameter optimization.Journal of Machine Learning Research 13, 2 (2012), 281–305.Google ScholarGoogle Scholar
  19. [19] Blum Avrim and Mitchell Tom. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Bradbury James, Merity Stephen, Xiong Caiming, and Socher Richard. 2017. Quasi-recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google ScholarGoogle Scholar
  21. [21] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS’20).Google ScholarGoogle Scholar
  22. [22] Buciluǎ Cristian, Caruana Rich, and Niculescu-Mizil Alexandru. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Charikar Moses S.. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 380388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Chaudhari Sneha, Mithal Varun, Polatkan Gungor, and Ramanath Rohan. 2021. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), Article 53, 32 pages. DOI: DOI: https://doi.org/10.1145/3465055Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321357.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Cheng Yu, Wang Duo, Zhou Pan, and Zhang Tao. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126136. DOI: DOI: https://doi.org/10.1109/MSP.2017.2765695Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12511258.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Chollet Francois. 2020. The Keras Blog. Retrieved June 4, 2021 from https://blog.keras.io.Google ScholarGoogle Scholar
  29. [29] Chowdhery Aakanksha, Narang Sharan, Devlin Jacob, Bosma Maarten, Mishra Gaurav, Roberts Adam, Barham Paul, et al. 2022. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).Google ScholarGoogle Scholar
  30. [30] Ciresan Dan C., Meier Ueli, Masci Jonathan, Gambardella Luca Maria, and Schmidhuber Jürgen. 2011. High-performance neural networks for visual object classification. CoRR abs/1102.0183 (2011). http://arxiv.org/abs/1102.0183.Google ScholarGoogle Scholar
  31. [31] Projects Contributors to Wikimedia. 2021. AVX-512—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1025044245.Google ScholarGoogle Scholar
  32. [32] Projects Contributors to Wikimedia. 2021. CUDA—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=CUDA&oldid=1025500257.Google ScholarGoogle Scholar
  33. [33] Projects Contributors to Wikimedia. 2021. Hyperparameter Optimization—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Hyperparameter_optimization&oldid=1022309479.Google ScholarGoogle Scholar
  34. [34] Projects Contributors to Wikimedia. 2021. Multiply-Accumulate Operation—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Multiply-accumulate_operation&oldid=1026461481.Google ScholarGoogle Scholar
  35. [35] Projects Contributors to Wikimedia. 2021. SSE\(^{4}\)—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=SSE4&oldid=1023092035.Google ScholarGoogle Scholar
  36. [36] Cubuk Ekin D., Zoph Barret, Shlens Jonathon, and Le Quoc V.. 2020. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702703.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Dehghani Mostafa, Tay Yi, Arnab Anurag, Beyer Lucas, and Vaswani Ashish. 2022. The efficiency misnomer. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  38. [38] Dettmers Tim and Zettlemoyer Luke. 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).Google ScholarGoogle Scholar
  39. [39] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).Google ScholarGoogle Scholar
  40. [40] Dietterich Thomas G.. 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems. 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Doersch Carl, Gupta Abhinav, and Efros Alexei A.. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 14221430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Dong Xin, Chen Shangyu, and Pan Sinno Jialin. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  43. [43] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR ’21). https://openreview.net/forum?id=YicbFdNTTy.Google ScholarGoogle Scholar
  44. [44] Dukhan Marat, Wu Yiming Wu, and Lu Hao. 2020. QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering. Retrieved June 3, 2021 from https://engineering.fb.com/2018/10/29/ml-applications/qnnpack.Google ScholarGoogle Scholar
  45. [45] Elsen Erich, Dukhan Marat, Gale Trevor, and Simonyan Karen. 2020. Fast sparse ConvNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1462914638.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Elsken Thomas, Metzen Jan Hendrik, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research 20, 55 (2019), 121.Google ScholarGoogle Scholar
  47. [47] Evci Utku, Gale Trevor, Menick Jacob, Castro Pablo Samuel, and Elsen Erich. 2020. Rigging the lottery: Making all tickets winners. In Proceedings of the International Conference on Machine Learning. 29432952.Google ScholarGoogle Scholar
  48. [48] Fawzi Alhussein, Samulowitz Horst, Turaga Deepak, and Frossard Pascal. 2016. Adaptive data augmentation for image classification. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, Los Alamitos, CA, 36883692.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Gidaris Spyros, Singh Praveer, and Komodakis Nikos. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google ScholarGoogle Scholar
  50. [50] Golovin Daniel, Solnik Benjamin, Moitra Subhodeep, Kochanski Greg, Karro John, and Sculley D.. 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 14871495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Google. 2021. Edge TPU Performance Benchmarks \(\vert\) Coral. Retrieved June 3, 2021 from https://coral.ai/docs/edgetpu/benchmarks.Google ScholarGoogle Scholar
  52. [52] Gopalan Arjun, Juan Da-Cheng, Magalhaes Cesar Ilharco, Ferng Chun-Sung, Heydon Allan, Lu Chun-Ta, Pham Philip, Yu George, Fan Yicheng, and Wang Yueqi. 2021. Neural structured learning: Training neural networks with structured signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 11501153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR ’16).Google ScholarGoogle Scholar
  54. [54] Han Song, Pool Jeff, Tran John, and Dally William J.. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 1135–1143.Google ScholarGoogle Scholar
  55. [55] Hansen Lars Kai and Salamon Peter. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 9931001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Hassibi Babak, Stork David G., and Wolff Gregory J.. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks. IEEE, Los Alamitos, CA, 293299.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] He Yihui, Lin Ji, Liu Zhijian, Wang Hanrui, Li Li-Jia, and Han Song. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Heim Lennart. 2022. Estimating xn–PaLM-kd53c’s training cost. Blog.heim. Retrieved February 1, 2023 from https://blog.heim.xyz/palm-training-cost.Google ScholarGoogle Scholar
  60. [60] Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. 2014. Distilling the knowledge in a neural network. In Proceedings of the NeurIPS 2014 Deep Learning Workshop.Google ScholarGoogle Scholar
  61. [61] Hooker Sara, Moorosi Nyalleng, Clark Gregory, Bengio Samy, and Denton Emily. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google ScholarGoogle Scholar
  62. [62] Howard Andrew, Sandler Mark, Chu Grace, Chen Liang-Chieh, Chen Bo, Tan Mingxing, Wang Weijun, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13141324.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Howard Jeremy and Ruder Sebastian. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers). 328339.Google ScholarGoogle Scholar
  64. [64] Hsu Chi-Hung, Chang Shu-Huan, Liang Jhao-Hong, Chou Hsin-Ping, Liu Chun-Hao, Chang Shih-Chieh, Pan Jia-Yu, Chen Yu-Ting, Wei Wei, and Juan Da-Cheng. 2018. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332 (2018).Google ScholarGoogle Scholar
  65. [65] Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 68696898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Inoue Hiroshi. 2018. Data augmentation by pairing samples for images classification. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  67. [67] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Jaderberg Max, Dalibard Valentin, Osindero Simon, Czarnecki Wojciech M., Donahue Jeff, Razavi Ali, Vinyals Oriol, et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).Google ScholarGoogle Scholar
  69. [69] Jamieson Kevin and Talwalkar Ameet. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 240248.Google ScholarGoogle Scholar
  70. [70] Jordan Jeremy. 2020. Setting the learning rate of your neural network.Jeremy Jordan. Retrieved February 1, 2023 from https://www.jeremyjordan.me/nn-learning-rate.Google ScholarGoogle Scholar
  71. [71] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Kaliamoorthi Prabhu, Ravi Sujith, and Kozareva Zornitsa. 2019. PRADO: Projection attention networks for document classification on-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 50125021.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Kaliamoorthi Prabhu, Siddhant Aditya, Li Edward, and Johnson Melvin. 2021. Distilling large language models into tiny and effective students using pQRNN. arXiv preprint arXiv:2101.08890 (2021).Google ScholarGoogle Scholar
  74. [74] Kanwar Pankaj, Brandt Peter, and Zhou Zongwei. 2021. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud. Retrieved June 3, 2021 from https://blog.tensorflow.org/2020/07/tensorflow-2-mlperf-submissions.html.Google ScholarGoogle Scholar
  75. [75] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018). https://arxiv.org/abs/1806.08342v1.Google ScholarGoogle Scholar
  76. [76] Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.Google ScholarGoogle Scholar
  77. [77] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 10971105.Google ScholarGoogle Scholar
  78. [78] Krogh Anders and Vedelsby Jesper. 1994. Neural network ensembles, cross validation and active learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems(NIPS’94).Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Kung H. T. and Leiserson C. E.. 1980. Introduction to VLSI systems. Algorithms for VLSI Processor Arrays, C. A. Mead and L. Conway (Eds.). Addison-Wesley, Reading, MA, 271292.Google ScholarGoogle Scholar
  80. [80] Kung Hsiang-Tsung. 1982. Why systolic architectures?IEEE Computer 15, 1 (1982), 3746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Lecun Yann, Bottou Leon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov.1998), 22782324. DOI: DOI: https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] LeCun Yann, Denker John S., and Solla Sara A.. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2. 598605.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Li Fengfu, Zhang Bo, and Liu Bin. 2016. Ternary weight networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16).Google ScholarGoogle Scholar
  84. [84] Li Hao, Kadav Asim, Durdanovic Igor, Samet Hanan, and Graf Hans Peter. 2016. Pruning filters for efficient ConvNets. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Poster.Google ScholarGoogle Scholar
  85. [85] Li Lisha, Jamieson Kevin, DeSalvo Giulia, Rostamizadeh Afshin, and Talwalkar Ameet. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 67656816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Li Sharon Y.. 2020. Automating data augmentation: Practice, theory and new direction. SAIL Blog. Retrieved February 1, 2023 from http://ai.stanford.edu/blog/data-augmentation.Google ScholarGoogle Scholar
  87. [87] Liu Chenxi, Zoph Barret, Neumann Maxim, Shlens Jonathon, Hua Wei, Li Li-Jia, Fei-Fei Li, Yuille Alan, Huang Jonathan, and Murphy Kevin. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 1934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Liu Hanxiao, Simonyan Karen, and Yang Yiming. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  89. [89] Liu Zechun, Mu Haoyuan, Zhang Xiangyu, Guo Zichao, Yang Xin, Cheng Kwang-Ting, and Sun Jian. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 32963305.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Ltd. Arm2021. SIMD ISAs \(\vert\) Neon—Arm Developer. Retrieved June 3, 2021 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.Google ScholarGoogle Scholar
  91. [91] Mehta Sachin and Rastegari Mohammad. 2021. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).Google ScholarGoogle Scholar
  92. [92] Menghani Gaurav and Ravi Sujith. 2019. Learning from a teacher using unlabeled data. arXiv preprint arXiv:1911.05275 (2019).Google ScholarGoogle Scholar
  93. [93] Mikolov Tomas, Grave Edouard, Bojanowski Piotr, Puhrsch Christian, and Joulin Armand. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).Google ScholarGoogle Scholar
  94. [94] Mishra Rahul, Gupta Hari Prabhat, and Dutta Tanima. 2020. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954 (2020).Google ScholarGoogle Scholar
  95. [95] MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/training-normal-20.Google ScholarGoogle Scholar
  96. [96] MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/inference-datacenter-20.Google ScholarGoogle Scholar
  97. [97] Močkus Jonas. 1975. On Bayesian methods for seeking the extremum. In Proceedings of the Optimization Techniques IFIP Technical Conference. 400404.Google ScholarGoogle ScholarCross RefCross Ref
  98. [98] Molchanov Pavlo, Tyree Stephen, Karras Tero, Aila Timo, and Kautz Jan. 2017. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440 (2017).Google ScholarGoogle Scholar
  99. [99] MosaicML. 2022. Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer. Retrieved June 29, 2022 from https://www.mosaicml.com/blog/mosaic-resnet.Google ScholarGoogle Scholar
  100. [100] NVIDIA. 2020. GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture. Retrieved June 3, 2021 from https://developer.nvidia.com/gtc/2020/video/s22085-vid.Google ScholarGoogle Scholar
  101. [101] NVIDIA. 2020. Inside Volta: The World’s Most Advanced Data Center GPU \(\vert\) NVIDIA Developer Blog. Retrieved June 3, 2021 from https://developer.nvidia.com/blog/inside-volta.Google ScholarGoogle Scholar
  102. [102] NVIDIA. 2021. NVIDIA Embedded Systems for Next-Gen Autonomous Machines. Retrieved June 4, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems.Google ScholarGoogle Scholar
  103. [103] Panigrahy Rina. 2021. Matrix Compression Operator. Retrieved June 5, 2021 from https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html.Google ScholarGoogle Scholar
  104. [104] PapersWithCode.com. 2021. Papers with Code—The Latest in Machine Learning. Retrieved June 3, 2021 from https://paperswithcode.com.Google ScholarGoogle Scholar
  105. [105] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (NeurIPS’19). 80248035.Google ScholarGoogle Scholar
  106. [106] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Pham Hieu, Guan Melody, Zoph Barret, Le Quoc, and Dean Jeff. 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning. 40954104.Google ScholarGoogle Scholar
  108. [108] Raina Rajat, Madhavan Anand, and Ng Andrew Y.. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873880.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. [109] Ramesh Aditya, Dhariwal Prafulla, Nichol Alex, Chu Casey, and Chen Mark. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).Google ScholarGoogle Scholar
  110. [110] Rastegari Mohammad, Ordonez Vicente, Redmon Joseph, and Farhadi Ali. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525542.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Ravi Sujith. 2019. ProjectionNet: Learning efficient on-device deep networks using neural projections. In Proceedings of the 36th International Conference on Machine Learning.Google ScholarGoogle Scholar
  112. [112] Ravi Sujith and Kozareva Zornitsa. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 887893.Google ScholarGoogle ScholarCross RefCross Ref
  113. [113] Real Esteban, Aggarwal Alok, Huang Yanping, and Le Quoc V.. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 47804789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Research Google. 2021. Fast Sparse ConvNets—GitHub Repository. Retrieved June 3, 2021 from https://github.com/google-research/google-research/tree/master/fastconvnets.Google ScholarGoogle Scholar
  115. [115] Rotem Nadav, Fix Jordan, Abdulrasool Saleem, Catron Garret, Deng Summer, Dzhabarov Roman, Gibson Nick, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google ScholarGoogle Scholar
  116. [116] Saharia Chitwan, Chan William, Saxena Saurabh, Li Lala, Whang Jay, Denton Emily, Ghasemipour Seyed Kamyar Seyed, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).Google ScholarGoogle Scholar
  117. [117] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS’19).Google ScholarGoogle Scholar
  119. [119] Sankar Chinnadhurai, Ravi Sujith, and Kozareva Zornitsa. 2019. Transferable neural projection representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 33553360.Google ScholarGoogle ScholarCross RefCross Ref
  120. [120] Sankar Chinnadhurai, Ravi Sujith, and Kozareva Zornitsa. 2021. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 28232828.Google ScholarGoogle Scholar
  121. [121] Sato Kaz. 2021. What Makes TPUs Fine-Tuned for Deep Learning? \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning.Google ScholarGoogle Scholar
  122. [122] Schrittwieser Julian, Antonoglou Ioannis, Hubert Thomas, Simonyan Karen, Sifre Laurent, Schmitt Simon, Guez Arthur, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604609.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Simard Patrice Y., Steinkraus David, and John C. Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03).Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Simonyan Karen and Zisserman Andrew. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015).Google ScholarGoogle Scholar
  125. [125] Stosic Dusan. 2020. Training Neural Networks with Tensor Cores—Dusan Stosic, NVIDIA. Retrieved June 3, 2021 from https://www.youtube.com/watch?v=jF4-_ZK_tyc.Google ScholarGoogle Scholar
  126. [126] Sun Chen, Shrivastava Abhinav, Singh Saurabh, and Gupta Abhinav. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision. 843852.Google ScholarGoogle ScholarCross RefCross Ref
  127. [127] Sun Zhiqing, Yu Hongkun, Song Xiaodan, Liu Renjie, Yang Yiming, and Zhou Denny. 2020. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 21582170. DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.195.Google ScholarGoogle Scholar
  128. [128] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 31043112.Google ScholarGoogle Scholar
  129. [129] Sze Vivienne, Chen Yu-Hsin, Yang Tien-Ju, and Emer Joel S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 22952329.Google ScholarGoogle ScholarCross RefCross Ref
  130. [130] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Sandler Mark, Howard Andrew, and Le Quoc V.. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 28202828.Google ScholarGoogle ScholarCross RefCross Ref
  132. [132] Tay Yi, Dehghani Mostafa, Bahri Dara, and Metzler Donald. 2023. Efficient transformers: A survey. ACM Computing Surveys 55, 6 (2023), Article 109: 28 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. [133] TensorFlow. 2021. Model Optimization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/model_optimization.Google ScholarGoogle Scholar
  134. [134] Tsang Sik-Ho. 2019. Review: Xception—With depthwise separable convolution, better than Inception-v3 (image classification). Medium. Retrieved February 1, 2023 from https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568.Google ScholarGoogle Scholar
  135. [135] Urban Gregor, Geras Krzysztof J., Kahou Samira Ebrahimi, Aslan Özlem, Wang Shengjie, Mohamed Abdelrahman, Philipose Matthai, Richardson Matthew, and Caruana Rich. 2017. Do deep convolutional nets really need to be deep and convolutional? In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  136. [136] Vanhoucke Vincent, Senior Andrew, and Mao Mark Z.. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).Google ScholarGoogle Scholar
  137. [137] Vasilache Nicolas, Zinenko Oleksandr, Theodoridis Theodoros, Goyal Priya, DeVito Zachary, Moses William S., Verdoolaege Sven, Adams Andrew, and Cohen Albert. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google ScholarGoogle Scholar
  138. [138] Vaswani Ashish, Shazeer Noam M., Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural InformationProcessingSystems (NIPS’17). 1–11.Google ScholarGoogle Scholar
  139. [139] Wang Peisong, Chen Qiang, He Xiangyu, and Cheng Jian. 2020. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the International Conference on Machine Learning. 98479856. http://proceedings.mlr.press/v119/wang20c.html.Google ScholarGoogle Scholar
  140. [140] Wang Shibo and Kanwar Pankaj. 2021. BFloat16: The Secret to High Performance on Cloud TPUs \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.Google ScholarGoogle Scholar
  141. [141] Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, and Keutzer Kurt. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1073410742.Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] Xie Qizhe, Luong Minh-Thang, Hovy Eduard, and Le Quoc V.. 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1068710698.Google ScholarGoogle ScholarCross RefCross Ref
  143. [143] Yalniz I. Zeki, Jégou Hervé, Chen Kan, Paluri Manohar, and Mahajan Dhruv. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).Google ScholarGoogle Scholar
  144. [144] Yu Jiahui, Xu Yuanzhong, Koh Jing Yu, Luong Thang, Baid Gunjan, Wang Zirui, Vasudevan Vijay, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).Google ScholarGoogle Scholar
  145. [145] Yu Tong and Zhu Hong. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).Google ScholarGoogle Scholar
  146. [146] Yu Xiyu, Liu Tongliang, Wang Xinchao, and Tao Dacheng. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 6776. DOI: DOI: https://doi.org/10.1109/CVPR.2017.15Google ScholarGoogle ScholarCross RefCross Ref
  147. [147] Zagoruyko Sergey and Komodakis N.. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). 1–13.Google ScholarGoogle Scholar
  148. [148] Zhang Hongyi, Cissé Moustapha, Dauphin Yann N., and Lopez-Paz David. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18). 1–13.Google ScholarGoogle Scholar
  149. [149] Zhu Michael and Gupta Suyog. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google ScholarGoogle Scholar
  150. [150] Zoph Barret and Le Quoc V.. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google ScholarGoogle Scholar
  151. [151] Zoph Barret, Vasudevan Vijay, Shlens Jonathon, and Le Quoc V.. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 86978710.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 55, Issue 12
          December 2023
          825 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3582891
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 March 2023
          • Online AM: 20 January 2023
          • Accepted: 22 November 2022
          • Revised: 30 June 2022
          • Received: 13 July 2021
          Published in csur Volume 55, Issue 12

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • survey

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format