Abstract
Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.
Supplemental Material
Available for Download
Supplementary material
- [1] . 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved February 1, 2023 from https://www.tensorflow.org/.
(Software available from tensorflow.org.) Google Scholar - [2] . 2020. Exploring Bayesian optimization. Distill 5, 5 (2020), e26.Google ScholarCross Ref
- [3] . 2022. A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55, 2 (
Feb. 2022), 829–894. Google ScholarDigital Library - [4] . 2021. Neural Networks API \(\vert\) Android NDK \(\vert\) Android Developers. Retrieved June 3, 2021 from https://developer.android.com/ndk/guides/neuralnetworks.Google Scholar
- [5] . 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1–18.Google ScholarDigital Library
- [6] . 2021. Accelerate \(\vert\) Apple Developer Documentation. Retrieved June 3, 2021 from https://developer.apple.com/documentation/accelerate.Google Scholar
- [7] . 2021. Performance Tuning Guide—PyTorch Tutorials 1.8.1+cu102 Documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html.Google Scholar
- [8] . 2021. PyTorch Mobile. Retrieved June 3, 2021 from https://pytorch.org/mobile/home.Google Scholar
- [9] . 2021. Quantization Recipe—PyTorch Tutorials 1.8.1+cu102 documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/quantization.html.Google Scholar
- [10] . 2021. torch.jit.script—PyTorch 1.8.1 Documentation. Retrieved June 3, 2021 from https://pytorch.org/docs/stable/generated/torch.jit.script.html.Google Scholar
- [11] . 2021. Post-Training Quantization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/post_training_quantization.Google Scholar
- [12] . 2021. TensorFlow Lite Converter. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/convert.Google Scholar
- [13] . 2021. TensorFlow Lite \(\vert\) ML for Mobile and Edge Devices. Retrieved June 3, 2021 from https://www.tensorflow.org/lite.Google Scholar
- [14] . 2021. XLA: Optimizing Compiler for Machine Learning \(\vert\) TensorFlow. Retrieved June 3, 2021 from https://www.tensorflow.org/xla.Google Scholar
- [15] . 2021. XNNPACK. Retrieved June 3, 2021 from https://github.com/google/XNNPACK.Google Scholar
- [16] . 2021. XNNPACK Backend for TensorFlow Lite. Retrieved June 3, 2021 from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md/#sparse-inference.Google Scholar
- [17] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google Scholar
- [18] . 2012. Random search for hyper-parameter optimization.Journal of Machine Learning Research 13, 2 (2012), 281–305.Google Scholar
- [19] . 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.Google ScholarDigital Library
- [20] . 2017. Quasi-recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google Scholar
- [21] . 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS’20).Google Scholar
- [22] . 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535–541.Google ScholarDigital Library
- [23] . 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 380–388.Google ScholarDigital Library
- [24] . 2021. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), Article 53, 32 pages.
DOI: DOI: https://doi.org/10.1145/3465055Google ScholarDigital Library - [25] . 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarCross Ref
- [26] . 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136.
DOI: DOI: https://doi.org/10.1109/MSP.2017.2765695Google ScholarCross Ref - [27] . 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.Google ScholarCross Ref
- [28] . 2020. The Keras Blog. Retrieved June 4, 2021 from https://blog.keras.io.Google Scholar
- [29] . 2022. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).Google Scholar
- [30] . 2011. High-performance neural networks for visual object classification. CoRR abs/1102.0183 (2011). http://arxiv.org/abs/1102.0183.Google Scholar
- [31] . 2021. AVX-512—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1025044245.Google Scholar
- [32] . 2021. CUDA—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=CUDA&oldid=1025500257.Google Scholar
- [33] . 2021. Hyperparameter Optimization—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Hyperparameter_optimization&oldid=1022309479.Google Scholar
- [34] . 2021. Multiply-Accumulate Operation—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Multiply-accumulate_operation&oldid=1026461481.Google Scholar
- [35] . 2021. SSE\(^{4}\)—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=SSE4&oldid=1023092035.Google Scholar
- [36] . 2020. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702–703.Google ScholarCross Ref
- [37] . 2022. The efficiency misnomer. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [38] . 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).Google Scholar
- [39] . 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).Google Scholar
- [40] . 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.Google ScholarDigital Library
- [41] . 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.Google ScholarDigital Library
- [42] . 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Google Scholar
- [43] . 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR ’21). https://openreview.net/forum?id=YicbFdNTTy.Google Scholar
- [44] . 2020. QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering. Retrieved June 3, 2021 from https://engineering.fb.com/2018/10/29/ml-applications/qnnpack.Google Scholar
- [45] . 2020. Fast sparse ConvNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14629–14638.Google ScholarCross Ref
- [46] . 2019. Neural architecture search: A survey.Journal of Machine Learning Research 20, 55 (2019), 1–21.Google Scholar
- [47] . 2020. Rigging the lottery: Making all tickets winners. In Proceedings of the International Conference on Machine Learning. 2943–2952.Google Scholar
- [48] . 2016. Adaptive data augmentation for image classification. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, Los Alamitos, CA, 3688–3692.Google ScholarCross Ref
- [49] . 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google Scholar
- [50] . 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487–1495.Google ScholarDigital Library
- [51] . 2021. Edge TPU Performance Benchmarks \(\vert\) Coral. Retrieved June 3, 2021 from https://coral.ai/docs/edgetpu/benchmarks.Google Scholar
- [52] . 2021. Neural structured learning: Training neural networks with structured signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1150–1153.Google ScholarDigital Library
- [53] . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR ’16).Google Scholar
- [54] . 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 1135–1143.Google Scholar
- [55] . 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993–1001.Google ScholarDigital Library
- [56] . 1993. Optimal brain surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks. IEEE, Los Alamitos, CA, 293–299.Google ScholarCross Ref
- [57] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [58] . 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–800.Google ScholarDigital Library
- [59] . 2022. Estimating xn–PaLM-kd53c’s training cost. Blog.heim. Retrieved February 1, 2023 from https://blog.heim.xyz/palm-training-cost.Google Scholar
- [60] . 2014. Distilling the knowledge in a neural network. In Proceedings of the NeurIPS 2014 Deep Learning Workshop.Google Scholar
- [61] . 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google Scholar
- [62] . 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.Google ScholarCross Ref
- [63] . 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers). 328–339.Google Scholar
- [64] . 2018. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332 (2018).Google Scholar
- [65] . 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869–6898.Google ScholarDigital Library
- [66] . 2018. Data augmentation by pairing samples for images classification. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).Google Scholar
- [67] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google ScholarCross Ref
- [68] . 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).Google Scholar
- [69] . 2016. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 240–248.Google Scholar
- [70] . 2020. Setting the learning rate of your neural network.Jeremy Jordan. Retrieved February 1, 2023 from https://www.jeremyjordan.me/nn-learning-rate.Google Scholar
- [71] . 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.Google ScholarDigital Library
- [72] . 2019. PRADO: Projection attention networks for document classification on-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5012–5021.Google ScholarCross Ref
- [73] . 2021. Distilling large language models into tiny and effective students using pQRNN. arXiv preprint arXiv:2101.08890 (2021).Google Scholar
- [74] . 2021. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud. Retrieved June 3, 2021 from https://blog.tensorflow.org/2020/07/tensorflow-2-mlperf-submissions.html.Google Scholar
- [75] . 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018). https://arxiv.org/abs/1806.08342v1.Google Scholar
- [76] Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
Technical Report . University of Toronto.Google Scholar - [77] . 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097–1105.Google Scholar
- [78] . 1994. Neural network ensembles, cross validation and active learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems(NIPS’94).Google ScholarDigital Library
- [79] . 1980. Introduction to VLSI systems. Algorithms for VLSI Processor Arrays, C. A. Mead and L. Conway (Eds.). Addison-Wesley, Reading, MA, 271–292.Google Scholar
- [80] . 1982. Why systolic architectures?IEEE Computer 15, 1 (1982), 37–46.Google ScholarDigital Library
- [81] . 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (
Nov. 1998), 2278–2324.DOI: DOI: https://doi.org/10.1109/5.726791Google ScholarCross Ref - [82] . 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2. 598–605.Google ScholarDigital Library
- [83] . 2016. Ternary weight networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16).Google Scholar
- [84] . 2016. Pruning filters for efficient ConvNets. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Poster.Google Scholar
- [85] . 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.Google ScholarDigital Library
- [86] . 2020. Automating data augmentation: Practice, theory and new direction. SAIL Blog. Retrieved February 1, 2023 from http://ai.stanford.edu/blog/data-augmentation.Google Scholar
- [87] . 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19–34.Google ScholarDigital Library
- [88] . 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google Scholar
- [89] . 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3296–3305.Google ScholarCross Ref
- [90] 2021. SIMD ISAs \(\vert\) Neon—Arm Developer. Retrieved June 3, 2021 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.Google Scholar
- [91] . 2021. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).Google Scholar
- [92] . 2019. Learning from a teacher using unlabeled data. arXiv preprint arXiv:1911.05275 (2019).Google Scholar
- [93] . 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).Google Scholar
- [94] . 2020. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954 (2020).Google Scholar
- [95] . 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/training-normal-20.Google Scholar
- [96] . 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/inference-datacenter-20.Google Scholar
- [97] . 1975. On Bayesian methods for seeking the extremum. In Proceedings of the Optimization Techniques IFIP Technical Conference. 400–404.Google ScholarCross Ref
- [98] . 2017. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440 (2017).Google Scholar
- [99] . 2022. Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer. Retrieved June 29, 2022 from https://www.mosaicml.com/blog/mosaic-resnet.Google Scholar
- [100] . 2020. GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture. Retrieved June 3, 2021 from https://developer.nvidia.com/gtc/2020/video/s22085-vid.Google Scholar
- [101] . 2020. Inside Volta: The World’s Most Advanced Data Center GPU \(\vert\) NVIDIA Developer Blog. Retrieved June 3, 2021 from https://developer.nvidia.com/blog/inside-volta.Google Scholar
- [102] . 2021. NVIDIA Embedded Systems for Next-Gen Autonomous Machines. Retrieved June 4, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems.Google Scholar
- [103] . 2021. Matrix Compression Operator. Retrieved June 5, 2021 from https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html.Google Scholar
- [104] . 2021. Papers with Code—The Latest in Machine Learning. Retrieved June 3, 2021 from https://paperswithcode.com.Google Scholar
- [105] . 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (NeurIPS’19). 8024–8035.Google Scholar
- [106] . 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.Google ScholarCross Ref
- [107] . 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning. 4095–4104.Google Scholar
- [108] . 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.Google ScholarDigital Library
- [109] . 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).Google Scholar
- [110] . 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525–542.Google ScholarCross Ref
- [111] . 2019. ProjectionNet: Learning efficient on-device deep networks using neural projections. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar
- [112] . 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 887–893.Google ScholarCross Ref
- [113] . 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.Google ScholarDigital Library
- [114] . 2021. Fast Sparse ConvNets—GitHub Repository. Retrieved June 3, 2021 from https://github.com/google-research/google-research/tree/master/fastconvnets.Google Scholar
- [115] . 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
- [116] . 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).Google Scholar
- [117] . 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.Google ScholarCross Ref
- [118] . 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS’19).Google Scholar
- [119] . 2019. Transferable neural projection representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 3355–3360.Google ScholarCross Ref
- [120] . 2021. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2823–2828.Google Scholar
- [121] . 2021. What Makes TPUs Fine-Tuned for Deep Learning? \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning.Google Scholar
- [122] . 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.Google ScholarCross Ref
- [123] . 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03).Google ScholarCross Ref
- [124] . 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015).Google Scholar
- [125] . 2020. Training Neural Networks with Tensor Cores—Dusan Stosic, NVIDIA. Retrieved June 3, 2021 from https://www.youtube.com/watch?v=jF4-_ZK_tyc.Google Scholar
- [126] . 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision. 843–852.Google ScholarCross Ref
- [127] . 2020. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2158–2170.
DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.195.Google Scholar - [128] . 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 3104–3112.Google Scholar
- [129] . 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
- [130] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
- [131] . 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarCross Ref
- [132] . 2023. Efficient transformers: A survey. ACM Computing Surveys 55, 6 (2023), Article 109: 28 pages.Google ScholarDigital Library
- [133] . 2021. Model Optimization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/model_optimization.Google Scholar
- [134] . 2019. Review: Xception—With depthwise separable convolution, better than Inception-v3 (image classification). Medium. Retrieved February 1, 2023 from https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568.Google Scholar
- [135] . 2017. Do deep convolutional nets really need to be deep and convolutional? In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).Google Scholar
- [136] . 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).Google Scholar
- [137] . 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google Scholar
- [138] . 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural InformationProcessingSystems (NIPS’17). 1–11.Google Scholar
- [139] . 2020. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the International Conference on Machine Learning. 9847–9856. http://proceedings.mlr.press/v119/wang20c.html.Google Scholar
- [140] . 2021. BFloat16: The Secret to High Performance on Cloud TPUs \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.Google Scholar
- [141] . 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742.Google ScholarCross Ref
- [142] . 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687–10698.Google ScholarCross Ref
- [143] . 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).Google Scholar
- [144] . 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).Google Scholar
- [145] . 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).Google Scholar
- [146] . 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 67–76.
DOI: DOI: https://doi.org/10.1109/CVPR.2017.15Google ScholarCross Ref - [147] . 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). 1–13.Google Scholar
- [148] . 2018. mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18). 1–13.Google Scholar
- [149] . 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google Scholar
- [150] . 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).Google Scholar
- [151] . 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.Google ScholarCross Ref
Index Terms
- Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Recommendations
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information ProcessingAbstractAs the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Deep learning: systematic review, models, challenges, and research directions
AbstractThe current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Comments