Abstract
Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. Techniques like federated learning and the growing capacity of GPU-accelerated edge devices like NVIDIA Jetson motivate the need for a holistic characterization of DNN training on the edge. Training DNNs is resource-intensive and can stress an edge's GPU, CPU, memory and storage capacities. Edge devices also have different resources compared to workstations and servers, such as slower shared memory and diverse storage media. Here, we perform a principled study of DNN training on individual devices of three contemporary Jetson device types: AGX Xavier, Xavier NX and Nano for three diverse DNN model--dataset combinations. We vary device and training parameters such as I/O pipelining and parallelism, storage media, mini-batch sizes and power modes, and examine their effect on CPU and GPU utilization, fetch stalls, training time, energy usage, and variability. Our analysis exposes several resource inter-dependencies and counter-intuitive insights, while also helping quantify known wisdom. Our rigorous study can help tune the training performance on the edge, trade-off time and energy usage on constrained devices, and even select an ideal edge hardware for a DNN workload, and, in future, extend to federated learning too. As an illustration, we use these results to build a simple model to predict the training time and energy per epoch for any given DNN across different power modes, with minimal additional profiling.
- Hazem A. Abdelhafez, Hassan Halawa, Karthik Pattabiraman, and Matei Ripeanu. 2021. Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In ACM EdgeSys Workshop.Google ScholarDigital Library
- Hazem A. Abdelhafez and Matei Ripeanu. 2019. Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1. In IEEE Intl. Conf. on Fog and Mobile Edge Comp. (FMEC).Google Scholar
- Assemblyai. 2022. TF v/s Pytorch. https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/.Google Scholar
- S. Baller, A. Jindal, M. Chadha, and M. Gerndt. 2021. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In IEEE International Conference on Cloud Engineering.Google Scholar
- Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.Google ScholarDigital Library
- Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020).Google Scholar
- Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konený, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.), Vol. 1. 374--388. https://proceedings.mlsys.org/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdfGoogle Scholar
- Shubham Chandel. 2022. Pytorch Model Summary. https://github.com/sksq96/pytorch-summary.Google Scholar
- Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, and Virginia Smith. 2021. On large-cohort training for federated learning. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674. https://doi.org/10.1109/JPROC.2019.2921977Google ScholarCross Ref
- John Chen, Cameron Wolfe, Zhao Li, and Anastasios Kyrillidis. 2022. Demon: Improved Neural Network Training with Momentum Decay. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3958--3962.Google Scholar
- Qi Chen, Wei Wang, Fangyu Wu, Suparna De, Ruili Wang, Bailing Zhang, and Xin Huang. 2019. A survey on an emerging area: Deep learning for smart city data. IEEE Transactions on Emerging Topics in Computational Intelligence (2019).Google Scholar
- Xiaohan Ding, Guiguang Ding, Jungong Han, and Sheng Tang. 2018. Auto-balanced filter pruning for efficient convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
- Noah Golmant, Nikita Vemuri, Zhewei Yao, Vladimir Feinberg, Amir Gholami, Kai Rothauge, Michael W Mahoney, and Joseph Gonzalez. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).Google Scholar
- Google. 2022. Dev Board datasheet. https://coral.ai/docs/dev-board/datasheet/.Google Scholar
- Google. 2022. Google Coral Products. https://coral.ai/products/.Google Scholar
- Hassan Halawa, Hazem A. Abdelhafez, Andrew Boktor, and Matei Ripeanu. 2017. NVIDIA Jetson Platform Characterization. In Euro-Par 2017: Parallel Processing. Springer International Publishing, Cham, 92--105.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Stephan Holly, Alexander Wendt, and Martin Lechner. 2020. Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. In 2020 11th International Green and Sustainable Computing Workshops (IGSC). 1--6. https: //doi.org/10.1109/IGSC51522.2020.9290876Google ScholarCross Ref
- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.Google ScholarCross Ref
- Intel. 2022. Intel Movidius VPUs. https://www.intel.com/content/www/us/en/products/details/processors/movidiusvpu.html.Google Scholar
- Sumin Kim, Seunghwan Oh, and Youngmin Yi. 2021. Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications (Virtual, United Kingdom) (HotMobile '21). Association for Computing Machinery, New York, NY, USA, 57--63. https://doi.org/10.1145/3446382.3448606Google ScholarDigital Library
- Dimitrios Kollias et al. 2018. Dimitrios Kollias and Athanasios Tagaris and Andreas Stafylopatis and Stefanos Kollias and Georgios Tagaris. Complex & Intelligent Systems (2018).Google Scholar
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Navjot Kukreja, Alena Shilova, Olivier Beaumont, Jan Huckelheim, Nicola Ferrier, Paul Hovland, and Gerard Gorman. 2019. Training on the Edge: The why and the how. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 899--903. https://doi.org/10.1109/IPDPSW.2019.00148Google ScholarCross Ref
- Abhishek Vijaya Kumar and Muthian Sivathanu. 2020. Quiver: An informed storage cache for deep learning. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20).Google Scholar
- Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. 2020. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems (2020).Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324. https://doi.org/10.1109/5.726791Google ScholarCross Ref
- Jie Liu, Jiawen Liu, Wan Du, and Dong Li. 2019. Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). 506--515. https://doi.org/10.1109/ICPADS47876.2019.00077Google ScholarCross Ref
- man page. 2021. iostat. https://man7.org/linux/man-pages/man1/iostat.1.html.Google Scholar
- man pages. 2021. vmtouch. https://linux.die.net/man/8/vmtouch.Google Scholar
- Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).Google Scholar
- Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark, Vol. 2. 336--349. https://proceedings.mlsys.org/ paper/2020/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdfGoogle Scholar
- Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and mitigating data stalls in DNN training. Proceedings of the VLDB Endowment (2021).Google ScholarDigital Library
- Nvidia. 2021. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developerkit.Google Scholar
- Nvidia. 2021. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit.Google Scholar
- Nvidia. 2021. Jetson NX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-xavier-nx.Google Scholar
- Nvidia. 2021. Power modes for Nano. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html#page/ Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_nano.html#.Google Scholar
- Nvidia. 2021. Power modes for NX and AGX. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html# page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_jetson_xavier.html#.Google Scholar
- Nvidia. 2021. Technical Brief: Nvidia Jetson AGX Orin. https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/ jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf.Google Scholar
- Nvidia. 2021. tegrastats. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3231/index.html#page/Tegra% 20Linux%20Driver%20Package%20Development%20Guide/AppendixTegraStats.html.Google Scholar
- Nvidia. 2022. Jetson AGX Orin Developer Kit. https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-orin/.Google Scholar
- papers with code. 2021. Mobilenet V3. https://paperswithcode.com/lib/torchvision/mobilenet-v3.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdfGoogle Scholar
- T Prabhakar, Nisha Bhaskar, Tejas Pande, and Chaitanya Kulkarni. 2014. Joule Jotter: An interactive energy meter for metering, monitoring and control. In International Workshop on Demand Response, co-located with the ACM e-Energy.Google Scholar
- pytorch. 2021. TORCH.UTILS.DATA. https://pytorch.org/docs/stable/data.html.Google Scholar
- PyTorch. 2022. Cuda event. https://pytorch.org/docs/stable/generated/torch.cuda.Event.html.Google Scholar
- Prashanthi S. K, Aakash Khochare, Sai Anuroop Kesanapalli, Rahul Bhope, and Yogesh Simmhan. 2022. Workshop on Parallel AI and Systems for the Edge - PAISE. In 2022 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW).Google Scholar
- Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2018. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600 (2018).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/ARXIV.1409.1556Google ScholarCross Ref
- Vladislav Sovrasov. 2021. Flops counter. https://pypi.org/project/ptflops/.Google Scholar
- TensorFlow. 2022. TFF GLDv2. https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/gldv2/ load_dataGoogle Scholar
- Rik van Riel. 2001. Page Replacement in Linux 2.4 Memory Management. In 2001 USENIX Annual Technical Conference (USENIX ATC 01). USENIX Association, Boston, MA. https://www.usenix.org/conference/2001-usenix-annualtechnical-conference/page-replacement-linux-24-memory-managementGoogle Scholar
- Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking tpu, gpu, and cpu platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google Scholar
- Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In IEEE/CVF conference on computer vision and pattern recognition.Google ScholarCross Ref
- Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. 2019. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE (2019)Google ScholarCross Ref
Index Terms
- Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
Recommendations
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsDeep Neural Network (DNN) models are becoming ubiquitous in a variety of contemporary domains such as Autonomous Vehicles, Smart cities and Healthcare. They help drones to navigate, identify suspicious activities from safety cameras, and perform ...
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
SIGMETRICS '23Deep Neural Network (DNN) models are becoming ubiquitous in a variety of contemporary domains such as Autonomous Vehicles, Smart cities and Healthcare. They help drones to navigate, identify suspicious activities from safety cameras, and perform ...
High performance distributed deep learning: a beginner's guide
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel ProgrammingThe current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe2, TensorFlow, Cognitive Toolkit (CNTK), PyTorch, ...
Comments