skip to main content
research-article

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Published:08 December 2022Publication History
Skip Abstract Section

Abstract

Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. Techniques like federated learning and the growing capacity of GPU-accelerated edge devices like NVIDIA Jetson motivate the need for a holistic characterization of DNN training on the edge. Training DNNs is resource-intensive and can stress an edge's GPU, CPU, memory and storage capacities. Edge devices also have different resources compared to workstations and servers, such as slower shared memory and diverse storage media. Here, we perform a principled study of DNN training on individual devices of three contemporary Jetson device types: AGX Xavier, Xavier NX and Nano for three diverse DNN model--dataset combinations. We vary device and training parameters such as I/O pipelining and parallelism, storage media, mini-batch sizes and power modes, and examine their effect on CPU and GPU utilization, fetch stalls, training time, energy usage, and variability. Our analysis exposes several resource inter-dependencies and counter-intuitive insights, while also helping quantify known wisdom. Our rigorous study can help tune the training performance on the edge, trade-off time and energy usage on constrained devices, and even select an ideal edge hardware for a DNN workload, and, in future, extend to federated learning too. As an illustration, we use these results to build a simple model to predict the training time and energy per epoch for any given DNN across different power modes, with minimal additional profiling.

References

  1. Hazem A. Abdelhafez, Hassan Halawa, Karthik Pattabiraman, and Matei Ripeanu. 2021. Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In ACM EdgeSys Workshop.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hazem A. Abdelhafez and Matei Ripeanu. 2019. Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1. In IEEE Intl. Conf. on Fog and Mobile Edge Comp. (FMEC).Google ScholarGoogle Scholar
  3. Assemblyai. 2022. TF v/s Pytorch. https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/.Google ScholarGoogle Scholar
  4. S. Baller, A. Jindal, M. Chadha, and M. Gerndt. 2021. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In IEEE International Conference on Cloud Engineering.Google ScholarGoogle Scholar
  5. Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020).Google ScholarGoogle Scholar
  7. Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konený, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.), Vol. 1. 374--388. https://proceedings.mlsys.org/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdfGoogle ScholarGoogle Scholar
  8. Shubham Chandel. 2022. Pytorch Model Summary. https://github.com/sksq96/pytorch-summary.Google ScholarGoogle Scholar
  9. Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, and Virginia Smith. 2021. On large-cohort training for federated learning. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  10. Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674. https://doi.org/10.1109/JPROC.2019.2921977Google ScholarGoogle ScholarCross RefCross Ref
  11. John Chen, Cameron Wolfe, Zhao Li, and Anastasios Kyrillidis. 2022. Demon: Improved Neural Network Training with Momentum Decay. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3958--3962.Google ScholarGoogle Scholar
  12. Qi Chen, Wei Wang, Fangyu Wu, Suparna De, Ruili Wang, Bailing Zhang, and Xin Huang. 2019. A survey on an emerging area: Deep learning for smart city data. IEEE Transactions on Emerging Topics in Computational Intelligence (2019).Google ScholarGoogle Scholar
  13. Xiaohan Ding, Guiguang Ding, Jungong Han, and Sheng Tang. 2018. Auto-balanced filter pruning for efficient convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  14. Noah Golmant, Nikita Vemuri, Zhewei Yao, Vladimir Feinberg, Amir Gholami, Kai Rothauge, Michael W Mahoney, and Joseph Gonzalez. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).Google ScholarGoogle Scholar
  15. Google. 2022. Dev Board datasheet. https://coral.ai/docs/dev-board/datasheet/.Google ScholarGoogle Scholar
  16. Google. 2022. Google Coral Products. https://coral.ai/products/.Google ScholarGoogle Scholar
  17. Hassan Halawa, Hazem A. Abdelhafez, Andrew Boktor, and Matei Ripeanu. 2017. NVIDIA Jetson Platform Characterization. In Euro-Par 2017: Parallel Processing. Springer International Publishing, Cham, 92--105.Google ScholarGoogle Scholar
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  19. Stephan Holly, Alexander Wendt, and Martin Lechner. 2020. Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. In 2020 11th International Green and Sustainable Computing Workshops (IGSC). 1--6. https: //doi.org/10.1109/IGSC51522.2020.9290876Google ScholarGoogle ScholarCross RefCross Ref
  20. Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  21. Intel. 2022. Intel Movidius VPUs. https://www.intel.com/content/www/us/en/products/details/processors/movidiusvpu.html.Google ScholarGoogle Scholar
  22. Sumin Kim, Seunghwan Oh, and Youngmin Yi. 2021. Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications (Virtual, United Kingdom) (HotMobile '21). Association for Computing Machinery, New York, NY, USA, 57--63. https://doi.org/10.1145/3446382.3448606Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dimitrios Kollias et al. 2018. Dimitrios Kollias and Athanasios Tagaris and Andreas Stafylopatis and Stefanos Kollias and Georgios Tagaris. Complex & Intelligent Systems (2018).Google ScholarGoogle Scholar
  24. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  25. Navjot Kukreja, Alena Shilova, Olivier Beaumont, Jan Huckelheim, Nicola Ferrier, Paul Hovland, and Gerard Gorman. 2019. Training on the Edge: The why and the how. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 899--903. https://doi.org/10.1109/IPDPSW.2019.00148Google ScholarGoogle ScholarCross RefCross Ref
  26. Abhishek Vijaya Kumar and Muthian Sivathanu. 2020. Quiver: An informed storage cache for deep learning. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20).Google ScholarGoogle Scholar
  27. Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. 2020. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems (2020).Google ScholarGoogle Scholar
  28. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324. https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  29. Jie Liu, Jiawen Liu, Wan Du, and Dong Li. 2019. Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). 506--515. https://doi.org/10.1109/ICPADS47876.2019.00077Google ScholarGoogle ScholarCross RefCross Ref
  30. man page. 2021. iostat. https://man7.org/linux/man-pages/man1/iostat.1.html.Google ScholarGoogle Scholar
  31. man pages. 2021. vmtouch. https://linux.die.net/man/8/vmtouch.Google ScholarGoogle Scholar
  32. Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).Google ScholarGoogle Scholar
  33. Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark, Vol. 2. 336--349. https://proceedings.mlsys.org/ paper/2020/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdfGoogle ScholarGoogle Scholar
  34. Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and mitigating data stalls in DNN training. Proceedings of the VLDB Endowment (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nvidia. 2021. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developerkit.Google ScholarGoogle Scholar
  36. Nvidia. 2021. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit.Google ScholarGoogle Scholar
  37. Nvidia. 2021. Jetson NX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-xavier-nx.Google ScholarGoogle Scholar
  38. Nvidia. 2021. Power modes for Nano. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html#page/ Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_nano.html#.Google ScholarGoogle Scholar
  39. Nvidia. 2021. Power modes for NX and AGX. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html# page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_jetson_xavier.html#.Google ScholarGoogle Scholar
  40. Nvidia. 2021. Technical Brief: Nvidia Jetson AGX Orin. https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/ jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf.Google ScholarGoogle Scholar
  41. Nvidia. 2021. tegrastats. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3231/index.html#page/Tegra% 20Linux%20Driver%20Package%20Development%20Guide/AppendixTegraStats.html.Google ScholarGoogle Scholar
  42. Nvidia. 2022. Jetson AGX Orin Developer Kit. https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-orin/.Google ScholarGoogle Scholar
  43. papers with code. 2021. Mobilenet V3. https://paperswithcode.com/lib/torchvision/mobilenet-v3.Google ScholarGoogle Scholar
  44. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdfGoogle ScholarGoogle Scholar
  45. T Prabhakar, Nisha Bhaskar, Tejas Pande, and Chaitanya Kulkarni. 2014. Joule Jotter: An interactive energy meter for metering, monitoring and control. In International Workshop on Demand Response, co-located with the ACM e-Energy.Google ScholarGoogle Scholar
  46. pytorch. 2021. TORCH.UTILS.DATA. https://pytorch.org/docs/stable/data.html.Google ScholarGoogle Scholar
  47. PyTorch. 2022. Cuda event. https://pytorch.org/docs/stable/generated/torch.cuda.Event.html.Google ScholarGoogle Scholar
  48. Prashanthi S. K, Aakash Khochare, Sai Anuroop Kesanapalli, Rahul Bhope, and Yogesh Simmhan. 2022. Workshop on Parallel AI and Systems for the Edge - PAISE. In 2022 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW).Google ScholarGoogle Scholar
  49. Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2018. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600 (2018).Google ScholarGoogle Scholar
  50. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/ARXIV.1409.1556Google ScholarGoogle ScholarCross RefCross Ref
  51. Vladislav Sovrasov. 2021. Flops counter. https://pypi.org/project/ptflops/.Google ScholarGoogle Scholar
  52. TensorFlow. 2022. TFF GLDv2. https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/gldv2/ load_dataGoogle ScholarGoogle Scholar
  53. Rik van Riel. 2001. Page Replacement in Linux 2.4 Memory Management. In 2001 USENIX Annual Technical Conference (USENIX ATC 01). USENIX Association, Boston, MA. https://www.usenix.org/conference/2001-usenix-annualtechnical-conference/page-replacement-linux-24-memory-managementGoogle ScholarGoogle Scholar
  54. Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking tpu, gpu, and cpu platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).Google ScholarGoogle Scholar
  55. Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In IEEE/CVF conference on computer vision and pattern recognition.Google ScholarGoogle ScholarCross RefCross Ref
  56. Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. 2019. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE (2019)Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
            Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 3
            POMACS
            December 2022
            534 pages
            EISSN:2476-1249
            DOI:10.1145/3576048
            Issue’s Table of Contents

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 December 2022
            Published in pomacs Volume 6, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader