ABSTRACT
The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a means to meet this demand, due to the high parallelism that these applications exhibit. Although this approach can yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steep learning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, that can be accessed via wrapper calls from a wide range of managed programming languages (e.g., Java, Python, Scala). Therefore, the development of high-performance neural network architectures is fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise and use a low-level programming language, such as OpenCL, CUDA or High Level Synthesis (HLS) tools.
In this paper we showcase how we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Our work shows how a pure Java-based deep learning neural network engine can be dynamically compiled at runtime and specialized for particular hardware accelerators, without requiring developers to employ any low-level programming framework typically used for such devices. Our preliminary results show up to 6.45x end-to-end performance speedup and up to 88.5x kernel performance speedup, when executing the feed forward process of the network's training on the GPUs against the sequential execution of the original Deep Netts framework.
- [n.d.]. Java JSR 381 : Visual Recognition (VisRec) Specification. https://jcp.org/ en/jsr/detail?id= 381Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).Google ScholarDigital Library
- AMD. [n.d.]. Aparapi project. https://github.com/aparapi/aparapiGoogle Scholar
- D. Alex Black, Adam Gibson, and Josh Patterson. 2017. Deeplearning4j. https: //deeplearning4j.org/Google Scholar
- James Clarkson, Juan Fumero, Michail Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, Christos Kotselidis, and Mikel Luján. 2018. Exploiting HighPerformance Heterogeneous Hardware for Java Programs Using Graal. In Proceedings of the 15th International Conference on Managed Languages and Runtimes (ManLang'18). Association for Computing Machinery, New York, NY, USA, Article 4, 13 pages. https://doi.org/10.1145/3237009.3237016 Google ScholarDigital Library
- Juan Fumero. 2017. Accelerating Interpreted Programming Languages on GPUs with Just-In-Time and Runtime Optimisations. Ph.D. Dissertation. The University of Edinburgh, UK.Google Scholar
- Juan Fumero, Michail Papadimitriou, Foivos Zakkak, Maria Xekalaki, James Clarkson, and Christos Kotselidis. 2019. Dynamic Application Reconfiguration on Heterogeneous Hardware. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments.Google ScholarDigital Library
- Juan Fumero, Michel Steuwer, Lukas Stadler, and Christophe Dubach. 2017. JustIn-Time GPU Compilation for Interpreted Languages with Partial Evaluation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Xi'an, China) (VEE'17). Association for Computing Machinery, New York, NY, USA, 60-73. https://doi.org/10.1145/3050748.3050761 Google ScholarDigital Library
- Juan José Fumero, Toomas Remmelg, Michel Steuwer, and Christophe Dubach. 2015. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. In Proceedings of the Principles and Practices of Programming on The Java Platform (Melbourne, FL, USA) ( PPPJ'15). Association for Computing Machinery, New York, NY, USA, 16-26. https://doi.org/10.1145/2807426.2807428 Google ScholarDigital Library
- K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. 2015. Compiling and Optimizing Java 8 Programs for GPU Execution. In International Conference on Parallel Architecture and Compilation (PACT).Google Scholar
- Christos Kotselidis, James Clarkson, Andrey Rodchenko, Andy Nisbet, John Mawer, and Mikel Luján. 2017. Heterogeneous Managed Runtime Systems: A Computer Vision Case Study. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE âĂŹ17). https: //doi.org/10.1145/3050748.3050764 Google ScholarDigital Library
- Christos Kotselidis, Sotiris Diamantopoulos, Orestis Akrivopoulos, Viktor Rosenfeld, Katerina Doka, Hazeef Mohammed, Georgios Mylonas, Vassilis Spitadakis, and Will Morgan. 2020. Eficient Compilation and Execution of JVM-Based Data Processing Frameworks on Heterogeneous Co-Processors. In Proceedings of the 23rd Conference on Design, Automation and Test in Europe (DATE âĂŹ20).Google ScholarDigital Library
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc.Google ScholarDigital Library
- Josh Patterson and Adam Gibson. 2017. Deep Learning: A Practitioner's Approach (1st ed.). O'Reilly Media, Inc.Google Scholar
- David E Rumelhart, Geofrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.Google Scholar
- Zoran Sevarac. 2018. Deep Netts Betta. https://deepnetts.com/Google Scholar
- Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units.Google ScholarDigital Library
Index Terms
- Transparent acceleration of Java-based deep learning engines
Recommendations
Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine
This article aims to explore JNI features and to discover fundamental operations of the Java programming language, such as arrays, objects, classes, threads and exception handling, and to illustrate these by using various algorithms and code samples. ...
Accelerating Habanero-Java programs with OpenCL generation
PPPJ '13: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and ToolsThe initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are ...
Hardware Acceleration with Multi-Threading of Java-Based High Level Synthesis Tool
HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable TechnologiesIn this research, we attempt to speed up the computational fluid dynamics (CFD) and the convolutional neural network (CNN) using JavaRock-Thrash thread function of the high-level synthesis tool with an FPGA. In the two-dimensional heat equation, by ...
Comments