research-article

Transparent acceleration of Java-based deep learning engines

Authors:
Athanasios Stratikopoulos

University of Manchester, UK

University of Manchester, UK
View Profile

,
Mihai-Cristian Olteanu

University of Manchester, UK

University of Manchester, UK
View Profile

,
Ian Vaughan

University of Manchester, UK

University of Manchester, UK
View Profile

,
Zoran Sevarac

Deep Netts, USA

Deep Netts, USA
View Profile

,
Nikos Foutris

University of Manchester, UK

University of Manchester, UK
View Profile

,
Juan Fumero

University of Manchester, UK

University of Manchester, UK
View Profile

,
Christos Kotselidis

University of Manchester, UK

University of Manchester, UK
View Profile

MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and RuntimesNovember 2020Pages 73–79https://doi.org/10.1145/3426182.3426188

Published:04 November 2020Publication History

MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes

Pages 73–79

ABSTRACT

The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a means to meet this demand, due to the high parallelism that these applications exhibit. Although this approach can yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steep learning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, that can be accessed via wrapper calls from a wide range of managed programming languages (e.g., Java, Python, Scala). Therefore, the development of high-performance neural network architectures is fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise and use a low-level programming language, such as OpenCL, CUDA or High Level Synthesis (HLS) tools.

In this paper we showcase how we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Our work shows how a pure Java-based deep learning neural network engine can be dynamically compiled at runtime and specialized for particular hardware accelerators, without requiring developers to employ any low-level programming framework typically used for such devices. Our preliminary results show up to 6.45x end-to-end performance speedup and up to 88.5x kernel performance speedup, when executing the feed forward process of the network's training on the GPUs against the sequential execution of the original Deep Netts framework.

References

[n.d.]. Java JSR 381 : Visual Recognition (VisRec) Specification. https://jcp.org/ en/jsr/detail?id= 381Google Scholar
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).Google ScholarDigital Library
AMD. [n.d.]. Aparapi project. https://github.com/aparapi/aparapiGoogle Scholar
D. Alex Black, Adam Gibson, and Josh Patterson. 2017. Deeplearning4j. https: //deeplearning4j.org/Google Scholar
James Clarkson, Juan Fumero, Michail Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, Christos Kotselidis, and Mikel Luján. 2018. Exploiting HighPerformance Heterogeneous Hardware for Java Programs Using Graal. In Proceedings of the 15th International Conference on Managed Languages and Runtimes (ManLang'18). Association for Computing Machinery, New York, NY, USA, Article 4, 13 pages. https://doi.org/10.1145/3237009.3237016 Google ScholarDigital Library
Juan Fumero. 2017. Accelerating Interpreted Programming Languages on GPUs with Just-In-Time and Runtime Optimisations. Ph.D. Dissertation. The University of Edinburgh, UK.Google Scholar
Juan Fumero, Michail Papadimitriou, Foivos Zakkak, Maria Xekalaki, James Clarkson, and Christos Kotselidis. 2019. Dynamic Application Reconfiguration on Heterogeneous Hardware. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments.Google ScholarDigital Library
Juan Fumero, Michel Steuwer, Lukas Stadler, and Christophe Dubach. 2017. JustIn-Time GPU Compilation for Interpreted Languages with Partial Evaluation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Xi'an, China) (VEE'17). Association for Computing Machinery, New York, NY, USA, 60-73. https://doi.org/10.1145/3050748.3050761 Google ScholarDigital Library
Juan José Fumero, Toomas Remmelg, Michel Steuwer, and Christophe Dubach. 2015. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. In Proceedings of the Principles and Practices of Programming on The Java Platform (Melbourne, FL, USA) ( PPPJ'15). Association for Computing Machinery, New York, NY, USA, 16-26. https://doi.org/10.1145/2807426.2807428 Google ScholarDigital Library
K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. 2015. Compiling and Optimizing Java 8 Programs for GPU Execution. In International Conference on Parallel Architecture and Compilation (PACT).Google Scholar
Christos Kotselidis, James Clarkson, Andrey Rodchenko, Andy Nisbet, John Mawer, and Mikel Luján. 2017. Heterogeneous Managed Runtime Systems: A Computer Vision Case Study. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE âĂŹ17). https: //doi.org/10.1145/3050748.3050764 Google ScholarDigital Library
Christos Kotselidis, Sotiris Diamantopoulos, Orestis Akrivopoulos, Viktor Rosenfeld, Katerina Doka, Hazeef Mohammed, Georgios Mylonas, Vassilis Spitadakis, and Will Morgan. 2020. Eficient Compilation and Execution of JVM-Based Data Processing Frameworks on Heterogeneous Co-Processors. In Proceedings of the 23rd Conference on Design, Automation and Test in Europe (DATE âĂŹ20).Google ScholarDigital Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc.Google ScholarDigital Library
Josh Patterson and Adam Gibson. 2017. Deep Learning: A Practitioner's Approach (1st ed.). O'Reilly Media, Inc.Google Scholar
David E Rumelhart, Geofrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.Google Scholar
Zoran Sevarac. 2018. Deep Netts Betta. https://deepnetts.com/Google Scholar
Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units.Google ScholarDigital Library

Index Terms

Transparent acceleration of Java-based deep learning engines
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques

Recommendations

Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

This article aims to explore JNI features and to discover fundamental operations of the Java programming language, such as arrays, objects, classes, threads and exception handling, and to illustrate these by using various algorithms and code samples. ...
Read More
Accelerating Habanero-Java programs with OpenCL generation
PPPJ '13: Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools

The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are ...
Read More
Hardware Acceleration with Multi-Threading of Java-Based High Level Synthesis Tool
HEART '17: Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

In this research, we attempt to speed up the computational fluid dynamics (CFD) and the convolutional neural network (CNN) using JavaRock-Thrash thread function of the high-level synthesis tool with an FPGA. In the two-dimensional heat equation, by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes
November 2020
97 pages
ISBN:9781450388535
DOI:10.1145/3426182
Program Chair:
Stefan Marr
University of Kent, UK
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep Learning
Hardware Acceleration
Java
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 89
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transparent acceleration of Java-based deep learning engines

MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

Accelerating Habanero-Java programs with OpenCL generation

Hardware Acceleration with Multi-Threading of Java-Based High Level Synthesis Tool

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Transparent acceleration of Java-based deep learning engines

MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

Accelerating Habanero-Java programs with OpenCL generation

Hardware Acceleration with Multi-Threading of Java-Based High Level Synthesis Tool

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media