skip to main content
10.1145/3426182.3426188acmotherconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Transparent acceleration of Java-based deep learning engines

Published:04 November 2020Publication History

ABSTRACT

The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a means to meet this demand, due to the high parallelism that these applications exhibit. Although this approach can yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steep learning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, that can be accessed via wrapper calls from a wide range of managed programming languages (e.g., Java, Python, Scala). Therefore, the development of high-performance neural network architectures is fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise and use a low-level programming language, such as OpenCL, CUDA or High Level Synthesis (HLS) tools.

In this paper we showcase how we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Our work shows how a pure Java-based deep learning neural network engine can be dynamically compiled at runtime and specialized for particular hardware accelerators, without requiring developers to employ any low-level programming framework typically used for such devices. Our preliminary results show up to 6.45x end-to-end performance speedup and up to 88.5x kernel performance speedup, when executing the feed forward process of the network's training on the GPUs against the sequential execution of the original Deep Netts framework.

References

  1. [n.d.]. Java JSR 381 : Visual Recognition (VisRec) Specification. https://jcp.org/ en/jsr/detail?id= 381Google ScholarGoogle Scholar
  2. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AMD. [n.d.]. Aparapi project. https://github.com/aparapi/aparapiGoogle ScholarGoogle Scholar
  4. D. Alex Black, Adam Gibson, and Josh Patterson. 2017. Deeplearning4j. https: //deeplearning4j.org/Google ScholarGoogle Scholar
  5. James Clarkson, Juan Fumero, Michail Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, Christos Kotselidis, and Mikel Luján. 2018. Exploiting HighPerformance Heterogeneous Hardware for Java Programs Using Graal. In Proceedings of the 15th International Conference on Managed Languages and Runtimes (ManLang'18). Association for Computing Machinery, New York, NY, USA, Article 4, 13 pages. https://doi.org/10.1145/3237009.3237016 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Juan Fumero. 2017. Accelerating Interpreted Programming Languages on GPUs with Just-In-Time and Runtime Optimisations. Ph.D. Dissertation. The University of Edinburgh, UK.Google ScholarGoogle Scholar
  7. Juan Fumero, Michail Papadimitriou, Foivos Zakkak, Maria Xekalaki, James Clarkson, and Christos Kotselidis. 2019. Dynamic Application Reconfiguration on Heterogeneous Hardware. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Juan Fumero, Michel Steuwer, Lukas Stadler, and Christophe Dubach. 2017. JustIn-Time GPU Compilation for Interpreted Languages with Partial Evaluation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Xi'an, China) (VEE'17). Association for Computing Machinery, New York, NY, USA, 60-73. https://doi.org/10.1145/3050748.3050761 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Juan José Fumero, Toomas Remmelg, Michel Steuwer, and Christophe Dubach. 2015. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. In Proceedings of the Principles and Practices of Programming on The Java Platform (Melbourne, FL, USA) ( PPPJ'15). Association for Computing Machinery, New York, NY, USA, 16-26. https://doi.org/10.1145/2807426.2807428 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. 2015. Compiling and Optimizing Java 8 Programs for GPU Execution. In International Conference on Parallel Architecture and Compilation (PACT).Google ScholarGoogle Scholar
  11. Christos Kotselidis, James Clarkson, Andrey Rodchenko, Andy Nisbet, John Mawer, and Mikel Luján. 2017. Heterogeneous Managed Runtime Systems: A Computer Vision Case Study. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE âĂŹ17). https: //doi.org/10.1145/3050748.3050764 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Christos Kotselidis, Sotiris Diamantopoulos, Orestis Akrivopoulos, Viktor Rosenfeld, Katerina Doka, Hazeef Mohammed, Georgios Mylonas, Vassilis Spitadakis, and Will Morgan. 2020. Eficient Compilation and Execution of JVM-Based Data Processing Frameworks on Heterogeneous Co-Processors. In Proceedings of the 23rd Conference on Design, Automation and Test in Europe (DATE âĂŹ20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Josh Patterson and Adam Gibson. 2017. Deep Learning: A Practitioner's Approach (1st ed.). O'Reilly Media, Inc.Google ScholarGoogle Scholar
  15. David E Rumelhart, Geofrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.Google ScholarGoogle Scholar
  16. Zoran Sevarac. 2018. Deep Netts Betta. https://deepnetts.com/Google ScholarGoogle Scholar
  17. Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transparent acceleration of Java-based deep learning engines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MPLR '20: Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes
      November 2020
      97 pages
      ISBN:9781450388535
      DOI:10.1145/3426182

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 November 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader