Sm@rtConfig: A context-aware runtime and tuning system using an aspect-oriented approach for data intensive engineering applications☆
Introduction
High performance platforms are commonly required for scientific and engineering algorithms to deal appropriately with timing constraints. Desktop-based co-processors, such as many cores Graphics Processing Units (GPUs), have become a cost effective alternative as execution platform to improve performance. As an example, Nvidia has presented its GPU GTX285 that provides a peak performance of 1062 Gflop/s for single precision and 89 Gflop/s for double precision float operations (Nvidia, 2010).
As a consequence, heterogeneous platforms with several types of PUs act in essence as powerful asymmetric multi-core clusters and can handle multiple applications and tasks. This is even intensified with the multi-core CPUs, like the Intel Core2Quad that provides around 100 Gflop/s (Intel, 2010). Therefore, efficiently using all available resources from the PUs is a significant challenge to program applications.
Another challenge is the design of applications that use such a heterogeneous platform. On one hand, distinct PUs may require a specific programming technology, which, on the other hand, may not be supported by all available PUs. To transform the same application's source code into binary code for multiple PUs, an experienced specialist may be required in addition to have development tools available. Virtualization is an example of technique to solve the compatibility problem by adding a layer (i.e., a virtual machine) between the application binary code and the real PU, e.g., Java Virtual Machine. However, this solution leads to a performance penalty, which sometimes cannot be accepted by the target application due to its constraints. Such a situation demands new techniques to raise the design's abstraction level of such applications.
Using high-level representations of the application's structure and behavior allow the refinement of the application's requirements up to achieving its native implementation in the technology(ies) supported by different target PUs. Model-Driven Engineering (MDE) approaches advocate that designers shall use models instead of source code as the main artifact of the design. The system implementation is generated automatically from these models. In this sense, Aspect-Oriented Model-Driven Engineering for Real-Time (AMoDE-RT) systems (Wehrmeister, Freitas, Wagner, & Pereira, 2007) is a MDE approach that uses the Unified Modeling Language (UML),1 along with its MARTE profile2 for specifying real-time systems and concepts of the Aspect Oriented Software Development (AOSD) approach. Modularization and reuse are the main goals of AMoDE-RT approach, since implementations for distinct execution platforms can be obtained from the same application's UML model.
This work extends the AMoDE-RT, more specifically, its aspects framework, by including a modular and reusable runtime system for dynamic scheduling and tuning. In order to take full advantage of the available computing power, a strategy to distribute the application tasks over the available PUs is important. The strategy lies on dynamic scheduling, instead of static scheduling used by OpenCL (Khronos, 2010) or, more specifically, by CUDA (Nvidia, 2010) for Nvidia GPUs (see also Göddeke et al., 2009). This need becomes even more essential when dealing with desktop applications with timing constraints, like the real-time 3D Computational Fluid Dynamics (CFD) systems used as case study and which is applied in several complex engineering applications, such as design of modern cars or airplanes.
The task scheduling problem is considered NP-complete (Garey & Johnson, 1990) and several heuristics have been developed to better meet a good scheduling with little overhead, like, for example, the distinct approaches used by Topcuoglu, Society, you Wu, and Member (2002), Ahmadinia, Bobda, Koch, Majer, and Teich (2004), and Götz and Dittmann (2006) for heterogeneous PUs. However, just very recently, some techniques are starting to be directly applied to platforms containing CPU (eventually multi-core) and multiple GPUs. This paper additionally presents a new strategy to distribute the workload over the CPU and the GPU, being sufficient generic to consider other PUs coupled in a desktop. The dynamic scheduling method is oriented for a set of high-level tasks, like algorithms. It combines a first assignment phase – based on a pre-processing benchmark for acquiring initial tasks performance samples – with a runtime phase that obtains real performance measurements of tasks and feeds a performance database. This way, after the first assignment, the system considers the history presented on the database to perform further assignments for every task, maximizing the applications' performance with little overhead.
In this work, three iterative solvers for Systems of Linear Equations (SLEs) – Jacobi, Red-Black Gauss-Seidel, and Conjugate Gradient – are used by the CFD application and represent the high-level tasks for the scheduling strategy. The solvers have different implementations for the CPU and the GPU (using shared memory and with memory coalescing), as presented in previous work (Binotto et al., 2010).
It is important to mention that, although the GPU is more powerful to deal with those kind of data-intensive tasks, there are many scenarios in which the CPU provides better performance, e.g., when working with multiple applications and tasks with different problem size domains (based on the amount of data to be processed, not known before application execution). The paper presents an example of such a scenario. In a CFD application, a gain of 21.77% in comparison to the static assignment of all tasks to the GPU is achieved, while the scheduling error remains negligible.
The main contribution of this paper is the Sm@rtConfig framework, which provides a dynamic scheduling method for a desktop CPU–GPU platform that is composed of:
- (i)
a first assignment phase,
- (ii)
a runtime profiler that feeds a timing performance database, and
- (iii)
the runtime assignment phase that performs assignments based on the performance history stored on the database.
The paper is structured as follows: Section 2 presents the design approach based on aspect orientation used for the application specification and implementation phase. Following, Section 3 describes the new scheduling strategies that implement the aspects for one CPU and one GPU and its generalization for multiple PUs. Section 4 presents the real-time computing CFD applications used as case study. Its requirements and specification are discussed along with the obtained experimental results based on a performance analysis over the CPU–GPU platform. The related work focused on both design methods and the scheduling on distributed platforms using the GPU is then presented in Section 5. Finally, conclusions and future research are addressed in Section 6.
Section snippets
Overview
Aspect-Oriented Model-Driven Engineering for Real-Time systems – AMoDE-RT (Wehrmeister et al., 2007) – allows a smooth transition from initial specification phases to implementation phases of the design of real-time systems. Using MDE techniques combined with AOSD concepts, AMoDE-RT increases the abstraction level during design to address the increasing complexity of real-time systems. Fig. 1 shows an overview of AMoDE-RT.
The first step in AMoDE-RT is gathering requirements and constraints of
Runtime scheduling and tuning system
In a broad sense, the scheduling strategy has the goal to automatically assign Units of Allocation (UA) over a CPU-Co-processors execution platform. The term UA was generically defined since the proposed framework is intended to deal with different granularities (the granularity is designed to change in accordance to the platform to be used) and different types of decomposition (task or data decomposition, according to application characteristics). However, in the context of this paper, an UA
Overview of system's requirements
As a motivation for designing an asymmetric CPU–GPU platform approach, a Computational Fluid Dynamics application is briefly described. For this application, large computations are needed to solve the velocity field and local pressure for objects like planes and cars. Clearly, both computation time and performance need to be optimized, while several instances of varying geometries for the objects are evaluated. In industrial prototyping, commonly default flow simulation is used, while in later
Related work
Aspect orientation software development. Applying AOSD in the software development for “traditional” information systems has led to important improvements on productivity and complexity management, mainly due to the separation and modularization of crosscutting concerns that lead to an improved reuse of previously developed components. In order to obtain the same benefits, engineers and researchers of embedded and real-time systems communities are increasingly using AOSD's concepts in their
Conclusions and future work
A context-aware runtime and tuning system, named Sm@rtConfig, was presented based on a compromise between reducing the execution time of applications due to appropriate dynamic scheduling and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. The system is integrated into the AMoDE-RT approach, by means of including two new aspects in DERAF and also their implementation in the code generation scripts used by the GenERTiCA tool towards Sm@rtConfig. This way,
Acknowledgments
We would like to thank the reviewers for detailed suggestions and comments. Alécio Binotto thanks the support given by DAAD fellowship no. A/07/70158, Programme Al an scholarship no. E07D402961BR, and CNPq scholarship no. 150860/2011-0. Marco Wehrmeister is grateful to CNPq (Brazilian National Council for Scientific and Technological Development) for the grant no. 480321/2011-6.
References (35)
- et al.
An aspect-oriented programming-based approach to software development for fault detection in measurement systems
Computer Standards & Interfaces
(2010) - et al.
Model driven middleware: A new paradigm for deploying and provisioning distributed real-time and embedded applications
Science of Computer Programming
(2008) - Ahmadinia, A., Bobda, C., Koch, D., Majer, M., & Teich, J. (2004). Task scheduling for heterogeneous reconfigurable...
- ATI. 2010. ATI stream SDK with OpenCL 〈http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx〉. Stand...
- Augonnet, C., Thibault, S., Namyst, R., & Wacrenier, P.-A. (2009). StarPU: A unified platform for task scheduling on...
- et al.
Templates for the solution of linear systems: Building blocks for iterative methods
(1994) - Bell, N., & Garland, M. (2009). Implementing sparse matrix-vector multiplication on throughput-oriented processors. In...
- Binotto, A. P. D., Daniel, C., Weber, D., Kuijper, A., Stork, A., Pereira, C. E., et al. (2010). Iterative sle solvers...
- et al.
Towards task dynamic reconfiguration over asymmetric computing platforms for UAVs surveillance systems
Scalable Computing: Practice and Experience
(2009) - Binotto, A. P. D., Pedras, B. M., Goetz, M., Kuijper, A., Pereira, C. E., Stork, A., et al. (2010). Effective dynamic...
Managing embedded systems complexity with aspect-oriented model-driven engineering
ACM Transactions on Embedded Computing Systems
Computers and intractability: A guide to the theory of NP-completeness
Cited by (11)
Programming languages for data-Intensive HPC applications: A systematic mapping study
2020, Parallel ComputingCombining aspects and object-orientation in model-driven engineering for distributed industrial mechatronics systems
2014, MechatronicsCitation Excerpt :Based on users feedback, the authors concluded that MDE is useful when applied to the development of complex systems, but it is still missing effective and easy-to-use tools to perform such development. In addition, another practical example of using MDE combined with an aspect-oriented approach in industrial applications was carried in previous work [8]. It demonstrates the practical use of developed tools during requirement and modeling phases of a complex Computational Fluid Dynamics application, from design to code generation for specific processing units.
Generating ROS-based Software for Industrial Cyber-Physical Systems from UML/MARTE
2020, IEEE International Conference on Emerging Technologies and Factory Automation, ETFAA review of machine learning and meta-heuristic methods for scheduling parallel computing systems
2018, ACM International Conference Proceeding Series
- ☆
Special section on Advanced Software Engineering in Industrial Automation.