SunwayMR: A distributed parallel computing framework with convenient data-intensive applications programming
Introduction
In the last several years, data amount has been explosively growing and the scale has vastly evolved [1], [2], [3]. Organizations have been using data-intensive applications to extract valuable information from the huge datasets they manage. Meanwhile, commercial computational processors have the impressive performance advantages and dominate the general-purpose processor market. They catalyze high performance computing (HPC) and keep pace with the increasing computation requirement [4], [5]. In fact, network-connected computing nodes at server level are critical to modern HPC applications. Nevertheless, the complexity of the operations of bottom instructions confines the use of computing facilities to process data. Hence, how to develop a programmable parallel computing framework that benefits the most from general or custom-made computational devices is a critical and promising issue [6], [7], [8], [9].
Although numerous researchers have devoted fundamental work to develop parallel computing frameworks with implementation techniques [10], [11], [12], [13], this work leads to a proliferation of such frameworks for data analytics. We argue that there exist inherent limitations, some lack ease-of-use and flexibility, and some cannot be extended in application programming. As a result, the design of distributed system is a nontrivial and challenging work. Some mechanisms of data distribution, job scheduler, information communication and fault tolerance scheme, etc., for parallel computing framework should be designed in detail.
A good distributed computing framework actually emerges for analyzing the enormous quantities of data. Usually, these frameworks share some key features: (1) Data processing management with efficiency and scalability. Data analysis is performed on locally stored data, greatly increasing throughput and performance. (2) Programming abstraction with ease-of-use. Most parallel computing frameworks provide some degree of abstraction over computing nodes. It is potentially beneficial to programmers, with shortening learning curve. (3) Be insensitive to underlying hardware. Typically, it is not necessary for the developers to be aware of the detail of the underlying architecture. (4) Fault tolerance for reliability, easy configuration and less library dependencies.
Definitely, Apache Hadoop [14] has been designed for distributed computing, from a single node to a potentially huge number of nodes. It is resilient to node failures. Things become worse that Hadoop might not deal with the data variety well, since its programming interfaces and associated data processing models are inconvenient and inefficient for handling variety of data, e.g., structural data and graph data. The key idea of Apache Spark [15], another distributed computing framework, is based on the important concept of immutable Resilient Distributed Datasets (RDDs), with providing transformation and action operations. Data analytics in Spark is performed via a sequence of RDD transformations while a MapReduce job consists of a map phrase and a reduce phase. By contrasting with Hadoop MapReduce, Spark has better advantage in performance. Note that, to some extent, Spark and Hadoop suffer from fussy parameters configuration.
When a new hardware architecture is introduced, in many existing distributed frameworks, the requirement of less library dependencies cannot be possibly met. For example, it is necessary to install and configure JAVA SDK for Spark and Hadoop. Thus, a new overarching system is specially needed to be developed with less library dependencies from scratch, so that special environment requirement can be met. For instance, although Sunway processors (provided by the State Key Laboratory of Mathematic Engineering and Advance Computing (MEAC-SKL)) have the capacity of super computation. It is violated for Spark and Hadoop to run on Sunway processors, because Spark and Hadoop strongly require dependent libraries, such as Scala SDK or Java SDK, etc. As so far, Spark supports several programming languages, including Scala, Java, and Python. Spark and Hadoop are JVM-based.
Motivated by these observations, the goal of our inspired work is to explore an effective solution to manage computational devices, with obtaining high performance. Like Spark and Hadoop, we propose SunwayMR by utilizing devices directly, which can lower the barrier to entry for the average users. In this paper, SunwayMR involves useful and architecture design, thus to obtain parallel capacity. The advantage is that SunwayMR copes with the challenge of data variety with multi-structural datasets and has easier configuration. Our developed framework makes specific data-intensive applications deploy on Infrastructure-as-a-Service Infrastructure lighter and faster. Programmers can use this on-going framework, which is targeted to have applicability and generality, to develop data-intensive applications.
To summarize, the main contributions of our work include:
- •
We first present and discuss the framework’s design. Based on clustering system’s two-level (master–slaves) hierarchy architecture, a distributed dataset managing mechanism organizes data into partitions as data computing unit sets (DCUS). More critically, task organization, job/task scheduling and message communication are given out subsequently.
- •
We give out systematic optimizations of thread-level stringstream to accelerate information communication between nodes substantially, and lightweight fault tolerance to solve reliability problem.
- •
We make an implementation of SunwayMR, which provides advantages of both ease-of-use and extensibility. More data-intensive applications can be achieved quickly by invoking public high-level APIs from lower layers of the framework, so as to write less low-level code.
- •
We conduct extensively empirical studies to evaluate the performance of SunwayMR using various applications and real datasets. Experimental results demonstrate that our solution achieves better performance in efficiency, speedup and execution time, compared with Spark framework.
The SunwayMR is written in C++ language in more than 8000 LOC, whereas the software resources (including Linux shell compiling scripts) are available from Github repository.1 We hope our research can help provide a guiding role for researchers to achieve autonomous parallel computing frameworks more quickly.
The rest of the paper is organized as follows. Section 2 contains related work and background. Section 3 provides an overview of SunwayMR framework. Section 4 introduces some preliminary knowledge. The main design principle is described in Section 5. Some optimizations are discussed in Section 6 and we introduce the framework’s ease-of-use in Section 7. We evaluate our system in Section 8. Finally, we conclude our paper in Section 9.
Section snippets
Related work and background
In this section, we discuss the key enablers of our study, namely, computational devices, distributed parallel programmable techniques and relative performance requirement.
Commoditization of computational accelerators is driving their widespread use.
Currently, it is common practice that large-scale data centers typically employ commodity off-the-shelf components to yield a cost-efficient setup. The large-scale data centers (e.g., Google, Amazon’s EC2) follow this commonly accepted approach to
SunwayMR overview
The HPC clustering system for running SunwayMR ought to include several computing nodes and desktops, etc., as depicted in Fig. 1. Parallel machines are networked together through a high speed network (e.g., GigaNet, InfiniBand). Such design constructs the foundation for the framework’s running environment. In order to provide remote access and control the cluster, the architecture of heterogeneous clustering system is organized in master–slaves paradigm, as shown in Fig. 2. The master node
Preliminaries
Typically, the sources from which data is loaded for parallel computing can be distributed file systems, shared/local file systems, memory or a data stream. The essential thing is to partition data logically and to process the data partitions in different physical servers. Therefore, in this section, we firstly introduce some important preliminaries, i.e., distributed dataset management and relative job and task management, etc., so as to realize both collaboration, inter-operability,
The main design principle of SunwayMR
The main design of SunwayMR contains several main aspects, data processing mechanism, coarse-grained and fine-grained parallelism and SunwayMRHelper communication component, as explained next.
Some system optimizations
Despiting the main design of SunwayMR has been proposed, there are some important optimizations that can be done furtherly. One is the communication efficiency between inter nodes (thread-level stringstream optimization); and the other is the reliability (detect/resume model for fault tolerance), as explained next.
The layered software architecture
Today’s popular software architecture generally follows the loosely-coupled layered manner. Likely, SunwayMR is the layered software architecture stack, The framework’s code can be divided into mainly three abstracted layers totally from the perspective of software engineering: (1) the upper code layer provides interfaces for application programming; (2) the bottom code layer manages the computing hardware resource; (3) the middle code layer block mainly connects link between the preceding and
Evaluation
In this section, we discuss two categories of research questions. One is that whether the performance is well-behaved performed. The other is that what the effect we obtain when applying our integrated optimizations, as well as varying the nodes, threads and data sizes. To answer above questions, we conduct the following analysis.
Conclusion
A programmable parallel computing framework, SunwayMR, is proposed and implemented for data-intensive applications on distributed clustering systems. It solves data volume challenge by parallelization and alleviates the data variety challenge, etc. Some systematic design alternatives and mechanisms on distributed environment are evaluated. The experiments show SunwayMR can effectively utilize cluster resources, while preserving its transparency, simplicity, and portability as a programming
Acknowledgments
This work is partially supported by the National High Technology Research and Development Program of China under Grant No.2014AA01A301, and the National Natural Science Foundation of China (NSFC) under Grant No.61472241.
Renke Wu is currently working toward the Ph.D. degree in the department of computer science and engineering at the Shanghai Jiao Tong University. His research focus on parallel computing and software engineering.
References (29)
- et al.
A capabilities-aware framework for using computational accelerators in data-intensive computing
J. Parallel Distrib. Comput.
(2011) - et al.
Reusable software components for accelerator-based clusters
J. Syst. Softw.
(2011) - et al.
A robust framework for real-time distributed processing of satellite data
J. Parallel Distrib. Comput.
(2006) - et al.
An adaptive and hierarchical task scheduling scheme for multi-core clusters
PARCO
(2014) - et al.
Performance analysis of homogeneous on-chip large-scale parallel computing architectures for data-parallel applications
J. Electr. Comput. Eng.
(2015) - Tang Tao, Research on Programming Model and Compiler Optimizations for CPU–GPU Heterogeneous Parallel Systems,...
- et al.
A practical data classification framework for scalable and high performance chip-multiprocessors
IEEE Trans. Comput.
(2014) - A.M. Aji, L.S. Panwar, F. Ji, M. Chabbi, K. Murthy, P. Balaji, …, R. Thakur, On the efficacy of GPU-integrated MPI for...
- F. Song, J. Dongarra, A scalable framework for heterogeneous GPU-based clusters, in: Proceedings of SPAA,...
- Usman Dastgeer, Christoph Kessler, A framework for performance-aware composition of applications for GPU-based systems,...
A parallel computing framework for large-scale air traffic flow optimization
ITS
Cited by (10)
A new exact algorithm for the shortest path problem: An optimized shortest distance matrix
2021, Computers and Industrial EngineeringTowards the Analysis networks of Redundancy with von Neumann Machines and RPCs
2021, Procedia Computer ScienceEDAWS: A distributed framework with efficient data analytics workspace towards discriminative services for critical infrastructures
2018, Future Generation Computer SystemsCitation Excerpt :Spark provides in-memory data structure to persist intermediate results in memory [12]. Additionally, distributed parallel computing systems MAPR, e.g., [16], Hadoop and SunwayMR [17–19], emerge for processing data stream. Meanwhile, several systems, e.g., Flume [20], HBase [21], Hive [22], have been built on the top of Hadoop.
PadMesh: a parallel and distributed framework for interactive mesh generation software
2022, Engineering with ComputersSwMR: A framework for accelerating mapreduce applications on sunway taihulight
2021, IEEE Transactions on Emerging Topics in Computing
Renke Wu is currently working toward the Ph.D. degree in the department of computer science and engineering at the Shanghai Jiao Tong University. His research focus on parallel computing and software engineering.
Linpeng Huang received his M.S. and Ph.D. degrees in computer science from Shanghai Jiao Tong University in 1989 and 1992, respectively. He is a professor of computer science in the department of computer science and engineering, Shanghai Jiao Tong University. His research interests lie in the area of distributed systems, architecture-driven software development, parallel computing, big data analysis and in-memory computing.
Peng Yu received his B.S. degree in software engineering from Nankai University (NKU) in 2014. He is currently working toward the M.S. degree in the school of software at the Shanghai Jiao Tong University. His research interests lie in the area of distributed systems, architecture-driven software development, parallel computing and big data analysis.
Haojie Zhou received his M.S. degree in computer science from Chinese Academy of Sciences. He works in the State Key Laboratory of Mathematic Engineering and Advance Computing, Jiangnan Institute of Computing Technology. His research interests lie in the area of distributed systems, parallel computing and data analysis.