Elsevier

Future Generation Computer Systems

Volume 55, February 2016, Pages 428-443
Future Generation Computer Systems

Multi-criteria and satisfaction oriented scheduling for hybrid distributed computing infrastructures

https://doi.org/10.1016/j.future.2015.03.022Get rights and content

Highlights

  • We designed an overall multi-criteria task scheduling method for hybrid DCIs.

  • The scheduling method allows a systematic integration of new scheduling criteria into it.

  • We defined a methodology for finding optimal scheduling strategies.

  • For the validation we consider both user and resource owners perspectives.

  • We presented the experimental system built for the validation of the scheduling method.

Abstract

Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software.

Introduction

The requirements of distributed computing applications in terms of processing and storing capacities are continuously increasing, pushed by the gigantic deluge of large data volume to process. Nowadays, scientific communities and industrial companies can choose among a large variety of distributed computing infrastructures (DCI) to execute their applications. Examples of such infrastructures are Desktop Grids or Volunteer Computing systems  [1] which can gather a huge number of volunteer PCs at almost no cost, Grids [2] which assemble large number of distributed clusters and more recently, clouds  [3] which can be accessed remotely, following a pay-as-you-go pricing model. All these infrastructures have very different characteristics in terms of computing capacity, cost, reliability, consumed power efficiency and more. Hence, combining these infrastructures in such a way that meets users’ and applications’ requirements raises significant scheduling challenges.

The first challenge concerns the design of the resource management middleware which allows the assemblage of hybrid DCIs. The difficulty relies in the number of desirable high level features that the middleware has to provide in order to cope with: (i) distributed infrastructures that have various usage paradigms (reservation, on-demand, queue), and (ii) computing resources that are heterogeneous, volatile, unreliable and sometimes not trustee. An architecture that has been proved to be efficient to gather hybrid and elastic infrastructures is the joint use of a pull-based scheduler with pilot jobs [4], [5], [6], [7]. The pull-based scheduler, often used in Desktop Grid computing systems [8], [9], relies on the principle that the computing resources pull tasks from a centralized scheduler. Pilot jobs consist in resource acquisition by the scheduler and the deployment on them of agents with direct access to the central pull-based scheduler. In this way the scheduler can directly work with the resources, rather than going through local job schedulers. This approach exhibits several desirable properties, such as scalability, fault resilience, ease of deployment and ability to cope with elastic infrastructures, motivating us to use it in our scheduler.

The second challenge is to design task scheduling that are capable of efficiently using hybrid DCIs, and in particular, that takes into account the differences between the infrastructures. In particular, the drawback of a pull-scheduler is that it flattens the hybrid infrastructures and tends to consider all computing resources on an equal basis. Our earlier results  [10], [11] proved that a multi-criteria scheduling method based on the Promethee decision model  [12] can make a pull-based scheduler able to implement scheduling strategies aware of the computing resources characteristics. However, in this initial work, we tested the method on single infrastructure type at a time, without considering hybrid computing infrastructures, and we evaluated the method against two criteria: expected completion time (ECT) and usage price. In this paper, we propose the following extensions to the Promethee scheduler: (i) we add a third criteria called Expected Error Impact (EEI), that reflects the confidence that a host returns correct results, (ii) we evaluate the Promethee scheduler on hybrid environments, (iii) we leverage the tunability of the Promethee scheduler so that applications developers can empirically configure the scheduler to put more emphasize on criteria that are important from their own perspective.

The third challenge regards the design of a new scheduling approach that maximizes satisfaction of both users and resource owners. In general, end users request to run their tasks quicker and at the cheapest costs, opposed to the infrastructure owners which need to capitalize their assets and minimize the operational costs. Thus, an overall scheduling approach should allow the resource owners to keep their business profitable and meantime, increase the end user satisfaction after the interaction with the global computing system.

The Promethee scheduler allows users to provide their own scheduling strategies in order to meet their applications requirements by configuring the relative importance of each criteria. However such configurable multi-criteria schedulers have two strong limitations: (i) there is no guaranty that the user preferences expressed when configuring the scheduler actually translates in an execution that follows the favored criteria, and (ii) the number of possible scheduling strategies explodes with the number of criteria and the number of application profiles, rapidly leading to an intractable situation by the user. We propose Satisfaction Oriented FilTering (SOFT), a new methodology that explores all the scheduling strategies provided by a Promethee multi-criteria scheduler to filter and select the most favorable ones according to the user execution profiles and the optimization of the infrastructure usage. SOFT also allows to select a default scheduling strategy so that the scheduler attains a high and at the same time stable level of user satisfaction, regardless the diversity of user satisfaction profiles.

In this paper, we introduce the design of the fault-tolerant and trust-aware Promethee scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. We thoroughly present the algorithms of the multi-criteria decision making and the SOFT methodology. Finally, we extensively evaluate the Promethee scheduler using a simulator that recreates a combination of hybrid, elastic and unreliable environment containing Internet Desktop Grid, public Cloud using Spot Instance and Best Effort Grid. Simulation results not only show the effectiveness of the Promethee scheduler but also its ability to meet user application requirements. We also propose an optimized implementation of the Promethee algorithms and perform real world experiments to validate the approach.

The remainder of the paper is organized as follows. In Section  2 we give the background for our work and define the key concepts, in Section  3 we explain our scheduling approach and define the performance evaluation metrics. In Section  4 we define SOFT, the methodology for optimal scheduling strategies selection. Then we present the architecture of the implemented experimental system in Section  5. In Section  6 we describe the experimental data, the setup and present the obtained results and findings. In Section  7 we discuss related work and finally Section  8 gives the concluding remarks and observations on this work.

Section snippets

Background

This section describes the multi-criteria scheduling on hybrid DCIs problem that we address in this work and defines the key concepts used in our discussion.

The Promethee scheduling method

This section presents our approach of using Promethee [12] for task scheduling and the defined performance evaluation metrics.

Overall scheduling approach

This section presents our contribution in defining a methodology that allows one to select from a large number of defined scheduling strategies the optimal one with regard to a set of application requirements.

Scheduling method implementation

In this section we describe the architecture of the scheduler implementation and other complementary components. The scheduler in general is inspired from the XtremWeb behavior, borrowing basic functional components like the pull-based and task replication mechanisms.

Fig. 4 depicts a component-based representation of the implemented system architecture; its main components are: the Task Scheduler, the trace-based Hybrid DCI Simulator and the Visualizer.

The Task Scheduler is a real

Experiments and results

This section presents the experimental setup used for the validation of the scheduling approach presented in the previous sections. More precisely, we showed how to apply the scheduling methodology devised in Section  4.2, in order to select the proper tuning of the scheduler, considering a hybrid computing infrastructure. We start by presenting the experimental data and setup then we discuss the obtained results.

Related work

In this section we review other approaches for building middleware solutions to facilitate joint usage of computing resources, originating from different types of distributed computing infrastructure.

Addressing the need of simultaneously exploiting different computing infrastructures, for both research and business purposes, there are several attempts  [24], [6], [25] to build middleware that aggregates resources from different types of infrastructure, in order to facilitate a better

Concluding remarks

In this paper we addressed the challenge of scheduling tasks in distributed computing infrastructures (DCI) like Grids and Clouds by proposing a multi-criteria scheduling method based on the Promethee algorithm. We presented the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria

Mircea Moca is Associate Professor at the Babeş-Bolyai University of Cluj-Napoca, România. He completed his Ph.D. in 2010. During his Ph.D. he followed an internship at LIP, INRIA Lyon, participating to the implementation and validation of the MapReduce for Volunteer Computing INRIA application. He joined the Babeş-Bolyai University in 2009 within the Business Information Systems department. His current research interests are multi-criteria scheduling methods for hybrid distributed computing

References (38)

  • Javier Celaya et al.

    A task routing approach to large-scale scheduling

    Future Gener. Comput. Syst.

    (2013)
  • Christophe Cérin et al.

    Desktop Grid Computing

    (2012)
  • I. Foster et al.

    The anatomy of the grid: Enabling scalable virtual organizations

    Int. J. Supercomput. Appl.

    (2001)
  • Etienne Urbah et al.

    EDGeS: Bridging EGEE to BOINC and XtremWeb

    J. Grid Comput.

    (2009)
  • P. Kacsuk et al.

    Towards a powerful european dci based on desktop grids

    J. Grid Comput.

    (2011)
  • Mark Silberstein et al.

    Gridbot: execution of bags of tasks in multiple grids

  • D.P. Anderson, BOINC: A system for public-resource computing and storage, in: Proceedings of the 5th IEEE/ACM...
  • G. Fedak et al.

    XtremWeb: A generic global computing platform

  • M. Moca et al.

    Using Promethee Methods for Multi-Criteria Pull-based scheduling on DCIs

  • Mircea Moca is Associate Professor at the Babeş-Bolyai University of Cluj-Napoca, România. He completed his Ph.D. in 2010. During his Ph.D. he followed an internship at LIP, INRIA Lyon, participating to the implementation and validation of the MapReduce for Volunteer Computing INRIA application. He joined the Babeş-Bolyai University in 2009 within the Business Information Systems department. His current research interests are multi-criteria scheduling methods for hybrid distributed computing infrastructures, building E-FAST–a prototype that provides high-performance execution of financial services based on advanced computational methods and XtremWeb.

    Cristian Litan is an Associate Professor at the Babeş-Bolyai University of Cluj-Napoca, România, within the Department of Statistics, Forecasting and Mathematics. His research interests are in the field of mathematical and quantitative methods applied in economics and social sciences. He gave special attention to research in game theory and applications. He also conducted studies in applied econometrics and statistics.

    Gheorghe Cosmin Silaghi is Professor at the Babes-Bolyai University of Cluj-Napoca, Romania. He received his Bachelor in Business Information Systems in 2000 and Engineering degree in Computer Science in 2002. In 2002, he received his M.Sc. in Artificial Intelligence from Free University of Amsterdam. He completed his Ph.D. in 2005. He joined the Babes-Bolyai University in 2000. His current research interests are focused on resource management techniques in untrusted distributed environments, like peer-to-peer systems.

    Gilles Fedak is a permanent INRIA research scientist since 2004 and he is currently working in the AVALON team. After graduated from University Paris Sud in 2003, he followed a postdoctoral fellowship at University California San Diego in 2003–2004. His research topics include designing and implementing an open-source Desktop Grid system called XtremWeb, an open-source platform for data-intensive applications on Cloud and Desktop Grid called BitDew.

    View full text