doi:10.1016/j.future.2006.02.008
Copyright © 2006 Elsevier Ltd All rights reserved.
Performance prediction and its use in parallel and distributed computing systems
aDepartment of Computer Science, University of Warwick, Warwick CV4 7AL, UK
bCenter for Space Research, Massachusetts Institute of Technology, Cambridge, MA, USA
cNASA Ames Research Center, Moffett Field, CA, USA
Available online 2 May 2006.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Performance prediction is set to play a significant role in supportive middleware that is designed to manage workload on parallel and distributed computing systems. This middleware underpins the discovery of available resources, the identification of a task’s requirements and the match-making, scheduling and staging that follow.
This paper documents two prediction-based middleware services that address the implications of executing a particular workload on a given set of resources. These services are based on an established performance prediction system that is employed at both the local (intra-domain) and global (multi-domain) levels to provide dynamic workload steering. These additional facilities bring about significant performance improvements, the details of which are presented with regard to system- and user-level qualities of service. The middleware has been designed for the management of resources and distributed workload across multiple administrative boundaries, a requirement that is of central importance to grid computing.
Keywords: Performance prediction; Resource management; Grid computing
Fig. 1. An outline of the PACE system including the application and resource modelling components and the parametric evaluation engine which combines the two.
Fig. 2. Intra-domain level middleware components. Tasks that have associated performance data are processed by the Titan co-scheduler. This maps the tasks to the resources before they are finally committed to the physical hardware by Condor.
Fig. 3. Top — run-time schedule using just Condor (70.08 min); Bottom — run-time schedule using Condor and the predictive co-scheduler Titan (35.19 min).
Fig. 4. Interconnect of Titan and the MDS-based performance information service.
Fig. 5. The results of running 1000 tasks submitted at a request rate of 5 per second. Top — when the predictive middleware is inactive; bottom — when the predictive middleware is active. Note the improved makespan, 2756 s to 467 s. Darker shading indicates greater utilisation.
Fig. 6. The average delay under varying system loads. The bar to the right represents the delay when the middleware is off; the bar to the left represents the delay when the middleware is on.
Fig. 7. The resource usage under varying system loads. The bar to the right represents the resource usage when the middleware is off; the bar to the left represents the resource usage when the middleware is on.
Table 1.
Experimental results using Condor and using Condor with the Titan co-scheduler

Table 2.
Experimental results: r is the number of requests (load);
is the request submission rate per second; M represents whether the predictive middleware is active; t is the makespan; ε is the average delay and ν is the resource utilisation

Table 3.
Percentage of tasks meeting their deadlines under low, medium and high workloads

The results represent when the middleware is off and on; the results also show the percentage improvement made by activating the middleware.