Abstract
Despite intense research on Grid scheduling, differentiated quality of service remains an open question, and no consensus has emerged on the most promising strategy. The difficulties of experimentation might be one of the root causes of this stalling. An alternative to experimenting on real, large, and complex data is to look for well-founded and parsimonious representations, which may also contribute to the a-priori knowledge required for operational Autonomics. The goal of this paper is thus to explore explanatory and generative models rather than predictive ones. As a test case, we address the following issue: is it possible to exhibit and validate consistent models of the Grid workload? Most existing work on modeling the dynamics of Grid behavior describes Grids as complex systems, but assumes a steady-state system (technically stationarity) and concludes to some form of long-range dependence (slowly decaying correlation) in the associated time-series. But the physical (economic and sociologic) processes governing the Grid behavior dispel the stationarity hypothesis. This paper considers an appealing different class of models: a sequence of stationary processes separated by breakpoints. The model selection question is now defined as identifying the breakpoints and fitting the processes in each segment. Experimenting with data from the EGEE/EGI Grid, we found that a non-stationary model can consistently be identified from empirical data, and that limiting the range of models to piecewise affine (autoregressive) time series is sufficiently powerful. We propose and experiment a validation methodology that empirically addresses the current lack of theoretical results concerning the quality of the estimated model parameters. Finally, we present a bootstrapping strategy for building more robust models from the limited samples at hand.
Similar content being viewed by others
References
Beran, J.: Statistics for Long-Memory Processes, vol. 61 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York (1994)
Bhattacharya, R.N., Gupta, V.K., Waymire, E.: The Hurst effect under trends. J. Appl. Probab. 20(3), 649–662 (1983)
Breiman L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay in space-shared computing environments. In: IISWC, pp. 213–224 (2006)
Burns, P.J.: Robustness of the Ljung-Box Test and its Rank Equivalent (2002). Available at SSRN: http://ssrn.com/abstract=443560 or doi:10.2139/ssrn.443560
Davis, R.A., Lee, T., Rodriguez-Yam, G.: Structural break estimation for nonstationary time series models. J. Am. Stat. Assoc. 101, 229–239 (2006)
Diebold, F.X., Inoue, A.: Long memory and regime switching. J. Econom. 105(1), 131–159 (2001)
Dinda, P.A., O’Hallaron, D.R.: Host load prediction using linear models. Cluster Comput. 3(4), 265–280 (2000)
Downey A.B.: Using queue time predictions for processor allocation. In: IPPS’97, JSSPP’97, pp. 35–57 (1997)
Efron, B.: Bootstrap. Another look at jackknife. Ann. Stat. 7(1), 1–26 (1979)
Elteto, T., Germain Renaud, C., Bondon, P., Sebag, M.: Discovering piecewise linear models of Grid workload. In: 10th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing, pp. 474–484 (2010)
Germain-Renaud, C., et al.: The Grid observatory. In: 11th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing (2011)
Laure, E., et al.: Programming the Grid with gLite. Comput. Methods Sci. Technol. 12(1), 33–45 (2006)
Gagliardi, F., et al.: Building an infrastructure for scientific Grid computing: status and goals of the EGEE project. Philos. Trans. R. Soc. A 1833, 1729–1742 (2005)
Lassnig, M., et al.: Identification, modelling and prediction of non-periodic bursts in workloads. In: 10th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing, pp. 485–494 (2010)
Andreozzi, S., et al.: Glue schema specification, V1.3. Technical report, Open Grid Forum (2008)
Fearnhead, P.: Exact Bayesian curve fitting and signal segmentation. IEEE Trans. Signal Process. 53(6), 2160–2166 (2005)
Granger, C.W.J., Hyung, N.: Occasional structural breaks and long memory with an application to the S&P 500 absolute stock returns. J. Empir. Finance 11(3), 399–421 (2004)
Granger, C.W.J., Joyeux, R.: An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1(1), 15–29 (1980)
Hosking, J.R.M.: Fractional differencing. Biometrika 68(1), 165–176 (1981)
Huebscher, M.C., McCann, J.A.: A survey of autonomic computing: degrees, models, and applications. ACM Comput. Surv. 40, 7,1–7,28 (2008)
Gott, R., III.: Implications of the copernican principle for our future prospects. Nature 363, 315–319 (1993)
Ilijašić, L., Saitta, L.: Characterization of a computational Grid as a complex system. In: Procs. of GMAC ’09, pp. 9–18 (2009)
Jha, S., Parashar, M., Rana, O.: Investigating autonomic behaviours in Grid-basedcomputational science applications. In: Proceedings of GMAC’09, pp. 29–38 (2009)
Kitagawa, G., Akaike, H.: A procedure for the modeling of non-stationary time series. Ann. Inst. Stat. Math. 30(1), 351–363 (1978)
Lagana, A., et al.: COMPCHEM: Progress towards gems a Grid empowered molecular simulator and beyond. Journal of Grid Computing 8, 571–586 (2010)
Lee, B.-D., Schopf., J. M.: Run-time prediction of parallel applications on shared environments. In: CLUSTER, pp. 487–491 (2003)
Lee, T.-W., Yang, Y.: Bagging binary and quantile predictors for times series. J. Econom. 135(1–2), 465–497 (2006)
Li, H., Muskulus, M.: Analysis and modeling of job arrivals in a production Grid. SIGMETRICS Perform. Eval. Rev. 34(4), 59–70 (2007)
Lingrand, D., Glatard, T., Montagnat, J.: Modeling the latency on production Grids with respect to the execution context. Parallel Comput. 35, 493–511 (2009)
Macias, M., Rana, O., Smith, G., Guitart, J., Torres, J.: Maximising revenue in Grid markets using an economically enhanced resource manager. Concurrency and Computation: Practice and Experience 22(14), 1990–2011 (2008)
Meng, J., Chakradhar, S.T., Raghunathan, A.: Best-effort parallel execution framework for recognition and mining applications. In: IPDPS, pp. 1–12 (2009)
Mi, N., Casale, G., Cherkasova, L., Smirni, E.: Injecting realistic burstiness to a traditional client-server benchmark. In: Proceedings of ICAC ’09, pp. 149–158 (2009)
Minh, T.N., Wolters, L., Epema D.: A realistic integrated model of parallel system workloads. In: 10th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing, pp. 464–473 (2010)
Mutz, A., Wolski, R., Brevik, J.: Eliciting honest value information in a batch-queue environment. In: Proceedings of GRID ’07, pp. 291–297 (2007)
Nadeem, F., Yousaf, M.M., Prodan, R., Fahringer, T.: Soft benchmarks-based application performance prediction using a minimum training set. In: e-science’06 (2006)
Special issue: EGEE applications and supporting Grid technologies. Journal of Grid Computing 8(3) (2010)
Perez, J., Germain-Renaud, C., Kégl, B., Loomis, C.: Utility-based reinformcement learning for reactive Grids. In: The 5th IEEE ICAC Autonomic Computing (2008)
Perez, J., Germain-Renaud, C., Kégl, B., Loomis, C.: Multi-objective reinforcement learning for responsive Grids. Journal of Grid Computing 8(3), 473–492 (2010)
Pugliese, A., Talia, D., Yahyapour, R.: Modeling and supporting Grid scheduling. Journal of Grid Computing 6(2), 195–213 (2008)
Raman, R., Livny, M., Solomon, M.: Matchmaking: distributed resource management for high throughput computing. In: Procs 7th IEEE Int. Symp. on High Performance Distributed Computing, pp. 140–147 (1998)
Rish, I., Das, R., Tesauro, G., Kephart, J.: Autonomic computing: a new challenge for machine learning. ECML/PKDD tutorial, available online at www.ecmlpkdd2006.org/tutorials.html (2006)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore (1989)
Rogers, E.: Diffusion of Innovations. Free Press, New York (1983)
Skaug, H.J., Tjostheim, D.: Testing for serial independence using measures of distance between densities. In: Robinson, P.M., Rosenblatt, M. (eds.) Athens Conference on Applied Probability and Time Series. Springer Lecture Notes in Statistics, vol. 115 (1996)
Smith, W., Taylor, V.E., Foster, I.T.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: IPPS/SPDP ’99, JSSPP’99, pp. 202–219 (1999)
Sonmez, O., Yigitbasi, N., Iosup, A., Epema, D.: Trace-based evaluation of job runtime and queue wait time predictions in Grids. In: Proceedings of HPDC ’09 (2009)
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. In: American Control Conference, pp. 2143–2146 (1992)
Taqqu, M.S., Teverovsky, V.: Robustness of Whittle-type estimators for time series with long-range dependence. Commun. Stat. Stoch. Models 13(4), 723–757 (1997)
Tesauro, G., Jong, N.K., Das, R., Bennani, M.N.: On the use of hybrid reinforcement learning for autonomic resource allocation. Cluster Comput. 10(3), 287–299 (2007)
Tesauro, G.: Reinforcement learning in autonomic computing: a manifesto and case studies. IEEE Int. Comput. 11, 22–30 (2007)
Teverovsky, V., Taqqu, M.: Testing for long-range dependence in the presence of shifting means or a slowly declining trend, using a variance-type estimator. J. Time Ser. Anal. 18(3), 279–304 (1997)
Thain, D., Bent, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Gathering at the well: creating communities for Grid i/o. In: Proc. of Supercomputing (2001)
Wolski, R., Spring, N.T., Hayes, J.: Predicting the cpu availability of time-shared unix systems on the computational Grid. Cluster Comput. 3(4), 293–301 (2000)
Yang, L., Schopf, J.M., Foster, I.: Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 31 (2003)
Yao, Y.-C.: Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 6(3), 181–189 (1988)
Zhang, X., Furtlehner, C., Perez, J., Germain-Renaud, C., Sebag, M.: Toward autonomic Grids: analyzing the job flow with affinity streaming. In: Proc. of the 15th ACM SIGKDD, pp. 987–996 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Éltető, T., Germain-Renaud, C., Bondon, P. et al. Towards Non-Stationary Grid Models. J Grid Computing 9, 423–440 (2011). https://doi.org/10.1007/s10723-011-9194-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-011-9194-z