skip to main content
10.1145/1851476.1851505acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

Published:21 June 2010Publication History

ABSTRACT

As the most widely used parallel job scheduling strategy in production schedulers, EASY has achieved great success, not only because it can balance fairness and performance, but also because it is universally applicable to most HPC systems. However, unfairness still exists in EASY. For real workloads used in this work, our simulation shows that a blocked job can be delayed by later jobs for more than 90 hours. In addition, EASY cannot directly employ parallel job runtime prediction techniques, because this would lead to a serious situation called reservation violation.

In this paper, we aim at guaranteeing strict fairness (no job is delayed by any jobs of lower priority) while achieving attractive performance, and employing prediction without causing reservation violation in parallel job scheduling. We propose two novel strategies, shadow load preemption (SLP) and venture backfilling (VB), which are together integrated into EASY to construct a preemptive venture EASY backfilling (PV-EASY) strategy. Experimental results on three workloads of real HPC systems demonstrate that: First, PV-EASY guarantees strict fairness, in addition to avoiding reservation violation when employing job runtime prediction techniques in scheduling; Second, PV-EASY achieves the same performance as EASY, and outperforms prediction employed EASY; Third, the preemption in PV-EASY is not resource costly and simple enough to be implemented in all HPC systems where EASY works. These advantages make PV-EASY more attractive than EASY in parallel job scheduling, both from academic and industry perspectives.

References

  1. }}Lifka, D. A., The ANL/IBM SP scheduling system. In 1st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Feitelson, D. G., Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Transactions on Parallel and Distributed Systems, 2005: p. 175--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Mu'Alem, A. W. and Feitelson, D. G., Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP 2 with backfilling. IEEE Transactions on Parallel and Distributed Systems, 2001. 12(6): p. 529--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Karger, D., Stein, C., and Wein, J., Scheduling algorithms. CRC Handbook of Computer Science, 1997.Google ScholarGoogle Scholar
  5. }}Sgall, J. On-line scheduling - a survey. In A. Fiat and G. Woeginger, editors, On-Line Algorithms: The State of the Art, Lecture Notes in Computer Science, pages 196--231. Springer-Verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Majumdar, S., Eager, D. L., and Bunt, R. B., Scheduling in multiprogrammed parallel systems. ACM SIGMETRICS Performance Evaluation Review, 1988. 16(1): p. 104--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Sevcik, K. C., Application scheduling and processor allocation in multiprogrammed parallel processing systems. Journal of Performance Evaluation, 1994. 19: p. 107--140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}AuYoung, A., Vahdat A., and Snoeren, A. C., Evaluating the Impact of Inaccurate Information in Utility-Based Scheduling. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Etsion, Y. and Tsafrir, D., A Short Survey of Commercial Cluster Batch Schedulers. Technical Report 2005--13, The Hebrew University of Jerusalem, May 2005.Google ScholarGoogle Scholar
  10. }}Chiang, S. H., Arpaci-Dusseau, A., and Vernon, M. K., The impact of more accurate requested runtimes on production job scheduling performance. In 8th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Srinivasan, S., Kettimuthu, R., Subramani, V., and Sadayappan, P., Selective reservation strategies for backfill job scheduling. In 8th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}Ward, W. A., Mahood, C. L. and West, J. E., Scheduling jobs on parallel systems using a relaxed backfill strategy. In 8th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Shmueli, E. and Feitelson, D. G., Backfilling with lookahead to optimize the packing of parallel jobs. Journal of Parallel and Distributed Computing, 2005. 65(9): p. 1090--1107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Jones, J. P. and Nitzberg, B., Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In 5th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}Talby, D. and Feitelson, D. G., Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling. In Proceedings of the 13th International Symposium on Parallel Processing (IPPS), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Tsafrir, D., Etsion, Y. and Feitelson, D. G., Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems, 2007. 18(6): p. 789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Thebe, O., Bunde, D. P. and Leung. V. J., Scheduling Restartable Jobs with Short Test Runs. In 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Guim, F., Rodero, I. and Corbalan, J., The resource usage aware backfilling. In 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}Kurian, R., Balaji, P. and Sadayappan, P., Opportune job shredding: An effective approach for scheduling parameter sweep applications. In Los Alamos Computer Science Institute Symposium, New Mexico, 2003.Google ScholarGoogle Scholar
  20. }}Sabin, G., et al., Scheduling of parallel jobs in a heterogeneous multi-site environment. In 9th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2003.Google ScholarGoogle ScholarCross RefCross Ref
  21. }}Shmueli, E. and Feitelson, D. G., On simulation and design of parallel-systems schedulers: are we doing the right thing?. IEEE Transactions on Parallel and Distributed Systems, 2009. 20(7): p. 983--996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}Raz, D., Levy, H. and Avi-Itzhak, B., A resource-allocation queueing fairness measure. ACM SIGMETRICS Performance Evaluation Review, 2004. 32(1): p. 130--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}Avi-Itzhak, B., Levy, H. and Raz, D., Quantifying fairness in queueing systems: Principles and applications, in the Engineering and Informational Sciences, v. 22 n. 4, p. 495--517, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Isard, M., et al., Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SIGOPS 22nd symposium on Operating systems principles (SOSP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }}Mann, L., Queue culture: The waiting line as a social system. The American Journal of Sociology, 1969. 75(3): p. 340--354.Google ScholarGoogle Scholar
  26. }}Larson, R. C., Perspectives on queues: social justice and the psychology of queueing. Operations Research, 1987. 35(6): p. 895--905.Google ScholarGoogle Scholar
  27. }}Sabin, G. and Kochhar, G., Job Fairness in Non-Preemptive Job Scheduling. In Proceedings of the 2004 International Conference on Parallel Processing (ICPP), 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}Avi-Itzhak, B., Brosh, E. and Levy, H., SQF: A slowdown queueing fairness measure. Performance Evaluation, 2007. 64(9--12): p. 1121--1136 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Ngubiri, J. and van Vliet, M., Characteristics of fairness metrics and their effect on perceived scheduler effectiveness. 2007, Technical Report, Radboud University Nijmegen.Google ScholarGoogle Scholar
  30. }}Lee, C. B. and Snavely, A., On the user-scheduler dialogue: Studies of user-provided runtime estimates and utility functions. International Journal of High Performance Computing Applications, 2006. 20(4): p. 495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}Lee, C. B., et al., Are user runtime estimates inherently inaccurate?. In 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}Tang, W., Lan, Z., Desai, N. and Buettner, D., Fault-Aware, Utility-Based Job Scheduling on Blue Gene/P Systems. In 2009 IEEE International Conference on Cluster Computing (Cluster), 2009.Google ScholarGoogle ScholarCross RefCross Ref
  33. }}Susukita, R., et al. Performance prediction of large-scale parallell system and application using macro-level simulation, in Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. }}Kapadia, N. H., Fortes, J. and Brodley, C. E., Predictive application-performance modeling in a computational grid environment. In 8th IEEE Int'l Symp. on High Performance Distributed Computing (HPDC), p. 6, Aug 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}Krishnaswamy, S., Loke, S. W. and Zaslavsky, A., Estimating computation times of data-intensive applications. IEEE Distributed Systems Online, 2004. 5(4).Google ScholarGoogle ScholarCross RefCross Ref
  36. }}Lee, C. B. and Snavely, A. E., Precise and realistic utility functions for user-centric performance analysis of schedulers, In 16th International Symposium on High Performance Distributed Computing (HPDC), 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}Perkovic, D. and Keleher, P. J., Randomization, speculation, and adaptation in batch schedulers. in Proceedings of the 2000 ACM/IEEE conference on Supercomputing (SC). 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. }}Zotkin, D. and Keleher, P. J., Job-Length Estimation and Performance in Backfilling Schedulers. in Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing (HPDC). 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. }}Tsafrir, D., Feitelson, D. G.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: IEEE International Symposium on Workload Characterization, pp. 131--141 (2006)Google ScholarGoogle ScholarCross RefCross Ref
  40. }}Nadeem, F. and Fahringer, T., Predicting the execution time of grid workflow applications through local learning. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC), 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. }}Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload/.Google ScholarGoogle Scholar
  42. }}Yero, E. and Henriques, M., Contention-sensitive static performance prediction for parallel distributed applications. Performance Evaluation, 2006. 63(4--5): p. 265--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. }}Jackson, D., Maui/Moab default configuration. with CTO of Cluster Resources, 2006Google ScholarGoogle Scholar

Index Terms

  1. PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
    June 2010
    911 pages
    ISBN:9781605589428
    DOI:10.1145/1851476

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 June 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate166of966submissions,17%

    Upcoming Conference

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader