Skip to main content

Constraint Programming-Based Job Dispatching for Modern HPC Applications

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11802))

Abstract

HPC systems are increasingly being used for big data analytics and predictive model building that employ many short jobs. In these application scenarios, HPC job dispatchers need to process large numbers of short jobs quickly and make decisions on-line while ensuring high Quality-of-Service (QoS) levels and meet demanding timing requirements. Constraint Programming (CP) is an effective approach for tackling job dispatching problems. Yet, the state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching and take advantage of job duration predictions. These limitations jeopardize achieving high QoS levels, and consequently impede the adoption of CP-based dispatchers in HPC systems. We propose a class of CP-based dispatchers that are more suitable for HPC systems running modern applications. The new dispatchers are able to reduce the time required for generating on-line dispatching decisions significantly, and are able to make effective use of job duration predictions to decrease waiting times and job slowdowns, especially for workloads dominated by short jobs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://developers.google.com/optimization/.

References

  1. Altair: Altair PBS professional (2019). http://www.pbsworks.com

  2. Anderson, M.J., et al.: Bridging the gap between HPC and big data frameworks. PVLDB 10(8), 901–912 (2017)

    Google Scholar 

  3. Ashby, S., et al.: The opportunities and challenges of exascale computing-summary report of the advanced scientific computing advisory committee (ASCAC) subcommittee. US Department of Energy Office of Science, pp. 1–77 (2010)

    Google Scholar 

  4. Baptiste, P., Laborie, P., Pape, C.L., Nuijten, W.: Chapter 22 - constraint-based scheduling and planning. In: Handbook of Constraint Programming, Foundations of Artificial Intelligence, vol. 2, pp. 761–799. Elsevier (2006)

    Google Scholar 

  5. Bartolini, A., Borghesi, A., Bridi, T., Lombardi, M., Milano, M.: Proactive workload dispatching on the EURORA supercomputer. In: O’Sullivan, B. (ed.) CP 2014. LNCS, vol. 8656, pp. 765–780. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10428-7_55

    Chapter  Google Scholar 

  6. Blazewicz, J., Lenstra, J.K., Kan, A.H.G.R.: Scheduling subject to resource constraints: classification and complexity. Discrete Appl. Math. 5(1), 11–24 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  7. Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Scheduling-based power capping in high performance computing systems. Sustain. Comput. Inf. Syst. 19, 1–13 (2018)

    Google Scholar 

  8. Borghesi, A., Collina, F., Lombardi, M., Milano, M., Benini, L.: Power capping in high performance computing systems. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 524–540. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23219-5_37

    Chapter  Google Scholar 

  9. Bridi, T., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans. Parallel Distrib. Syst. 27(10), 2781–2794 (2016)

    Article  Google Scholar 

  10. Buddhakulsomsiri, J., Kim, D.S.: Priority rule-based heuristic for multi-mode resource-constrained project scheduling problems with resource vacations and activity splitting. Eur. J. Oper. Res. 178(2), 374–390 (2007)

    Article  MATH  Google Scholar 

  11. Cavazzoni, C.: EURORA: a European architecture toward exascale. In: Proceedings of Future HPC Systems - The Challenges of Power-Constrained Performance, pp. 1–4. ACM (2012)

    Google Scholar 

  12. CINECA: The Italian Interuniversitary Consortium for High Performance Computing (2019). https://www.cineca.it/

  13. Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization, pp. 140–148, December 2001

    Google Scholar 

  14. Emeras, J., Varrette, S., Guzek, M., Bouvry, P.: Evalix: classification and prediction of job resource consumption on HPC platforms. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 102–122. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_6

    Chapter  Google Scholar 

  15. Fan, Y., Rich, P., Allcock, W.E., Papka, M.E., Lan, Z.: Trade-off between prediction accuracy and underestimation rate in job runtime estimates. In: Proceedings of IEEE International Conference on Cluster Computing, CLUSTER 2017, pp. 530–540. IEEE Computer Society (2017)

    Google Scholar 

  16. Feitelson, D.G., Weil, A.M.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: Proceedings of First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, IPPS/SPDP 1998, pp. 542–546 (1998)

    Google Scholar 

  17. Fox, G., Qiu, J., Jha, S., Ekanayake, S., Kamburugamuve, S.: Big data, simulations and HPC convergence. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds.) WBDB -2015. LNCS, vol. 10044, pp. 3–17. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49748-8_1

    Chapter  Google Scholar 

  18. Galleguillos, C., Kiziltan, Z., Netti, A., Soto, R.: AccaSim: a customizable workload management simulator for job dispatching research in HPC systems. Cluster Computing (2019)

    Google Scholar 

  19. Galleguillos, C., Sîrbu, A., Kiziltan, Z., Babaoglu, O., Borghesi, A., Bridi, T.: Data-driven job dispatching in HPC systems. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds.) MOD 2017. LNCS, vol. 10710, pp. 449–461. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72926-8_37

    Chapter  Google Scholar 

  20. Gaussier, É., Glesser, D., Reis, V., Trystram, D.: Improving backfilling by using machine learning to predict running times. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 1–10. ACM (2015)

    Google Scholar 

  21. Haupt, R.: A survey of priority rule-based scheduling. Operat. Res. Spektrum 11(1), 3–16 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  22. Henderson, R.L.: Job scheduling under the Portable Batch System. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 279–294. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_34

    Chapter  Google Scholar 

  23. Wyatt II, M.R., Herbein, S., Gamblin, T., Moody, A., Ahn, D.H., Taufer, M.: PRIONN: predicting runtime and IO using neural networks. In: Proceedings of 47th International Conference on Parallel Processing, ICPP 2018, pp. 1–12. ACM (2018)

    Google Scholar 

  24. Laborie, P., Godard, D.: Self-adapting large neighborhood search: application to single-mode scheduling problems. In: Proceedings of 3rd Multidisciplinary International Conference on Scheduling: Theory and Applications, MISTA 2007, pp. 276–284 (2007)

    Google Scholar 

  25. Laborie, P., Rogerie, J.: Reasoning with conditional time-intervals. In: Proceedings of Twenty-First International Florida Artificial Intelligence Research Society Conference, FLAIRS 2008, pp. 555–560. AAAI Press (2008)

    Google Scholar 

  26. Laborie, P., Rogerie, J., Shaw, P., Vilím, P.: IBM ILOG CP Optimizer for scheduling. Constraints 23(2), 210–250 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  27. Bailey Lee, C., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005). https://doi.org/10.1007/11407522_14

    Chapter  Google Scholar 

  28. Naghshnejad, M., Singhal, M.: Adaptive online runtime prediction to improve HPC applications latency in cloud. In: Proceedings of 11th IEEE International Conference on Cloud Computing, CLOUD 2018, pp. 762–769. IEEE Computer Society (2018)

    Google Scholar 

  29. Nonaka, J., Sakamoto, N., Shimizu, T., Fujita, M., Ono, K., Koyamada, K.: Distributed particle-based rendering framework for large data visualization on HPC environments. In: 2017 International Conference on High Performance Computing Simulation (HPCS), pp. 300–307 (2017)

    Google Scholar 

  30. Pape, C.L., Couronne, P., Vergamini, D., Gosselin, V.: Time-versus-capacity compromises in project scheduling. AISB Q. 91, 19–31 (1995)

    Google Scholar 

  31. Qiu, J., Jha, S., Luckow, A., Fox, G.C.: Towards HPC-ABDS: an initial high-performance big data stack. Building Robust Big Data Ecosystem ISO/IEC JTC 1, 18–21 (2014)

    Google Scholar 

  32. Reuther, A., et al.: Scalable system scheduling for HPC and big data. J. Parallel Distrib. Comput. 111, 76–92 (2018)

    Article  Google Scholar 

  33. Rückemann, C.: Using parallel multicore and HPC systems for dynamical visualisation. In: 2009 International Conference on Advanced Geographic Information Systems Web Services, pp. 13–18 (2009)

    Google Scholar 

  34. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)

    Article  Google Scholar 

  35. SLURM: SLURM workload manager (2019). http://slurm.schedmd.com

  36. Soysal, M., Berghoff, M., Streit, A.: Analysis of job metadata for enhanced wall time prediction. In: Klusáček, D., Cirne, W., Desai, N., et al. (eds.) JSSPP 2018. LNCS, vol. 11332, pp. 1–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10632-4_1

    Chapter  Google Scholar 

  37. Tang, W., Desai, N., Buettner, D., Lan, Z.: Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: Proceedings of 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, pp. 1–11. IEEE (2010)

    Google Scholar 

  38. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007)

    Article  Google Scholar 

  39. Vivodtzev, F., Bertron, I.: Remote visualization of large scale fast dynamic simulations in a HPC context. In: Proceedings of 4th IEEE Symposium on Large Data Analysis and Visualization, LDAV 2014, pp. 121–122. IEEE (2014)

    Google Scholar 

Download references

Acknowledgements

We thank A. Bartolini, L. Benini, M. Milano, M. Lombardi and the SCAI group at Cineca for providing the Eurora data, and A. Borghesi and T. Bridi for sharing the original implementations of the dispatchers. We thank the IT Center of the University of Pisa and M. Marzolla for providing computing resources. C. Galleguillos has been supported by Postgraduate Grant PUCV 2018. A. Sîrbu has been partially funded by the SoBigData EU project (grant agreement 654024).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristian Galleguillos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Galleguillos, C., Kiziltan, Z., Sîrbu, A., Babaoglu, O. (2019). Constraint Programming-Based Job Dispatching for Modern HPC Applications. In: Schiex, T., de Givry, S. (eds) Principles and Practice of Constraint Programming. CP 2019. Lecture Notes in Computer Science(), vol 11802. Springer, Cham. https://doi.org/10.1007/978-3-030-30048-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30048-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30047-0

  • Online ISBN: 978-3-030-30048-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics