Abstract
Energy consumption has gradually become a very important parameter in High Performance Computing platforms. The Resource and Job Management System (RJMS) is the HPC middleware that is responsible for distributing computing power to applications and has knowledge of both the underlying resources and jobs needs. Therefore it is the best candidate for monitoring and controlling the energy consumption of the computations according to the job specifications. The integration of energy measurment mechanisms on RJMS and the consideration of energy consumption as a new characteristic in accounting seemed primordial at this time when energy has become a bottleneck to scalability. Since Power-Meters would be too expensive, other existing measurement models such as IPMI and RAPL can be exploited by the RJMS in order to track energy consumption and enhance the monitoring of the executions with energy considerations.
In this paper we present the design and implementation of a new framework, developed upon SLURM Resource and Job Management System, which allows energy accounting per job with power profiling capabilities along with parameters for energy control features based on static frequency scaling of the CPUs. Since the goal of this work is the deployment of the framework on large petaflopic clusters such as CURIE, its cost and reliability are important issues. We evaluate the overhead of the design choices and the precision of the monitoring modes using different HPC benchmarks (Linpack, IMB, Stream) on a real-scale platform with integrated Power-meters. Our experiments show that the overhead is less than 0.6% in energy consumption and less than 0.2% in execution time while the error deviation compared to Power-meters less than 2% in most cases.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: Simple Linux utility for resource management. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003)
Top500 supercomputer sites, http://www.top500.org/
Assun¸ão, M., Gelas, J.P., Lefèvre, L., Orgerie, A.C.: The green grid’5000: Instrumenting and using a grid with energy sensors. In: Remote Instrumentation for eScience and Related Aspects, pp. 25–42. Springer, New York (2012)
James, L., David, D., Phi, P.: Powerinsight - a commodity power measurement capability. In: The Third International Workshop on Power Measurement and Profiling (2013)
Ge, R., Feng, X., Song, S., Chang, H.C., Li, D., Cameron, K.: Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems 21(5), 658–671 (2010)
Intel: (intelligent platform management interface specification v2.0)
Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: A quantitative comparison. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194–204 (2013)
Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., Rajwan, D.: Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro 32(2), 20–27 (2012)
Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures. In: 2012 Second International Conference on Cloud and Green Computing (CGC), pp. 274–281 (2012)
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)
Weaver, V.M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., Moore, S.: Measuring energy and power with papi. In: ICPP Workshops, pp. 262–268 (2012)
Goel, B., McKee, S., Gioiosa, R., Singh, K., Bhadauria, M., Cesati, M.: Portable, scalable, per-core power estimation for intelligent resource management. In: 2010 International Green Computing Conference, pp. 135–146 (2010)
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2: The next generation of scalable trace formats and support libraries. In: Bosschere, K.D., D’Hollander, E.H., Joubert, G.R., Padua, D.A., Peters, F.J., Sawyer, M. (eds.) PARCO. Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2011)
Folk, M., Cheng, A., Yates, K.: HDF5: A file format and i/o library for high performance computing applications. In: Proceedings of Supercomputing 1999 (CD-ROM), Portland, OR. ACM SIGARCH and IEEE (1999)
Biddiscombe, J., Soumagne, J., Oger, G., Guibert, D., Piccinali, J.G.: Parallel computational steering for HPC applications using HDF5 files in distributed shared memory. IEEE Transactions on Visualization and Computer Graphics 18(6), 852–864 (2012)
Hennecke, M., Frings, W., Homberg, W., Zitz, A., Knobloch, M., Böttiger, H.: Measuring power consumption on ibm blue gene/p. Computer Science - Research and Development 27(4), 329–336 (2012)
Rountree, B., Lownenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making dvs practical for complex hpc applications. In: ICS 2009: Proceedings of the 23rd International Conference on Supercomputing, pp. 460–469. ACM, New York (2009)
Huang, S., Feng, W.: Energy-efficient cluster computing via accurate workload characterization. In: CCGRID 2009: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 68–75. IEEE Computer Society, Washington, DC (2009)
Lim, M.Y., Freeh, V.W., Lowenthal, D.K.: Adaptive, transparent frequency and voltage scaling of communication phases in mpi programs. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 107. ACM, New York (2006)
Costa, G.D., de Assunção, M.D., Gelas, J.P., Georgiou, Y., Lefèvre, L., Orgerie, A.C., Pierson, J.M., Richard, O., Sayah, A.: Multi-facet approach to reduce energy consumption in clouds and grids: the green-net framework. In: e-Energy, pp. 95–104 (2010)
Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lantéri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, I.G., Iréa, T.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int. Journal of High Performance Computing Applications 20(4), 481–494 (2006)
Patterson, M.K., Poole, S.W., Hsu, C.-H., Maxwell, D., Tschudi, W., Coles, H., Martinez, D.J., Bates, N.: TUE, a new energy-efficiency metric applied at ORNL’s jaguar. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 372–382. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Georgiou, Y., Cadeau, T., Glesser, D., Auble, D., Jette, M., Hautreux, M. (2014). Energy Accounting and Control with SLURM Resource and Job Management System. In: Chatterjee, M., Cao, Jn., Kothapalli, K., Rajsbaum, S. (eds) Distributed Computing and Networking. ICDCN 2014. Lecture Notes in Computer Science, vol 8314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45249-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-45249-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45248-2
Online ISBN: 978-3-642-45249-9
eBook Packages: Computer ScienceComputer Science (R0)