Skip to main content
Log in

Integrated parallel performance views

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 55. IEEE Computer Society, Washington (2003)

    Google Scholar 

  2. Jones, T., et al.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. IEEE Computer Society, Washington (2003)

    Google Scholar 

  3. TAU: Tuning and Analysis Utilities, http://www.cs.uoregon.edu/research/paracomp/tau/

  4. Hollingsworth, J.K., Miller, B.P., Cargille, J.: Dynamic program instrumentation for scalable performance tools. Tech. Rep. CS-TR-1994-1207 (1994) [Online]. Available: citeseer.ist.psu.edu/75570.html

  5. Tamches, A., Miller, B.P.: Fine-grained dynamic instrumentation of commodity operating system kernels. Oper. Syst. Des. Implement, 117–130 (1999)

  6. Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 13. USENIX, Boston (2004)

    Google Scholar 

  7. Yaghmour, K., Dagenais, M.R.: Measuring and characterizing system behavior using kernel-level event logging. In: USENIX ’00: Proceedings of the 2000 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2000)

    Google Scholar 

  8. Wisniewski, R.W., Rosenburg, B.: Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. [Online]. Available: citeseer.csail.mit.edu/675589.html

  9. Richard, M.D., et al.: Efficient and accurate tracing of events in linux clusters. [Online]. Available: citeseer.ist.psu.edu/627702.html

  10. Sgi kernprof, http://oss.sgi.com/projects/kernprof/

  11. Oprofile, http://sourceforge.net/projects/oprofile/

  12. Ruan, Y., Pai, V.: Making the “box” transparent: System call performance as a first-class result. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2004)

    Google Scholar 

  13. Mirgorodskiy, A., Miller, B.P.: Crosswalk: A tool for performance profiling across the user-kernel boundary. [Online]. Available: citeseer.csail.mit.edu/692418.html

  14. Etsion, Y., Tsafrir, D., Kirkpatrick, S., Feitelson, D.G.: Fine grained kernel logging with klogger: Experience and insights, Technical Report 2005-35. School of Computer Science and Engineering, The Hebrew University of Jerusalem (2005)

  15. Sharma, S., Bridges, P.G., Maccabe, A.B.: A framework for analyzing linux system overheads on hpc applications. In: LACSI ’05: Proceedings of the 2005 Los Alamos Computer Science Institute Symposium, Santa Fe, NM, USA, p. 17 (2005)

  16. Bell, R., Malony, A.D., Shende, S.: A portable, extensible, and scalable tool for parallel performance profile analysis. In: Lecture Notes in Computer Science, vol. 2790, pp. 17–26. Springer, Berlin (2003)

    Google Scholar 

  17. Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996). [Online]. Available: citeseer.ist.psu.edu/nagel96vampir.html

    Google Scholar 

  18. Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with Jumpshot. Int. J. High Perform. Comput. Appl. 13(3), 277–288 (1999). [Online]. Available: citeseer.ist.psu.edu/zaki99toward.html

    Article  Google Scholar 

  19. ZeptoOS: The small linux for big computers, http://www.mcs.anl.gov/zeptoos/

  20. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). [Online]. Available: citeseer.ist.psu.edu/bailey95nas.html

    Article  Google Scholar 

  21. Hoisie, A., Lubeck, O.M., Wasserman, H.J., Petrini, F., Alme, H.: A general predictive performance model for wavefront algorithms on clusters of SMPs. In: International Conference on Parallel Processing, p. 219 (2000)

  22. McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996). [Online]. Available: citeseer.ist.psu.edu/mcvoy96lmbench.html

  23. Nataraj, A., Malony, A., Morris, A., Shende, S.: Early experiences with ktau on the ibm bg/l. In: EuroPar06 European Conference on Parallel Processing (2006)

  24. Bhattacharya, S., Apte, V.: A measurement study of the linux tcp/ip stack performance and scalability on smp systems. In: 1st International Conference on COMmunication Systems softWAre and middlewaRE (COMSWARE) (2006)

  25. Personal communication—Application Specific Linux, http://www.cs.ucsb.edu/~lyouseff/ASL.htm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aroon Nataraj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nataraj, A., Malony, A.D., Shende, S. et al. Integrated parallel performance views. Cluster Comput 11, 57–73 (2008). https://doi.org/10.1007/s10586-007-0051-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-007-0051-6

Keywords

Navigation