Integrated parallel performance views

Nataraj, Aroon; Malony, Allen D.; Shende, Sameer; Morris, Alan

doi:10.1007/s10586-007-0051-6

Integrated parallel performance views

Published: 27 November 2007

Volume 11, pages 57–73, (2008)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Aroon Nataraj¹,
Allen D. Malony¹,
Sameer Shende¹ &
…
Alan Morris¹

57 Accesses
4 Citations
Explore all metrics

Abstract

The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 55. IEEE Computer Society, Washington (2003)
Google Scholar
Jones, T., et al.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. IEEE Computer Society, Washington (2003)
Google Scholar
TAU: Tuning and Analysis Utilities, http://www.cs.uoregon.edu/research/paracomp/tau/
Hollingsworth, J.K., Miller, B.P., Cargille, J.: Dynamic program instrumentation for scalable performance tools. Tech. Rep. CS-TR-1994-1207 (1994) [Online]. Available: citeseer.ist.psu.edu/75570.html
Tamches, A., Miller, B.P.: Fine-grained dynamic instrumentation of commodity operating system kernels. Oper. Syst. Des. Implement, 117–130 (1999)
Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 13. USENIX, Boston (2004)
Google Scholar
Yaghmour, K., Dagenais, M.R.: Measuring and characterizing system behavior using kernel-level event logging. In: USENIX ’00: Proceedings of the 2000 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2000)
Google Scholar
Wisniewski, R.W., Rosenburg, B.: Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. [Online]. Available: citeseer.csail.mit.edu/675589.html
Richard, M.D., et al.: Efficient and accurate tracing of events in linux clusters. [Online]. Available: citeseer.ist.psu.edu/627702.html
Sgi kernprof, http://oss.sgi.com/projects/kernprof/
Oprofile, http://sourceforge.net/projects/oprofile/
Ruan, Y., Pai, V.: Making the “box” transparent: System call performance as a first-class result. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2004)
Google Scholar
Mirgorodskiy, A., Miller, B.P.: Crosswalk: A tool for performance profiling across the user-kernel boundary. [Online]. Available: citeseer.csail.mit.edu/692418.html
Etsion, Y., Tsafrir, D., Kirkpatrick, S., Feitelson, D.G.: Fine grained kernel logging with klogger: Experience and insights, Technical Report 2005-35. School of Computer Science and Engineering, The Hebrew University of Jerusalem (2005)
Sharma, S., Bridges, P.G., Maccabe, A.B.: A framework for analyzing linux system overheads on hpc applications. In: LACSI ’05: Proceedings of the 2005 Los Alamos Computer Science Institute Symposium, Santa Fe, NM, USA, p. 17 (2005)
Bell, R., Malony, A.D., Shende, S.: A portable, extensible, and scalable tool for parallel performance profile analysis. In: Lecture Notes in Computer Science, vol. 2790, pp. 17–26. Springer, Berlin (2003)
Google Scholar
Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996). [Online]. Available: citeseer.ist.psu.edu/nagel96vampir.html
Google Scholar
Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with Jumpshot. Int. J. High Perform. Comput. Appl. 13(3), 277–288 (1999). [Online]. Available: citeseer.ist.psu.edu/zaki99toward.html
Article Google Scholar
ZeptoOS: The small linux for big computers, http://www.mcs.anl.gov/zeptoos/
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). [Online]. Available: citeseer.ist.psu.edu/bailey95nas.html
Article Google Scholar
Hoisie, A., Lubeck, O.M., Wasserman, H.J., Petrini, F., Alme, H.: A general predictive performance model for wavefront algorithms on clusters of SMPs. In: International Conference on Parallel Processing, p. 219 (2000)
McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996). [Online]. Available: citeseer.ist.psu.edu/mcvoy96lmbench.html
Nataraj, A., Malony, A., Morris, A., Shende, S.: Early experiences with ktau on the ibm bg/l. In: EuroPar06 European Conference on Parallel Processing (2006)
Bhattacharya, S., Apte, V.: A measurement study of the linux tcp/ip stack performance and scalability on smp systems. In: 1st International Conference on COMmunication Systems softWAre and middlewaRE (COMSWARE) (2006)
Personal communication—Application Specific Linux, http://www.cs.ucsb.edu/~lyouseff/ASL.htm

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Oregon, Eugene, OR, USA
Aroon Nataraj, Allen D. Malony, Sameer Shende & Alan Morris

Authors

Aroon Nataraj
View author publications
You can also search for this author in PubMed Google Scholar
Allen D. Malony
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Shende
View author publications
You can also search for this author in PubMed Google Scholar
Alan Morris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aroon Nataraj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nataraj, A., Malony, A.D., Shende, S. et al. Integrated parallel performance views. Cluster Comput 11, 57–73 (2008). https://doi.org/10.1007/s10586-007-0051-6

Download citation

Received: 16 March 2007
Accepted: 29 October 2007
Published: 27 November 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10586-007-0051-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrated parallel performance views

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Containerization technologies: taxonomies, applications and challenges

A survey on transactional stream processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrated parallel performance views

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Containerization technologies: taxonomies, applications and challenges

A survey on transactional stream processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation