skip to main content
10.1145/1088149.1088161acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Low-overhead call path profiling of unmodified, optimized code

Published:20 June 2005Publication History

ABSTRACT

Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedure's code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, shows that for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead.

References

  1. G. Ammons, T. Ball, and J. R. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. In SIGPLAN Conference on Programming Language Design and Implementation, pages 85--96, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Ammons, J.-D. Choi, M. Gupta, and N. Swamy. Finding and removing performance bottlenecks in large systems. In Proceedings of the 2004 European Conference on Object-Oriented Programming, pages 172--196, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous profiling: where have all the cycles gone? In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, pages 1--14. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apple Computer. Shark. http://developer.apple.com/performance/.Google ScholarGoogle Scholar
  5. M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In SIGPLAN Conference on Programming Language Design and Implementation, pages 168--179, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Arnold and P. F. Sweeney. Approximating the calling context tree via sampling. Technical Report 21789, IBM, 1999.Google ScholarGoogle Scholar
  7. A. R. Bernat and B. P. Miller. Incremental call-path profiling. Technical report, University of Wisconsin, 2004.Google ScholarGoogle Scholar
  8. H.-P. Company. Calling standard for Alpha systems. http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARH9MCT%E/TITLETXT.HTM. 29 April 2005.Google ScholarGoogle Scholar
  9. T. C. Conway and Z. Somogyi. Deep profiling: engineering a profiler for a declarative programming language. Technical Report 24, University of Melbourne, Australia, 2001.Google ScholarGoogle Scholar
  10. S. J. Drew, K. J. Gough, and J. Ledermann. Implementing zero overhead exception handling. Technical Report 95--12, Queensland University of Technology, 1995.Google ScholarGoogle Scholar
  11. E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software: Practice and Experience, 29(5), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. L. Graham, P. B. Kessler, and M. K. McKusick. gprof: a call graph execution profiler. In SIGPLAN Symposium on Compiler Construction, pages 120--126, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, Sept. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. J. Hall. Call path refinement profiles. In IEEE Transactions on Software Engineering, volume no. 6, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. J. Hall and A. J. Goldberg. Call path profiling of monotonic program resources in UNIX. In Proceedings of the USENIX Summer Technical Conference, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel Corporation. Intel vtune performance analyzers. http://www.intel.com/software/products/vtune/.Google ScholarGoogle Scholar
  17. J. Mellor-Crummey, R. Fowler, G. Marin, and N. Tallent. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing, 23:81--101, 2002. Special Issue with selected papers from the Los Alamos Computer Science Institute Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Ponder and R. J. Fateman. Inaccuracies in program profilers. Software: Practice and Experience, 18(5), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Sander. Graph layout through the VCG tool. In R. Tamassia and I. G. Tollis, editors, Proc. DIMACS Int. Work. Graph Drawing, GD, number 894, pages 194--205, Berlin, Germany, 10--12 1994. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Spivey. Fast, accurate call graph profiling. Software: Practice and Experience, 34(3):249--264, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. A. Varley. Practical experience of the limitations of gprof. Software: Practice and Experience, 23(4):461--463, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Waddell and J. M. Ashley. Visualizing the performance of higher-order programs. In Proceedings of the 1998 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pages 75--82. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Whaley. A portable sampling-based profiler for Java virtual machines. In Java Grande, pages 78--87, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICS '05: Proceedings of the 19th annual international conference on Supercomputing
    June 2005
    414 pages
    ISBN:1595931678
    DOI:10.1145/1088149

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate629of2,180submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader