research-article

Open Access

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers

Authors:
Humphrey Burchell

University of Kent, Canterbury, UK

University of Kent, Canterbury, UK
View Profile

,
Octave Larose

University of Kent, Canterbury, UK

University of Kent, Canterbury, UK
View Profile

,
Sophie Kaleba

University of Kent, Canterbury, UK

University of Kent, Canterbury, UK
View Profile

,
Stefan Marr

University of Kent, Canterbury, UK

University of Kent, Canterbury, UK
View Profile

MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and RuntimesOctober 2023Pages 100–113https://doi.org/10.1145/3617651.3622985

Published:19 October 2023Publication History

MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes

Pages 100–113

ABSTRACT

To identify optimisation opportunities, Java developers often use sampling profilers that attribute a percentage of run time to the methods of a program. Even so these profilers use sampling, are probabilistic in nature, and may suffer for instance from safepoint bias, they are normally considered to be relatively reliable. However, unreliable or inaccurate profiles may misdirect developers in their quest to resolve performance issues by not correctly identifying the program parts that would benefit most from optimisations.

With the wider adoption of profilers such as async-profiler and Honest Profiler, which are designed to avoid the safepoint bias, we wanted to investigate how precise and accurate Java sampling profilers are today. We investigate the precision, reliability, accuracy, and overhead of async-profiler, Honest Profiler, Java Flight Recorder, JProfiler, perf, and YourKit, which are all actively maintained. We assess them on the fully deterministic Are We Fast Yet benchmarks to have a stable foundation for the probabilistic profilers.

We find that profilers are relatively reliable over 30 runs and normally report the same hottest method. Unfortunately, this is not true for all benchmarks, which suggests their reliability may be application-specific. Different profilers also report different methods as hottest and cannot reliably agree on the set of top 5 hottest methods. On the positive side, the average run time overhead is in the range of 1% to 5.4% for the different profilers.

Future work should investigate how results can become more reliable, perhaps by reducing the observer effect of profilers by using optimisation decisions of unprofiled runs or by developing a principled approach of combining multiple profiles that explore different dynamic optimisations.

References

Ole Agesen. 1998. GC Points in a Threaded Environment. Sun Microsystems. Google Scholar
Bowen Alpern, C. R. Attanasio, Anthony Cocchi, Derek Lieber, Stephen Smith, Ton Ngo, John J. Barton, Susan Flynn Hummel, Janice C. Sheperd, and Mark Mergen. 1999. Implementing Jalapeño in Java. In Proceedings of the 14th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’99). ACM, 314–324. isbn:1-58113-238-7 https://doi.org/10.1145/320385.320418 Google ScholarDigital Library
Emery D. Berger, Sam Stern, and Juan Altmayer Pizzorno. 2023. Triangulating Python Performance Issues with SCALENE. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association. Google Scholar
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. ACM, 169–190. isbn:1-59593-348-4 https://doi.org/10.1145/1167473.1167488 Google ScholarDigital Library
De Bus, Bruno and Chanet, Dominique and De Sutter, Bjorn and Van Put, Ludo and De Bosschere, Koen. 2004. The Design and Implementation of FIT: a Flexible Instrumentation Toolkit. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, 29–34. Google Scholar
Jason Gait. 1986. A probe effect in concurrent programs. Software: Practice and Experience, 16, 3 (1986), 225–233. https://doi.org/10.1002/spe.4380160304 Google ScholarDigital Library
Andy Georges, Lieven Eeckhout, and Dries Buytaert. 2008. Java Performance Evaluation through Rigorous Replay Compilation. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (OOPSLA’08). ACM, 367–384. https://doi.org/10.1145/1449764.1449794 Google ScholarDigital Library
Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Lightweight Java Profiling with Partial Safepoints and Incremental Stack Tracing. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE ’15). ACM, 75–86. isbn:978-1-4503-3248-4 https://doi.org/10.1145/2668930.2688038 Google ScholarDigital Library
IBM. 1964. IBM System/360 Principles of Operation. IBM Press. Google Scholar
D. M. Ritchie K. Thompson. 1974. UNIX programmer’s manual fifth edition. Bell Telephone Laboratories, Inc.. Google Scholar
Naveen Kumar, Bruce R. Childers, and Mary Lou Soffa. 2005. Low Overhead Program Monitoring and Profiling. SIGSOFT Softw. Eng. Notes, 31, 1 (2005), sep, 28–34. issn:0163-5948 https://doi.org/10.1145/1108768.1108801 Google ScholarDigital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, 190–200. isbn:1595930566 https://doi.org/10.1145/1065010.1065034 Google ScholarDigital Library
Stefan Marr. 2023. ReBench: Execute and Document Benchmarks Reproducibly.. https://doi.org/10.5281/zenodo.1311762 Version 1.2 Google ScholarCross Ref
Stefan Marr, Benoit Daloze, and Hanspeter Mössenböck. 2016. Cross-Language Compiler Benchmarking—Are We Fast Yet? In Proceedings of the 12th Symposium on Dynamic Languages (DLS’16). ACM, 120–131. isbn:978-1-4503-4445-6 https://doi.org/10.1145/2989225.2989232 Google ScholarDigital Library
Jonathan Misurda, James A. Clause, Juliya L. Reed, Bruce R. Childers, and Mary Lou Soffa. 2005. Demand-Driven Structural Testing with Dynamic Instrumentation. In Proceedings of the 27th International Conference on Software Engineering (ICSE ’05). ACM, 156–165. isbn:1581139632 https://doi.org/10.1145/1062455.1062496 Google ScholarDigital Library
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the Accuracy of Java Profilers. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 187–197. isbn:978-1-4503-0019-3 https://doi.org/10.1145/1806596.1806618 Google ScholarDigital Library
Indigo Orton and Alan Mycroft. 2021. Tracing and Its Observer Effect on Concurrency. MPLR 2021. Association for Computing Machinery, 88–96. isbn:9781450386753 https://doi.org/10.1145/3475738.3480940 Google ScholarDigital Library
Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 31–47. https://doi.org/10.1145/3314221.3314637 Google ScholarDigital Library
Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and Brad Chen. 1997. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997 (NT’97). USENIX Association, 1. Google ScholarDigital Library
Ilene Seelemann. 1995. Limiting the Probe Effect in Debugging Concurrent Object-Oriented Programs. In Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON ’95). IBM Press, 56–68. Google ScholarDigital Library
Isaac Sjoblom, Tim Snyder, and Elena Machkasova. 2011. Can You Trust Your JVM Diagnostic Tools? MICS 2011. Midwest Instruction and Computing Symposium. https://micsymposium.org/mics_2011_proceedings/mics2011_submission_26.pdf Google Scholar
Amitabh Srivastava and Alan Eustace. 1994. ATOM: A System for Building Customized Program Analysis Tools. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). ACM, 196–205. isbn:089791662X https://doi.org/10.1145/178243.178260 Google ScholarDigital Library
Yudi Zheng, Lubomír Bulej, and Walter Binder. 2015. Accurate Profiling in the Presence of Dynamic Compilation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM, 433–450. isbn:978-1-4503-3689-5 https://doi.org/10.1145/2814270.2814281 Google ScholarDigital Library

Index Terms

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement

Recommendations

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers (Poster Abstract)
MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes

To identify optimisation opportunities, Java developers often use sampling profilers that attribute a percentage of run time to the methods of a program. Even so these profilers use sampling, are probabilistic in nature, and may suffer for ...
Read More
Evaluating the accuracy of Java profilers
PLDI '10

Performance analysts profile their programs to find methods that are worth optimizing: the "hot" methods. This paper shows that four commonly-used Java profilers (xprof , hprof , jprofile, and yourkit) often disagree on the identity of the hot methods. ...
Read More
Evaluating the accuracy of Java profilers
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation

Performance analysts profile their programs to find methods that are worth optimizing: the "hot" methods. This paper shows that four commonly-used Java profilers (xprof , hprof , jprofile, and yourkit) often disagree on the identity of the hot methods. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes
October 2023
184 pages
ISBN:9798400703805
DOI:10.1145/3617651
General Chair:
Rodrigo Bruno
INESC-ID, Portugal / IST-ULisboa, Portugal
,
Program Chair:
Eliot Moss
University of Massachusetts at Amherst, USA
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Analysis tools
CPU sampling
Profiler comparison
Profiler precision
Profiling
Qualifiers
- research-article
Conference
Upcoming Conference
SPLASH '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 20 - 25, 2024

Pasadena , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 115
  Total Downloads
- Downloads (Last 12 months)115
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers

MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes

ABSTRACT

References

Cited By

Index Terms

Recommendations

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers (Poster Abstract)

Evaluating the accuracy of Java profilers

Evaluating the accuracy of Java profilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers

MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes

ABSTRACT

References

Cited By

Index Terms

Recommendations

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers (Poster Abstract)

Evaluating the accuracy of Java profilers

Evaluating the accuracy of Java profilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media