ABSTRACT
To identify optimisation opportunities, Java developers often use sampling profilers that attribute a percentage of run time to the methods of a program. Even so these profilers use sampling, are probabilistic in nature, and may suffer for instance from safepoint bias, they are normally considered to be relatively reliable. However, unreliable or inaccurate profiles may misdirect developers in their quest to resolve performance issues by not correctly identifying the program parts that would benefit most from optimisations.
With the wider adoption of profilers such as async-profiler and Honest Profiler, which are designed to avoid the safepoint bias, we wanted to investigate how precise and accurate Java sampling profilers are today. We investigate the precision, reliability, accuracy, and overhead of async-profiler, Honest Profiler, Java Flight Recorder, JProfiler, perf, and YourKit, which are all actively maintained. We assess them on the fully deterministic Are We Fast Yet benchmarks to have a stable foundation for the probabilistic profilers.
We find that profilers are relatively reliable over 30 runs and normally report the same hottest method. Unfortunately, this is not true for all benchmarks, which suggests their reliability may be application-specific. Different profilers also report different methods as hottest and cannot reliably agree on the set of top 5 hottest methods. On the positive side, the average run time overhead is in the range of 1% to 5.4% for the different profilers.
Future work should investigate how results can become more reliable, perhaps by reducing the observer effect of profilers by using optimisation decisions of unprofiled runs or by developing a principled approach of combining multiple profiles that explore different dynamic optimisations.
- Ole Agesen. 1998. GC Points in a Threaded Environment. Sun Microsystems. Google Scholar
- Bowen Alpern, C. R. Attanasio, Anthony Cocchi, Derek Lieber, Stephen Smith, Ton Ngo, John J. Barton, Susan Flynn Hummel, Janice C. Sheperd, and Mark Mergen. 1999. Implementing Jalapeño in Java. In Proceedings of the 14th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’99). ACM, 314–324. isbn:1-58113-238-7 https://doi.org/10.1145/320385.320418 Google ScholarDigital Library
- Emery D. Berger, Sam Stern, and Juan Altmayer Pizzorno. 2023. Triangulating Python Performance Issues with SCALENE. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association. Google Scholar
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. ACM, 169–190. isbn:1-59593-348-4 https://doi.org/10.1145/1167473.1167488 Google ScholarDigital Library
- De Bus, Bruno and Chanet, Dominique and De Sutter, Bjorn and Van Put, Ludo and De Bosschere, Koen. 2004. The Design and Implementation of FIT: a Flexible Instrumentation Toolkit. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, 29–34. Google Scholar
- Jason Gait. 1986. A probe effect in concurrent programs. Software: Practice and Experience, 16, 3 (1986), 225–233. https://doi.org/10.1002/spe.4380160304 Google ScholarDigital Library
- Andy Georges, Lieven Eeckhout, and Dries Buytaert. 2008. Java Performance Evaluation through Rigorous Replay Compilation. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (OOPSLA’08). ACM, 367–384. https://doi.org/10.1145/1449764.1449794 Google ScholarDigital Library
- Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Lightweight Java Profiling with Partial Safepoints and Incremental Stack Tracing. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE ’15). ACM, 75–86. isbn:978-1-4503-3248-4 https://doi.org/10.1145/2668930.2688038 Google ScholarDigital Library
- IBM. 1964. IBM System/360 Principles of Operation. IBM Press. Google Scholar
- D. M. Ritchie K. Thompson. 1974. UNIX programmer’s manual fifth edition. Bell Telephone Laboratories, Inc.. Google Scholar
- Naveen Kumar, Bruce R. Childers, and Mary Lou Soffa. 2005. Low Overhead Program Monitoring and Profiling. SIGSOFT Softw. Eng. Notes, 31, 1 (2005), sep, 28–34. issn:0163-5948 https://doi.org/10.1145/1108768.1108801 Google ScholarDigital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, 190–200. isbn:1595930566 https://doi.org/10.1145/1065010.1065034 Google ScholarDigital Library
- Stefan Marr. 2023. ReBench: Execute and Document Benchmarks Reproducibly.. https://doi.org/10.5281/zenodo.1311762 Version 1.2 Google ScholarCross Ref
- Stefan Marr, Benoit Daloze, and Hanspeter Mössenböck. 2016. Cross-Language Compiler Benchmarking—Are We Fast Yet? In Proceedings of the 12th Symposium on Dynamic Languages (DLS’16). ACM, 120–131. isbn:978-1-4503-4445-6 https://doi.org/10.1145/2989225.2989232 Google ScholarDigital Library
- Jonathan Misurda, James A. Clause, Juliya L. Reed, Bruce R. Childers, and Mary Lou Soffa. 2005. Demand-Driven Structural Testing with Dynamic Instrumentation. In Proceedings of the 27th International Conference on Software Engineering (ICSE ’05). ACM, 156–165. isbn:1581139632 https://doi.org/10.1145/1062455.1062496 Google ScholarDigital Library
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the Accuracy of Java Profilers. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 187–197. isbn:978-1-4503-0019-3 https://doi.org/10.1145/1806596.1806618 Google ScholarDigital Library
- Indigo Orton and Alan Mycroft. 2021. Tracing and Its Observer Effect on Concurrency. MPLR 2021. Association for Computing Machinery, 88–96. isbn:9781450386753 https://doi.org/10.1145/3475738.3480940 Google ScholarDigital Library
- Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 31–47. https://doi.org/10.1145/3314221.3314637 Google ScholarDigital Library
- Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and Brad Chen. 1997. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997 (NT’97). USENIX Association, 1. Google ScholarDigital Library
- Ilene Seelemann. 1995. Limiting the Probe Effect in Debugging Concurrent Object-Oriented Programs. In Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON ’95). IBM Press, 56–68. Google ScholarDigital Library
- Isaac Sjoblom, Tim Snyder, and Elena Machkasova. 2011. Can You Trust Your JVM Diagnostic Tools? MICS 2011. Midwest Instruction and Computing Symposium. https://micsymposium.org/mics_2011_proceedings/mics2011_submission_26.pdf Google Scholar
- Amitabh Srivastava and Alan Eustace. 1994. ATOM: A System for Building Customized Program Analysis Tools. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). ACM, 196–205. isbn:089791662X https://doi.org/10.1145/178243.178260 Google ScholarDigital Library
- Yudi Zheng, Lubomír Bulej, and Walter Binder. 2015. Accurate Profiling in the Presence of Dynamic Compilation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM, 433–450. isbn:978-1-4503-3689-5 https://doi.org/10.1145/2814270.2814281 Google ScholarDigital Library
Index Terms
- Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers
Recommendations
Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers (Poster Abstract)
MPLR 2023: Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and RuntimesTo identify optimisation opportunities, Java developers often use sampling profilers that attribute a percentage of run time to the methods of a program. Even so these profilers use sampling, are probabilistic in nature, and may suffer for ...
Evaluating the accuracy of Java profilers
PLDI '10Performance analysts profile their programs to find methods that are worth optimizing: the "hot" methods. This paper shows that four commonly-used Java profilers (xprof , hprof , jprofile, and yourkit) often disagree on the identity of the hot methods. ...
Evaluating the accuracy of Java profilers
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and ImplementationPerformance analysts profile their programs to find methods that are worth optimizing: the "hot" methods. This paper shows that four commonly-used Java profilers (xprof , hprof , jprofile, and yourkit) often disagree on the identity of the hot methods. ...
Comments