skip to main content
10.1145/3617651.3622985acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article
Open Access

Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers

Published:19 October 2023Publication History

ABSTRACT

To identify optimisation opportunities, Java developers often use sampling profilers that attribute a percentage of run time to the methods of a program. Even so these profilers use sampling, are probabilistic in nature, and may suffer for instance from safepoint bias, they are normally considered to be relatively reliable. However, unreliable or inaccurate profiles may misdirect developers in their quest to resolve performance issues by not correctly identifying the program parts that would benefit most from optimisations.

With the wider adoption of profilers such as async-profiler and Honest Profiler, which are designed to avoid the safepoint bias, we wanted to investigate how precise and accurate Java sampling profilers are today. We investigate the precision, reliability, accuracy, and overhead of async-profiler, Honest Profiler, Java Flight Recorder, JProfiler, perf, and YourKit, which are all actively maintained. We assess them on the fully deterministic Are We Fast Yet benchmarks to have a stable foundation for the probabilistic profilers.

We find that profilers are relatively reliable over 30 runs and normally report the same hottest method. Unfortunately, this is not true for all benchmarks, which suggests their reliability may be application-specific. Different profilers also report different methods as hottest and cannot reliably agree on the set of top 5 hottest methods. On the positive side, the average run time overhead is in the range of 1% to 5.4% for the different profilers.

Future work should investigate how results can become more reliable, perhaps by reducing the observer effect of profilers by using optimisation decisions of unprofiled runs or by developing a principled approach of combining multiple profiles that explore different dynamic optimisations.

References

  1. Ole Agesen. 1998. GC Points in a Threaded Environment. Sun Microsystems. Google ScholarGoogle Scholar
  2. Bowen Alpern, C. R. Attanasio, Anthony Cocchi, Derek Lieber, Stephen Smith, Ton Ngo, John J. Barton, Susan Flynn Hummel, Janice C. Sheperd, and Mark Mergen. 1999. Implementing Jalapeño in Java. In Proceedings of the 14th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’99). ACM, 314–324. isbn:1-58113-238-7 https://doi.org/10.1145/320385.320418 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Emery D. Berger, Sam Stern, and Juan Altmayer Pizzorno. 2023. Triangulating Python Performance Issues with SCALENE. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association. Google ScholarGoogle Scholar
  4. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. ACM, 169–190. isbn:1-59593-348-4 https://doi.org/10.1145/1167473.1167488 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. De Bus, Bruno and Chanet, Dominique and De Sutter, Bjorn and Van Put, Ludo and De Bosschere, Koen. 2004. The Design and Implementation of FIT: a Flexible Instrumentation Toolkit. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, 29–34. Google ScholarGoogle Scholar
  6. Jason Gait. 1986. A probe effect in concurrent programs. Software: Practice and Experience, 16, 3 (1986), 225–233. https://doi.org/10.1002/spe.4380160304 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andy Georges, Lieven Eeckhout, and Dries Buytaert. 2008. Java Performance Evaluation through Rigorous Replay Compilation. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (OOPSLA’08). ACM, 367–384. https://doi.org/10.1145/1449764.1449794 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Lightweight Java Profiling with Partial Safepoints and Incremental Stack Tracing. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE ’15). ACM, 75–86. isbn:978-1-4503-3248-4 https://doi.org/10.1145/2668930.2688038 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. IBM. 1964. IBM System/360 Principles of Operation. IBM Press. Google ScholarGoogle Scholar
  10. D. M. Ritchie K. Thompson. 1974. UNIX programmer’s manual fifth edition. Bell Telephone Laboratories, Inc.. Google ScholarGoogle Scholar
  11. Naveen Kumar, Bruce R. Childers, and Mary Lou Soffa. 2005. Low Overhead Program Monitoring and Profiling. SIGSOFT Softw. Eng. Notes, 31, 1 (2005), sep, 28–34. issn:0163-5948 https://doi.org/10.1145/1108768.1108801 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, 190–200. isbn:1595930566 https://doi.org/10.1145/1065010.1065034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stefan Marr. 2023. ReBench: Execute and Document Benchmarks Reproducibly.. https://doi.org/10.5281/zenodo.1311762 Version 1.2 Google ScholarGoogle ScholarCross RefCross Ref
  14. Stefan Marr, Benoit Daloze, and Hanspeter Mössenböck. 2016. Cross-Language Compiler Benchmarking—Are We Fast Yet? In Proceedings of the 12th Symposium on Dynamic Languages (DLS’16). ACM, 120–131. isbn:978-1-4503-4445-6 https://doi.org/10.1145/2989225.2989232 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Misurda, James A. Clause, Juliya L. Reed, Bruce R. Childers, and Mary Lou Soffa. 2005. Demand-Driven Structural Testing with Dynamic Instrumentation. In Proceedings of the 27th International Conference on Software Engineering (ICSE ’05). ACM, 156–165. isbn:1581139632 https://doi.org/10.1145/1062455.1062496 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2010. Evaluating the Accuracy of Java Profilers. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 187–197. isbn:978-1-4503-0019-3 https://doi.org/10.1145/1806596.1806618 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Indigo Orton and Alan Mycroft. 2021. Tracing and Its Observer Effect on Concurrency. MPLR 2021. Association for Computing Machinery, 88–96. isbn:9781450386753 https://doi.org/10.1145/3475738.3480940 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 31–47. https://doi.org/10.1145/3314221.3314637 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and Brad Chen. 1997. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997 (NT’97). USENIX Association, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ilene Seelemann. 1995. Limiting the Probe Effect in Debugging Concurrent Object-Oriented Programs. In Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON ’95). IBM Press, 56–68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Isaac Sjoblom, Tim Snyder, and Elena Machkasova. 2011. Can You Trust Your JVM Diagnostic Tools? MICS 2011. Midwest Instruction and Computing Symposium. https://micsymposium.org/mics_2011_proceedings/mics2011_submission_26.pdf Google ScholarGoogle Scholar
  22. Amitabh Srivastava and Alan Eustace. 1994. ATOM: A System for Building Customized Program Analysis Tools. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). ACM, 196–205. isbn:089791662X https://doi.org/10.1145/178243.178260 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yudi Zheng, Lubomír Bulej, and Walter Binder. 2015. Accurate Profiling in the Presence of Dynamic Compilation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM, 433–450. isbn:978-1-4503-3689-5 https://doi.org/10.1145/2814270.2814281 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Don’t Trust Your Profiler: An Empirical Study on the Precision and Accuracy of Java Profilers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Article Metrics

      • Downloads (Last 12 months)115
      • Downloads (Last 6 weeks)17

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader