skip to main content
10.1145/2451116.2451141acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

STABILIZER: statistically sound performance evaluation

Published:16 March 2013Publication History

ABSTRACT

Researchers and software developers require effective performance evaluation. Researchers must evaluate optimizations or measure overhead. Software developers use automatic performance regression tests to discover when changes improve or degrade performance. The standard methodology is to compare execution times before and after applying changes.

Unfortunately, modern architectural features make this approach unsound. Statistically sound evaluation requires multiple samples to test whether one can or cannot (with high confidence) reject the null hypothesis that results are the same before and after. However, caches and branch predictors make performance dependent on machine-specific parameters and the exact layout of code, stack frames, and heap objects. A single binary constitutes just one sample from the space of program layouts, regardless of the number of runs. Since compiler optimizations and code changes also alter layout, it is currently impossible to distinguish the impact of an optimization from that of its layout effects.

This paper presents Stabilizer, a system that enables the use of the powerful statistical techniques required for sound performance evaluation on modern architectures. Stabilizer forces executions to sample the space of memory configurations by repeatedly re-randomizing layouts of code, stack, and heap objects at runtime. Stabilizer thus makes it possible to control for layout effects. Re-randomization also ensures that layout effects follow a Gaussian distribution, enabling the use of statistical tests like ANOVA. We demonstrate Stabilizer's efficiency (<7% median overhead) and its effectiveness by evaluating the impact of LLVM's optimizations on the SPEC CPU2006 benchmark suite. We find that, while -O2 has a significant impact relative to -O1, the performance impact of -O3 over -O2 optimizations is indistinguishable from random noise.

References

  1. A. Alameldeen and D. Wood. Variability in Architectural Simulations of Multi-threaded Workloads. In HPCA '03, pp. 7--18. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. E. Bassham, III, A. L. Rukhin, J. Soto, J. R. Nechvatal, M. E. Smid, E. B. Barker, S. D. Leigh, M. Levenson, M. Vangel, D. L. Banks, N. A. Heckert, J. F. Dray, and S. Vo. SP 800--22 Rev. 1a. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Tech. rep., National Institute of Standards & Technology, Gaithersburg, MD, United States, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. D. Berger and B. G. Zorn. DieHard: Probabilistic Memory Safety for Unsafe Languages. In PLDI '06, pp. 158--168. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. D. Berger, B. G. Zorn, and K. S. McKinley. Composing High-Performance Memory Allocators. In PLDI '01, pp. 114--124. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bhatkar, D. C. DuVarney, and R. Sekar. Address Obfuscation: an Efficient Approach to Combat a Broad Range of Memory Error Exploits. In USENIX Security '03, pp. 8--8. USENIX Association, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bhatkar, R. Sekar, and D. C. DuVarney. Efficient Techniques for Comprehensive Protection from Memory Error Exploits. In SSYM '05, pp. 271---286. USENIX Association, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, A. Diwan, M. Hauswirth, A. M. Memon, and P. F. Sweeney. Workshop on Experimental Evaluation of Software and Systems in Computer Science (Evaluate 2010). In SPLASH '10, pp. 291--292. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. M. Blackburn, A. Diwan, M. Hauswirth, P. F. Sweeney, et al. TR 1: Can You Trust Your Experimental Results? Tech. rep., Evaluate Collaboratory, 2012.Google ScholarGoogle Scholar
  9. A. Demers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining Generational and Conservative Garbage Collection: Framework and Implementations. In POPL '90, pp. 261--269. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Durstenfeld. Algorithm 235: Random Permutation. Communications of the ACM, 7(7):420, 1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Feller. An Introduction to Probability Theory and Applications, volume 1. John Wiley & Sons Publishers, 3rd edition, 1968.Google ScholarGoogle Scholar
  12. A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In OOPSLA '07, pp. 57--76. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Hamerly, E. Perelman, J. Lau, B. Calder, and T. Sherwood. Using Machine Learning to Guide Architecture Simulation. Journal of Machine Learning Research, 7:343--378, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. A. R. Hoare. Quicksort. The Computer Journal, 5(1):10--16, 1962.Google ScholarGoogle Scholar
  15. D. A. Jiménez. Code Placement for Improving Dynamic Branch Prediction Accuracy. In PLDI '05, pp. 107--116. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Kil, J. Jun, C. Bookholt, J. Xu, and P. Ning. Address Space Layout Permutation (ASLP): Towards Fine-Grained Randomization of Commodity Software. In ACSAC '06, pp. 339--348. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04, pp. 75--86. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Marsaglia. Random Number Generation. In Encyclopedia of Computer Science, 4th Edition, pp. 1499--1503. John Wiley and Sons Ltd., Chichester, UK, 2003.Google ScholarGoogle Scholar
  19. M. Masmano, I. Ripoll, A. Crespo, and J. Real. TLSF: A New Dynamic Memory Allocator for Real-Time Systems. In ECRTS '04, pp. 79--86. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. Molnar. Exec-Shield. http://people.redhat.com/mingo/exec-shield/.Google ScholarGoogle Scholar
  21. D. A. Moon. Garbage Collection in a Large LISP System. In LFP '84, pp. 235--246. ACM, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! In ASPLOS '09, pp. 265--276. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Novark and E. D. Berger. DieHarder: Securing the Heap. In CCS '10, pp. 573--584. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: Automatically Correcting Memory Errors with High Probability. Communications of the ACM, 51(12):87--95, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. The Chromium Project. Performance Dashboard. http://build.chromium.org/f/chromium/perf/dashboard/overview.html.Google ScholarGoogle Scholar
  26. The LLVM Team. Clang: a C Language Family Frontend for LLVM. http://clang.llvm.org, 2012.Google ScholarGoogle Scholar
  27. The LLVM Team. Dragonegg - Using LLVM as a GCC Backend. http://dragonegg.llvm.org, 2013.Google ScholarGoogle Scholar
  28. The Mozilla Foundation. Buildbot/Talos. https://wiki.mozilla.org/Buildbot/Talos.Google ScholarGoogle Scholar
  29. The PaX Team. The PaX Project. http://pax.grsecurity.net, 2001.Google ScholarGoogle Scholar
  30. D. Tsafrir and D. Feitelson. Instability in Parallel Job Scheduling Simulation: the Role of Workload Flurries. In IPDPS '06. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Tsafrir, K. Ouaknine, and D. G. Feitelson. Reducing Performance Evaluation Sensitivity and Variability by Input Shaking. In MASCOTS '07, pp. 231--237. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6):80--83, 1945.Google ScholarGoogle ScholarCross RefCross Ref
  33. P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic Storage Allocation: A Survey and Critical Review. Lecture Notes in Computer Science, 986, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Xu and S. J. Chapin. Improving Address Space Randomization with a Dynamic Offset Randomization Technique. In SAC '06, pp. 384--391. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Xu, Z. Kalbarczyk, and R. Iyer. Transparent Runtime Randomization for Security. In SRDS '03, pp. 260--269. IEEE Computer Society, 2003.Google ScholarGoogle Scholar

Index Terms

  1. STABILIZER: statistically sound performance evaluation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
          March 2013
          574 pages
          ISBN:9781450318709
          DOI:10.1145/2451116
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 4
            ASPLOS '13
            April 2013
            540 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2499368
            Issue’s Table of Contents
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
            ASPLOS '13
            March 2013
            540 pages
            ISSN:0163-5964
            DOI:10.1145/2490301
            Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 March 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate535of2,713submissions,20%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader