skip to main content
10.1145/1640089.1640101acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

A concurrent dynamic analysis framework for multicore hardware

Authors Info & Claims
Published:25 October 2009Publication History

ABSTRACT

Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs.

This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses.

Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.

References

  1. B. Alpern, D. Attanasio, J. J. Barton, M. G. Burke, P.Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM System Journal, 39(1):211--238, Feb. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arnold and D. Grove. Collecting and exploiting high-accuracy call graph profiles in virtual machines. In International Symposium on Code Generation and Optimization, pages 51--62, San Jose, CA, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In ACM Conference on Programming Language Design and Implementation, pages 168--179, Snowbird, UT, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Ball and J. R. Larus. Efficient path profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 46--57, Paris, France, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In ACM Conference on Programming Language Design and Implementation, pages 22--32, Tuscon, AZ, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 83--89, Portland, OR, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis (extended version). Technical Report TR-CS-06-01, Dept. of Computer Science, Australian National University, 2006. http://www.dacapobench.org.Google ScholarGoogle Scholar
  8. M. D. Bond and K. S. McKinley. Continuous path and edge profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 130--140, Barcelona, Spain, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing, pages 1--13, Article 42, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX Annual Technical Conference, pages 1--14, Boston, MA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Gharachorloo and P. B. Gibbons. Detecting violations of sequential consistency. In ACM Symposium on Parallel Algorithms and Architectures, pages 316--326, Hilton Head, SC, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symposium on Principles and Practice of Parallel Programming, pages 43--52, Salt Lake City, UT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E.Witchel. Improved error reporting for software that uses black-box components. In ACM Conference on Programming Language Design and Implementation, pages 101--111, San Diego, CA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley. A concurrent dynamic analysis framework for multicore hardware. Technical Report TR-09-24, The University of Texas at Austin, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In International Conference on Software Engineering, pages 291--301, Orlando, FL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Language Systems, 13(1):124--149, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 117--126, December 2001.Google ScholarGoogle Scholar
  19. M. S. Lam, M. Martin, B. Livshits, and J. Whaley. Securing web applications with static and dynamic information flow tracking. In ACM Workshop Partial Evaluation and Semantics-Based Program Manipulation, pages 3--12, San Francisco, CA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Language Systems, 5(2):190--222, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In IEEE International Symposium on High Performance Computer Architecture, pages 302--312, Nuevo Leone, Mexico, Jan. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: a Program Query Language. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 365--383, San Diego, CA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow Profiling: Hiding instrumentation costs with parallelism. In International Symposium on Code Generation and Optimization, pages 198--208, Washington, DC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Paleczny, C. Vick, and C. Click. The Java HotSpot server compiler. In Java Virtual Machine Research and Technology Symposium, Monterey, CA, April 2001. Sun Microsystems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Pettersson. Linux Intel/x86 performance counters, 2003. http://user.it.uu.se/mikpe/linux/perfctr/.Google ScholarGoogle Scholar
  28. R. Shetty, M. Kharbutli, Y. Solihin, and M. Prvulovic. Heap-Mon: a helper-thread approach to programmable, automatic, and lowoverhead memory bug detection. IBM Journal of Research and Development, 50(2/3):261--275, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SPECjvm98 Documentation. Standard Performance Evaluation Corporation, release 1.03 edition, March 1999.Google ScholarGoogle Scholar
  30. S. Wallace and K. Hazelwood. SuperPin: Parallelizing dynamic instrumentation for real-time performance. In International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Wang, K. S. McKinley, A. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In International Conference on Parallel Architectures and Compilation Techniques, pages 199--208, Charlottesville, VA, Sept. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In ACM European Conference on Computer Systems, pages 375--388, Leuven, Belgium, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Zhao, I. Cutcutache, andW.-F.Wong. PiPA: Pipelined profiling and analysis on multi-core systems. In International Symposium on Code Generation and Optimization, pages 185--194, Boston, MA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient Architectural Support for Software Debugging. In ACM/IEEE International Symposium on Computer Architecture, pages 224--235, Munchen, Germany, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A concurrent dynamic analysis framework for multicore hardware

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
      October 2009
      590 pages
      ISBN:9781605587660
      DOI:10.1145/1640089
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 44, Issue 10
        OOPSLA '09
        October 2009
        554 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1639949
        Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 October 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      OOPSLA '09 Paper Acceptance Rate25of144submissions,17%Overall Acceptance Rate268of1,244submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader