research-article

A concurrent dynamic analysis framework for multicore hardware

Authors:
Jungwoo Ha

University of Southern California - Information Sciences Institute East, Arlington, VA, USA

University of Southern California - Information Sciences Institute East, Arlington, VA, USA
View Profile

,
Matthew Arnold

IBM T. J. Watson Research, Hawthorne, NY, USA

IBM T. J. Watson Research, Hawthorne, NY, USA
View Profile

,
Stephen M. Blackburn

Australian National University, Canberra, Australia

Australian National University, Canberra, Australia
View Profile

,
Kathryn S. McKinley

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applicationsOctober 2009Pages 155–174https://doi.org/10.1145/1640089.1640101

Published:25 October 2009Publication History

OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

Pages 155–174

ABSTRACT

Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs.

This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses.

Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.

References

B. Alpern, D. Attanasio, J. J. Barton, M. G. Burke, P.Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM System Journal, 39(1):211--238, Feb. 2000. Google ScholarDigital Library
M. Arnold and D. Grove. Collecting and exploiting high-accuracy call graph profiles in virtual machines. In International Symposium on Code Generation and Optimization, pages 51--62, San Jose, CA, Mar. 2005. Google ScholarDigital Library
M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In ACM Conference on Programming Language Design and Implementation, pages 168--179, Snowbird, UT, June 2001. Google ScholarDigital Library
T. Ball and J. R. Larus. Efficient path profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 46--57, Paris, France, Dec. 1996. Google ScholarDigital Library
S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In ACM Conference on Programming Language Design and Implementation, pages 22--32, Tuscon, AZ, June 2008. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 83--89, Portland, OR, Oct. 2006. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis (extended version). Technical Report TR-CS-06-01, Dept. of Computer Science, Australian National University, 2006. http://www.dacapobench.org.Google Scholar
M. D. Bond and K. S. McKinley. Continuous path and edge profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 130--140, Barcelona, Spain, Nov. 2005. Google ScholarDigital Library
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing, pages 1--13, Article 42, 2000. Google ScholarDigital Library
D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, 2004. Google ScholarDigital Library
J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX Annual Technical Conference, pages 1--14, Boston, MA, 2008. Google ScholarDigital Library
K. Gharachorloo and P. B. Gibbons. Detecting violations of sequential consistency. In ACM Symposium on Parallel Algorithms and Architectures, pages 316--326, Hilton Head, SC, 1991. Google ScholarDigital Library
J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symposium on Principles and Practice of Parallel Programming, pages 43--52, Salt Lake City, UT, 2008. Google ScholarDigital Library
J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E.Witchel. Improved error reporting for software that uses black-box components. In ACM Conference on Programming Language Design and Implementation, pages 101--111, San Diego, CA, 2007. Google ScholarDigital Library
J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley. A concurrent dynamic analysis framework for multicore hardware. Technical Report TR-09-24, The University of Texas at Austin, 2009.Google ScholarDigital Library
S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In International Conference on Software Engineering, pages 291--301, Orlando, FL, 2002. Google ScholarDigital Library
M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Language Systems, 13(1):124--149, 1991. Google ScholarDigital Library
M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 117--126, December 2001.Google Scholar
M. S. Lam, M. Martin, B. Livshits, and J. Whaley. Securing web applications with static and dynamic information flow tracking. In ACM Workshop Partial Evaluation and Semantics-Based Program Manipulation, pages 3--12, San Francisco, CA, 2008. Google ScholarDigital Library
L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Language Systems, 5(2):190--222, 1983. Google ScholarDigital Library
W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In IEEE International Symposium on High Performance Computer Architecture, pages 302--312, Nuevo Leone, Mexico, Jan. 2001. Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, 2005. Google ScholarDigital Library
M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: a Program Query Language. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 365--383, San Diego, CA, 2005. Google ScholarDigital Library
T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow Profiling: Hiding instrumentation costs with parallelism. In International Symposium on Code Generation and Optimization, pages 198--208, Washington, DC, 2007. Google ScholarDigital Library
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007. Google ScholarDigital Library
M. Paleczny, C. Vick, and C. Click. The Java HotSpot server compiler. In Java Virtual Machine Research and Technology Symposium, Monterey, CA, April 2001. Sun Microsystems. Google ScholarDigital Library
M. Pettersson. Linux Intel/x86 performance counters, 2003. http://user.it.uu.se/mikpe/linux/perfctr/.Google Scholar
R. Shetty, M. Kharbutli, Y. Solihin, and M. Prvulovic. Heap-Mon: a helper-thread approach to programmable, automatic, and lowoverhead memory bug detection. IBM Journal of Research and Development, 50(2/3):261--275, 2006. Google ScholarDigital Library
SPECjvm98 Documentation. Standard Performance Evaluation Corporation, release 1.03 edition, March 1999.Google Scholar
S. Wallace and K. Hazelwood. SuperPin: Parallelizing dynamic instrumentation for real-time performance. In International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, 2007. Google ScholarDigital Library
Z. Wang, K. S. McKinley, A. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In International Conference on Parallel Architectures and Compilation Techniques, pages 199--208, Charlottesville, VA, Sept. 2002. Google ScholarDigital Library
C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In ACM European Conference on Computer Systems, pages 375--388, Leuven, Belgium, 2006. Google ScholarDigital Library
Q. Zhao, I. Cutcutache, andW.-F.Wong. PiPA: Pipelined profiling and analysis on multi-core systems. In International Symposium on Code Generation and Optimization, pages 185--194, Boston, MA, 2008. Google ScholarDigital Library
P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient Architectural Support for Software Debugging. In ACM/IEEE International Symposium on Computer Architecture, pages 224--235, Munchen, Germany, June 2004. Google ScholarDigital Library

Index Terms

A concurrent dynamic analysis framework for multicore hardware
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

A concurrent dynamic analysis framework for multicore hardware
OOPSLA '09

Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and ...
Read More
Opportunities for concurrent dynamic analysis with explicit inter-core communication
PASTE '10: Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering

Multicore is now the dominant processor trend, and the number of cores is rapidly increasing. The paradigm shift to multicore forces the redesign of the software stack, which includes dynamic analysis. Dynamic analyses provide rich features to software ...
Read More
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
October 2009
590 pages
ISBN:9781605587660
DOI:10.1145/1640089
General Chair:
Shail Arora
Adayana, Inc.
,
Program Chair:
Gary Leavens
University of Central Florida
ACM SIGPLAN Notices Volume 44, Issue 10
OOPSLA '09
October 2009
554 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1639949
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic analysis
instrumentation
multicore
profiling
Qualifiers
- research-article
Conference

Acceptance Rates
OOPSLA '09 Paper Acceptance Rate25of144submissions,17%Overall Acceptance Rate268of1,244submissions,22%
More
Upcoming Conference
SPLASH '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 20 - 25, 2024

Pasadena , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 698
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A concurrent dynamic analysis framework for multicore hardware

OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

A concurrent dynamic analysis framework for multicore hardware

Opportunities for concurrent dynamic analysis with explicit inter-core communication

A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops