ABSTRACT
Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs.
This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses.
Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.
- B. Alpern, D. Attanasio, J. J. Barton, M. G. Burke, P.Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM System Journal, 39(1):211--238, Feb. 2000. Google ScholarDigital Library
- M. Arnold and D. Grove. Collecting and exploiting high-accuracy call graph profiles in virtual machines. In International Symposium on Code Generation and Optimization, pages 51--62, San Jose, CA, Mar. 2005. Google ScholarDigital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In ACM Conference on Programming Language Design and Implementation, pages 168--179, Snowbird, UT, June 2001. Google ScholarDigital Library
- T. Ball and J. R. Larus. Efficient path profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 46--57, Paris, France, Dec. 1996. Google ScholarDigital Library
- S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In ACM Conference on Programming Language Design and Implementation, pages 22--32, Tuscon, AZ, June 2008. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 83--89, Portland, OR, Oct. 2006. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis (extended version). Technical Report TR-CS-06-01, Dept. of Computer Science, Australian National University, 2006. http://www.dacapobench.org.Google Scholar
- M. D. Bond and K. S. McKinley. Continuous path and edge profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 130--140, Barcelona, Spain, Nov. 2005. Google ScholarDigital Library
- S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing, pages 1--13, Article 42, 2000. Google ScholarDigital Library
- D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, 2004. Google ScholarDigital Library
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX Annual Technical Conference, pages 1--14, Boston, MA, 2008. Google ScholarDigital Library
- K. Gharachorloo and P. B. Gibbons. Detecting violations of sequential consistency. In ACM Symposium on Parallel Algorithms and Architectures, pages 316--326, Hilton Head, SC, 1991. Google ScholarDigital Library
- J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symposium on Principles and Practice of Parallel Programming, pages 43--52, Salt Lake City, UT, 2008. Google ScholarDigital Library
- J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E.Witchel. Improved error reporting for software that uses black-box components. In ACM Conference on Programming Language Design and Implementation, pages 101--111, San Diego, CA, 2007. Google ScholarDigital Library
- J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley. A concurrent dynamic analysis framework for multicore hardware. Technical Report TR-09-24, The University of Texas at Austin, 2009.Google ScholarDigital Library
- S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In International Conference on Software Engineering, pages 291--301, Orlando, FL, 2002. Google ScholarDigital Library
- M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Language Systems, 13(1):124--149, 1991. Google ScholarDigital Library
- M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 117--126, December 2001.Google Scholar
- M. S. Lam, M. Martin, B. Livshits, and J. Whaley. Securing web applications with static and dynamic information flow tracking. In ACM Workshop Partial Evaluation and Semantics-Based Program Manipulation, pages 3--12, San Francisco, CA, 2008. Google ScholarDigital Library
- L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Language Systems, 5(2):190--222, 1983. Google ScholarDigital Library
- W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In IEEE International Symposium on High Performance Computer Architecture, pages 302--312, Nuevo Leone, Mexico, Jan. 2001. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, 2005. Google ScholarDigital Library
- M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: a Program Query Language. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 365--383, San Diego, CA, 2005. Google ScholarDigital Library
- T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow Profiling: Hiding instrumentation costs with parallelism. In International Symposium on Code Generation and Optimization, pages 198--208, Washington, DC, 2007. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007. Google ScholarDigital Library
- M. Paleczny, C. Vick, and C. Click. The Java HotSpot server compiler. In Java Virtual Machine Research and Technology Symposium, Monterey, CA, April 2001. Sun Microsystems. Google ScholarDigital Library
- M. Pettersson. Linux Intel/x86 performance counters, 2003. http://user.it.uu.se/mikpe/linux/perfctr/.Google Scholar
- R. Shetty, M. Kharbutli, Y. Solihin, and M. Prvulovic. Heap-Mon: a helper-thread approach to programmable, automatic, and lowoverhead memory bug detection. IBM Journal of Research and Development, 50(2/3):261--275, 2006. Google ScholarDigital Library
- SPECjvm98 Documentation. Standard Performance Evaluation Corporation, release 1.03 edition, March 1999.Google Scholar
- S. Wallace and K. Hazelwood. SuperPin: Parallelizing dynamic instrumentation for real-time performance. In International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, 2007. Google ScholarDigital Library
- Z. Wang, K. S. McKinley, A. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In International Conference on Parallel Architectures and Compilation Techniques, pages 199--208, Charlottesville, VA, Sept. 2002. Google ScholarDigital Library
- C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In ACM European Conference on Computer Systems, pages 375--388, Leuven, Belgium, 2006. Google ScholarDigital Library
- Q. Zhao, I. Cutcutache, andW.-F.Wong. PiPA: Pipelined profiling and analysis on multi-core systems. In International Symposium on Code Generation and Optimization, pages 185--194, Boston, MA, 2008. Google ScholarDigital Library
- P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient Architectural Support for Software Debugging. In ACM/IEEE International Symposium on Computer Architecture, pages 224--235, Munchen, Germany, June 2004. Google ScholarDigital Library
Index Terms
- A concurrent dynamic analysis framework for multicore hardware
Recommendations
A concurrent dynamic analysis framework for multicore hardware
OOPSLA '09Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and ...
Opportunities for concurrent dynamic analysis with explicit inter-core communication
PASTE '10: Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineeringMulticore is now the dominant processor trend, and the number of cores is rapidly increasing. The paradigm shift to multicore forces the redesign of the software stack, which includes dynamic analysis. Dynamic analyses provide rich features to software ...
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
Comments