ABSTRACT
Many popular programming languages use interpreter-based execution for portability, supporting dynamic or reflective properties, and ease of implementation. Code-copying is an optimization technique for interpreters that reduces the performance gap between interpretation and JIT compilation, offering significant speedups over direct-threading interpretation. Due to varying language features and virtual machine design, however, not all languages benefit from codecopying to the same extent. We consider here properties of interpreted languages, and in particular bytecode and virtual machine construction that enhance or reduce the impact of code-copying. We implemented code-copying and compared performance with the original direct-threading virtual machines for three languages, Java (SableVM), OCaml, and Ruby (Yarv), examining performance on three different architectures, ia32 (Pentium 4), x86_64 (AMD64) and PowerPC (G5). Best speedups are achieved on ia32 by OCaml (maximum 4.88 times, 2.81 times on average), where a small and simple bytecode design facilitates improvements to branch prediction brought by code-copying. Yarv only slightly improves over direct-threading; large working sizes of bytecodes, and a relatively small fraction of time spent in the actual interpreter loop both limit the application of codecopying and its overall net effect. We are able to show that simple ahead of time analysis of VM and execution properties can help determine the suitability of code-copying for a particular VM before an implementation of code-copying is even attempted.
- David Bélanger. SableJIT: A retargetable just-in-time compiler. Master's thesis, McGill University, August 2004.Google Scholar
- Marc Berndl, Benjamin Vitale, Mathew Zaleski, and Angela Demke Brown. Context threading: A flexible and efficient dispatch technique for virtual machine interpreters. In Proceedings of CGO-4, 2005. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, New York, NY, USA, October 2006. ACM Press. Google ScholarDigital Library
- Per Bothner. Compiling Java with GCJ. Linux J., 2003 (105):4, 2003. ISSN 1075-3583. Google ScholarDigital Library
- C. Consel, J.L. Lawall, and A.-F. Le Meur. A tour of Tempo: A program specializer for the C language. Science of Computer Programming, 2004. Google ScholarDigital Library
- Debian Shootout. http://shootout.alioth.debian.org/.Google Scholar
- M. Anton Ertl and David Gregg. The behavior of efficient virtual machine interpreters on modern architectures. In Euro-Par '01: Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing, pages 403--412, London, UK, 2001. Springer-Verlag. ISBN 3-540-42495-4. Google ScholarDigital Library
- M. Anton Ertl and David Gregg. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In SIGPLAN '03 Conference on Programming Language Design and Implementation, 2003. Google ScholarDigital Library
- M. Anton Ertl and David Gregg. Retargeting JIT compilers by using C-compiler generated executable code. In Parallel Architecture and Compilation Techniques (PACT' 04), pages 41--50, 2004. Google Scholar
- M. Anton Ertl, David Gregg, Andreas Krall, and Bernd Paysan. Vmgen: a generator of efficient virtual machine interpreters. Softw. Pract. Exper., 32(3):265--294, 2002. ISSN 0038-0644. doi: http://dx.doi.org/10.1002/spe.434. Google ScholarDigital Library
- M. Anton Ertl, Christian Thalinger, and Andreas Krall. Superinstructions and replication in the Cacao JVM interpreter. Journal of .NET Technologies, 4:25--32, 2006. ISSN 1801-2108.Google Scholar
- Etienne M. Gagnon. A Portable Research Framework for the Execution of Java Bytecode. PhD thesis, McGill University, 2002. Google ScholarDigital Library
- Etienne M. Gagnon and Laurie J. Hendren. SableCC, an object-oriented compiler framework. In TOOLS'98: Proceedings of the Technology of Object-Oriented Languages and Systems, page 140, Washington, DC, USA, 1998. IEEE Computer Society. ISBN 0-8186-8482-8. Google ScholarDigital Library
- Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers. A retrospective on: "an evaluation of staged run-time optimizations in DyC". SIGPLAN Not., 39(4):656--669, 2004. ISSN 0362-1340. doi: http://doi.acm.org/10.1145/989393.989458. Google ScholarDigital Library
- Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the international symposium on Code generation and optimization, page 75, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2102-9 Google ScholarDigital Library
- Hidehiko Masuhara and Akinori Yonezawa. Run-time bytecode specialization. Lecture Notes in Computer Science, 2053, 2001. Google ScholarDigital Library
- OCaml. http://caml.inria.fr.Google Scholar
- OProfile. http://oprofile.sf.net/.Google Scholar
- K. Palacz, J. Baker, C. Flack, C. Grothoff, H. Yamauchi, and J. Vitek. Engineering a customizable intermediate representation. In IVME '03: Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators, pages 67--76, New York, NY, USA, 2003. ACM. ISBN 1-58113-655-2. doi: http://doi.acm.org/10.1145/858570.858578. Google ScholarDigital Library
- Jinzhan Peng, Gansha Wu, and Guei-Yuan Lueh. Code sharing among states for stack-caching interpreter. In IVME '04: Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators, pages 15--22, New York, NY, USA, 2004. ACM. ISBN 1-58113-909-8. doi: http://doi.acm.org/10.1145/1059579.1059584. Google ScholarDigital Library
- Ian Piumarta and Fabio Riccardi. Optimizing direct threaded code by selective inlining. In PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 291--300, New York, NY, USA, 1998. ACM Press. ISBN 0-89791-987-4. doi: http://doi.acm.org/10.1145/277650.277743. Google ScholarDigital Library
- Patrice Pominville, Feng Qian, Raja Vallée-Rai, Laurie Hendren, and Clark Verbrugge. A framework for optimizing Java using attributes. In Reinhard Wilhelm, editor, Proceedings of the 10th International Conference on Compiler Construction (CC '01), volume 2027 of Lecture Notes in Computer Science (LNCS), pages 334--354, April 2001. Google ScholarDigital Library
- Gregory B. Prokopski and Clark Verbrugge. Towards GCC as a compiler for multiple VMs. In GCC Developers' Summit, 2007.Google Scholar
- Gregory B. Prokopski and Clark Verbrugge. Compilerguaranteed safety in code-copying virtual machines. In Compiler Construction, 17th International Conference, LNCS. Springer, 2008. to appear. Google ScholarDigital Library
- Gregory B. Prokopski, Etienne M. Gagnon, and Christian Arcand. Bytecode testing framework for SableVM code-copying engine. Technical Report SABLETR-2007-9, Sable Research Group, School of Computer Science, McGill University, Montréal, Québec, Canada, September 2007.Google Scholar
- Raw results used for this publication. http://www.sable.mcgill.ca/~gproko/gcc/multi-08-raw-results.pdf.Google Scholar
- Markku Rossi and Kengatharan Sivalingam. A survey of instruction dispatch techniques for byte-code interpreters. Technical Report TKO-C79, Faculty of Information Technology, Helsinki Univeristy of Technology, May 1996.Google Scholar
- Koichi Sasada. YARV: yet another RubyVM: innovating the Ruby interpreter. In OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 158--159, New York, NY, USA, 2005. ACM. ISBN 1-59593-193-7. doi: http://doi.acm.org/10.1145/1094855.1094912. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. SPEC JVM98 Benchmarks. http://www.spec.org/jvm98.Google Scholar
- Ben Stephenson and Wade Holst. Multicodes: optimizing virtual machines using bytecode sequences. In OOPSLA '03: Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 328--329, New York, NY, USA, 2003. ACM Press. ISBN 1-58113-751-6. doi: http://doi.acm.org/10.1145/949344.949436. Google ScholarDigital Library
- T. Suganuma, T. Ogasawara, M. Takeuchi, T. Yasue, M. Kawahito, K. Ishizaki, H. Komatsu, and T. Nakatani. Overview of the IBM Java Just-in-Time compiler. IBM Systems Journal, 39(1):175--193, 2000. Google ScholarDigital Library
- S. Thibault, C. Consel, J. Lawall, R. Marlet, and G. Muller. Static and dynamic program compilation by interpreter specialization. Higher-Order and Symbolic Computation, 13(3):161--178, September 2000. Google ScholarDigital Library
- Ankush Varma and Shuvra S. Bhattacharyya. Javathrough-C compilation: An enabling technology for Java in embedded systems. In DATE '04: Proceedings of the conference on Design, automation and test in Europe, page 30161, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2085-5-3. Google ScholarDigital Library
- Benjamin Vitale and Mathew Zaleski. Alternative dispatch techniques for the tcl vm interpreter. In Proceedings of 12th Annual Tcl/Tk Conference, October 2005.Google Scholar
- Mathew Zaleski, Marc Berndl, and Angela Demke Brown. Mixed mode execution with context threading. In CASCON '05: Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, pages 305--319. IBM Press, 2005. Google ScholarDigital Library
Index Terms
- Analyzing the performance of code-copying virtual machines
Recommendations
Analyzing the performance of code-copying virtual machines
Many popular programming languages use interpreter-based execution for portability, supporting dynamic or reflective properties, and ease of implementation. Code-copying is an optimization technique for interpreters that reduces the performance gap ...
Enhancing the performance of 16-bit code using augmenting instructions
Special Issue: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool support for embedded systems (San Diego, CA).In the embedded domain, memory usage and energy consumption are critical constraints. Dual width instruction set embedded processors such as the ARM provide a 16-bit instruction set in addition to the 32-bit instruction set to address these concerns. ...
Enhancing the performance of 16-bit code using augmenting instructions
LCTES '03: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systemsIn the embedded domain, memory usage and energy consumption are critical constraints. Dual width instruction set embedded processors such as the ARM provide a 16-bit instruction set in addition to the 32-bit instruction set to address these concerns. ...
Comments