ABSTRACT
Complete system simulation to understand the influence of architecture and operating systems on application execution has been identified to be crucial for systems design. While there have been previous attempts at understanding the architectural impact of Java programs, there has been no prior work investigating the operating system (kernel) activity during their executions. This problem is particularly interesting in the context of Java since it is not only the application that can invoke kernel services, but so does the underlying Java Virtual Machine (JVM) implementation which runs these programs. Further, the JVM style (JIT compiler or interpreter) and the manner in which the different JVM components (such as the garbage collector and class loader) are exercised, can have a significant impact on the kernel activities.
To investigate these issues, this research uses complete system simulation of the SPECjvm98 benchmarks on the SimOS simulation platform. The execution of these benchmarks on both JIT compilers and interpreters is profiled in detail, to identify and quantify where time is spent in each component. The kernel activity of SPECjvm98 applications constitutes up to 17% of the execution time in the large dataset and up to 31% in the small dataset. The average kernel activity in the large dataset is approximately 10%, in comparison to around 2% in four SPECInt benchmarks studied. Of the kernel services, TLB miss handling is the most dominant in all applications. The TLB miss rates in the JIT compiler, dynamic class loader and garbage collector portions of the JVM are individually analyzed. In addition to such execution profiles, the ILP in the user and kernel mode are also quantified. The Java code is seen to limit exploitable parallelism and aggressive instruction issue is seen to be less efficient for SPECjvm98 benchmarks in comparison to SPEC95 programs. Also, the kernel mode of execution does not exhibit as much ILP as the user mode.
- 1.C.-H. A. Hsieh, M. T. Conte, T. L. Johnson, J. C. Gyllenhaal and W. W. Hwu, A Study of the Cache and Branch Performance Issues with Running Java on Current Hardware Platforms, In Proceedings of COMPCON, pages 211-216, 1997.]] Google ScholarDigital Library
- 2.R. Radhakrishnan, N. Vijaykrishnan, L. K. John and A. Sivasubramaniam, Architectural Issue in Java Runtime Systems, In Proceedings of the 6th International Conference on High Performance Computer Architecture, pages 387-398, 2000.]]Google Scholar
- 3.N. Vijaykrishnan, N. Ranganathan and R. Gadekarla, Object- Oriented Architectural Support for a Java Processor, In Proceedings the 12th European Conference on Object-Oriented Programming, pages 430-455,1998.]] Google ScholarDigital Library
- 4.N. Vijaykrishnan and N. Ranganathan, Tuning Branch Predictors to Support Virtual Method Invocation in Java, In Proceedings of the 5th USENIX Conference of Object-Oriented Technologies and Systems, pages 217-228, 1999.]] Google ScholarDigital Library
- 5.A. Barisone, F. Bellotti, R. Berta and A. D. Gloria, Ultrasparc Instruction Level Characterization of Java Virtual Machine Workload, from the 2nd Annual Workshop on Workload Characterization, Workload Characterization for Computer System Design, Kluwer Academic Publishers, pages 1-24, 1999.]]Google Scholar
- 6.J.-S. Kim and Y. Hsu, Analyzing Memory Reference Traces of Java Programs, from the 2nd Annual Workshop on Workload Characterization, Workload Characterization for Computer System Design, Kluwer Academic Publishers, pages 25-48, 1999.]]Google Scholar
- 7.M. O'Connor and M. Tremblay, PicoJava-I: The Java Virtual Machine in Hardware, IEEE Micro, pages 45-53, Mar. 1997.]] Google ScholarDigital Library
- 8.R. Radhakrishnan, J. Rubio and L. John, Characterization of Java Applications at Bytecode and Ultra-SPARC Machine Code Levels, In Proceedings of IEEE International Conference on Computer Design, pages 281-284, 1999.]] Google ScholarDigital Library
- 9.L. A. Barroso, K. Gharachodoo, and E. Bugnion, Memory System Characterization of Commercial Workloads, In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, 1998.]] Google ScholarDigital Library
- 10.S. A. Herrod, Using Complete Machine Simulation to Understand Computer System Behavior, Ph.D. Thesis, Stanford University, Feb. 1998.]] Google ScholarDigital Library
- 11.M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta, Complete Computer System Simulation: the SimOS Approach, IEEE Parallel and Distributed Technology: Systems and Applications, vol.3, no.4, pages 34-43, Winter 1995.]] Google ScholarDigital Library
- 12.M. Rosenblum, E. Bugnion, S. A.Herrod, E. Witchel, and A. Gupta, The Impact of Architectural Trends on Operating System Performance, In Proceedings of the 15th ACM Symposium on Operating System Principles, pages 285-298, 1995.]] Google ScholarDigital Library
- 13.C.-H. A. Hsieh, J. C. Gyllenhaal and W. W. Hwu, Java Bytecode to Native Code Translation: the Caffeine Prototype and Preliminary Results, In Proceedings of the 29th International Symposium on Microarchitecture, pages 90-97, 1996.]] Google ScholarDigital Library
- 14.T. Cramer, R. Friedman, T. Miller, D. Seberger, R. Wilson and M. Wolczko, Compiling Java Just-In-Time, IEEE Micro, vol. 17, pages 36--43, May 1997.]] Google ScholarDigital Library
- 15.A. Krall, Efficient JavaVM Just-In-Time Compilation, In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 54-61, 1998.]] Google ScholarDigital Library
- 16.A. Adl-Tabatabai, M. Cierrtiak, G. Lueh, V. M. Parakh and J. M. Stichnoth, Fast Effective Code Generation in a Just-In-Time Java Compiler, In Proceedings of Conference on Programming Language Design and Implementation, pages 280-290, 1998.]] Google ScholarDigital Library
- 17.H. McGhan and M. O'Connor, PicoJava: A Direct Execution Engine for Java Bytecode , IEEE Computer, pages 22-30, Oct. 1998.]] Google ScholarDigital Library
- 18.N. Vijaykrishnan, Issues in the Design of a Java Processor Architecture. PhD Thesis, College of Engineering, University of South Florida, July 1998.]]Google Scholar
- 19.M. C. Merten, A. R. Trick, C. N. George, J. Gyllenhaal, and W. W. Hwu, A Hardware Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization, In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 136-147, 1999.]] Google ScholarDigital Library
- 20.SPEC Jvm98 Benchmarks, http://www.spec.org/osg/jvm98/]]Google Scholar
- 21.SPEC CPU95 Benchmarks, hnp://www.spec.org/osg/cpu95/]]Google Scholar
- 22.S. A. Herrod, M. Rosenblum, E. Bugnion, S. Devine, R. Bosch, J. Chapin, K. Govil, D. Teodosiu, E. Witchel, and B. Verghese, The SimOS User Guide, http ://simos.stanford.edu/userguide/]]Google Scholar
- 23.Overview of Java Platform Product Family, http://www.javasoft, com/products/O V jdkProduct, html]]Google Scholar
- 24.E. Witchel and M. Rosenblum, Embra: Fast and Flexible Machine Simulation, In Proceedings of ACM SIGMETRICS "96: Conference on Measurement and Modeling of Computer Systems, 1996.]] Google ScholarDigital Library
- 25.J. Bennett and M. Fiynn, Performance Factors for Superscalar Processors, Technical Report CSL-TR-95-661, Computer Systems Laboratory, Stanford University, Feb. 1995.]] Google ScholarDigital Library
- 26.MIPS Technologies, Incorporated, R10000 Microprocessor Product Overview, MIPS Open RISC Technology, Oct. 1994.]]Google Scholar
- 27.K. C. Yeager, MIPS R10000, IEEE Micro, vol.16, no.l, pages 28-40, Apr. 1996.]] Google ScholarDigital Library
- 28.K. I. Farkas and N. P. Jouppi, Complexity/Performance Tradeoffs with Non-Blocking Loads, In Proceedings of the 21th International Symposium on Computer Architecture, pages 211- 222, 1994.]] Google ScholarDigital Library
- 29.K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson and K.-Y. Chang, The Case for a Single-Chip Multiprocessor, In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1-4, 1996.]] Google ScholarDigital Library
- 30.B. Nayfeh, L. Hammond and K. Olukotun, Evaluation of Design Alternatives for aMultiprocessor Microprocessor, In Proceedings of the 23rd International Symposium on Computer Architecture, pages 66-77, 1996.]] Google ScholarDigital Library
- 31.Tao Li, Using Complete System Simulation to Characterize the Execution Behaviors of SPECjvm98 Benchmarks, http://www.ece.utexas.edu/tli3/tao-jvm98.ps]]Google Scholar
- 32.T. M. Austin and G. S. Sohi, Dynamic Dependency Analysis of Ordinary Programs, In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 342- 351, 1992.]] Google ScholarDigital Library
- 33.R. Sathe and M. Franklin, Available Parallelism with Data Value Prediction, In Proceedings of International Conference on High Performance Computing, pages 194-201, 1998.]] Google ScholarDigital Library
- 34.J. Sabarinathan, A Study of Instruction Level Parallelism in Contemporary Computer Applications, Master Report, University of Texas at Austin, Dec. 1999.]]Google Scholar
- 35.S. Dieckmann and U. H61zle, A Study of the Allocation Behavior of the SPECjvm98 Java Benchmarks, In Proceedings of the 13th European Conference on Object-Oriented Programming, 1999, Springer Verlag,]] Google ScholarDigital Library
- 36.K. M. Wilson, K. Olukotun, and M. Rosenblum, Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors, In Proceedings of the 23rd International Symposium on Computer Architecture, pages 147-157,1996.]] Google ScholarDigital Library
Index Terms
- Using complete system simulation to characterize SPECjvm98 benchmarks
Recommendations
The DaCapo benchmarks: java benchmarking development and analysis
OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applicationsSince benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection ...
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++
ACM-SE 42: Proceedings of the 42nd annual Southeast regional conferenceModern processors include features such as deep pipelining, multi-level cache hierarchy, branch predictors, out of order execution engine, and advanced floating point and multimedia units. To successfully exploit these features, architecture-aware ...
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingPerformance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Comments