skip to main content
article
Open Access

Efficient Java RMI for parallel programming

Published:01 November 2001Publication History
Skip Abstract Section

Abstract

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun's RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols.A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.We evaluate the performance of Manta using benchmarks and applications that run on a 32-node Myrinet cluster. The time for a null-RMI (without parameters or a return value) of Manta is 35 times lower than for the Sun JDK 1.2, and only slightly higher than for a C-based RPC protocol. This high performance is accomplished by pushing almost all of the runtime overhead of RMI to compile time. We study the performance differences between the Manta and the Sun RMI protocols in detail. The poor performance of the Sun RMI protocol is in part due to an inefficient implementation of the protocol. To allow a fair comparison, we compiled the applications and the Sun RMI protocol with the native Manta compiler. The results show that Manta's null-RMI latency is still eight times lower than for the compiled Sun RMI protocol and that Manta's efficient RMI protocol results in 1.8 to 3.4 times higher speedups for four out of six applications.

References

  1. Alexandrov, A. D., Ibel, M., Schauser, K. E., and Scheiman, C. J. 1997. SuperWeb: Research issues in Java-based global computing. Concurrency: Pract. Exper. 9, 6 (June), 535--553.]]Google ScholarGoogle Scholar
  2. Antoniu, G., Bougé, L., Hatcher, P., MacBeth, M., McGuigan, K., and Namyst, R. 2000. Compiling multithreaded Java bytecode for distributed execution. In Proceedings of the Euro-Par 2000. LNCS 1900 Springer, München, Germany, 1039--1052.]] Google ScholarGoogle Scholar
  3. Aridor, Y., Factor, M., and Teperman, A. 1999. cJVM: a single system image of a JVM on a cluster. In Proceedings of the 1999 International Conference on Parallel Processing (Aizu, Japan).]] Google ScholarGoogle Scholar
  4. Bal, H., Bhoedjang, R., Hofman, R., Jacobs, C., Langendoen, K., Rühl, T., and Kaashoek, M. 1998. Performance evaluation of the Orca shared object system. ACM Trans. Comput. Syst. 16, 1 (Feb.), 1--40.]] Google ScholarGoogle Scholar
  5. Bal, H., Bhoedjang, R., Hofman, R., Jacobs, C., Langendoen, K., and Verstoep, K. 1997. Performance of a high-level parallel language on a high-speed network. J. Parallel Distrib. Comput. 40, 1 (Feb.), 49--64.]] Google ScholarGoogle Scholar
  6. Baldeschwieler, J., Blumofe, R., and Brewer, E. 1996. ATLAS: An infrastructure for global computing. In Proceedings of the Seventh ACM SIGOPS European Workshop on System Support for Worldwide Applications. ACM, New York, NY.]] Google ScholarGoogle Scholar
  7. Bershad, B., Savage, S., Pardyak, P., Sirer, E. G., Becker, D., Fiuczynski, M., Chambers, C., and Eggers, S. 1995. Extensibility, safety and performance in the SPIN operating system. In Proceedings of the 15th ACM Symposium on Operating System Principles (SOSP-15). ACM, New York, NY, 267--284.]] Google ScholarGoogle Scholar
  8. Bhoedjang, R., Verstoep, K., Rühl, T., Bal, H., and Hofman, R. 2000. Evaluating design alternatives for reliable communication on high-speed networks. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-9, Cambridge, MA).]] Google ScholarGoogle Scholar
  9. Blackston, D. and Suel, T. 1997. Highly portable and efficient implementations of parallel adaptive N-body methods. In SC'97. online at http://www.supercomp.org /sc97/program/ TECH/BLACKSTO/.]] Google ScholarGoogle Scholar
  10. Boden, N., Cohen, D., Felderman, R., Kulawik, A., Seitz, C., Seizovic, J., and Su, W. 1995. Myrinet: A gigabit-per-second local area network IEEE Micro 15, 1 (Jan.), 29--36.]] Google ScholarGoogle Scholar
  11. Breg, F., Diwan, S., Villacis, J., Balasubramanian, J., Akman, E., and Gannon, D. 1998. Java RMI performance and object model interoperability: Experiments with Java/HPC++ distributed components. In Proceedings of the ACM 1998 Workshop on Java for High-Performance Network Computing (Santa Barbara, CA), ACM, New York, NY.]]Google ScholarGoogle Scholar
  12. Brown, A. and Seltzer, M. 1997. Operating system benchmarking in the wake of Lmbench: A case study of the performance of NetBSD on the Intel x86 architecture. In Proceedings of the 1997 Conference on Measurement and Modeling of Computer Systems (SIGMETRICS, Seattle, WA), 214--224.]] Google ScholarGoogle Scholar
  13. Burke, M., Choi, J.-D., Fink, S., Grove, D., Hind, M., Sarkar, V., Serrano, M., Sreedhar, V. C., Srinivasan, H., and Whaley, J. 1999. The Jalapeno dynamic optimizing compiler for Java. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 129--141.]] Google ScholarGoogle Scholar
  14. Carpenter, B., Fox, G., Ko, S. H., and Lim, S. 1999. Object serialization for marshalling data in a Java interface to MPI. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 66--71.]] Google ScholarGoogle Scholar
  15. Chang, C.-C. and von Eicken, T. 1998. A software architecture for zero-copy RPC in Java. Tech. Rep. 98-1708, Cornell Univ., Sept.]] Google ScholarGoogle Scholar
  16. Chang, C.-C. and von Eicken, T. 1999. Interfacing Java with the Virtual Interface Architecture. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 51--57.]] Google ScholarGoogle Scholar
  17. Christiansen, B., Cappello, P., Ionescu, M. F., Neary, M. O., Schauser, K. E., and Wu, D. 1997. Javelin: Internet-based parallel computing using Java. Concurrency: Pract. Exper. 9, 11, 1139--1160.]]Google ScholarGoogle Scholar
  18. Culler, D., Dusseau, A., Goldstein, S., Krishnamurthy, A., Lumetta, S., von Eicken, T., and Yelick, K. 1993. Parallel programming in split-C. In Supercomputing.]] Google ScholarGoogle Scholar
  19. Foster, I. and Kesselman, C. 1998. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.]] Google ScholarGoogle Scholar
  20. Getov, V. 1999. MPI and Java-MPI: Contrasts and comparisons of low-level communication performance. In Supercomputing. Portland, OR.]] Google ScholarGoogle Scholar
  21. Getov, V., Flynn-Hummel, S., and Mintchev, S. 1998. High-performance parallel programming in Java: Exploiting native libraries. In Proceedings of the ACM 1998 Workshop on Java for High-Performance Network Computing. ACM, New York, NY.]]Google ScholarGoogle Scholar
  22. Gray, P. and Sunderam, V. 1997. IceT: Distributed computing and Java. Concurrency: Pract. Exper. 9, 11 (Nov.).]]Google ScholarGoogle Scholar
  23. Hagimont, D. and Louvegnies, D. 1998. Javanaise: Distributed shared objects for internet cooperative applications. In Proceedings of the Middleware'98 Conference (The Lake District, England).]]Google ScholarGoogle Scholar
  24. Hirano, S., Yasu, Y., and Igarashi, H. 1998. Performance evaluation of popular distributed object technologies for Java. In Proceedings of the ACM 1998 Workshop on Java for High-Performance Network Computing. Online at http://www.cs.ucsb.edu/conferences/java98/.]]Google ScholarGoogle Scholar
  25. Hu, Y., Yu, W., Cox, A., Wallach, D., and Zwaenepoel, W. 1999. Runtime support for distributed sharing in strongly typed languages. Tech. Rep., Rice Univ. Online at http://www.cs.rice.edu/willy/TreadMarks/papers.html.]]Google ScholarGoogle Scholar
  26. Hutchinson, N., Peterson, L., Abbott, M., and O'Malley, S. 1989. RPC in the x-Kernel: Evaluating new design techniques. In Proceedings of the 12th ACM Symposium on Operating System Principles (Litchfield Park, AZ), 91--101.]] Google ScholarGoogle Scholar
  27. Izatt, M., Chan, P., and Brecht, T. 1999. Ajents: Towards an environment for parallel, distributed and mobile Java applications. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), 15--24.]] Google ScholarGoogle Scholar
  28. Johnson, D. and Zwaenepoel, W. 1991. The Peregrine high-performance RPC system. Tech. Rep. TR91-151, Rice Univ., Mar.]]Google ScholarGoogle Scholar
  29. Johnson, K., Kaashoek, M., and Wallach, D. 1995. CRL: High-performance all-software distributed shared memory. In 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO), ACM, New York, NY, 213--228.]] Google ScholarGoogle Scholar
  30. Judd, G., Clement, M., Snell, Q., and Getov, V. 1999. Design issues for efficient implementation of MPI in Java. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 58--65.]] Google ScholarGoogle Scholar
  31. Kaashoek, M., Engler, D., Ganger, G., Briceno, H., Hunt, R., Mazières, D., Pinckney, T., Grimm, R., Jannotti, J., and Mackenzie, K. 1997. Application performance and flexibility on exokernel systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 52--65.]] Google ScholarGoogle Scholar
  32. Karamcheti, V. and Chien, A. 1993. Concert---efficient runtime support for concurrent object-oriented programming languages on stock hardware. In Supercomputing'93 (Portland, OR), 15--19.]] Google ScholarGoogle Scholar
  33. Keleher, P., Cox, A., Dwarkadas, S., and Zwaenepoel, W. 1994. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter 1994 Usenix Conference (San Francisco, CA), 115--131.]] Google ScholarGoogle Scholar
  34. Krall, A. and Grafl, R. 1997. CACAO---A 64-bit JavaVM just-in-time compiler. Concurrency: Pract. Exper. 9, 11 (Nov.), 1017--1030. Online at http://www.complang.tuwien.ac.at/andi/.]]Google ScholarGoogle Scholar
  35. Krishnaswamy, V., Walther, D., Bhola, S., Bommaiah, E., Riley, G., Topol, B., and Ahamad, M. 1998. Efficient implementations of Java RMI. In Proceedings of the 4th USENIX Conference on Object-Oriented Technologies and Systems (COOTS'98, Santa Fe, NM).]] Google ScholarGoogle Scholar
  36. Langendoen, K., Bhoedjang, R. A. F., and Bal, H. E. 1997. Models for asynchronous message handling. IEEE Concurrency 5, 2 (April--June), 28--38.]] Google ScholarGoogle Scholar
  37. Launay, P. and Pazat, J.-L. 1998. The Do! project: Distributed programming using Java. In Proceedings of the First UK Workshop on Java for High Performance Network Computing (Southampton, England).]]Google ScholarGoogle Scholar
  38. Lipkind, I., Pechtchanski, I., and Karamcheti, V. 1999. Object views: Language support for intelligent object caching in parallel and distributed computations. In Proceedings of the 1999 Conference on Object-Oriented Programming Systems, Languages and Applications. 447--460.]] Google ScholarGoogle Scholar
  39. Maassen, J., Kielmann, T., and Bal, H. E. 2000. Efficient replicated method invocation in Java. In Proceedings of the ACM 2000 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 88--96.]] Google ScholarGoogle Scholar
  40. Maassen, J., van Nieuwpoort, R., Veldema, R., Bal, H. E., and Plaat, A. 1999. An efficient implementation of Java's remote method invocation. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99, Atlanta, GA), ACM, New York, NY, 173--182.]] Google ScholarGoogle Scholar
  41. Macbeth, M. W., McGuigan, K. A., and Hatcher, P. J. 1998. Executing Java threads in parallel in a distributed-memory environment. In Proceedings of the CASCON'98. IBM Canada and the National Research Council of Canada, 40--54.]] Google ScholarGoogle Scholar
  42. Mosberger, D. and Peterson, L. 1996. Making paths explicit in the Scout operating system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 153--168.]] Google ScholarGoogle Scholar
  43. Muller, G., Moura, B., Bellard, F., and Consel, C. 1997. Harissa, a mixed offline compiler and interpreter for dynamic class loading. In Proceedings of the Third USENIX Conference on Object-Oriented Technologies (COOTS, Portland, OR).]]Google ScholarGoogle Scholar
  44. Nester, C., Philippsen, M., and Haumacher, B. 1999. A more efficient RMI for Java. In Proceedings of the ACM 1999 Java Grande Conference (San Francisco, CA), ACM, New York, NY, 153--159.]] Google ScholarGoogle Scholar
  45. Philippsen, M. and Zenger, M. 1997. JavaParty---Transparent remote objects in Java. Concurrency: Pract. Exper. 9, 11 (Nov.), 1225--1242. Online at http://wwwipd.ira.uka.de/JavaParty/.]]Google ScholarGoogle Scholar
  46. Philippsen, M., Haumacher, B., and Nester, C. 2000. More efficient serialization and RMI for Java. Concurrency: Pract. Exper. 12, 7, 495--518.]]Google ScholarGoogle Scholar
  47. Proebsting, T., Townsend, G., Bridges, P., Hartman, J., Newsham, T., and Watterson, S. 1997. Toba: Java for applications---a way ahead of time (WAT) compiler. In Proceedings of the 3rd Conference on Object-Oriented Technologies and Systems (Portland, OR).]] Google ScholarGoogle Scholar
  48. Rinard, M. C., Scales, D. J., and Lam, M. S. 1993. Jade: A high-level, machine-independent language for parallel programming. IEEE Computer 26, 6 (June.), 28--38.]] Google ScholarGoogle Scholar
  49. Rodrigues, S., Anderson, T., and Culler, D. 1997. High-performance local communication with fast sockets. In USENIX'97.]] Google ScholarGoogle Scholar
  50. Schroeder, M. and Burrows, M. 1990. Performance of Firefly RPC. ACM Trans. Comput. Syst. 8, 1 (Feb.), 1--17.]] Google ScholarGoogle Scholar
  51. Sun Microsystems. 1997. Java Remote Method Invocation Specification, JDK 1.1 FCS, Online at http://java.sun.com/products/jdk/rmi.]]Google ScholarGoogle Scholar
  52. Thekkath, C. and Levy, H. 1993. Limits to low-latency communication on high-speed networks. ACM Trans. Comput. Syst. 11, 2 (May), 179--203.]] Google ScholarGoogle Scholar
  53. van Reeuwijk, K., van Gemund, A., and Sips, H. 1997. Spar: A programming language for semi-automatic compilation of parallel programs. Concurrency: Pract. Exper. 9, 11 (Aug.), 1193--1205.]]Google ScholarGoogle Scholar
  54. van Renesse, R., van Staveren, J., and Tanenbaum, A. 1989. Performance of the Amoeba distributed operating system Softw. Pract. Exper. 19, 223--234.]] Google ScholarGoogle Scholar
  55. Veldema, R., Hofman, R., Bhoedjang, R., and Bal, H. 2001a. Runtime optimizations for a Java DSM implementation. In Proceedings of the ACM 2001 Java Grande Conference. ACM, New York, NY.]] Google ScholarGoogle Scholar
  56. Veldema, R., Hofman, R., Bhoedjang, R., Jacobs, C., and Bal, H. 2001b. Source-level global optimizations for fine-grain distributed shared memory systems. In PPoPP-2001 Symposium on Principles and Practice of Parallel Programming.]] Google ScholarGoogle Scholar
  57. von Eicken, T., Culler, D., Goldstein, S., and Schauser, K. 1992. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th Annual Int. Symposium on Computer Architecture (Gold Coast, Australia), 256--266.]] Google ScholarGoogle Scholar
  58. Waldo, J. 1998. Remote procedure calls and Java Remote Method Invocation. IEEE Concurrency 6, 3 (July--Sept.), 5--7.]] Google ScholarGoogle Scholar
  59. Wallach, D., Hsieh, W., Johnson, K., Kaashoek, M., and Weihl, W. 1995. Optimistic Active Messages: A mechanism for scheduling communication with computation. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'95, Santa Barbara, CA), ACM, New York, NY, 217--226.]] Google ScholarGoogle Scholar
  60. Welsh, M. and Culler, D. 2000. Jaguar: Enabling efficient communication and I/O from Java. Concurrency: Pract. Exper. 12, 7, 519-538.]]Google ScholarGoogle Scholar
  61. Woo, S., Ohara, M., Torrie, E., Singh, J., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, 24--36.]] Google ScholarGoogle Scholar
  62. Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., and Aiken, A. 1998. Titanium: A high-performance Java dialect. In Proceedings of the ACM 1998 Workshop on Java for High-Performance Network Computing. Online at http://www.cs.ucsb.edu/conferences/java98/.]]Google ScholarGoogle Scholar
  63. Yu, W. and Cox, A. 1997. Java/DSM: A platform for heterogeneous computing. Concurrency: Pract. Exper. 9, 11 (Nov.), 1213--1224.]]Google ScholarGoogle Scholar

Index Terms

  1. Efficient Java RMI for parallel programming

                Recommendations

                Reviews

                Thomas Rauber

                The authors present an efficient realization of the remote method invocation (RMI) mechanism of Java, and describe a compiler-based Java system called Manta that uses this RMI realization. The motivation for the work is that many RMI implementations provide only limited throughput and suffer from high latencies, caused by the need for dynamic loading of classes to parameter objects and by serialization and de-serialization of method arguments. The key idea of Manta is to provide two different communication protocols, one to provide compatibility with standard RMI, and one to provide fast communication between Manta processes (Manta RMI). The Manta RMI implementation is based on the Panda communication library for low-level communication, and the RMI protocol is designed to minimize serialization and dispatch overhead such as copying, buffer management, fragmentation, thread switching, and indirect method calls. The paper shows that the resulting communication performance on a 32-node Myrinet cluster is much better than the communication performance of other RMI implementations: the latency of the new RMI implementation is 35 times lower than the latency of the Sun JDK 1.2, and much larger throughputs are obtained. This is also confirmed by using the Manta system for application programs (successive over-relaxation, all-pair shortest paths, Radix sort, fast Fourier transform, Water, and Barnes-Hut particle simulations). This paper is written for readers familiar with the RMI mechanism, and with the necessary steps for its realization. The paper addresses sources of possible inefficiencies, and demonstrates that the overall performance of an RMI mechanism can be dramatically increased by exploiting a variety of sources for performance optimizations. Online Computing Reviews Service

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Programming Languages and Systems
                  ACM Transactions on Programming Languages and Systems  Volume 23, Issue 6
                  November 2001
                  112 pages
                  ISSN:0164-0925
                  EISSN:1558-4593
                  DOI:10.1145/506315
                  Issue’s Table of Contents

                  Copyright © 2001 ACM

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 November 2001
                  Published in toplas Volume 23, Issue 6

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader