Skip to main content
Log in

Where does the time go in software DSMs? — Experiences with JIAJIA

  • Regular Papers
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this paper, we take the challenge to find where we should focus our efforts in the future design. The components of total system overhead of software DSM systems are analyzed in detail firstly. Based on a state-of-the-art software DSM system JIAJIA, we measure these components on Dawning parallel system and draw five important conclusions which are different from some traditional viewpoints. (1) The performance of the JIAJIA software DSM system is acceptable. For four of eight applications, the parallel efficiency achieved by JIAJIA is about 80%, while for two others, 70% efficiency can be obtained. (2) 40.94% interrupt service time is overlapped with waiting time. (3) Encoding and decoding diffs do not cost much time (<1%), so using hardware support to encode/decode diffs and send/receive messages is not worthwhile. (4) Great endeavours should be put to reduce data miss penalty and optimize, synchronization operations, which occupy 11.75% and 13.65% of total execution time respectively. (5) Communication hardware overhead occupies 66.76% of the whole communication time in the experimental environment, and communication software overhead does not take much time as expected.

Moreover, by studying the effect of CPU speed to system overhead, we find that the common speedup formula for distributed memory systems does not work under software DSM systems. Therefore, we design a new speedup formula special to software DSM systems, and point out that when the CPU speed increases the speedup can be increased too even if the network speed is fixed, which is impossible in message passing systems. Finally, we argue that JIAJIA system has desired scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li K. Ivy: A shared virtual memory system for parallel computing. InProc. the 1988 Int. Conf. Parallel Processing (ICPP’88), August 1988, II: 94–101.

  2. Keleher P, Cox A L, Zwaenepoel W. Lazy release consistency for software distributed shared memory. InProc. the 19th Annual Int. Symp. Computer Architecture (ISCA’92) May 1992, pp.13–21.

  3. Bershad B N, Zekauskas M J, Sawdon W A. The Midway distributed shared memory system. InProc. the 38th IEEE Int. Computer Conf. (COMPCON Spring’93), Feb. 1993, pp.528–537.

  4. Iftode L, Singh J P, Li K Scope consistency: A bridge between release consistency and entry consistency. InProc. the 8th ACM Annual Symp. Parallel Algorithms and Architectures (SPAA’96), June 1996, pp.277–287.

  5. Carter J B, Bennett J K, Zwaenepoel W. Implementation and performance of Munin. InProc. the 13th ACM Symp. Operating Systems Principles (SOSP-13), Oct. 1991, pp.152–164.

  6. Bianchini R, Kontothanassis L I, Pinto R, De Maria M, Abud M, Amorim C L. Hiding communication latency and coherence overhead in software DSMs. InProc. the 7th Symp. Architectures Support for Programming Languages and Operating Systems (ASPLOSVII), Oct. 1996, pp.198–209.

  7. Lu H, Dwarkadas S, Cox A L, Zwaenepoel W. Quantifying the performance difference between PVM and treadmarks.Journal of Parallel and Distributed Computing, June 1997, 43(2): 65–78.

    Article  Google Scholar 

  8. Hu W W, Shi W S, Tang Z M. Reducing system overheads in home-based software DSMs. InProc. the Second Merged Symp. IPPS/SPDP 1999, April 1999.

  9. Culler D, Liu L T, Martin R P, Yoshikawa C. LogP performance assessment of fast network interfaces.IEEE Micro, 1996.

  10. Stets R, Dwarkadas S, Hardavellas N, Hunt G, Kontothanassis L, Parthasarathy S Scott Michael. Cashmere-21: Software coherent shared memory on a clustered remote-write network. InProc. the 16th ACM Symp. Operating Systems Principles (SOSP-16), October 1997.

  11. Scales D J, Gharachorloo K, Aggarwal A. Fine-grain software distributed shared memory on SMP clusters. InProc. the 4th IEEE Symp. High-Performance Computer Architecture (HPCA-4), February 1998.

  12. Keleher P. The relative importance of concurrent writers and weak consistency models. InProc. the 16th Int. Conf. Distributed Computing Systems (ICDCS-16), May 1996, pp.91–98.

  13. Keleher P, Dwarkadas S, Cox A L, Zwaenepoel W. Treadmarks: Distributed shared memory on standard workstations and operating systems. InProc. the Winter 1994 USENIX Conference, Jan. 1994, pp.115–131.

  14. Iftode L. Home-based shared virtual memory. PhD thesis, Dept. of Computer Science, Princeton University, June 1998.

  15. Iftode L, Singh J P. Shared virtual memory: Progress and challenges. InProc. the IEEE, Special Issue on Distributed Shared Memory, Spring 1999.

  16. Hu W W, Shi W S, Tang Z M. JIAJIA: An SVM system based on a new cache coherence protocol. InProc. the High Performance Computing and Networking Europe 1999 (HPCN’99), April 1999.

  17. Shi W S, Ma J. High efficient parallel computation of resonant frequencies of waveguide loaded cavities on JIAJIA software DSM system. InProc. the High Performance Computing and Networking Europe 1999 (HPCN’99), April 1999.

  18. Woo S, Ohara M, Torrie E, Singh J P, Gupta A. The splash-2 programs: Charachterization and methodological considerations. InProc. the 22nd Annual Int. Symp. Computer Architecture (ISCA’95), June 1995, pp.24–36.

  19. Singh J P, Weber Wolf-Doetrich, Gupta Anoop. Splash: Splash: Stanford parallel application for shared memory.ACM Computer Architecture News March 1992, 20(1): 5–44.

    Article  Google Scholar 

  20. Eskicioglu M R, Marshland T A, Hu W W, Shi W S. Evaluation of JIAJIA software DSM system on high performance computer architectures. InProc. the 32nd Hawaii Int Conf. System Sciences (HICSS-32) CD-ROM, January 1999.

  21. Bilas A. Improving the performance of shared virtual memory on system area networks. PhD thesis, Dept. of Computer Science, Princeton University, November 1998.

  22. Mowry T C, Chan C, Lo A. Comparative evaluation of latency tolerance techniques for software distributed shared memory. InProc. the 4th IEEE Symp. High-Performance Computer Architecture (HPCA-4), Feb. 1998, pp.300–311.

  23. Thitikamol K, Keleher P. Multi-threading and remote latency in software DSMs. InProc. the 17th Int. Conf. Distributed Computing Systems (ICDCS-17), May 1997.

  24. Eicken T, Basu A, Buch V, Vogels W. U-net: A user-level network interface for parallel and distributed computing. InProc. the 15th ACM Symp. Operating Systems Principles (SOSP-15), December 1995.

  25. Pakin S, Lauria M, Chien A. High performance messaging on workstation: Illinois fast messages (fm) for Myrinet. InProc. Supercomputing’95, December 1995.

  26. Blumrich M A, Albert R D, Chen Yet al. Design choices in the shrimp system: An empirical study. InProc. the 25th Annual Int. Symp. Computer Architecture (ISCA’98), June 1998.

  27. Holt C, Heinrich M Singh J P, Rothberg E, Hennessy J L. The performance effects of latency, occupancy and bandwidth in cache-coherent DSM multiprocessors. InProc. the Fifth Workshop on Scalable Shared Memory Multiprocessors, June 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Weisong.

Additional information

The work of this paper is supported by the National Natural Science Foundation of China under grant No.69896250 and No. 69703002, and the CLIMBING Program.

Shi Weisong received his B.S. degree from Xidian University in 1995. He is currently a Ph.D. candidate of the Institute of Computing Technology. His research interested high performance computing, distributed and metacomputing, distributed shared memory.

Hu Weiwu received his B.S. degree from University of Science and Technology of China in 1991 and his Ph.D. degree from the Institute of Computing Technology, the Chinese Academy of Sciences in 1996, both in computer science. He is currently an Associate Professor of the Institute of Computing Technology. His research interests include high performance computer architecture, parallel processing, and VLSI design.

Tang Zhimin received his B.S. degree from Nanjing University in 1985 and his Ph.D. degree from the Institute of Computing Technology, the Chinese Academy of Sciences in 1990, both in computer science. He is currently a Professor of the Institute of Computing Technology and the Graduate School of University of Science and Technology of China. His research interests include high performance computer architecture, MPP systems, digital signal processing, and VLSI design.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, W., Hu, W. & Tang, Z. Where does the time go in software DSMs? — Experiences with JIAJIA. J. of Comput. Sci. & Technol. 14, 193–205 (1999). https://doi.org/10.1007/BF02948508

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02948508

Keywords

Navigation