Skip to main content
Log in

Latency hiding on COMA multiprocessors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Cache-only memory access (COMA) multiprocessors support scalable coherent shared memory with a uniform memory access programming model. The local portion of shared memory associated with a processor is organized as a cache. This cache-based organization of memory results in long remote memory access latencies. Latency-hiding mechanisms can reduce effective remote memory access latency by making data present in a processor's local memory by the time the data are needed. In this paper we study the effectiveness of latency-hiding mechanisms on the KSR2 multiprocessor in improving the performance of three programs. The communication patterns of each program are analyzed and the mechanisms for latency hiding are applied. Results from a 52-processor system indicate that these mechanisms hide a significant portion of the latency of remote memory accesses. The results also quantify benefits in overall application performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Bianchini, M. Crovella, L. Kontoothanassis, and T. LeBlanc. Memory contention in scalable cachecoherent multiprocessors. Technical report 448, Department of Computer Science, The University of Rochester, Rochester, N.Y., 1993.

    Google Scholar 

  2. T. Chen and J. Baer. A performance study of software and hardware data prefetching schemes. In The 21st Annual International Symposium on Computer Architecture, pp. 223–232. IEEE Computer Society Press, Los Alamitos, Calif., 1994.

    Google Scholar 

  3. E. Hagersten, A. Landin, and S. Haridi. DDM—A cache-only memory architecture. IEEE Computer, 25: 44–54, 1992.

    Google Scholar 

  4. Kendall Square Research. KSR1 Principles of Operation Manual. Kendall Square Research Corporation, Waltham, Mass., 1992.

    Google Scholar 

  5. E. Rosti, E. Smirni, T. Wagner, A. Apon, and L. Dowdy. The KSR1: Experimentation and modeling of poststore. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 74–85. The Association for Computing Machinery, 1993.

  6. D. Solow. Linear Programming: An Introduction to Finite Improvement Algorithms. North-Holland, New York, 1984.

    Google Scholar 

  7. P. Stenström, T. Joe, and A. Gupta. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. In The 19th Annual International Symposium on Computer Architecture, pp. 80–91. IEEE Computer Society Press, Los Alamitos, Calif., 1992.

    Google Scholar 

  8. T. Wagner, E. Smirni, A. Apon, A. Madhukar, and L. Dowdy. Measuring the effects of thread placement on the Kendall Square KSR1. Technical report ORNL/TM-12462, Oak Ridge National Laboratory, Mathematical Sciences Section, Oak Ridge, Tenn., 1993.

    Google Scholar 

  9. D. Windheiser, E. Boyed, E. Hao, S. Abraham, and E. Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In The International Parallel Processing Symposium, pp. 454–461. IEEE Computer Society Press, Los Alamitos, Calif, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelrahman, T.S. Latency hiding on COMA multiprocessors. J Supercomput 10, 225–242 (1996). https://doi.org/10.1007/BF00130108

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00130108

Keywords

Navigation