Latency hiding on COMA multiprocessors

Abdelrahman, Tarek S.

doi:10.1007/BF00130108

Latency hiding on COMA multiprocessors

Published: September 1996

Volume 10, pages 225–242, (1996)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Tarek S. Abdelrahman¹

37 Accesses
Explore all metrics

Abstract

Cache-only memory access (COMA) multiprocessors support scalable coherent shared memory with a uniform memory access programming model. The local portion of shared memory associated with a processor is organized as a cache. This cache-based organization of memory results in long remote memory access latencies. Latency-hiding mechanisms can reduce effective remote memory access latency by making data present in a processor's local memory by the time the data are needed. In this paper we study the effectiveness of latency-hiding mechanisms on the KSR2 multiprocessor in improving the performance of three programs. The communication patterns of each program are analyzed and the mechanisms for latency hiding are applied. Results from a 52-processor system indicate that these mechanisms hide a significant portion of the latency of remote memory accesses. The results also quantify benefits in overall application performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. Bianchini, M. Crovella, L. Kontoothanassis, and T. LeBlanc. Memory contention in scalable cachecoherent multiprocessors. Technical report 448, Department of Computer Science, The University of Rochester, Rochester, N.Y., 1993.
Google Scholar
T. Chen and J. Baer. A performance study of software and hardware data prefetching schemes. In The 21st Annual International Symposium on Computer Architecture, pp. 223–232. IEEE Computer Society Press, Los Alamitos, Calif., 1994.
Google Scholar
E. Hagersten, A. Landin, and S. Haridi. DDM—A cache-only memory architecture. IEEE Computer, 25: 44–54, 1992.
Google Scholar
Kendall Square Research. KSR1 Principles of Operation Manual. Kendall Square Research Corporation, Waltham, Mass., 1992.
Google Scholar
E. Rosti, E. Smirni, T. Wagner, A. Apon, and L. Dowdy. The KSR1: Experimentation and modeling of poststore. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 74–85. The Association for Computing Machinery, 1993.
D. Solow. Linear Programming: An Introduction to Finite Improvement Algorithms. North-Holland, New York, 1984.
Google Scholar
P. Stenström, T. Joe, and A. Gupta. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. In The 19th Annual International Symposium on Computer Architecture, pp. 80–91. IEEE Computer Society Press, Los Alamitos, Calif., 1992.
Google Scholar
T. Wagner, E. Smirni, A. Apon, A. Madhukar, and L. Dowdy. Measuring the effects of thread placement on the Kendall Square KSR1. Technical report ORNL/TM-12462, Oak Ridge National Laboratory, Mathematical Sciences Section, Oak Ridge, Tenn., 1993.
Google Scholar
D. Windheiser, E. Boyed, E. Hao, S. Abraham, and E. Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In The International Parallel Processing Symposium, pp. 454–461. IEEE Computer Society Press, Los Alamitos, Calif, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of Toronto, M5S 3G4, Toronto, Ontario, Canada
Tarek S. Abdelrahman

Authors

Tarek S. Abdelrahman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdelrahman, T.S. Latency hiding on COMA multiprocessors. J Supercomput 10, 225–242 (1996). https://doi.org/10.1007/BF00130108

Download citation

Issue Date: September 1996
DOI: https://doi.org/10.1007/BF00130108

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latency hiding on COMA multiprocessors

Abstract

Access this article

Similar content being viewed by others

Concurrent Data Structures in Architectures with Limited Shared Memory Support

OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads

Task Scheduling on Manycore Processors with Home Caches

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Latency hiding on COMA multiprocessors

Abstract

Access this article

Similar content being viewed by others

Concurrent Data Structures in Architectures with Limited Shared Memory Support

OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads

Task Scheduling on Manycore Processors with Home Caches

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation