Abstract
Cache-only memory access (COMA) multiprocessors support scalable coherent shared memory with a uniform memory access programming model. The local portion of shared memory associated with a processor is organized as a cache. This cache-based organization of memory results in long remote memory access latencies. Latency-hiding mechanisms can reduce effective remote memory access latency by making data present in a processor's local memory by the time the data are needed. In this paper we study the effectiveness of latency-hiding mechanisms on the KSR2 multiprocessor in improving the performance of three programs. The communication patterns of each program are analyzed and the mechanisms for latency hiding are applied. Results from a 52-processor system indicate that these mechanisms hide a significant portion of the latency of remote memory accesses. The results also quantify benefits in overall application performance.
Similar content being viewed by others
References
R. Bianchini, M. Crovella, L. Kontoothanassis, and T. LeBlanc. Memory contention in scalable cachecoherent multiprocessors. Technical report 448, Department of Computer Science, The University of Rochester, Rochester, N.Y., 1993.
T. Chen and J. Baer. A performance study of software and hardware data prefetching schemes. In The 21st Annual International Symposium on Computer Architecture, pp. 223–232. IEEE Computer Society Press, Los Alamitos, Calif., 1994.
E. Hagersten, A. Landin, and S. Haridi. DDM—A cache-only memory architecture. IEEE Computer, 25: 44–54, 1992.
Kendall Square Research. KSR1 Principles of Operation Manual. Kendall Square Research Corporation, Waltham, Mass., 1992.
E. Rosti, E. Smirni, T. Wagner, A. Apon, and L. Dowdy. The KSR1: Experimentation and modeling of poststore. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 74–85. The Association for Computing Machinery, 1993.
D. Solow. Linear Programming: An Introduction to Finite Improvement Algorithms. North-Holland, New York, 1984.
P. Stenström, T. Joe, and A. Gupta. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. In The 19th Annual International Symposium on Computer Architecture, pp. 80–91. IEEE Computer Society Press, Los Alamitos, Calif., 1992.
T. Wagner, E. Smirni, A. Apon, A. Madhukar, and L. Dowdy. Measuring the effects of thread placement on the Kendall Square KSR1. Technical report ORNL/TM-12462, Oak Ridge National Laboratory, Mathematical Sciences Section, Oak Ridge, Tenn., 1993.
D. Windheiser, E. Boyed, E. Hao, S. Abraham, and E. Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In The International Parallel Processing Symposium, pp. 454–461. IEEE Computer Society Press, Los Alamitos, Calif, 1993.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Abdelrahman, T.S. Latency hiding on COMA multiprocessors. J Supercomput 10, 225–242 (1996). https://doi.org/10.1007/BF00130108
Issue Date:
DOI: https://doi.org/10.1007/BF00130108