ABSTRACT
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.
- 1.The Generic Active Message Interface Specification. Available from http://now.cs.berkeley.edu/Papers/Papers/gam spec.ps, 1994.Google Scholar
- 2.R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. 1990 International Conf. on Supercomputing, June 11-15 1990. Published as Computer Architecture News 18:3. Google ScholarDigital Library
- 3.T. Anderson, D. Culler, and D. Patterson. A case for NOW (networks of workstations). IEEE Micro, 15(1):54-64, 1995. Google ScholarDigital Library
- 4.T. M. Anderson and R. S. Cornelius. High-performance switching with Fibre Channel. In Digest of Papers Compcon 1992, pages 261-268. IEEE Computer Society Press, 1992. Los Alamitos, Calif. Google ScholarDigital Library
- 5.G. Armitage and K. Adams. How inefficient is IP over ATM anyway? IEEE Network, Jan/Feb 1995. Google ScholarDigital Library
- 6.Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrineta gigabit-per-second local-area network. IEEE Micro, 15(1):29-36, February 1995. Available from http://www.myri.com/myricom/Hot.ps . Google ScholarDigital Library
- 7.CCITT, SG XVIII, Report R34. Draft Recommendation I.150: B-ISDN ATM functional characteristics, June 1990.Google Scholar
- 8.Andrew A. Chien, Vijay Karamcheti, John Plevyak, and Xingbin Zhang. Concurrent aggregates language report 2.0. Available via anonymous ftp from cs.uiuc.edu in /pub/csag or from http://www-csag.cs.uiuc.edu/, September 1993.Google Scholar
- 9.D. Clark, V. Jacobson, J Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communication Magazine, 27(6):23-29, June 1989.Google ScholarDigital Library
- 10.Douglas E. Comer. Internetworking with TCP/IP Vol I: Principles Protocols, and Architecture, 2nd edition. Prentice Hall, Englewood Cliffs, NJ, 1991. Google ScholarDigital Library
- 11.Cray Research, Inc. Cray T3D System Architecture Overview, March 1993.Google Scholar
- 12.Peter Druschel and Larry L. Peterson. Fbufs: A high-bandwidth crossdomain transfer facility. In Proceedings of Fourteenth ACM Symposium on Operating Systems Principles, pages 189-202. ACM SIGOPS, ACM Press, December 1993. Google ScholarDigital Library
- 13.Fiber-distributed data interface (FDDI)-Token ring media access control (MAC). American National Standard for Information Systems ANSI X3.139-1987, July 1987. American National Standards Institute.Google Scholar
- 14.Message Passing Interface Forum. The MPI message passing interface standard. Technical report, University of Tennessee, Knoxville, April 1994. Can be found at http://www.mcs.anl.gov/mpi/mpi-report.ps. Google ScholarDigital Library
- 15.H. Franke, C. E. Wu, M Riviere, P Pattnik, and M Snir. MPI programming environment for IBM SP1/SP2. In Proceedings of the International Symposium on Computer Architecture, 1995.Google ScholarDigital Library
- 16.F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.Google ScholarCross Ref
- 17.Mark Henderson, Bill Nickless, and Rick Stevens. A scalable highperformance I/O system. In Proceedings of the Scalable High- Performance Computing Conference, pages 79-86, 1994.Google ScholarCross Ref
- 18.James Hoe and A. Boughton. Network substrate for parallel processing on a workstation cluster. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.Google ScholarCross Ref
- 19.H. Houh, J. Adam, M. Ismert, C. Lindblad, and D. Tennenhouse. The VuNet desk area network: Architecture, implementation and experience. IEEE Journal of Selected Areas in Communications, 1995. Google ScholarDigital Library
- 20.IBM 9076 Scalable POWERparallel 1: General information. IBM brochure GH26-7219-00, February 1993. Available from http://ibm.tc.cornell.edu/ibm/pps/sp2/index.html .Google Scholar
- 21.Intel Corporation. Paragon XP/S Product Overview, 1991.Google Scholar
- 22.Vijay Karamcheti and Andrew A. Chien. A comparison of architectural support for messaging on the TMC CM-5 and the Cray T3D. In Proceedings of the International Symposium on Computer Architecture, 1995. Available from http://www-csag.cs.uiuc.edu/papers/cm5- t3d-messaging.ps . Google ScholarDigital Library
- 23.Vijay Karamcheti and Andrew A. Chien. FM-fast messaging on the Cray T3D. Available from http://www-csag.cs.uiuc.edu/papers/t3d-fmmanual. ps, February 1995.Google Scholar
- 24.M. Liu, J. Hsieh, D. Hu, J. Thomas, and J. MacDonald. Distributed network computing over Local ATM Networks. In Supercomputing '94, 1995.Google Scholar
- 25.R. Martin. HPAM: An Active Message layer for a network of HP workstation. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994. Available from ftp://ftp.cs.berkeley.edu/ucb/CASTLE/Active Messages/hotipaper.ps.Google ScholarCross Ref
- 26.Meiko World Incorporated. Meiko Computing Surface Communications Processor Overview, 1993.Google Scholar
- 27.V. S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315-340, {12} 1990. Google ScholarDigital Library
- 28.A. S. Tanenbaum. Computer networks. Prentice-Hall 2nd ed. 1989, 1981. Google ScholarDigital Library
- 29.Thinking Machines Corporation, 245 First Street, Cambridge, MA 02154-1264. The Connection Machine CM-5 Technical Summary, October 1991.Google Scholar
- 30.T. von Eicken, A. Basu, and V. Buch. Low-latency communication over ATM networks using Active Messages. IEEE Micro, 15(1):46-53, 1995. Google ScholarDigital Library
- 31.T. von Eicken, D. Culler, S. Goldstein, and K. Schauser. Active Messages: a mechanism for integrated communication and computation. In Proceedings of the International Symposium on Computer Architecture, 1992. Available from http://www.cs.cornell.edu/Info/People/tve/ucb papers/isca92.ps. Google ScholarDigital Library
Index Terms
- High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Recommendations
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecturePractical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Adaptive insertion policies for high performance caching
ISCA '07: Proceedings of the 34th annual international symposium on Computer architectureThe commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
Comments