Article

Free Access

High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Authors:
Scott Pakin

Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL

Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL
View Profile

,
Mario Lauria

Dipartimento di Informatica e Sistemistica, Università di Napoli 'Federico II' via Claudio 21, 80125 Napoli, Italy

Dipartimento di Informatica e Sistemistica, Università di Napoli 'Federico II' via Claudio 21, 80125 Napoli, Italy
View Profile

,
Andrew Chien

Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL

Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL
View Profile

Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on SupercomputingDecember 1995Pages 55–eshttps://doi.org/10.1145/224170.224360

Published:08 December 1995Publication History

Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing

Pages 55–es

ABSTRACT

The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.

References

1.The Generic Active Message Interface Specification. Available from http://now.cs.berkeley.edu/Papers/Papers/gam spec.ps, 1994.Google Scholar
2.R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. 1990 International Conf. on Supercomputing, June 11-15 1990. Published as Computer Architecture News 18:3. Google ScholarDigital Library
3.T. Anderson, D. Culler, and D. Patterson. A case for NOW (networks of workstations). IEEE Micro, 15(1):54-64, 1995. Google ScholarDigital Library
4.T. M. Anderson and R. S. Cornelius. High-performance switching with Fibre Channel. In Digest of Papers Compcon 1992, pages 261-268. IEEE Computer Society Press, 1992. Los Alamitos, Calif. Google ScholarDigital Library
5.G. Armitage and K. Adams. How inefficient is IP over ATM anyway? IEEE Network, Jan/Feb 1995. Google ScholarDigital Library
6.Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrineta gigabit-per-second local-area network. IEEE Micro, 15(1):29-36, February 1995. Available from http://www.myri.com/myricom/Hot.ps . Google ScholarDigital Library
7.CCITT, SG XVIII, Report R34. Draft Recommendation I.150: B-ISDN ATM functional characteristics, June 1990.Google Scholar
8.Andrew A. Chien, Vijay Karamcheti, John Plevyak, and Xingbin Zhang. Concurrent aggregates language report 2.0. Available via anonymous ftp from cs.uiuc.edu in /pub/csag or from http://www-csag.cs.uiuc.edu/, September 1993.Google Scholar
9.D. Clark, V. Jacobson, J Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communication Magazine, 27(6):23-29, June 1989.Google ScholarDigital Library
10.Douglas E. Comer. Internetworking with TCP/IP Vol I: Principles Protocols, and Architecture, 2nd edition. Prentice Hall, Englewood Cliffs, NJ, 1991. Google ScholarDigital Library
11.Cray Research, Inc. Cray T3D System Architecture Overview, March 1993.Google Scholar
12.Peter Druschel and Larry L. Peterson. Fbufs: A high-bandwidth crossdomain transfer facility. In Proceedings of Fourteenth ACM Symposium on Operating Systems Principles, pages 189-202. ACM SIGOPS, ACM Press, December 1993. Google ScholarDigital Library
13.Fiber-distributed data interface (FDDI)-Token ring media access control (MAC). American National Standard for Information Systems ANSI X3.139-1987, July 1987. American National Standards Institute.Google Scholar
14.Message Passing Interface Forum. The MPI message passing interface standard. Technical report, University of Tennessee, Knoxville, April 1994. Can be found at http://www.mcs.anl.gov/mpi/mpi-report.ps. Google ScholarDigital Library
15.H. Franke, C. E. Wu, M Riviere, P Pattnik, and M Snir. MPI programming environment for IBM SP1/SP2. In Proceedings of the International Symposium on Computer Architecture, 1995.Google ScholarDigital Library
16.F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.Google ScholarCross Ref
17.Mark Henderson, Bill Nickless, and Rick Stevens. A scalable highperformance I/O system. In Proceedings of the Scalable High- Performance Computing Conference, pages 79-86, 1994.Google ScholarCross Ref
18.James Hoe and A. Boughton. Network substrate for parallel processing on a workstation cluster. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.Google ScholarCross Ref
19.H. Houh, J. Adam, M. Ismert, C. Lindblad, and D. Tennenhouse. The VuNet desk area network: Architecture, implementation and experience. IEEE Journal of Selected Areas in Communications, 1995. Google ScholarDigital Library
20.IBM 9076 Scalable POWERparallel 1: General information. IBM brochure GH26-7219-00, February 1993. Available from http://ibm.tc.cornell.edu/ibm/pps/sp2/index.html .Google Scholar
21.Intel Corporation. Paragon XP/S Product Overview, 1991.Google Scholar
22.Vijay Karamcheti and Andrew A. Chien. A comparison of architectural support for messaging on the TMC CM-5 and the Cray T3D. In Proceedings of the International Symposium on Computer Architecture, 1995. Available from http://www-csag.cs.uiuc.edu/papers/cm5- t3d-messaging.ps . Google ScholarDigital Library
23.Vijay Karamcheti and Andrew A. Chien. FM-fast messaging on the Cray T3D. Available from http://www-csag.cs.uiuc.edu/papers/t3d-fmmanual. ps, February 1995.Google Scholar
24.M. Liu, J. Hsieh, D. Hu, J. Thomas, and J. MacDonald. Distributed network computing over Local ATM Networks. In Supercomputing '94, 1995.Google Scholar
25.R. Martin. HPAM: An Active Message layer for a network of HP workstation. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994. Available from ftp://ftp.cs.berkeley.edu/ucb/CASTLE/Active Messages/hotipaper.ps.Google ScholarCross Ref
26.Meiko World Incorporated. Meiko Computing Surface Communications Processor Overview, 1993.Google Scholar
27.V. S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315-340, {12} 1990. Google ScholarDigital Library
28.A. S. Tanenbaum. Computer networks. Prentice-Hall 2nd ed. 1989, 1981. Google ScholarDigital Library
29.Thinking Machines Corporation, 245 First Street, Cambridge, MA 02154-1264. The Connection Machine CM-5 Technical Summary, October 1991.Google Scholar
30.T. von Eicken, A. Basu, and V. Buch. Low-latency communication over ATM networks using Active Messages. IEEE Micro, 15(1):46-53, 1995. Google ScholarDigital Library
31.T. von Eicken, D. Culler, S. Goldstein, and K. Schauser. Active Messages: a mechanism for integrated communication and computation. In Proceedings of the International Symposium on Computer Architecture, 1992. Available from http://www.cs.cornell.edu/Info/People/tve/ucb papers/isca92.ps. Google ScholarDigital Library

Index Terms

High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Read More
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Read More
Adaptive insertion policies for high performance caching
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing
December 1995
875 pages
ISBN:0897918169
DOI:10.1145/224170
Chairman:
Sid Karin
San Diego Supercomputer Center, San Diego, CA
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 December 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Supercomputing '95 Paper Acceptance Rate69of241submissions,29%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 197
  Total Citations
  View Citations
- 368
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)

High performance cache replacement using re-reference interval prediction (RRIP)

Adaptive insertion policies for high performance caching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)

High performance cache replacement using re-reference interval prediction (RRIP)

Adaptive insertion policies for high performance caching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media