ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (392 K)

Article Toolbox
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0167-739X(00)00082-0    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 2000 Elsevier Science B.V. All rights reserved.

A new-generation parallel computer and its performance evaluation*1

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Sotirios G. ZiavrasCorresponding Author Contact Information, E-mail The Corresponding Author, a, Haim Grebela, Anthony T. Chronopoulosb and Florent Marcellia

a Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA

b Division of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA


Received 15 May 1999;
revised 15 May 2000;
accepted 15 May 2000.
Available online 2 November 2000.

Abstract

An innovative design is proposed for an MIMD distributed shared-memory (DSM) parallel computer capable of achieving gracious performance with technology expected to become feasible/viable in less than a decade. This New Millennium Computing Point Design was chosen by NSF, DARPA, and NASA as having the potential to deliver 100 TeraFLOPS and 1 PetaFLOPS performance by the year 2005 and 2007, respectively. Its scalability guarantees a lifetime extending well into the next century. Our design takes advantage of free-space optical technologies, with simple guided-wave concepts, to produce a 1D building block (BB) that implements efficiently a large, fully connected system of processors. Designing fully connected, large systems of electronic processors could be a very beneficial impact of optics on massively parallel processing. A 2D structure is proposed for the complete system, where the aforementioned 1D BB is extended into two dimensions. This architecture behaves like a 2D generalized hypercube, which is characterized by outstanding performance and extremely high wiring complexity that prohibits its electronics-only implementation. With readily available technology, a mesh of clear plastic/glass bars in our design facilitate point-to-point bit-parallel transmissions that utilize wavelength-division multiplexing (WDM) and follow dedicated optical paths. Each processor is mounted on a card. Each card contains eight processors interconnected locally via an electronic crossbar. Taking advantage of higher-speed optical technologies, all eight processors share the same communications interface to the optical medium using time-division multiplexing (TDM). A case study for 100 TeraFLOPS performance by the year 2005 is investigated in detail; the characteristics of chosen hardware components in the case study conform to SIA (Semiconductor Industry Association) projections. An impressive property of our system is that its bisection bandwidth matches, within an order of magnitude, the performance of its computation engine. Performance results based on the implementation of various important algorithmic kernels show that our design could have a tremendous, positive impact on massively parallel computing. 2D and 3D implementations of our design could achieve gracious (i.e., sustained) PetaFLOPS performance before the end of the next decade.

Author Keywords: PetaFLOPS computer performance; Advanced computer architecture; Generalized hypercube; Parallel computer; Optical interconnection network

Article Outline

1. Introduction
2. Basic architecture
2.1. 1D building block
2.2. Complete 2D system
3. Case study: 100 Tera(FL)OPS performance
3.1. Feasibility analysis for optical components
3.1.1. Optical Interface
3.1.2. Light sources and detectors
3.1.3. Optical/electrical power consumption
4. Analysis of the optical interconnection network in the BB
4.1. Optical filters
4.2. Simulation of a simple free-space optical interconnect
4.2.1. Bit error rate and signal-packing density
4.3. Alignment of the free-space optical system
4.3.1. Offsets in an integrated planar-optics interconnect
4.3.2. Efficiency and alignability of the integrated planar-optical system for the case study
5. Performance evaluation
5.1. Data communications
5.2. Implementation of algorithmic kernels
5.2.1. Algorithm I: Vector update or SAXPY loop
5.2.2. Algorithm II: Large-stride vector fetch and store
5.2.3. Algorithm III: Irregular gather/scatter
5.2.4. Algorithms IV and V: 3D Jacobi kernels
5.2.5. Further performance results for altered Jacobi kernels: Algorithms VI and VII
6. Conclusions
References
Vitae









*1 The work presented in this research was supported in part by NSF and DARPA, also co-sponsored by NASA, under the New Millennium Computing Point Design Grant ASC-9634775.

Corresponding Author Contact Information Corresponding author. Tel.: +1-201-596-5651; fax: +1-201-596-5680; email: ziavras@njit.edu


 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2009 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.