SCMP: A Single-Chip Message-Passing Parallel Computer

Baker, James M.; Gold, Brian; Bucciero, Mark; Bennett, Sidney; Mahajan, Rajneesh; Ramachandran, Priyadarshini; Shah, Jignesh

doi:10.1023/B:SUPE.0000040612.33760.8a

SCMP: A Single-Chip Message-Passing Parallel Computer

Published: November 2004

Volume 30, pages 133–149, (2004)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

James M. Baker Jr.¹,
Brian Gold²,
Mark Bucciero³,
Sidney Bennett²,
Rajneesh Mahajan³,
Priyadarshini Ramachandran³ &
…
Jignesh Shah³

127 Accesses
3 Citations
Explore all metrics

Abstract

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Article Open access 20 July 2021

Exploiting Tightly-Coupled Cores

Article Open access 26 August 2014

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

References

F. Allen et al., Blue gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2):310–327, 2001.
Google Scholar
S. Chatterji, M. Narayanan, J. Duell, and L. Oliker. Performance evaluation of two emerging media proces-sors: VIRAM and imagine. Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia (PDIVM), 2003.
W. Dally, Virtual-channel flowcontrol. IEEE Transactions on Parallel and Distributed Systems, 3(2):194–205, 1992.
Google Scholar
W. Dally, J. Fiske, J. Keen, R. Lethin, M. Noakes, P. Nuth, R. Davison, and G. Fyler. The message-driven processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, 12(2):23–39, 1992.
Google Scholar
W. J. Dally and S. Lacy. VLSI architecture: Past, present, and future. 20th Conference on Advanced Research in VLSI (ARVLSI 99),March 1999.
K. Diefendorff. Power4 focuses on memory bandwidth. Microprocessor Report, 13(13), 1999.
K. Diefendorff and P. Dubey. How multimedia workloads will change processor design. Computer, 30(9):43–45, 1997.
Google Scholar
DIS Stressmark Suite, Atlantic Aerospace Division, Titan Systems Corporation, www.aaec.com/projectweb/dis.
S. Eggers, J. Elmer, H. Levy. J. Lo, R. Stamm, and D. Tullsen. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro, 17(5):12–19, 1997.
Google Scholar
B. Gaeke, P. Husbands, X. Li, L. Oliker, K. Yelick, and R. Biswas. Memory-intensive benchmarks: IRAM vs. cache-based machines. International Parallel and Distributed Processing Symposium (IPDPS '02), April 2002.
P. Ghosh, R. Mangaser, C. Mark, and K. Rose. Interconnect-dominated VLSI design. 20th Conference on Advanced Research in VLSI (ARVLSI 99), March 1999.
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multipro-cessor performance with the SUIF compiler. Computer, 29(12):84–89, 1996.
Google Scholar
C. Kozyrakis and D. Patterson. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-35), pp. 283–293, 2002.
V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans-actions on Computers, 48(9):866–880, 1999.
Google Scholar
D. Matzke. Will physical scalability sabotage performance gains? Computer, 30(9):37–39, 1997.
Google Scholar
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. Seventh International Symp. Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pp. 2–11, Oct. 1996.
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A case for intelligent RAM. IEEE Micro, 17(2):34–44, 1997.
Google Scholar
J. Suh, E. Kim, S. Crago, L. Srinivasan, and M. French. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels. 30th Annual International Symposium on Computer Architecture (ISCA '03), pp. 410–419, June 2003.
The International Technology Roadmap for Semiconductors 2003 Edition, SIA '03.
M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliaro, and S. S. Tse. The MAJC architecture: A synthesis of parallelism and scalability. IEEE Micro, 20(6):12–25, 2000.
Google Scholar
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. Computer, 30(9):86–93, 1997.
Google Scholar
D. S. Wills, H. H. Cat, J. Cruz-Rivera, W. S. Lacy, J. M. Baker, Jr., J. C. Eble, A. Lopez-Lagunas, and M. Hopper. High-throughput, low-memory applications on the pica architecture. IEEE Transactions on Parallel and Distributed Systems, 8(10):1055–1067, 1997.
Google Scholar
W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 23(1):20–24, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Virginia Military Institute, Lexington, VA, USA
James M. Baker Jr.
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
Brian Gold & Sidney Bennett
Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Mark Bucciero, Rajneesh Mahajan, Priyadarshini Ramachandran & Jignesh Shah

Authors

James M. Baker Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Brian Gold
View author publications
You can also search for this author in PubMed Google Scholar
Mark Bucciero
View author publications
You can also search for this author in PubMed Google Scholar
Sidney Bennett
View author publications
You can also search for this author in PubMed Google Scholar
Rajneesh Mahajan
View author publications
You can also search for this author in PubMed Google Scholar
Priyadarshini Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar
Jignesh Shah
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, J.M., Gold, B., Bucciero, M. et al. SCMP: A Single-Chip Message-Passing Parallel Computer. The Journal of Supercomputing 30, 133–149 (2004). https://doi.org/10.1023/B:SUPE.0000040612.33760.8a

Download citation

Issue Date: November 2004
DOI: https://doi.org/10.1023/B:SUPE.0000040612.33760.8a

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCMP: A Single-Chip Message-Passing Parallel Computer

Abstract

Access this article

Similar content being viewed by others

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Exploiting Tightly-Coupled Cores

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

SCMP: A Single-Chip Message-Passing Parallel Computer

Abstract

Access this article

Similar content being viewed by others

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Exploiting Tightly-Coupled Cores

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation