ABSTRACT
An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. The system practically works with unmodified GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk[8], whose current implementation requires a fairly sophisticated preprocessor to C. These results show that efficient asynchronous calls (a.k.a. future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering.
- 1.Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system, in Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)Proceedings, pages 207-216, 1995. Google ScholarDigital Library
- 2.Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356-368, 1994.Google Scholar
- 3.Luca Cardelli, James Donahue, Lucille Glassman, Mick Jordan, Bill Kalsow, and Greg Nelson. Modula-3 report (revised). Technical Report 52, Digital Systems Research Center, 1989.Google Scholar
- 4.Andrew. A. Chien, U. S. Reddy, J. Plevyak, and J. Dolby. ICC++ - a C++ dialect for high performance parallel computing, in Proceedings of the Second International Symposium on Object Technologies for Advanced Software, 1996. Google ScholarDigital Library
- 5.Marc Feeley. A message passing implementation of lazy task creation. In Robert H. Halstead, Jr. and Takayasu Ito, editors, Proceedings of international Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications, number 748 in Lecture Notes in Computer Science, pages 94-107. Springer-Verlag, 1993. Google ScholarDigital Library
- 6.Mark Feeley. An Efficient and General Implementation of Futures on Large Scale Shared-Memory Multiprocessots. PhD thesis, Brandeis University, 1993. Google ScholarDigital Library
- 7.Mark Feeley. Polling efficiently on stock hardware. In Proceedings of the 1993 A CM SIGPLAN Conference on Functional Programming and Computer Architecture, pages 179-187, 1993. Google ScholarDigital Library
- 8.Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the A CM SIGPLAN'98 Conference on Programming Language Design and Implementation (PLDI), 1998. Google ScholarDigital Library
- 9.Seth Copen Goldstein, Klaus Erik Schauser, and David Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5-20, August 1996. Google ScholarDigital Library
- 10.Robert H. Halstead, Jr. Multilisp: A language for concurrent symbolic computation. A CM Transactions on Programming Languages and Systems, 7(4):501-538, April 1985. Google ScholarDigital Library
- 11.Steve Kleiman, Devang Shah, and Bart Smaalders. Programming with Threads. Prentice Hall, 1996. Google ScholarDigital Library
- 12.Bil Lewis and Daniel J. Berg. Threads Primer. Prentice Hall, 1996.Google Scholar
- 13.Bertrand Meyer. Eiffel: the Language. Object-Oriented Series. Prentice Hall, 1992. Google ScholarDigital Library
- 14.Eric Molar, David A. Kranz, and Robert H. Halstead, Jr. Lazy Task Creation: A techinque for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3):264-280, July 1991. Google ScholarDigital Library
- 15.Rishiyur S. Nildail and Arvind. Id: a language with implicit parallelism. Technical report, Massachusetts instituted of Technology, Cambridge, 1990.Google Scholar
- 16.Yoshihiro Oyama, Kenjiro Taura, and Akinori Yonezawa. An efficient compilation framework for languages based on concurrent process calculus. In Proceedings of Europar '97, mtmber 1300 in Lecture Notes in Computer Science, pages 546-553, 1997. Google ScholarDigital Library
- 17.John Plevyak, Vijay Karamcheti, Xingbin Zhang, and Andrew A. Chien. A hybrid execution model for finegrained languages on distributed memory multicomputers. In Supercomputing '95, 1995. Google ScholarDigital Library
- 18.A. Rogers, M. Carlisle, J. Reppy, and L. Hendren. Supporting dynamic data structures on distributed memory machines. A CM Transactions on Programming Languages and Systems, 17(2):233-263, 1995. Google ScholarDigital Library
- 19.Richard M. Stallman. Using and Porting GNU CC, 1995.Google Scholar
- 20.Supercomputing Technology Group MIT Laboratory for Computer Science. Cilk-5.0 (Beta 1) Reference Manual, 1997. http://theory, les .mit. zdu/-cilk/.Google Scholar
- 21.Kenjiro Taura, Satoshi Matsuoka, and Akinori Yonezawa. StackThreads: An abstract machine for scheduling fine-grain threads on stock CPUs. In Proceedings of Workshop on Theory and Practice of Parallel Programming, number 907 in Lecture Notes on Computer Science, pages 121-136. Springer Verlag, 1994. Google ScholarDigital Library
- 22.Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa. StackThreads/MP: Integrating futures into calling standards. Technical Report TR 99-01, University of Tokyo, 1999. (longer version of this paper).Google Scholar
- 23.Kenjiro Taura and Akinori Yonezawa. Schematic: A concurrent object-oriented extension to scheme. In Proceedings of Workshop on Object-Based Parallel and Distributed Computation, number 1107 in Lecture Notes in Computer Science, pages 59-82. Springer-Verlag, 1996. Google ScholarDigital Library
- 24.Kenjiro Tanra and Akinori Yonezawa. Fine-grain multithreading with minimal compiler support--a cost effective approach to implementing efficient multithreading languages. In Proceedings of the 1997 A CM SIGPLAN Conference on Programming Language Design and Implementation, pages 320-333, 1997. Google ScholarDigital Library
- 25.Kazunori Ueda and Takashi Chikayama. Design of the kernel language for the parallel inference machine. The Computer Journal, 33(6):494-500, 1990. Google ScholarDigital Library
- 26.Akinori Yonezawa, Jean-Pierre Briot, and Etsuya Shibayama. Object-oriented concurrent programming in ABCL/1. In Proceedings of A CM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '86), pages 258- 268, 1986. Google ScholarDigital Library
Index Terms
- StackThreads/MP: integrating futures into calling standards
Recommendations
StackThreads/MP: integrating futures into calling standards
An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call ...
μπ: a scalable and transparent system for simulating MPI programs
SIMUTools '10: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniquesμπ is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features ...
Cray X-MP: The Birth of a Supercomputer
The authors' experience in designing and manufacturing the Cray X-MP supercomputer is described. The X-MP is a multiprocessor design built on the basic architecture of the Cray-1 and incorporates the 16-gate emitter-coupled logic gate arrays used in the ...
Comments