Abstract
Decoupled architectures have not traditionally been used in the context of general purpose computing because of their inability to tolerate control-intensive code that exists across a wide range of applications. This work investigates the possibility of using multithreading to overcome the loss of decoupling dependencies that represent the cause of this main limitation in decoupled architectures. A proposal for a multithreaded decoupled control/access/execute architecture is presented as a platform for achieving high performance on general purpose workloads. It is argued that such a decoupled architecture is more complexity-effective and scalable than comparable superscalar processors, which incorporate enormous amounts of complexity for modest performance gains.
- V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In ISCA 27, May 2000. Google ScholarDigital Library
- P. Bird, A. Rawsthorne, and N. Topham. The effectiveness of decoupling. In Int. Conf. on Supercomputing, pages 47-56, 1993. Google ScholarDigital Library
- S. Cotofana and S. Vasiliadis. On the design complexity of the issue logic of superscalar machines. In Euromicro '98, pages 277-284, 1998. Google ScholarDigital Library
- M. N. Dorojevets and V. Oklobdzija. Multithreaded decoupled architecture. Int. J. High Speed Computing, 7(3):465-480, 1995.Google ScholarCross Ref
- M. N. Dorozhevets and Peter Wolcott. The el'brus-3 and MARS-M: Recent advances in russian high-performance computing. The Journal of Supercomputing, 6(1), March 1992. Google ScholarDigital Library
- E. Waingold et. al. Baring it all to software: RAW machines. IEEE Computer, pages 86-93, September 1997. Google ScholarDigital Library
- J. R. Goodman et. al. PIPE: A VLSI decoupled architecture. In ISCA 12, pages 20-27, Boston, MA, June 1985. Google ScholarDigital Library
- James E. Smith et. al. The astronautics ZS-1 processor. In 1988 IEEE International Conference on Computer Design, pages 307-310, October 1988.Google Scholar
- N. Topham et. al. Compiling and optimizing for decoupled architectures. In 1995 ACM/IEEE Supercomputing Conference, San Diego, CA, December 1995. Google ScholarDigital Library
- M. Farrens, P. Ng, and P. Nico. A comparison of superscalar and decoupled access/execute architectures. In Micro-26, Austin, Texas, December 1993. Google ScholarDigital Library
- A. Gonzalez, T. Jerez, J. Llosa, J. M. Parcerisa, and M. Valero. Performance diagnostics of the ACRI-1. Technical Report UPC-DAC-1996-1, UPC-DAC Technical Reports, Universitat Politecnica de Catalunya, 1996.Google Scholar
- L. Gwennap. Mips R10000 uses decoupled architecture. Microprocessor Report, October 1994.Google Scholar
- G. P. Jones and N. P. Topham. A comparison of data prefetching on an access decoupled and superscalar machine. In Micro-30, pages 65-70, December 1997. Google ScholarDigital Library
- R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March/April 1999. Google ScholarDigital Library
- J. Kreuzinger and T. Ungerer. Context-switching techniques for decoupled multithreaded processors. In Euromicro '99, pages 248-251, 1999.Google ScholarCross Ref
- P. Marcuello, A. Gonzales, and J. Tubella. Speculative multithreaded processors. In International Conference on Supercomputing, July 1998. Google ScholarDigital Library
- S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. In ISCA 24, pages 206-218, Denver, CO, 1997. Google ScholarDigital Library
- Joan M. Parcerisa, Antonio Gonzalez, Josep Llosa, Toni Jerez, and Mateo Valero. The performance of decoupled architectures. Technical Report UPC-DAC-1996-23, UPC-DAC Technical Reports, Universitat Politecnica de Catalunya, 1996.Google Scholar
- Joan-Manuel Parcerisa and Antonio Gonzalez. Multithreaded decoupled access/execute processors. Technical Report UPC-DAC-1997-83, UPC-DAC Technical Reports, Universitat Politecnica de Catalunya, 1997.Google Scholar
- Joan-Manuel Parcerisa and Antonio Gonzalez. The synergy of multithreading and access/execute decoupling. In HPCA 5, pages 59-63, January 1999. Google ScholarDigital Library
- James E. Smith. Decoupled access/execute computer architecture. In ISCA 9, 1982. Google ScholarDigital Library
- James E. Smith. Dynamic instruction scheduling and the astronautics ZS-1. IEEE Computer, 22(7):21-35, July 1989. Google ScholarDigital Library
- N. Topham and K. McDougall. Performance of the ACRI decoupled architecture: The perfect club. In HPCN - Europe, pages 472-480, May 1995. Google ScholarDigital Library
- G. Tyson and M. Farrens. Code scheduling for multiple instruction stream architectures. International Journal of Parallel Processing, 22(3), 1994. Google ScholarDigital Library
- G. Tyson, M. Farrens, and A. R. Pleszkun. MISC: A multiple instruction stream computer. In Micro-25, pages 193-196, Portland, Oregon, December 1992. Google ScholarDigital Library
- Wm. A. Wulf. Evaluation of the WM computer architecture. In ISCA 19, pages 382-390, Gold Coast, Australia, May 1992. Google ScholarDigital Library
Index Terms
- Multithreading decoupled architectures for complexity-effective general purpose computing
Recommendations
Using speculative multithreading for general-purpose applications
ISPA'05: Proceedings of the Third international conference on Parallel and Distributed Processing and ApplicationsAs multi-core technology is currently deployed in computer industry primarily for limiting power consumption and improving system throughput, continued performance improvement of a single application on such systems remains an important and challenging ...
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers IISuperscalar processors depend heavily on broadcast-based bypass networks to improve performance by exploiting more instruction level parallelism. However, increasing clock speeds and shrinking technology make broadcasting slower and difficult to ...
Comments