Abstract
While trace cache, value prediction, and prefetching have been shown to be effective in the single-threaded superscalar, there has been no analysis of these techniques in a Simultaneously Multi threaded (SMT) processor. SMT brings new factors both for and against these techniques, and it is not known how these techniques would fare in SMT. We evaluate these techniques in an SMT to pro vide recommendations for future SMT designs. Our key contribu tions are: (1) we identify a fundamental interaction between the techniques and SMT’s sharing of resources among multiple threads, and (2) we quantify the impact of this interaction on SMT through put. SMT’s sharing of the instruction storage (i.e., trace cache or i-cache), physical registers, and issue queue impacts the effectiveness of trace cache, value prediction, and prefetching, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mendelson, A., Gabbay, F.: Speculative execution based on value prediction. Technical report, Technion (1997)
Balasubramonian, R., Dwarkadas, S., Albonesi, D.H.: Reducing the complexity of the register file in dynamic superscalar processors. In: Proc. of the 34th MICRO (November 2001)
Black, B., Rychlik, B., Shen, J.P.: The block-based trace cache. In: Proc. of the 26th ISCA (October 1999)
Borch, E., Tune, E., Manne, S., Emer, J.: Loose loops sink chips. In: Proc. of 8th HPCA (February 2002)
Calder, B., Reinman, G., Tullsen, D.M.: Selective value prediction. In: Proc. of the 26th ISCA (May 1999)
Charney, M.J., Reeves, A.P.: Generalized correlation-based hardware prefetching. Technical Report EE-CEG-95-1, Cornell University (February 1995)
Farkas, K.I., Jouppi, N.P.: Complexity/performance tradeoffs with non-blocking loads. In: Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 211–222 (April 1994)
Friendly, D.H., Patel, S.J., Patt, Y.N.: Alternative fetch and issue policies for the trace cache fetch mechanism. In: Proc. of the 30th MICRO (November 1997)
Hu, Z., Martonosi, M., Kaxiras, S.: Tcp: Tag correlating prefetchers. In: Proc. of 9th HPCA (February 2003)
Joseph, D., Grunwald, D.: Prefetching using markov predictors. In: Proc. of the 24th ISCA (June 1997)
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proc. of the 17th ISCA (May 1990)
Kaxiras, S., Hu, Z., Martonosi, M.: Cache decay: Exploiting generational behaviour to reduce cache leakage power. In: Proc. of the 28th ISCA (June 2001)
Lai, A.-C., Fide, C., Falsafi, B.: Dead-block prediction and dead-block correlating prefetchers. In: Proc. of the 28th ISCA (June 2001)
Lipasti, M.H., Schmidt, W.J., Kunkel, S.R., Roediger, R.R.: Spaid: software prefetching in pointer and call intensive environments. In: Proc. of the 28th MICRO (November 1995)
Lo, J., Barroso, L., Eggers, S., Gharachorloo, K., Levy, H., Parekh, S.: An analysis of database workload performance on simultaneous multithreaded processors. In: Proc. of the 25th ISCA (June 1998)
Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value locality and data speculation. In: Proc. of the 7th ASPLOS (October 1996)
Moshovos, A., Sohi, G.S.: Streamlining inter-operation memory communication via data dependence prediction. In: Proc. of the 30th MICRO (December 1997)
Park, I., Powell, M.D., Vijaykumar, T.N.: Reducing register ports for higher speed and lower energy. In: Proc. of the 35th MICRO (November 2002)
Patel, S.J., Evers, M., Patt, Y.N.: Improving trace cache effectiveness with branch promotion and trace packing. In: Proc. of the 25th ISCA (June 1998)
Patel, S.J., Friendly, D.H., Patt, Y.N.: Evaluation of design options for the trace cache fetch mechanism. IEEE Transactions on Computers, Special Issue on Cache Memory and Related Problems
Patel, S.J., Friendly, D.H., Patt, Y.N.: Critical issues regarding the trace cache fetch mechanism. Technical Report CSE-TR-335-97, University of Michigan (May 1997)
Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: A low latency approach to high bandwidth instruction fetching. In: Proc. of the 29th MICRO (December 1996)
Sazeides, Y., Smith, J.E.: Implementations of context based value predictors. Technical Report ECE-97-8, University of Wisconsin-Madison (December 1997)
Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proc. of the 5th ASPLOS (October 1992)
Chen, T.F., Baer, J.L.: Reducing memory latency via non-blocking and prefetching caches. In: Proc. of the 5th ASPLOS (October 1992)
Timothy Sherwood, G.H., Perelman, E., Calder, B.: Automatically characterizing large scale program behavior. In: Proc. of the 10th ASPLOS (October 2002)
Tullsen, D.M., Brown, J.A.: Handling long-latency loads in a simultaneous multithreading processor. In: Proc. of the 34th MICRO (December 2001)
Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.: Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In: Proc. of the 23rd ISCA (May 1996)
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proc. of the 22nd ISCA (June 1995)
Tyson, G.S., Austin, T.M.: Improving the accuracy and performance of memory communication through renaming. In: Proc. of the 30th MICRO (December 1997)
Yeh, T.-Y., Marr, D., Patt, Y.: Increasing instruction fetch rate via multiple branch prediction and a branch address cache. In: Proc. of the 7th ACM Int. Conf. on Supercomputing (July 1993)
Zhigang Hu, S.K., Martonosi, M.: Timekeeping in the memory system: Predicting and optimizing memory behavior. In: Proc. of the 29th ISCA (May 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cher, CY., Park, I., VijayKumar, T.N. (2006). Do Trace Cache, Value Prediction and Prefetching Improve SMT Throughput?. In: Grass, W., Sick, B., Waldschmidt, K. (eds) Architecture of Computing Systems - ARCS 2006. ARCS 2006. Lecture Notes in Computer Science, vol 3894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11682127_17
Download citation
DOI: https://doi.org/10.1007/11682127_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32765-3
Online ISBN: 978-3-540-32766-0
eBook Packages: Computer ScienceComputer Science (R0)