- 1 W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. Proc. of the 1979 National Computer Conference, pages 969-974, June 1979.Google ScholarCross Ref
- 2 E. Anderson and J. Dongarra. LAPACK working note 18, implementation guide for LAPACK. Technical Report CS- 90-101, University of Tennessee, Apr 1990. Google ScholarDigital Library
- 3 D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, June 1990. Google ScholarDigital Library
- 4 J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, pages 1-17, March 1990. Google ScholarDigital Library
- 5 K. Gallivan, W. Jalby, U. Meier, and A. Sameh. The impact of hierarchical memory systems on linear algebra algorithm design. Technical Report UIUCSRD 625, University of Illinios, 1987.Google Scholar
- 6 D. Oannon and W. Jalby. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiproeessor. In The Characteristics of Parallel Algorithms. MIT Press, 1987.Google Scholar
- 7 D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5:587-616, 1988. Google ScholarDigital Library
- 8 G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
- 9 J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebbl~ game, In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages 326-333. ACM SIGACT, May 1981. Google ScholarDigital Library
- 10 A. C. McKeller and E. G. Coffman. The organization of matrices and matrix operations in a paged multiprogramming environment. CACM, 12(3):153-165, 1969. Google ScholarDigital Library
- 11 A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Google ScholarDigital Library
- 12 M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. Submitted for publication., 1990.Google Scholar
Index Terms
- The cache performance and optimizations of blocked algorithms
Recommendations
Counter-Based Cache Replacement and Bypassing Algorithms
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Comments