article

A framework for modeling and optimization of prescient instruction prefetch

Authors:
Tor M. Aamodt

Intel Labs, Santa Clara, CA and University of Toronto, Canada

Intel Labs, Santa Clara, CA and University of Toronto, Canada
View Profile

,
Pedro Marcuello

Universitat Politécnica de Catalunya, Spain

Universitat Politécnica de Catalunya, Spain
View Profile

,
Paul Chow

University of Toronto, Canada

University of Toronto, Canada
View Profile

,
Antonio González

Universitat Politécnica de Catalunya, Spain

Universitat Politécnica de Catalunya, Spain
View Profile

,
Per Hammarlund

Intel Corp., Hillsboro, OR

Intel Corp., Hillsboro, OR
View Profile

,
Hong Wang

Intel Labs, Santa Clara, CA

Intel Labs, Santa Clara, CA
View Profile

,
John P. Shen

Intel Labs, Santa Clara, CA

Intel Labs, Santa Clara, CA
View Profile

Authors Info & Claims

ACM SIGMETRICS Performance Evaluation Review Volume 31 Issue 1June 2003pp 13–24https://doi.org/10.1145/885651.781030

Published:10 June 2003Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch. A helper thread is initiated when the main thread encounters a spawn point, and prefetches instructions starting at a distant target point. The target identifies a code region tending to incur I-cache misses that the main thread is likely to execute soon, even though intervening control flow may be unpredictable. The optimization of spawn-target pair selections is formulated by modeling program behavior as a Markov chain based on profile statistics. Execution paths are considered stochastic outcomes, and aspects of program behavior are summarized via path expression mappings. Mappings for computing reaching, and posteriori probability; path length mean, and variance; and expected path footprint are presented. These are used with Tarjan's fast path algorithm to efficiently estimate the benefit of spawn-target pair selections. Using this framework we propose a spawn-target pair selection algorithm for prescient instruction prefetch. This algorithm has been implemented, and evaluated for the Itanium Processor Family architecture. A limit study finds 4.8%to 17% speedups on an in-order simultaneous multithreading processor with eight contexts, over nextline and streaming I-prefetch for a set of benchmarks with high I-cache miss rates. The framework in this paper is potentially applicable to other thread speculation techniques.

References

M. Annavaram, J. M. Patel, and E. S. Davidson. Data Prefetching by Dependence Graph Precomputation. In 28th International Symposium on Computer Architecture pages 52--61, 2001. Google ScholarDigital Library
P. P. Chang, S. A. Mahlke, and W. Hwu. Using Profile Information to Assist Classic Code Optimizations. Software -- Practice and Experience 21(12):1301--1321, 1991. Google ScholarDigital Library
R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, and Y. N. Patt. Simultaneous Subordinate Microthreading (SSMT). In 26th International Symposium on Computer Architecture pages 186--195,1999. Google ScholarDigital Library
R. S. Chappell, F. Tseng, A. Yoaz, and Y. N. Patt. Difficult-Path Branch Prediction Using Subordinate Microthreads. In 29th International Symposium on Computer Architecture pages 307--317, 2002. Google ScholarDigital Library
J. D. Collins, D. M. Tullsen, H. Wang, and J. P. Shen. Dynamic Speculative Precomputation. In 34th International Symposium on Microarchitecture pages 306--317, 2001. Google ScholarDigital Library
J. D. Collins, H. Wang, D. M. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. P. Shen. Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In 28th International Symposium on Computer Architecture pages 14--25, 2001. Google ScholarDigital Library
M. Dubois and Y. Song. Assisted execution. Technical Report CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.Google Scholar
J. Emer. Simultaneous multithreading: Multiplying alpha's performance. Microprocessor Forum, Oct. 1999.Google Scholar
D. W. Hammerstromand, E. S. Davidson. Information Content of CPU Memory Referencing Behavior. In 4th International Symposium on Computer Architecture pages 184--192, 1977. Google ScholarDigital Library
G. Hinton and J. Shen. Intel's multi-threading technology. Microprocessor Forum, Oct. 2001.Google Scholar
J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 Architecture. IEEE Micro 20(5):12--23, 2000. Google ScholarDigital Library
Intel Corporation. Special Issue on Intel Hyper-Threading Technology in Pentium® 4 Processors. Intel Technology Journal. Q1 2002.Google Scholar
S. S. Liao, P. H. Wang, H. Wang, G. Hoflehner, D. Lavery, and J. P. Shen. Post-Pass Binary Adaptation for Software-Based Speculative Precomputation. In SIGPLAN 2002 Conference on Programming Language Design and Implementation pages 117--128, 2002. Google ScholarDigital Library
C.-K. Luk. Tolerating Memory Latency Through Software-Controlled Preexecution in Simultaneous Multithreading Processors. In 28th International Symposium on Computer Architecture pages 40--51, 2001. Google ScholarDigital Library
P. Marcuello and A. Gonzlez. Thread-Spawning Schemes for Speculative Multithreading. In 8th International Symposium on High-Performance Computer Architecture pages 55--64, 2002. Google ScholarDigital Library
A. Moshovos, D. N. Pnevmatikatos, and A. Baniasadi. Slice-Processors: An Implementation of Operation-Based Prediction. In 15th International Conference on Supercomputing pages 321--334, 2001. Google ScholarDigital Library
G. Ramalingam. Data flow frequency analysis. In SIGPLAN 1996 Conference on Programming Language Design and Implementation pages 267--277, 1996. Google ScholarDigital Library
G. Reinman, B. Calder, and T. Austin. Fetch Directed Instruction Prefetching. In 32nd International Symposium on Microarchitecture pages 16--27, 1999. Google ScholarDigital Library
A. Roth and G. S. Sohi. Speculative Data-Driven Multithreading. In 7th International Symposium on High-Performance Computer Architecture pages 37--48, 2001. Google ScholarDigital Library
A. Roth and G. S. Sohi. A Quantitative Framework for Automated Pre-Execution Thread Selection. In 35th International Symposium on Microarchitecture pages 430--441, 2002. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems pages 45--57, 2002. Google ScholarDigital Library
R. E. Tarjan. A Unified Approach to Path Problems. Journal of the ACM 28(3):577--593, 1981. Google ScholarDigital Library
R. E. Tarjan. Fast Algorithms for Solving Path Problems. Journal of the ACM 28(3):594--614, 1981. Google ScholarDigital Library
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In 23rd International Symposium on Computer Architecture pages 191--202, 1996. Google ScholarDigital Library
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In 22nd International Symposium on Computer Architecture pages 392--403, 1995. Google ScholarDigital Library
M. Weiser. Program slicing. In 5th International Conference on Software Engineering pages 439--449, 1981. Google ScholarDigital Library
C. Zilles and G. Sohi. Execution-based prediction using speculative slices. In 28th International Symposium on Computer Architecture pages 2--13, 2001. Google ScholarDigital Library
C. B. Zilles and G. S. Sohi. Understanding the backward slices of performance degrading instructions. In 27th International Symposium on Computer Architecture pages 172--181,2000. Google ScholarDigital Library

Index Terms

A framework for modeling and optimization of prescient instruction prefetch
1. Hardware
  1. Electronic design automation
    1. Modeling and parameter extraction

Recommendations

A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious ...
Read More
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Read More
Optimization of data prefetch helper threads with path-expression based statistical modeling
ICS '07: Proceedings of the 21st annual international conference on Supercomputing

This paper investigates helper threads that improve performance by prefetching data on behalf of an application's main thread. The focus is data prefetch helper threads that lack branch instructions and which generate prefetches for one dynamic instance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMETRICS Performance Evaluation Review Volume 31, Issue 1
June 2003
325 pages
ISSN:0163-5999
DOI:10.1145/885651
Issue’s Table of Contents
SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 2003
338 pages
ISBN:1581136641
DOI:10.1145/781027
General Chairs:
Bill Cheng
TeleGIF
,
Satish Tripathi
University of California at Riverside
,
Program Chairs:
Jennifer Rexford
AT&T Labs -- Research, Florham Park, NJ
,
William H. Sanders
University of Illinois at Urbana-Champaign
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2003
Check for updates
Author Tags
analytical modeling
helper threads
instruction prefetch
multithreading
optimization
path expressions
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 556
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A framework for modeling and optimization of prescient instruction prefetch

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

A framework for modeling and optimization of prescient instruction prefetch

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Optimization of data prefetch helper threads with path-expression based statistical modeling