Article

DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

Authors:
Sylvain Girbal

LRI, Paris South, University and CEA, France

LRI, Paris South, University and CEA, France
View Profile

,
Gilles Mouchard

LRI, Paris South, University and CEA, France

LRI, Paris South, University and CEA, France
View Profile

,
Albert Cohen

INRIA Rocquencourt, France

INRIA Rocquencourt, France
View Profile

,
Olivier Temam

LRI, Paris South, University, France

LRI, Paris South, University, France
View Profile

SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsJune 2003Pages 1–12https://doi.org/10.1145/781027.781029

Published:10 June 2003Publication History

SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Pages 1–12

ABSTRACT

While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation during this research process and get trends faster, researchers usually reduce the trace size. More sophisticated techniques like trace sampling or distributed simulation are scarcely used because they are considered unreliable and complex due to their impact on accuracy and the associated warm-up issues.In this article, we present DiST, a practical distributed simulation scheme where, unlike in other simulation techniques that trade accuracy for speed, the user is relieved from most accuracy issues thanks to an automatic and dynamic mechanism for adjusting the warm-up interval size. Moreover, the mechanism is designed so as to always privilege accuracy over speedup. The speedup scales with the amount of available computing resources, bringing an average 7.35 speedup on 10 machines with an average IPC error of 1.81% and a maximum IPC error of 5.06%.Besides proposing a solution to the warm-up issues in distributed simulation, we experimentally show that our technique is significantly more accurate than trace size reduction or trace sampling for identical speedups. We also show that not only the error always remains small for IPC and other metrics, but that a researcher can reliably base research decisions on DiST simulation results. Finally, we explain how the DiST tool is designed to be easily pluggable into existing architecture simulators with very few modifications.

References

J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S. Leung, D. Sites, M. Vandevoorde, C. Waldspurger, and W. Weihl. Continuous profiling: Where have all the cycles gone, July 1997. Google ScholarDigital Library
P. Bose and T. M. Conte. Performance analysis and its impact on design. IEEE Computer, pages 41--49, May 1998. Google ScholarDigital Library
D. Burger and T. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, Department of Computer Sciences, University of Wisconsin, June 1997.Google ScholarDigital Library
S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In Sixth International Symposium on High-Performance Computer Architecture, pages 195--205, Toulouse, France, 2000.Google Scholar
T. Conte, M. Hirsch, and K. Menezes. Reducing state loss for effective trace sampling of superscalar processors. In International Conference on Computer Design, pages 468--477, 1996. Google ScholarDigital Library
J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Z. Chrysos. ProfileMe : Hardware support for instruction-level profiling on out-of-order processors. In International Symposium on Microarchitecture, pages 292--302, Research Triangle Park, North Carolina, 1997. Google ScholarDigital Library
R. Desikan, D. Burger, and S. W. Keckler. Measuring experimental error in microprocessor simulation. In The 28th Annual Intl. Symposium on Computer Architecture, pages 266--277, June 2001. Google ScholarDigital Library
L. Eeckhout, K. DeBousschere, and H. Neefs. Performance analysis through synthetic trace generation. In Int. Symp. on Performance Analysis of Systems and Software, Liege, Belgium, April 2000. Google ScholarDigital Library
J. Haskins and K. Skadron. Minimal subset evaluation: Rapid warm-up for simulated hardware state. In Proc. of the 2001 International Conference on Computer Design, Austin, Texas, September 2001.Google ScholarCross Ref
V. S. Iyengar and L. H. Trevillyan. Evaluation and generation of reduced traces for benchmarks. Technical Report RC20610, IBM T. J. Watson, Oct 1996.Google Scholar
A. KleinOsowski, J. Flynn, N. Meares, and D. Lilja. Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research. In Proceedings of the Third IEEE Annual Workshop on Workload Characterization, International Conference on Computer Design (ICCD),, pages 73--82, September 2000.Google Scholar
T. Lafage, A. Seznec, E. Rohou, and F. Bodin. Code cloning tracing: A "pay per trace" approach. In EuroPar'99 Parallel Processing, Toulouse, France, August 1999. Google ScholarDigital Library
M. J. Litzkow, M. Livny, and M. W. Mutka. Condor - a hunter of idle workstations. In Proc. of the 8th Intl. Conf. on Distributed Computing Systems, pages 104--111, San Jose, Calif., June 1988.Google ScholarCross Ref
M. Martonosi, A. Gupta, and T. Anderson. Effectiveness of trace sampling for performance debugging tools. In Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 248--259. ACM Press, 1993. Google ScholarDigital Library
A. Nguyen, M. Michael, A. Nanda, K. Ekanadham, and P. Bose. Accuracy and speed-up of parallel trace-driven architectural simulation. In Proc. Int'l Parallel Processing Symp., IEEE Computer Soc. Press,, pages 39--44, Geneva, Switzerland, April 1997. Google ScholarDigital Library
D. B. Noonburg and J. P. Shen. A framework for statistical modeling of superscalar processor performance. In Proc. Thrird In. Symp. On High Perf. Computer Architecture, San Antonio, Texas, February 1997. Google ScholarDigital Library
S. Nussbaum and J. Smith. Modeling superscalar processors via statistical simulation. In PACT '01, International Conference on Parallel Architectures and Compilation Techniques, Barcelona, September 2001. Google ScholarDigital Library
D. Parello, O. Temam, and J.-M. Verdun. On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance - matrix-multiply revisited. In Supercomputing 2002, Baltimore, November 2002. Google ScholarDigital Library
V. Rajesh and R. Moona. Processor modeling for hardware software codesign. In International Conference on VLSI Design, Goa, India, January 1999. Google ScholarDigital Library
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architecture and Compilation Techniques, Barcelona, Spain, September 2001. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proc. of Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, Calif., October 2002. Google ScholarDigital Library
Synopsys. SystemC. http://www.systemc.org, 2000-2002.Google Scholar
X. Vera, M. Hogskola, and J. Xue. Let's study whole-program cache behaviour analytically. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), Boston, Massachusettes, February 2002. Google ScholarDigital Library
Z. Wang, K. Pierce, and S. McFarling. BMAT --- a binary matching tool for stale profile propagation. Journal of Instruction-Level Parallelism, 2(1--6), 2000.Google Scholar

Index Terms

DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation ...
Read More
Using the HLA for Distributed Continuous Simulations
EUROSIM '13: Proceedings of the 2013 8th EUROSIM Congress on Modelling and Simulation

Distributed computing offers many advantages for all types of computational applications. Realizing heterogeneous simulation platforms may benefit from many facilities of distributed computing. However, distributing simulation components over a network ...
Read More
SMT Layout Overhead and Scalability

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
June 2003
338 pages
ISBN:1581136641
DOI:10.1145/781027
General Chairs:
Bill Cheng
TeleGIF
,
Satish Tripathi
University of California at Riverside
,
Program Chairs:
Jennifer Rexford
AT&T Labs -- Research, Florham Park, NJ
,
William H. Sanders
University of Illinois at Urbana-Champaign
ACM SIGMETRICS Performance Evaluation Review Volume 31, Issue 1
June 2003
325 pages
ISSN:0163-5999
DOI:10.1145/885651
Issue’s Table of Contents
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed simulation
processor architecture
Qualifiers
- Article
Conference

Acceptance Rates
SIGMETRICS '03 Paper Acceptance Rate26of222submissions,12%Overall Acceptance Rate459of2,691submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 756
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

Using the HLA for Distributed Continuous Simulations

SMT Layout Overhead and Scalability