Article

Cross-architecture performance predictions for scientific applications using parameterized models

Authors:
Gabriel Marin

Rice University, Houston, TX

Rice University, Houston, TX
View Profile

,
John Mellor-Crummey

Rice University, Houston, TX

Rice University, Houston, TX
View Profile

SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systemsJune 2004Pages 2–13https://doi.org/10.1145/1005686.1005691

Published:01 June 2004Publication History

SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systems

Pages 2–13

ABSTRACT

This paper describes a toolkit for semi-automatically measuring and modeling static and dynamic characteristics of applications in an architecture-neutral fashion. For predictable applications, models of dynamic characteristics have a convex and differentiable profile. Our toolkit operates on application binaries and succeeds in modeling key application characteristics that determine program performance. We use these characterizations to explore the interactions between an application and a target architecture. We apply our toolkit to SPARC binaries to develop architecture-neutral models of computation and memory access patterns of the ASCI Sweep3D and the NAS SP, BT and LU benchmarks. From our models, we predict the L1, L2 and TLB cache miss counts as well as the overall execution time of these applications on an Origin 2000 system. We evaluate our predictions by comparing them against measurements collected using hardware performance counters.

References

The ASCI Sweep3D Benchmark Code. DOE Accelerated Strategic Computing Initiative. http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/asci_sweep3d.html.Google Scholar
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Dec. 1995.Google Scholar
T. Ball and J. R. Larus. Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems, 16(4):1319--1360, July 1994. Google ScholarDigital Library
B. Bennett and V. Kruskal. Lru stack processing. IBM Journal of Research and Development, 19(4):353--357, July 1975.Google ScholarDigital Library
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In IASTED conference on Parallel and Distributed Computing and Systems 2001 (PDCS01), pages 617--662, 2001.Google Scholar
T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990. Google ScholarDigital Library
C. Ding and Y. Zhong. Reuse distance analysis. Technical Report TR741, Dept. of Computer Science, University of Rochester, 2001. Google ScholarDigital Library
C. Hristea, D. Lenoski, and J. Keen. Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), pages 1--12. ACM Press, 1997. Google ScholarDigital Library
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive Performance and Scalability Modeling of a Large-Scale Application. In Supercomputing 2001, Denver, CO, Nov. 2001. Google ScholarDigital Library
D. E. Knuth and F. R. Stevenson. Optimal measurement points for program frequency counts. BIT, 13(3):313--322, 1973.Google ScholarCross Ref
J. Larus and E. Schnarr. EEL: Machine-Independent Executable Editing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 291--300, June 1995. Google ScholarDigital Library
G. Marin. Semi-Automatic Synthesis of Parameterized Performance Models for Scientific Programs. Master's thesis, Dept. of Computer Science, Rice University, Houston, TX, Apr. 2003.Google Scholar
MathWorks. Optimization Toolbox: Function quadprog. http://www.mathworks.com/access/helpdesk/help/toolbox/optim/quadprog.shtml.Google Scholar
R. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.Google ScholarDigital Library
U. Prestor. Evaluating the memory performance of a ccNUMA system. Master's thesis, Dept. of Computer Science, University of Utah, Salt Lake City, UT, Aug. 2001.Google Scholar
A. Snavely, L. Carrington, and N. Wolter. Modeling application performance by convolving machine signatures with application profiles. In Proc. IEEE 4th Annual Workshop on Workload Characterization, 2001. Google ScholarDigital Library
D. Sundaram-Stukel and M. K. Vernon. Predictive Analysis of a Wavefront Application Using LogGP. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '99), Atlanta, May 1999. Google ScholarDigital Library
R. E. Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences, 9:355--365, 1974.Google ScholarDigital Library
Y. Zhong, S. G. Dropsho, and C. Ding. Miss Rate Prediction across All Program Inputs. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, Sept. 2003. Google ScholarDigital Library

Index Terms

Cross-architecture performance predictions for scientific applications using parameterized models

Recommendations

Cross-architecture performance predictions for scientific applications using parameterized models

This paper describes a toolkit for semi-automatically measuring and modeling static and dynamic characteristics of applications in an architecture-neutral fashion. For predictable applications, models of dynamic characteristics have a convex and ...
Read More
An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, ...
Read More
Using dead blocks as a virtual victim cache
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Caches mitigate the long memory latency that limits the performance of modern processors. However, caches can be quite inefficient. On average, a cache block in a 2MB L2 cache is dead 59% of the time, i.e., it will not be referenced again before it is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systems
June 2004
450 pages
ISBN:1581138733
DOI:10.1145/1005686
General Chair:
E. G. Coffman
Columbia University
,
Program Chairs:
Zhen Liu
IBM Research
,
Arif Merchant
HP Labs
ACM SIGMETRICS Performance Evaluation Review Volume 32, Issue 1
June 2004
432 pages
ISSN:0163-5999
DOI:10.1145/1012888
Issue’s Table of Contents
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
modeling
performance analysis
prediction
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate459of2,691submissions,17%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 211
  Total Citations
  View Citations
- 1,396
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-architecture performance predictions for scientific applications using parameterized models

SIGMETRICS '04/Performance '04: Proceedings of the joint international conference on Measurement and modeling of computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-architecture performance predictions for scientific applications using parameterized models

An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

Using dead blocks as a virtual victim cache