Skip to main content
Log in

3DyRM: a dynamic roofline model including memory latency information

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern systems present complex memory hierarchies and heterogeneity among cores and processors. As a consequence, efficient programming is challenging. An easy-to-understand performance model, offering guidelines and information about the behaviour of a code, may be useful to alleviate these issues. In this paper, we present two extensions of the well-known Berkeley Roofline Model. The first of these extensions, the Dynamic Roofline Model (DyRM), takes into consideration the complexities of multicore and heterogeneous systems, offering a more detailed view of the evolution of the execution of a code. The second, the 3DyRM, also adds information about the latency of memory accesses to better represent the behaviour on systems with complex memory hierarchies. A set of tools to obtain and represent the models has been implemented. These tools obtain the needed data from hardware counters, with low overhead. Different views are displayed by the tool that can be used to extract the main features of the code. Results of studying, with these tools, the NAS Parallel Benchmarks for OpenMP on two different systems are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. HP (2013) HP Caliper, Rockville. http://www.hp.com/go/hpux-caliper-docs. Accessed 2014

  2. Intel (2012) Intel\(\textregistered \)64 and IA-32 architectures software developer’s manual volume 3B: system programming guide, part 2. http://download.intel.com/products/processor/manual/253669.pdf. Accessed 2014

  3. Intel (2013) Intel VTune performance analyzer. Intel Corporation, Santa Clara. http://software.intel.com/en-us/intel-vtune. Accessed 2014

  4. Intel (2013) Intel ark. http://ark.intel.com/products/64592/. Accessed 2014

  5. Jin H, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. In: Technical report NAS-99-011, NASA Ames Research Center, Moffett Field

  6. Lorenzo OG, Lorenzo JA, Cabaleiro JC, Heras DB, Suarez M, Pichel JC (2011) A study of memory access patterns in irregular parallel codes using hardware counter-based tools. In: Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp 920–923.

  7. Martínez DR, Blanco V, Cabaleiro JC, Pena TF, Rivera FF (2013) Modeling the performance of parallel applications using model selection techniques. Concurr Comput Pract Exp doi:10.1002/cpe.3020

  8. McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture (TCCA) newsletter, pp 19–25

  9. Mosberger D, Eranian S (2001) IA-64 linux kernel: design and implementation. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  10. Paradyn Project (2013) Paradyn, Cape Coral. http://www.cs.wisc.edu/paradyn/. Accessed 2014

  11. perfmon2 (2013) Precise event-based sampling (PEBS). http://perfmon2.sourceforge.net/pfmon_intel_core.html#pebs. Accessed 2014

  12. R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (ISBN 3-900051-07-0)

  13. Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311

    Google Scholar 

  14. Taylor V, Wu X, Stevens R (2003) Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. ACM SIGMETRICS Perform Eval Rev 30(4):13–18

    Article  Google Scholar 

  15. Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76 doi:10.1145/1498765.1498785

    Google Scholar 

  16. Wu X (1999) Performance, evaluation, prediction and visualization of parallel systems. Kluwer Academic Publishers, Boston

Download references

Acknowledgments

This work has been partially supported by the Ministry of Education and Science of Spain, FEDER funds under contract TIN 2010-17541, and Xunta de Galicia, EM2013/041. It has been developed in the framework of the European network HiPEAC-2 and the Spanish network CAPAP-H4 (TIN2011-15734-E).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. G. Lorenzo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lorenzo, O.G., Pena, T.F., Cabaleiro, J.C. et al. 3DyRM: a dynamic roofline model including memory latency information. J Supercomput 70, 696–708 (2014). https://doi.org/10.1007/s11227-014-1163-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1163-4

Keywords

Navigation