ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Parallel Computing
Volume 33, Issue 9, September 2007, Pages 624-633
Selected Papers from EuroPVM/MPI 2006
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (205 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.parco.2007.06.006    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier B.V. All rights reserved.

Optimizing a conjugate gradient solver with non-blocking collective operations

Torsten Hoeflera, b, Corresponding Author Contact Information, E-mail The Corresponding Author, Peter Gottschlinga, E-mail The Corresponding Author, Andrew Lumsdainea, E-mail The Corresponding Author and Wolfgang Rehmb, E-mail The Corresponding Author

aIndiana University, Open Systems Lab, Bloomington, IN 47404, USA bTechnical University of Chemnitz, Department of Computer Science, 09107 Chemnitz, Germany

Received 22 December 2006; 
revised 7 June 2007; 
accepted 29 June 2007. 
Available online 27 July 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper presents a case study that analyzes the suitability and usage of non-blocking collective operations in parallel applications. As with their point-to-point counterparts, non-blocking collective operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. These operations are provided for MPI programs with LibNBC, a portable low-overhead implementation of non-blocking collective operations built on MPI-1. The straightforward applicability of the LibNBC is demonstrated by incorporating non-blocking collective operations into a parallel conjugate gradient solver. Although only minor changes are required to use them, non-blocking collective operations allow most of the communication costs to be hidden and provide performance improvements of up to 34%. We also show that, because of overlap, there is no significant performance difference between Gigabit Ethernet and InfiniBandTM for special cases of our calculation.

Keywords: Message passing interface (MPI); Communication; Computation overlap; Collective operations; Non-blocking collective operations; Poisson solver

Article Outline

1. Introduction
1.1. Related work
2. Implementing non-blocking collective operations
2.1. The scheduling engine
2.2. Building a schedule
2.3. Schedule execution
3. Optimization of linear solvers
3.1. Case study: three-dimensional Poisson equation
3.2. Domain decomposition
3.3. Design and optimization of the CG solver
3.4. Parallel matrix vector product
3.5. Benchmark results
3.6. Comparison to non-blocking point-to-point messaging
3.7. Optimization impact on other linear solvers
4. Conclusions and future work
Acknowledgements
References





Parallel Computing
Volume 33, Issue 9, September 2007, Pages 624-633
Selected Papers from EuroPVM/MPI 2006
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.