ABSTRACT
The modern parallel I/O stack consists of several software layers with complex inter-dependencies and performance characteristics. While each layer exposes tunable parameters, it is often unclear to users how different parameter settings interact with each other and affect overall I/O performance. As a result, users often resort to default system settings, which typically obtain poor I/O bandwidth. In this research, we develop a benchmark guided auto-tuning framework for tuning the HDF5, MPI-IO, and Lustre layers on production supercomputing facilities. Our framework consists of three main components. H5Tuner uses a control file to adjust I/O parameters without modifying or recompiling the application. H5PerfCapture records performance metrics for HDF5 and MPI-IO. H5Evolve uses a genetic algorithm to explore the parameter space to determine well-performing configurations. We demonstrate I/O performance results for three HDF5 application-based benchmarks on a Sun HPC system. All the benchmarks running on 512 MPI processes perform 3X to 5.5X faster with the auto-tuned I/O parameters compared to a configuration with default system parameters.
- P. Carns et al. Understanding and improving computational science storage access through continuous characterization. In 27th IEEE Conference on Mass Storage Systems and Technologies, 2011. Google ScholarDigital Library
- M. Howison et al. Tuning HDF5 for Lustre File Systems. In Proceedings of 2010 Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), 2010.Google Scholar
- R. Vuduc, J. Demmel, and K. Yelick. Oski: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, 2005.Google Scholar
- R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1--2):3--35, 2001.Google Scholar
- S. Williams et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12, 2007. Google ScholarDigital Library
- H. You, Q. Liu, Z. Li, and S. Moore. The design of an auto-tuning i/o framework on cray xt5 system.Google Scholar
- W. Yu et al. Performance characterization and optimization of parallel i/o on the cray xt. In IPDPS 2008., pages 1--11, april 2008.Google Scholar
Index Terms
- A framework for auto-tuning HDF5 applications
Recommendations
Taming parallel I/O complexity with auto-tuning
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisWe present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to ...
A framework for auto-tuning HDF5 applications
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingThe modern parallel I/O stack consists of several software layers with complex inter-dependencies and performance characteristics. While each layer exposes tunable parameters, it is often unclear to users how different parameter settings interact with ...
Understanding Parallel I/O Performance and Tuning
SNTA '22: Fifth International Workshop on Systems and Network Telemetry and AnalyticsPerformance of parallel I/O is critical for large-scale scientific applications to store and access data from parallel file systems on high-performance computing (HPC) systems. These applications use HPC systems often to generate and analyze large ...
Comments