Abstract
Parallel 3D FFT is a commonly used numerical method in scientific computing. P3DFFT is a recently proposed implementation of parallel 3D FFT that is designed to allow scalability to massively large systems such as Blue Gene. While there has been recent work that demonstrates such scalability on regular cartesian meshes (equal length in each dimension), its performance and scalability for flat cartesian meshes (much smaller length in one dimension) is still a concern. In this paper, we perform studies on a 16-rack (16384-node) Blue Gene/L system that demonstrates that a combination of the network topology and the communication pattern of P3DFFT can result in early network saturation and consequently performance loss. We also show that remapping processes on nodes and rotating the mesh by taking the communication properties of P3DFFT into consideration, can help alleviate this problem and improve performance by up to 48% in some special cases.
This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. We also acknowledge IBM for allowing us to use their BG-Watson system for our experiments. Finally, we thank Joerg Schumacher for providing us his test code that allowed us to understand the scalability issues with P3DFFT on flat cartesian meshes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
http://www.spscicomp.org/scicomp12/presentations/user/pekurovsky.pdf
Cooley, J.W., Tukey, J.W.: An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation 19(90), 297–301 (1964)
Intel Corporation. Intel Math Kernel Library (MKL), http://www.intel.com/cd/software/products/asmo-na/eng/307757.htm
Cramer, C.E., Board, J.A.: The Development and Integration of a Distributed 3D FFT for a Cluster of Workstations. In: Proceedings of the 4th Annual Linux Showcase and Conference, vol. 4. USENIX Association (2000)
Dubey, A., Tessera, D.: Redistribution strategies for portable parallel FFT: a case study. Concurrency and Computation: Practice and Experience 13(3), 209–220 (2001)
Eleftheriou, M., Fitch, B.G., Rayshubskiy, A., Ward, T.J.C., Germain, R.S.: Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements. IBM Journal of Research and Development 49, 457–464 (2005)
Filippone, S.: The IBM Parallel Engineering and Scientific Subroutine Library. In: International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science, London, UK, pp. 199–206. Springer, Heidelberg (1996)
Fitch, B.G., Rayshubskiy, A., Eleftheriou, M., Ward, T.J.C., Giampapa, M.E., Pitman, M.C., Pitera, J.W., Swope, W.C., Germain, R.S.: Blue Matter: Scaling of N-body simulations to one atom per node. IBM Journal of Research and Development 52(1/2) (2008)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005); special issue on Program Generation, Optimization and Platform Adaptation
Gara, A., Blumrich, M.A., Chen, D., Chiu, G.L.-T., Coteus, P., Giampapa, M.E., Haring, R.A., Heidelberger, P., Hoenicke, D., Kopcsay, G.V., Liebsch, T.A., Ohmacht, M., Steinmacher-Burow, B.D., Takken, T., Vranas, P.: Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development 49, 195–212 (2005)
Olson, C.J., Zimnyi, G.T., Kolton, A.B., Grnbech-Jensen, N.: Static and Dynamic Coupling Transitions of Vortex Lattices in Disordered Anisotropic Superconductors. Phys. Rev. Lett. 85, 5416 (2000)
Schumacher, J., Putz, M.: Turbulence in Laterally Extended Systems. In: Parallel Computing: Architectures, Algorithms and Applications. Advances in Parallel Computing, vol. 15. IOS Press, Amsterdam (2008)
Straub, D.N.: Instability of 2D Flows to Hydrostatic 3D Perturbations. J. Atmos. Sci. 60, 79–102 (2003)
Klitzing, K.v., Dorda, G., Pepper, M.: New Method for High-Accuracy Determination of the Fine-Structure Constant Based on Quantized Hall Resistance. Phys. Rev. Lett. 45, 494–497 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, A., Balaji, P., Gropp, W., Thakur, R. (2008). Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems . In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-89894-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89893-1
Online ISBN: 978-3-540-89894-8
eBook Packages: Computer ScienceComputer Science (R0)