Abstract
We propose a 1-dimensional FFT routine for distributed-memory vector-parallel machines which provides the user with both high performance and flexibility in data distribution. Our routine inputs/outputs data using block cyclic data distribution, and the block sizes for input and output can be specified independently by the user. This flexibility is realized with the same amount of inter-processor communication as the widely used transpose algorithm and no additional overhead for data redistribution is necessary. We implemented our method on the Hitachi SR2201, a distributed-memory parallel machine with pseudo-vector processing nodes, and obtained 45% of the peak performance on 16 nodes when the problem size is N = 224. This performance was unchanged for a wide range of block sizes from 1 to 16.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. C. Agarwal and J. W. Cooley: Vectorized Mixed Radix Discrete Fourier Transform Algorithms, Proc. of IEEE, Vol. 75, No. 9, pp. 1283–1292 (1987).
R. C. Agarwal, F. G. Gustavson and M. Zubair: A High Prformance Parallel Algorithm for 1-D FFT, Proc. of Supercomputing’ 94, pp. 34–40 (1994).
A. Averbuch, E. Gabber, B. Gordissky and Y. Medan: A Parallel FFT on a MIMD Machine, Parallel Computing, Vol. 15, pp. 61–74 (1990).
D. H. Bailey: FFTs in External or Hierarchical Memory, The Journal of Supercomputing, Vol. 4, pp. 23–35 (1990).
L. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker and R. Whaley: ScaLAPACK User’s Guide, SIAM, Philadelphia, PA, 1997.
D. A. Carlson: Ultrahigh-Performance FFTs for the Cray-2 and Cray Y-MP Supercomputers, Journal of Supercomputing, Vol. 6, pp. 107–116 (1992).
J. W. Cooley and J. W. Tukey: An Algorithm for the Machine Calculation of Complex Fourier Series, Mathematics of Computation, Vol. 19, pp. 297–301 (1965).
A. Dubey, M. Zubair and C. E. Grosch: A General Purpose Subroutine for Fast Fourier Transform on a Distributed Memory Parallel Machine, Parallel Computing, Vol. 20, pp. 1697–1710 (1994).
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon and D. Walker: Solving Problems on Concurrent Processors, Vol. I, Prentice-Hall, Englewood Cliffs, NJ, 1988.
H. Fujii, Y. Yasuda, H. Akashi, Y. Inagami, M. Koga, O. Ishihara, M. Kashiyama, H. Wada and T. Sumimoto: Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System, Proc. of IPPS’ 97, pp. 233–241, 1997.
M. Hegland: Real and Complex Fast Fourier Transforms on the Fujitsu VPP500, Parallel Computing, Vol. 22, pp. 539–553 (1996).
V. Kumar, A. Grama, A. Gupta and G. Karypis: Introduction to Parallel Computing, The Benjamin/Cummings Publishing Company, CA, 1994.
S. L. Johnson and R. L. Krawitz: Cooley-Tukey FFT on the Connection Machine, Parallel Computing, Vol. 18, pp. 1201–1221 (1992).
K. Nakazawa, H. Nakamura, H. Imori and S. Kawabe: Pseudo Vector Processor Based on Register-Windowed Superscalar Pipeline, Proc. of Supercomputing’ 92, pp. 642–651 (1992).
P. N. Swarztrauber: Multiprocessor FFTs, Parallel Computing, Vol. 5, pp. 197–210 (1987).
D. Takahashi: Parallel FFT Algorithms for the Distributed-Memory Parallel Computer Hitachi SR8000, Proc. of JSPP2000, pp. 91–98, 2000 (in Japanese).
C. Van Loan: Computational Frameworks for the Fast Fourier Transform, SIAM Press, Philadelphia, PA (1992).
Y. Yasuda, H. Fujii, H. Akashi, Y. Inagami, T. Tanaka, J. Nakagoshi, H. Wada and T. Sumimoto: Deadlock-Free Fault-Tolerant Routing in the Multi-Dimensional Crossbar Network and its Implementation for the Hitachi SR2201, Proc. of IPPS’ 97, pp. 346–352, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamamoto, Y., Igai, M., Naono, K. (2003). A Vector-Parallel FFT with a User-Specifiable Data Distribution Scheme. In: Guo, M., Yang, L.T. (eds) Parallel and Distributed Processing and Applications. ISPA 2003. Lecture Notes in Computer Science, vol 2745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-37619-4_36
Download citation
DOI: https://doi.org/10.1007/3-540-37619-4_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40523-8
Online ISBN: 978-3-540-37619-4
eBook Packages: Springer Book Archive