Abstract
In this study, a global optimization meta-heuristic is developed for the problem of determining the optimum data distribution and degree of parallelism in parallelizing a sequential program for distributed memory machines. The parallel program is considered as the union of consecutive stages and the method deals with all the stages in the entire program rather than proposing solutions for each stage. The meta-heuristic developed here for this specific problem combines simulated annealing and hill climbing (SA-HC) in the search for the optimum configuration. Performance is tested in terms of the total execution time of the program including communication and computation times. Two exemplary codes from the literature, the first being computation intensive and the second being communication intensive, are utilized in the experiments. The performance of the SA-HC algorithm provides satisfactory results for these illustrative examples.
Similar content being viewed by others
References
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 112–125, 1993.
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '94), 1994.
T. Chen and J. Sheu. Communication-free data allocation techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, pp. 924–938, September 1994.
J. Choi, J. J. Dongarra, and D. W. Walker. The design of a parallel, dense linear algebra software library: Reduction to Hessenberg, tridiagonal and bidiagonal form. Proceedings of the 2nd Workshop on Environments and Tools for Parallel Scientific Computing, pp. 98–111, 1994.
K. A. Dowsland. Some experiments with simulated annealing techniques for packing problems. European Journal of Operational Research, pp. 68:389–399, 1993.
P. Banerjee, J. Chandy, M. Gupta, E. Hodge, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. IEEE Computer, pp. 37–47, October 1995.
C. H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. Journal of Parallel and Distributed Computing, 19:90–102, 1993.
K. Ikudome, G. Fox, A. Kolawa, and J. Flower. An automatic and symbolic parallelization system for distributed memory parallel computers. Proceedings of 5th Distributed Memory Computing Conference, pp. 1105–1114, 1990.
A. H. Karp. Programming for Parallelism. IEEE Computer, pp. 43–57, 1987.
A. Kirkpatrick, Jr., C. D. Gelatt, and M. P. Vechi. Optimization by simulated annealing. Management Science, 220:671–680, 1983.
K. Knobe, J. Lucas, and G. Steele. Data optimizations: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, (8), 102–118, 1990.
C. Koulamas, S. R. Antony, and R. Jaen. A survey of simulated annealing applications to operations research problems. OMEGA, 22:41–56, 1994.
P. Lee, Efficient algorithms for data distribution on distributed memory parallel computers. IEEE Transactions on Parallel and Distributed Systems, 8(8):825–839, August 1997.
J. Li and M. Chen. Index domain alignment: minimizing costs of cross-referencing between distributed arrays. Third Symposium on the Frontiers of Massively Parallel Computation, pp. 424–433, 1990.
M. Mace. Memory Storage Patterns in Parallel Processing, Kluwer Academic, 1987.
J. Mohan. Performance of Parallel Programs. Ph.D. dissertation, Department of Computer Science, Carnegie-Mellon University, 1984.
E. Onbasioglu. and Y. Paker. A comparative workload-based methodology for performance evaluation of parallel computers. Future Generation Computer Systems, 12:512–545, 1997.
L. Özdamar and M. A. Bozyel. Simultaneouslot sizing and loading of product families on parallel facilities of different classes. International Journal of Production Research, 36:1305–1324, 1998.
D. Palermo. Compiler techniques for optimizing communication and data distribution for distributed memory multicomputers. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1996.
J. Ramanujan and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, pp. 472–482, October 1991.
T. Rauber and G. Runger. Deriving array distributions by optimization techniques. J. Supercomputing, 15:271–293, 2000.
R. J. M. Vaessens, E. H. L. Aarts, and J. K. Lenstra. A local search template. Computers and Operations Research, 25:969–979, 1998.
S. Wholey. Automatic data mapping for distributed memory parallel computers. Proceedings of the International Conference on Supercomputing, 1992.
M. Wolfe. High Performance Compilers for Parallel Computers, Addison-Wesley, 1996.
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 30–44, 1991.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Onbasçioglu, E., Özdamar, L. Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing. The Journal of Supercomputing 25, 237–253 (2003). https://doi.org/10.1023/A:1024299011109
Issue Date:
DOI: https://doi.org/10.1023/A:1024299011109