doi:10.1016/j.cpc.2004.07.003
Copyright © 2004 Elsevier B.V. All rights reserved.
Parallelization issues of a code for physically-based simulation of fabrics
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Sergio Romero, Eladio Gutiérrez, Luis F. Romero, Oscar Plata
,
and Emilio L. Zapata
Department of Computer Architecture, University of Málaga, 29071 Málaga, Spain
Received 9 February 2004;
accepted 3 July 2004.
Available online 28 August 2004.
Abstract
The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.
Keywords: Fabric and cloth simulation; Data locality; Irregular reductions; Parallelization techniques; ccNUMA multiprocessors
Fig. 1. Example of a discretization mesh.
Fig. 2. Fabric simulation iterative algorithm.
Fig. 3. Accesses to the system matrix.
Fig. 4. Force evaluation loop using implicit Euler integration method.
Fig. 5. A generic loop with multiple irregular (histogram) reductions.
Fig. 6. Original data distribution (a), reordering particles (b), and reordering triangles (c).
Fig. 7. Two different force loops updating the same vector b.
Fig. 8. Array expansion (a) and pseudo-expansion (b) methods.
Fig. 9. Two versions for the final reduction stage.
Fig. 10. Execution phase of the DWA-LIP method.
Fig. 11. Reduction loop parallelization speed-up and efficiency for the sorted input data set.
Fig. 12. Reduction loop parallelization speed-up and efficiency for the non-sorted input data set.
Fig. 13. Required memory for different parallelization methods (normalized to the sequential code).
Fig. 14. Tablecloth and hanged fabric simulations.
Table 1.
Cache behavior simulation for the final reduction stage codes shown in Figs. 9 (a) and (b)

This work was supported by Ministry of Education and Culture (CICYT), Spain, through grant TIC2003-06623.

Corresponding author. Tel.: +34 95 213 3318; Fax: +34 95 213 2790.