Real-time nonlinear finite element computations on GPU – Application to neurosurgical simulation

https://doi.org/10.1016/j.cma.2010.06.037Get rights and content

Abstract

Application of biomechanical modeling techniques in the area of medical image analysis and surgical simulation implies two conflicting requirements: accurate results and high solution speeds. Accurate results can be obtained only by using appropriate models and solution algorithms. In our previous papers we have presented algorithms and solution methods for performing accurate nonlinear finite element analysis of brain shift (which includes mixed mesh, different non-linear material models, finite deformations and brain–skull contacts) in less than a minute on a personal computer for models having up to 50,000 degrees of freedom. In this paper we present an implementation of our algorithms on a graphics processing unit (GPU) using the new NVIDIA Compute Unified Device Architecture (CUDA) which leads to more than 20 times increase in the computation speed. This makes possible the use of meshes with more elements, which better represent the geometry, are easier to generate, and provide more accurate results.

Introduction

The objective of our work is to significantly increase the efficacy and efficiency of image-guided neurosurgery by including realistic computation of brain deformations, based on a fully non-linear biomechanical model, in a system to improve intra-operative visualisation, navigation and monitoring. The system will create an augmented reality visualisation of the intra-operative configuration of the patient’s brain merged with high resolution pre-operative imaging data, including diffusion tensor imaging and functional magnetic resonance imaging, in order to better localise the tumor and critical healthy tissues.

The focus of this paper is on problems arising in image-guided neurosurgery. In this context it is very important to be able to predict the effect of certain procedures on the position of pathologies and critical healthy areas in the brain. The most typical example is the prediction of a displacement field within the brain after opening the skull (so called “brain shift” estimation). A neurosurgeon is interested in the final, deformed position of the brain. The displacement field within the brain predicted by a biomechanical model can be used to morph high quality preoperative images (usually MRI) in order to show the current (intra-operative) position of the brain components. The computation must be done intra-operatively; therefore it is subject to stringent time constraints, which practically means that the results should be available to an operating surgeon in less than one minute [1], [2], [3], [4]. The actual deformation computation should take only a fraction of this time, as there are other activities that need to be done beforehand (such as the acquisition of the deformed brain surface in the area of craniotomy and the registration of this surface with its pre-operative position, leading to the extraction of displacements that are used for driving the deformation of the model) [4].

Some of our previous work focused on developing efficient and accurate solution algorithms for nonlinear finite element models. In [5] we have shown how the Total Lagrangian formulation can be combined with explicit integration in order to obtain a very efficient time stepping algorithm. Because of their numerical efficiency, the linear tetrahedron and the linear under-integrated hexahedron are the preferred elements when constructing the mesh. We developed fast algorithms for handling the numerical problems associated with these elements: an anti-hourglassing algorithm for the linear under-integrated hexahedron is presented in [6] and a non-locking linear tetrahedron element is developed in [7]. We also developed an algorithm for handling the brain–skull interaction (contact) in [8]. Combining all these algorithms and using dynamic relaxation for the computation of the steady state solution, we have been able to perform brain shift simulations on standard 3 GHz dual-core personal computer in less than a minute for models having up to 50,000 degrees of freedom [9].

One of the main problems in the development of a biomechanical model is the generation of the mesh. Many algorithms are now available for fast and accurate automatic mesh generation using tetrahedral elements, but not for automatic hexahedral mesh generation [10], [11], [12]. Some template based meshing algorithms can be used for meshing different organs using hexahedrons [13], [14], [15], but these types of algorithms only work for healthy organs. In case of severe pathologies (such as a brain tumor), such algorithms can not be used, as the shape, size and position of the pathology is unpredictable. This is one reason why many authors proposed the use of tetrahedral meshes for their models [1], [2], [16], [17], [18]. In order to automate the simulation process, mixed meshes (having both hexahedral and tetrahedral elements) with predominantly hexahedral elements are the most convenient.

For the same number of nodes, a tetrahedral mesh has about five times more elements than a hexahedral one. It is therefore desirable to mesh as much of the volume as possible using hexahedral elements, in order to reduce the computational effort. This, combined with a limit on the number of elements in the mesh, usually means manual intervention during the meshing process, which makes meshing difficult and time consuming.

The possibility to use meshes with increased number of elements has several advantages: increased percentage of the volume can be meshed using hexahedral elements, better representation of the geometry, more accurate results, and easier mesh generation. This unfortunately leads to an increased computational time, which means we can no longer satisfy the time constraints associated with surgery using such models. As the algorithms we developed are already very efficient and the CPU does not offer sufficient computation power and memory bandwidth to solve this problem, we turned our attention to parallel computations on GPUs. GPUs offer high computation power and increased memory bandwidth at a relatively low cost.

In the past years there has been an increased interest in using the power of graphics processing units (GPUs), with their parallel architecture, for general purpose computations. The GPU is used as a coprocessor for the CPU for executing sections of the code that can run in parallel. Before the introduction of CUDA, general purpose computations on GPUs (GPGPU) were done by recasting the computations in graphic terms and using the graphics pipeline [19], therefore a scientific or general-purpose computation often requires a concerted effort by experts in both computer graphics and in the particular scientific or engineering domain. With the introduction of CUDA, in November 2006, NVIDIA proposed a new parallel programming model and instruction set for their GPUs that can be used for performing general purpose computations. CUDA comes with a software environment that allows developers to use C as a high level programming language. A minimum set of keywords are used to extent the C language in order to: identify the code that must be run on the GPU as parallel threads, identify each thread (and the block of threads it belongs to) and to organize and transfer the data in the different GPU memory spaces. CUDA also exposes the internal architecture of the GPU and allows direct access to its internal resources. The programmer has more control over the internal hardware resources of the GPU, but this comes at the expense of an increased programming effort compared to a CPU implementation.

An implementation of our basic nonlinear Total Lagrangian Explicit Dynamics algorithm using the GPGPU framework (the graphics pipeline) has been presented in [20]. The implementation shows up to 16 times speed gains compared with the corresponding CPU implementation, but it has several limitations: it can only handle linear locking tetrahedrons, a single material type, and no contacts, has no time step control and does not compute the steady state solution.

In this paper we use CUDA to implement a suite of nonlinear Finite Element algorithms for brain shift computation. The implementation can handle areas with different non-linear materials, different element types (linear hexahedron with hourglass control, linear tetrahedron and non-locking tetrahedron) and contacts between brain and skull. We present CUDA and the related terminology in Section 2. The finite element algorithm, as implemented on CPU, is presented in Section 3. In Section 4 we discuss the way the data-parallel parts of the algorithm can be transferred and executed on a GPU using CUDA in order to improve the computation efficiency. A performance evaluation is done in Section 5 and discussion and conclusions are presented in Section 6.

Section snippets

NVIDIA CUDA – a general purpose parallel computing architecture

The detailed presentation of CUDA is available in the Programming Guide provided by NVIDIA [21]. In this section we only present some of the main characteristics and terminology related to CUDA.

The GPU has a highly parallel, multithreaded, many core processor architecture. This is well suited for problems that can be expressed as data-parallel computations with high arithmetic intensity, where the same program is executed on many data elements in parallel. CUDA is a general purpose parallel

The finite element algorithms for brain shift computation

In this section we present the main algorithms used for solving the finite element problem. This is not an exhaustive presentation, as the details can be found in our previous papers [5], [6], [7], [8], [9], [22], [23].

GPU implementation using CUDA

When transforming an application to run on a GPU, first we have to identify the data-parallel parts of it. In our case, we are interested in speeding the iteration process (part b of the program), which has to be performed intra-operatively. The most obvious data-parallel parts of this code are the ones described above using phrases such as “for each element: …” or “for each node: …”, as each element or node can be seen as a data structure on which the computations are made. Therefore, the

Performance evaluation

We evaluated the performance of the GPU implementation by performing several simulations and comparing the results with our best CPU implementation of the same algorithms. The purpose of first three simulations is to evaluate the different parts of the algorithm specific to different element types. In these simulations we deformed a cylinder meshed with a different element type for each simulation and compared the speed and accuracy of the results against the CPU computations. We used a soft,

Discussion and conclusions

Intra-operative computation of the brain shift using bio-mechanical models can be used to register high-quality pre-operative images to the intra-operative position. This can be considered as a finite element problem for which the steady state solution needs to be found. We have developed algorithms and solution methods that allow us to use complex bio-mechanical models (which include geometric and material nonlinearities, finite deformations and contacts) to solve such a problem. These methods

Acknowledgments

The financial support of the Australian Research Council (Grants DP0664534, DP0770275 and LX0774754), University of Western Australia (Research Development Award 2009), and National Institute of Health (Grant R03 CA126466) is gratefully acknowledged.

References (42)

  • S.K. Warfield et al.

    Real-time registration of volumetric brain MRI by biomechanical simulation of deformation during image guided surgery

    Comput. Visual. Sci.

    (2002)
  • K. Miller et al.

    Total Lagrangian explicit dynamics finite element algorithm for computing soft tissue deformation

    Commun. Numer. Methods Engrg.

    (2007)
  • G.R. Joldes et al.

    An efficient hourglass control implementation for the uniform strain hexahedron using the total Lagrangian formulation

    Commun. Numer. Methods Engrg.

    (2008)
  • G.R. Joldes et al.

    Non-locking tetrahedral finite element for surgical simulation

    Commun. Numer. Methods Engrg.

    (2009)
  • G.R. Joldes, A. Wittek, K. Miller, L. Morriss, Realistic and efficient brain–skull interaction model for brain shift...
  • S.J. Owen, A survey of unstructured mesh generation technology, in: Seventh International Meshing Roundtable, Dearborn,...
  • M. Viceconti et al.

    Automatic generation of finite element meshes from computed tomography data

    Crit. Rev. Biomed. Engrg.

    (2003)
  • A.D. Castellano-Smith, T. Hartkens, J. Schnabel, D.R. Hose, H. Liu, W.A. Hall, C.L. Truwit, D.J. Hawkens, D.L.G. Hill,...
  • V. Luboz et al.

    Orbital and maxillofacial computer aided surgery: patient-specific finite element models to predict surgical outcomes

    Comput. Methods Biomech. Biomed. Engrg.

    (2005)
  • M. Ferrant et al.

    Deformable modeling for characterizing biomedical shape changes

  • O. Clatz et al.

    Patient specific biomechanical model of the brain: application to Parkinson’s disease procedure

  • Cited by (156)

    • Maximum likelihood-based extended Kalman filter for soft tissue modelling

      2023, Journal of the Mechanical Behavior of Biomedical Materials
    • Real-Time FEA-based breast deformation simulation using artificial neural network

      2022, Computer Methods and Programs in Biomedicine Update
    View all citing articles on Scopus
    View full text