Elsevier

Computers & Geosciences

Volume 100, March 2017, Pages 67-75
Computers & Geosciences

Research paper
3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

https://doi.org/10.1016/j.cageo.2016.12.006Get rights and content

Highlights

  • Enhancement of parallel 3D Kirchhoff migration algorithm on multicore CPU cluster.

  • Traveltime computations using Flexi-Trace iterations has accelerated performance.

  • Speedup achieved for prestack data is 49.05X with 76.64% efficiency on 64 nodes.

  • Performance gain of 57.5X for prestack data on 64 nodes over previous algorithm.

  • New parallelization strategy exhibits bettered MPI node scalability & performance.

Abstract

In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.

Introduction

With fast and continuous improvement in computer hardware architectures, computations and performance enhancement of seismic data processing and imaging applications has become a fundamental requisite that attracts the attention of many researchers and High Performance Computing (HPC) scientists (Linda, 2015). Seismic imaging methods are required for accurate estimation of the image of subsurface geology including properties of rocks beneath, using acoustic measurements, recorded on the surface of the earth. These methods are one of the oldest candidate of HPC technologies, since they are mathematically complex and needs to be solved for very large data (Almasi et al., 1992). Our current work is related to computational aspects of 3D Kirchhoff Depth Migration (KDM) method which is one of the oldest seismic migration method (Hagedoorn, 1954). In algorithmic perspective, 3D KDM is highly compute intensive and its resource requirements increases with the increase in input data sizes. Therefore, enhancement of its computational performance on various computer architectures is still an open research problem.

3D KDM consists of two crucial compute intensive operations, traveltime computation and migration summation of seismic data (Schneider, 1978). The quality of 3D KDM outcome depends upon the accuracy of traveltimes (Uwe et al., 1996). There are many methods in which the traveltime can be computed such as, Finite-Difference eikonal solver based methods and ray-tracing based methods with increasing level of computational complexities (Coman, 2003). In spite of any method adopted for traveltime computations, the storage of their computed values and supply to the migration process is the prime determination factor for the computational speed of the algorithm on any platform (Alkhalifah, 2011). The second compute intensive operation is summation of diffraction amplitudes computed using seismic data, which needs to process all the traces in the data guided by aperture function, in order to image a single grid point in the 3D subsurface model.

In the last few years, researchers and HPC scientists are actively involved in optimizing the performance of this algorithm using state of the art computer architectures. Panetta et al. (2007) have described the computational characteristics of the KDM on quad-core IBM Blue Gene using MPI and OpenMP. Li et al. (2009) proposed a MPI based partitioning strategy for 3D prestack KDM algorithm to handle large memory requirement by dividing the imaging space on a multicore CPU based system. This strategy improved efficiency of the migration in terms of memory for a limited number of processors, but for large number of processors I/O and communication overhead increased significantly. Teixeira et al. (2013) tested 3D prestack KDM on GPU-based clusters and found significant gain in efficiency when compared to CPU only version of the algorithm. They have used ray-tracing algorithm for traveltime computation which was not ported on GPU due to its memory limitations. Wang et al. (2014) described various methods of porting KDM on GPU and showed that 8–15X speedup can be obtained.

In our previous work, an efficient parallel poststack and prestack 3D Kirchhoff depth migration algorithm had been successfully demonstrated on current class of multicore systems (Rastogi et al., 2015) . We had introduced a concept of flexi-depth iterations while depth migrating data in parallel imaging space. The parallelization approach conveyed an effective utilization of available node memory for traveltime computations without the need of interpolation during runtime. The storage, I/O and communication requirements of the algorithm had been successfully minimized, however an in depth performance analysis shows that the scalability of the application slows down after increase in certain number of nodes. To further optimize the parallel performance of previously developed algorithm, the parallelization approach for 3D KDM algorithm is been re-designed in order to improve the overall scalability as well as to reduce the compute time of the application.

The theoretical foundation of both, previous and new implementation of 3D KDM is same which is described in detail in Section 2. Major focus of the current article is on the parallelization strategy that has driven acceleration in performance of the algorithm over the previous approach. The results have been demonstrated using 3D Overthrust data on the CPU based multicore cluster, PARAM Yuva II,1 a PARAM series of supercomputers. A comparative study of the previous and current approach is performed using computational experiments, and conclusions have been drawn based on the computing time. Study of performance metrics of the resultant parallel application with respect to increase in number of nodes shows promising results and proves the effectiveness of the re-designed parallelization approach.

Section snippets

Theory and methodology

The theory of Kirchhoff migration is very well established. It is an integral solution to the scalar wave equation based on diffraction summation which is governed by Huygens principle (Yilmaz, 2001). Fig. 1 depicts theoretical aspects of 3D KDM. The discrete form of the integral solution can be written as shown in Eq. (1).Pout(xi,yj,zk)=ΔxΔy4πi=1n(aiTiA)*wi*vldi,kwhere, Pout is the migration outcome at imaging location I(xi,yj,zk) in 3D space, x and y are inline and crossline receiver

Parallelization methodology

Theoretically, 3D KDM algorithm exhibits inherent parallelism since the computation of diffraction surface amplitude for a imaging location is independent of the other locations. Post computations, the summation process of amplitudes can be staged for final imaging. This theoretical aspect's advantage is exploited for parallelization in the current implementation. The algorithm's compute time is largely governed by the computation and storage of traveltimes and its feeding mechanism to the

Implementation details

Major features of the current parallel implementation of 3D KDM are described below:

The system

The application is developed and tested on PARAM series of supercomputer, PARAM Yuva II. It is a 225 nodes Linux cluster having 64 GB memory per node. Each node has dual socket with octa-core Intel® Xeon processor of E5-2670 series with 2.6 GHz core frequency. Cluster's primary System Area Network (SAN) is FDR InfiniBand™.

Numerical experimentation data

Performance evaluation of the current 3D KDM algorithm is done using synthetic 3D Overthrust data developed by SEG/EAGE modeling committee (Aminzadeh et al., 1996), in dip

Conclusions

A new scalable parallelization approach for 3D Kirchhoff depth migration application has been presented, which is robust and can efficiently migrate both prestack and poststack data on state of the art multicore CPU based cluster. Traveltime computations are efficiently managed on actual grid size during runtime within the node memory using Flexi-Trace iterations. The retain/replace policy for the traveltime computations balances the migration requirements of the algorithm with minimal

Acknowledgement

This work was supported by “Development and Adaptation of Applications, System Software and Hardware Technologies for Hybrid Architecture Based HPC systems” project, Department of Electronics and Information Technology (DeitY), Government of India. Authors are thankful to Centre for Development of Advanced Computing (CDAC), Pune, for providing PARAM Yuva II, computing facility along with permission to publish this work and the team of National PARAM Supercomputing Facility for their support.

References (18)

  • G. Almasi et al.

    Parallel distributed seismic migration

    Future Gener. Comput. Syst.

    (1992)
  • R. Rastogi et al.

    An efficient parallel algorithm:poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    Comput. Geosci.

    (2015)
  • T. Alkhalifah

    Efficient traveltime compression for 3D prestack Kirchhoff migration

    Geophys. Prospect.

    (2011)
  • F. Aminzadeh et al.

    Three dimensional SEG/EAGE models - an update

    Lead. Edge

    (1996)
  • Cohen, J.K., Stockwell, J.J.W., 2010. CWP/SU: Seismic Un*x. Release no - 42: An open source software package for...
  • Coman, R., 2003. Computation of Mutivalued Traveltimes in Three Dimensional Heterogeneous Media. Thesis...
  • J. Hagedoorn

    A process of seismic reflection interpretation

    Geophys. Prospect.

    (1954)
  • Hwang, K., 1992. Advanced Computer Architecture: Parallelism, Scalability, Programmability, 1st Edition. McGraw-Hill...
  • Jean-Pierre, G., Jeff, L., Michael, Y., 2000. Metacomputing and the Master-Worker Paradigm. Tech. rep., In Preprint...
There are more references available in the full text version of this article.

Cited by (0)

View full text