Conditioning multiple-point statistics simulations to block data

doi:10.1016/j.spasta.2016.02.005

Spatial Statistics

Volume 16, May 2016, Pages 53-71

https://doi.org/10.1016/j.spasta.2016.02.005 Get rights and content

Abstract

Multiple-points statistics (MPS) allows to generate random fields reproducing spatial statistics derived from a training image. MPS methods consist in borrowing patterns from the training set. Therefore, the simulation domain is assumed to be at the same resolution as the conceptual model, although geometrical deformations can be handled by such techniques. Whereas punctual conditioning data corresponding to the scale of the grid node can be easily integrated, accounting for data available at larger scales is challenging. In this paper, we propose an extension of MPS able to deal with block data, i.e. target mean values over subsets of the simulation domain. Our extension is based on the direct sampling algorithm and consists to add a criterion for the acceptance of the candidate node scanned in the training image to constrain the simulation to block data. Likelihood ratios are used to compare the averages of the simulated variable taken on the informed nodes in the blocks and the target mean values. Moreover, the block data may overlap and their support can be of any shape and size. Illustrative examples show the potential of the presented algorithm for practical applications.

Introduction

The multiple-point statistics (MPS) methods have become very popular in earth sciences, because they allows to generate highly heterogeneous random fields reproducing the spatial statistics of a conceptual geological model, the training image, given by the user. These methods overcome some limitations of classical geostatistical simulation techniques based on two-point statistics: variogram-based methods such as sequential Gaussian simulation, sequential indicator simulation (sisim) (Deutsch and Journel, 1998), transition probability based approaches such as TProgs (Carle, 1996), or Markovian-type categorical prediction (MCP) based on a maximum entropy principle (Allard et al., 2011). Among the existing MPS simulation algorithms, snesim (Strebelle, 2002) and impala (Straubhaar et al., 2011, Straubhaar et al., 2013) successively populate each node of the simulation grid by randomly drawing a facies category according to a probability distribution conditioned to the data event centered at the simulated node, computed from a catalog of patterns found in the training image. A storage based on a tree structure is used in snesim ensuring computational time efficiency, and a list-based catalog employed in impala guarantees low memory requirements. Using a catalog implies to consider only categorical variables and patterns of fixed geometry. A multiple grid approach (Tran, 1994) is employed in theses algorithms to capture large scale structures while keeping data events of reduced size. On the other hand, the direct sampling algorithm (Mariethoz et al., 2010) is a distance-based MPS algorithm. To simulate a node, the method consists in randomly scanning the training image until the pattern in the training image is compatible with the pattern retrieved from the simulation grid and centered at the simulated node. Then, the central node value is copied and pasted from the training image to the simulation grid. The compatibility between two patterns is related to a distance. This basic simulation principle leads to a very flexible method. Indeed, not using any catalog of patterns, categorical as well as continuous variables can be considered by defining an appropriate distance between data events, and the geometry of the patterns can vary during the simulation allowing to reproduce large scale structures without using a multiple grid approach. In particular, punctual conditioning data can be simply assigned in the simulation grid at the beginning of the simulation, whereas methods based on a multiple grid approach implies some precautions to properly address punctual data (Straubhaar and Malinverni, 2014). Distance-based MPS algorithms include also techniques consisting in pasting patches of the training image in the simulation grid at a time instead of only one pixel value, such as filtersim (Zhang and Journel, 2006) or simpat (Arpat and Caers, 2007). These latter methods use patterns database built from the training image and are also based on multiple grid approaches. Other patch-based MPS algorithms not using multiple grids nor databases consist in pasting overlapping boxes of pixels along a raster path, by minimizing a cross-correlation function over the overlapping region in the algorithm ccsim (Tahmasebi et al., 2012), or by minimizing an error between the common area followed by an optimal cut through this area in the algorithm conditional image quilting (CIQ) (Mahmud et al., 2014). Theses methods allow to better model the connectivity of the structures, but make the conditioning difficult. Therefore, the direct sampling method is appealing due to its simplicity and its flexibility. In particular it can easily be extended to the simulation of multivariate fields (Mariethoz et al., 2010, Mariethoz et al., 2012), providing an intuitive tool to manage various types of nonstationarities.

No matter what MPS technique is considered, the simulation domain is filled by borrowing patterns from the training image, which is assumed to have the same resolution as the simulation grid. Hence, whereas punctual conditioning data (corresponding to the scale of the simulation domain) can be straightforwardly handled, conditioning a simulation at local scale with data defined at a larger scale is quite challenging. Classical parametric methods based on covariance models can be used to integrate data with different support sizes (Liu and Journel, 2009, Journel, 1999). Essentially based on cokriging theory, such techniques nevertheless require point-to-point, point-to-block and block-to-block (cross-) covariance models and imply Gaussian assumptions.

In this paper, we propose an extension of the direct sampling algorithm able to deal with block data, i.e. target values for the average of the simulated variable on subsets of the simulation domain. The principle is to use the block data as a criterion for accepting a candidate location in addition to the comparison of the patterns. The current averages accounting for the already simulated pixels in the blocks plus the candidate value are compared to the target mean values and then a related mismatch for each block data is computed. Indeed, we follow a similar strategy as for multivariate simulations (Mariethoz et al., 2010), where a mismatch (distance) between the training and simulated patterns is computed for each variable. As several mismatches are computed, the condition for accepting a candidate node has to be defined. One can define a global misfit as a weighted average of the considered mismatches and check if it is below a certain threshold, as in the original direct sampling algorithm. This requires to specify a weight per mismatch and a global threshold. Another option is to specify a threshold per compared quantity, and accept the candidate value when each mismatch is less than its corresponding threshold. To develop our extension, we use an implementation of the direct sampling, called DeeSse (Straubhaar, 2015), based on this latter alternative.

It is important to highlight the differences between block data and local probability constraints. Local probability constraints are classically given by probability maps, which gives at any point of the simulation domain the probability of having a given category (facies) in a certain neighborhood. These maps are related to an underlying fixed support size, and represent moving local probabilities or proportions. Such constraints are designed for categorical variables and are handled in classical techniques (Liu, 2006, Krishnan, 2008) by using probability aggregation methods (Allard et al., 2012). Mariethoz et al. (2015) proposed a method to constrain distance-based MPS methods, such as the direct sampling algorithm, to such local probability constraints. On the other hand, although block data for binary variables can be viewed as proportion constraints, block data are defined on fixed groups of nodes of any shape, on which target mean values are given. Moreover, for block data conditioning to make sense, the simulated variable can be either continuous or discrete, provided that it does not represent arbitrary category codes.

Downscaling or increasing the resolution of an image is a classical example where the support of the data (input coarse scale image) is larger than the resolution of the output image. Mariethoz et al. (2011) developed a super-resolution method which consists in using the direct sampling algorithm to generate small-scale structures from the patterns found in the coarse scale (training) image. This tools assumes that the spatial structures have a property of scale invariance also called fractal property. Tang et al. (2015) propose to downscale remotely sensed images accounting for available images at different resolutions, using filtersim as MPS algorithm at coarse scale, combined with an area-to-point cokriging method to integrate the fine scale information.

More generally, the algorithm presented in this paper for MPS simulation accounting for block data can be used in a range of applications, not restricted to downscaling, where a conceptual model for the fine scale is known and the available conditioning data are defined on support of varying size and shape, larger than the resolution of the simulation domain. In its current implementation, the proposed method assumes that the block data are defined as the arithmetic mean of the values located within a block of known geometry. In practice and in certain situations, the block values are defined in a more complex manner, especially for non-additive variables such as permeability. Nevertheless, our method can still be useful to generate simulations that offer an approximation of the fine-scale structures.

This paper is organized as follows. In Section 2, some background information on the direct sampling algorithm is given. Then, in Section 3, block data are defined and we present how the algorithm is extended to account for block data. The manner the block data constraints are treated is presented in detail in Section 4. In Section 5, a multiGaussian test case is presented. MultiGaussian simulation offers a well-known framework, where the random fields are entirely described by an analytical covariance model which is used for the simulation. Hence, expected results are known and provide a point of comparison to study the performances of the proposed method. Then, application examples displaying more complex structures and justifying the use of MPS are given in Section 6. As illustrations, we apply the proposed method to simulate log-permeability fields in a downscaling context and in a situation where conditioning data are available at three different scales. In another synthetic example, we propose to model the subsurface geology conditionally to geophysical data. Finally, the method is discussed in Section 7.

Section snippets

Basic direct sampling algorithm

The basic principle of the direct sampling technique (Mariethoz et al., 2010) for simulating a node $x$ in the simulation grid is to compare the pattern (or data event) $d (x)$ centered at $x$ with patterns of same geometry $d (y)$ centered at random locations in the training image (TI), until $d (x)$ and $d (y)$ are sufficiently similar. For univariate simulation, the direct sampling algorithm requires in input a TI with a unique variable $Z$ , a simulation grid (SG), a normalized distance $D$ used for comparing

Algorithm accounting for block data

Initially designed for continuous variables, block data can be considered for categorical variables as well. In this situation, a unique numerical value is assigned to each category. These values should have a physical signification, so that they can be arranged in order. Under these conditions, the distance (3) can still be used for comparing patterns.

Considering a variable $Z$ in the SG, a block data is defined by a triplet $(B, μ_{B}, t_{B})$ where:

•
$B$ is a set of nodes (pixels) in the SG,
•
$μ_{B}$ is a target

Defining the target interval and the error for a block data constraint

In this section, the error $E_{B}$ (8) required by the method is defined in detail.

MultiGaussian test case

In this section, we use our method for generating multiGaussian fields conditioned to block data. The aim is to better understand how the method performs in a situation where the theoretical solution is known because it can be expressed analytically. As a point of comparison, we will also use multiGaussian simulations based on all the point-to-point, point-to-block, and block-to-block covariances. The MPS technique is expected to work less efficiently in this situation because it does not use

Application examples

In this section, the method is applied on synthetic cases, and the computation of the local standard deviation (18) required to handle error on a block data constraint is illustrated.

Discussion and conclusion

In this paper, a MPS simulation method allowing to account for data at different scales is proposed. The technique is based on the direct sampling algorithm (Mariethoz et al., 2010) which consists in randomly searching in the TI for a pattern compatible to the one centered at the simulated node. Several constraints in addition to the similarity of patterns can be handled. When scanning the TI, a misfit according to each constraint is computed and the scan is stopped as soon as all misfits are

Acknowledgments

We are grateful to Fabio Oriani for helpful discussions, and we also thank Denis Allard and an anonymous reviewer for their constructive criticism giving us the opportunity to improve the quality of this paper.

References (31)

A. Comunian et al.
Three-dimensional high resolution fluvio-glacial aquifer analog—Part 2: Geostatistical modeling
J. Hydrol.
(2011)
Y. Liu
Using the snesim program for multiple-point statistical simulation
Comput. Geosci.
(2006)
Y. Liu et al.
A package for geostatistical integration of coarse and fine scale data
Comput. Geosci.
(2009)
G. Mariethoz et al.
Constraining distance-based multipoint simulations to proportions and trends
Environ. Modell. Softw.
(2015)
E. Meerschman et al.
A practical guide to performing multiple-point statistical simulations with the direct sampling algorithm
Comput. Geosci.
(2013)
Y. Tang et al.
Downscaling remotely sensed imagery using area-to-point cokriging and multiple-point geostatistical simulation
ISPRS J. Photogramm.
(2015)
T.T. Tran
Improving variogram reproduction on dense simulation grids
Comput. Geosci.
(1994)
D. Allard et al.
Probability aggregation methods in geoscience
Math. Geosci.
(2012)
D. Allard et al.
An efficient maximum entropy approach for categorical variable prediction
Eur. J. Soil Sci.
(2011)
G. Arpat et al.
Conditional simulation with patterns
Math. Geol.
(2007)

P. Bayer et al.

High resolution multi-facies realizations of sedimentary reservoir and aquifer analogs

Sci. Data

(2015)

S.F. Carle

A transition probability-based approach to geostatistical characterization of hydrostratigraphic architecture

(1996)

S. Demir et al.

On the adaptive Nadaraya–Watson kernel regression estimators

Hacet. J. Math. Stat.

(2010)

C. Deutsch et al.

GSLIB: Geostatistical Software Library and User’s Guide

(1998)

C.R. Dietrich et al.

A fast and exact method for multidimensional Gaussian stochastic simulations: Extension to realizations conditioned on direct and indirect measurements

Water Resour. Res.

(1996)

Cited by (25)

Conditioning of multiple-point statistics simulations to indirect geophysical data
2024, Computers and Geosciences
Multiple-point statistical (MPS) simulation methods have gained widespread adoption across various Earth science disciplines. They offer a versatile framework for simulating intricate spatial patterns and heterogeneity in both surface and subsurface structures. While these simulations adeptly incorporate conditioning to hard data, such as information from boreholes, conditioning to indirect data (e.g. geophysical data) is more challenging. A new methodology is introduced that provides geostatistical realisations honouring indirect geophysical data and complex prior knowledge described by a training image. An MPS simulation is iteratively built up pixel-by-pixel starting from an empty grid or with initial hard conditioning data if available. During each simulation step, a pixel value is selected from a set of candidates proposed by the MPS algorithm. This selection is made proportionally to an approximated likelihood that accounts for indirect geophysical data. The expected values and uncertainty quantification are obtained by simulating many complete field realisations. Our approach, which we name Indirect Data Conditional Simulations (IDCS), is tested for multi-Gaussian and complex subsurface structures with synthetic data from linear and non-linear crosshole ground-penetrating radar responses. The IDCS method is inherently approximate due to the finiteness of the training image, a limited number of MPS candidates at each simulation step and the need to approximate intractable likelihood functions. Nevertheless, the results demonstrate that the posterior approximations obtained by IDCS are often comparable to those obtained with a Markov chain Monte Carlo method, with IDCS being at least one order of magnitude faster. While the method performs the best when the underlying physics is modelled as a linear response, encouraging preliminary results considering non-linear physical responses are provided.
Copula-Based Data-Driven Multiple-Point Simulation Method
2024, Spatial Statistics
Multiple-point simulation is a commonly used method in modeling complex curvilinear structures. The method is based on the application of training images that are open to manipulation. The present study introduces a new data-driven multiple-point simulation method that derives multiple point statistics directly from sparse data using copulas and applies them in simulation of complex mineral deposits. This method is based on simplification of N-dimensional copulas by its underlying two-dimensional copulas and taking advantage of conditional independence assumption to integrate information from different sources. The method was compared to Filtersim, a conventional multiple-point geostatistical method, through two synthetic data sets. Reproduction of cumulative distribution function, variogram, N-point connectivity, and visual patterns were considered in comparison. The copula-based multiple-point simulation (CMPS) method was implemented using trivial parts (almost 4%) of the synthetic data to extract required statistics while Filtersim was performed by giving the target image (100% data) as training image. Despite overwhelming data use in Filtersim, the CMPS showed compatible results to it. Application to synthetic data indicated that the method is a promising tool in the simulation of deposits with sparse data. The CMPS were applied in the simulation of two mineral deposits: (1) a porphyry copper deposit and (2) a magmatic iron deposit.
Reconstruction, optimization, and design of heterogeneous materials and media: Basic principles, computational algorithms, and applications
2021, Physics Reports
Modeling of heterogeneous materials and media is a problem of fundamental importance to a wide variety of phenomena with applications to many disciplines, ranging from condensed and soft materials, fuel cells, alloys, and composite media, to biological materials such as proteins, and even such large-scale structure as field-scale porous media and clusters of galaxies. While several approaches have been developed over the past several decades to address this class of problems, it has become increasingly clear that a most fruitful approach to the problem is by reconstruction: given a certain amount of data for a given medium, how should one develop a model for the medium that not only honors (reproduces) the data, but also provides accurate predictions for those properties of the medium for which no data are available, or are hard to measure. Although deterministic approaches were developed to address the problem, given that the amount of available data is typically limited, the uncertainty in the models is large and, at the same time the deterministic approaches cannot provide any quantitative measure of the uncertainty in the models. Thus, it became clear that the best that one can hope for is developing stochastic methods of reconstruction that not only can generate plausible realizations of the media of interest, but also provide quantitative measures of the uncertainty in the realizations.
The literature on the subject is rich. Some have formulated the problem as one of optimization: one begins with a “guess” for the structure of a medium, and defines an “energy” or “cost” function that represents a quantitative measure of the difference between the data and the corresponding “predictions” of the model. There are many optimization techniques, ranging from simulated annealing and its variants, to Nature-inspired methods, such as the genetic algorithm. Through optimization one varies systematically the initial guess for the morphology of the medium in order to finally identify a structure for the medium that minimizes the energy in a global sense. the cost or energy function may be based on direct measurements of some properties, or based on various two- or three-point statistical correlation functions. A second approach attempts to mimic the physical process that gave rise to the medium. The third approach uses a global cross-correlation function, together with some concepts from graph theory in order to achieve the goal of reconstruction.
This Review describes and discusses the current important methods of reconstruction. After describing the relevant theoretical and computational aspects of reconstruction and the various approaches to the problem, a wide variety of applications are described, and comparison is made between the reconstructed systems and their properties and the existing data. Finally, the problem of developing new classes of materials with specific properties, based on reconstruction techniques is discussed.
Modelling a complex gold deposit with multiple-point statistics
2021, Ore Geology Reviews
This paper presents an innovative approach to the geological modelling and stochastic estimation of a Mineral Resource in a structurally complex gold deposit. A Direct Sampling multiple-point statistics algorithm is adopted to produce stochastic models of the lithology and gold grade distributions while using sparse geological and grade data. A multivariate modelling framework shows potential for modelling finely intercalated folded lithologies, which are problematic with classical methods. The method performs well with sparse data or even in the absence of data, provided auxiliary variables are available. Geological input is limited to an elementary training image and structural measurements of the lithological contacts to create a representative potential field of ductile deformation. A new modelling framework is adopted where a portion of the ‘true’ model is reserved to be used as a training image, while the remaining part serves as a validation tool prior to arriving at a satisfactory ensemble of realisations. This ensemble is carried forward as a stochastic model for the poorly informed part of the deposit. This approach lends itself to the adaptation of modern developments in artificial intelligence and machine learning. The geologist’s involvement is confined to incorporating an understanding of the deposit and its genesis into the explicit ‘prior’ model.
An enhanced direct sampling (DS) approach to model the geological domain with locally varying proportions: Application to Golgohar iron ore mine, Iran
2021, Ore Geology Reviews
Citation Excerpt :
It should be noted that the idea of the proposed approach used here is adopted from Straubhaar et al. (2016). Straubhaar et al. (2016) used an extension of MPS based on two thresholds to compare the averages of the simulated variable taken on the informed nodes in the blocks and the target mean values. As mentioned above, the selection of a representative TI is one of the foremost challenges in the practical application of MPS methods.
A critical challenge in evaluating mineral resources is to model the uncertainty of geological boundaries and to classify the block model into domains of similar rock or ore types. The aim of this study is to deal with the problem of simulating geological boundaries in the Golgohar iron deposit based on two datasets extracted from exploration boreholes and blast holes. In the first step, contact analysis tools are used to identify the nature of geological boundaries. Next, a modified version of the direct sampling algorithm is proposed to simulate geological domains based on a training image and local probabilities derived from a vertical proportion matrix. The purpose of using the modified direct sampling is to control the local proportion of each geological domain in the distance-based multiple-point geostatistical approaches. A set of 100 conditional realizations of two ore types are generated. The reconciliation between the realizations produced with the proposed algorithm and actual data values is provided by checking the reproduction of single-, two- and multiple-point statistical parameters and the connectivity of geological boundaries. The proposed method proves to be a useful variant of the direct sampling algorithm for simulating the spatial layout of geological domains when there is a strong difference in the domain proportions between the hard data and the TI.
A review of geostatistical simulation models applied to satellite remote sensing: Methods and applications
2021, Remote Sensing of Environment
Despite an ever-increasing number of spaceborne, airborne, and ground-based data acquisition platforms, remote sensing data are still often spatially incomplete or temporally irregular. While deterministic interpolation techniques are often used, they tend to create unrealistic spatial patterns and generally do not provide uncertainty quantification. Geostatistical simulation models are effective in generating an ensemble of realistic and equally probable realizations of an unmeasured phenomenon, allowing data uncertainty to be propagated. These models are commonly used in several fields of earth science, and in recent years, they have been applied widely to remotely sensed data. This study provides the first review of the applications of geostatistical simulation to remote sensing data. We review recent geostatistical simulation models relevant to satellite remote sensing data and discuss the characteristics and advantages of each approach. Finally, the applications of each geostatistical simulation model are categorized in different domains of natural sciences, including soil, vegetation, topography, and atmospheric science.

View all citing articles on Scopus

View full text

Conditioning multiple-point statistics simulations to block data

Abstract

Introduction

Section snippets

Basic direct sampling algorithm

Algorithm accounting for block data

Defining the target interval and the error for a block data constraint

MultiGaussian test case

Application examples

Discussion and conclusion

Acknowledgments

J. Hydrol.

Comput. Geosci.

Comput. Geosci.

Environ. Modell. Softw.

Comput. Geosci.

ISPRS J. Photogramm.

Comput. Geosci.

Probability aggregation methods in geoscience

Math. Geosci.

An efficient maximum entropy approach for categorical variable prediction

Eur. J. Soil Sci.

Conditional simulation with patterns

Math. Geol.

High resolution multi-facies realizations of sedimentary reservoir and aquifer analogs

Sci. Data

A transition probability-based approach to geostatistical characterization of hydrostratigraphic architecture

On the adaptive Nadaraya–Watson kernel regression estimators

Hacet. J. Math. Stat.

GSLIB: Geostatistical Software Library and User’s Guide

A fast and exact method for multidimensional Gaussian stochastic simulations: Extension to realizations conditioned on direct and indirect measurements

Water Resour. Res.