Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification

Leclercq, Florent

doi:10.3390/psf2022005004

Open AccessProceeding Paper

Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification^†

by

Florent Leclercq

CNRS & Sorbonne Université, UMR 7095, Institut d’Astrophysique de Paris, 98 bis boulevard Arago, F-75014 Paris, France

^†

Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Paris, France, 18–22 July 2022.

Phys. Sci. Forum 2022, 5(1), 4; https://doi.org/10.3390/psf2022005004

Published: 2 November 2022

(This article belongs to the Proceedings of The 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents recent methodological advances for performing simulation-based inference (SBI) of a general class of Bayesian hierarchical models (BHMs) while checking for model misspecification. Our approach is based on a two-step framework. First, the latent function that appears as a second layer of the BHM is inferred and used to diagnose possible model misspecification. Second, target parameters of the trusted model are inferred via SBI. Simulations used in the first step are recycled for score compression, which is necessary for the second step. As a proof of concept, we apply our framework to a prey–predator model built upon the Lotka–Volterra equations and involving complex observational processes.

Keywords:

Bayesian inference; Bayesian hierarchical models; simulation-based inference

1. Introduction

Model misspecification is a long-standing problem for Bayesian inference: when the model differs from the actual data-generating process, posteriors tend to be biased and/or overly concentrated. In this paper, we are interested the problem of model misspecification for a particular, but common, class of Bayesian hierarchical models (BHMs): those that involve a latent function, such as the primordial power spectrum in cosmology (e.g., [1]) or the population model in genetics (e.g., [2]).

Simulation-based inference (SBI) only provides the posterior of top-level target parameters and marginalizes over all other latent variables of the BHM. Alone, it is therefore unable to diagnose whether the model is misspecified. Key insights regarding the issue of model misspecification can usually be obtained from the posterior distribution of the latent function, as there often exists an independent theoretical understanding of its values. An approximate posterior for the latent function (a much higher-dimensional quantity than the target vector of parameters) can be obtained using selfi (simulator expansion for likelihood-free inference, [1]), an approach based on the likelihood of an alternative parametric model, constructed by linearizing model predictions around an expansion point.

This paper presents a framework that combines selfi and SBI while recycling the necessary simulations. The simulator is first linearized to obtain the selfi posterior of the latent function. Next, the same simulations are used for data compression to the score function (the gradient of the log-likelihood with respect to the parameters), and the final SBI posterior of target parameters is obtained.

2. Method

2.1. Bayesian Hierarchical Models with a Latent Function

In this paper, we assume a given BHM consisting of the following variables:

ω \in R^{N}

(vector of N target parameters),

θ \in R^{S}

(vector containing the values of the latent function

θ

at S support points),

Φ \in R^{P}

(data vector of P components), and

\tilde{ω} \in R^{N}

(compressed data vector of size N). We typically expect

N \sim O (5 - 10)

target parameters,

S \sim O (10^{2} - 10^{3})

support points; P can be any number and as large as

O (10^{7})

for complex data models. We further assume that

ω

and

θ

are linked by a deterministic function T, usually theoretically well-understood and numerically cheap. Therefore, the expensive and potentially misspecified part of the BHM is the probabilistic simulator linking the latent function

θ

to the data

Φ

,

P (Φ | θ)

. The deterministic compression step C linking

Φ

to

\tilde{ω}

is discussed in Section 2.4.

2.2. Latent Function Inference with SELFI

The first part of the framework proposed in this paper is to infer the latent function

θ

conditional on observed data

Φ_{O}

. This is an inference problem in high dimension (S, the number of support points for the latent function

θ

), which means that usual SBI frameworks, allowing a general exploration of parameter space, will fail and that stronger assumptions are required. selfi [1] relies upon the simplification of the inference problem around an expansion point

θ_{0}

.

The first assumption is a Taylor expansion (linearization) of the mean data model around

θ_{0}

. Namely, if

{\hat{Φ}}_{θ} \equiv E [Φ_{θ}]

is the expectation value of

Φ_{θ}

, where

Φ_{θ}

are simulations of

Φ

given

θ

(i.e.,

Φ_{θ} ⤺ P (Φ | θ)

), we assume that

{\hat{Φ}}_{θ} \approx f_{0} + \nabla f_{0} \cdot (θ - θ_{0}) \equiv f (θ),

(1)

where

f_{0} \equiv {\hat{Φ}}_{θ_{0}}

is the mean data model at the expansion point

θ_{0}

, and

\nabla f_{0}

is the gradient of

f_{0}

at the expansion point (for simplification, we note

\nabla f_{0} = \nabla_{θ} f_{0}

, where the gradient is taken with respect to

θ

). The second assumption is that the (true) implicit likelihood of the problem is replaced by a Gaussian effective likelihood:

P (Φ_{O} | θ) \equiv exp [{\hat{ℓ}}_{θ} (θ)]

with

- 2 {\hat{ℓ}}_{θ} (θ) \approx log |2 π C_{0}| + {[Φ_{O} - f (θ)]}^{⊺} C_{0}^{- 1} [Φ_{O} - f (θ)],

(2)

where

C_{0}

is the data covariance matrix at the expansion point

θ_{0}

.

The selfi framework is fully characterized by

f_{0}

,

C_{0}

, and

\nabla f_{0}

, which, if unknown, can be evaluated through forward simulations only. The numerical computation requires

N_{0}

simulations at the expansion point (to evaluate the empirical mean

f_{0}

and empirical covariance matrix

C_{0}

), and

N_{s}

simulations in each direction of parameter space (to evaluate the empirical gradient

\nabla f_{0}

via first-order forward finite differences). The total is

N_{0} + N_{s} \times S

simulations;

N_{0}

and

N_{s}

should be of the order of the dimensionality of the data space P, giving a total cost of

O (≳ P (S + 1))

model evaluations.

To fully characterize the Bayesian problem, one requires a prior on

θ

,

P (θ)

. Any prior can be used if one is ready to use numerical techniques to explore the posterior (such as standard Markov Chain Monte Carlo), using the linearized data model and Gaussian effective likelihood. However, a remarkable analytic result with selfi is that, if the prior is Gaussian with a mean equal to the expansion point

θ_{0}

, i.e.,

- 2 log P (θ) \equiv log |2 π S| + {(θ - θ_{0})}^{⊺} S^{- 1} (θ - θ_{0}),

(3)

then the effective posterior is also Gaussian:

- 2 log P (θ | Φ_{O}) \approx log |2 π Γ| + {(θ - γ)}^{⊺} Γ^{- 1} (θ - γ) .

(4)

The posterior mean and covariance matrix are given by

\begin{matrix} γ & \equiv & θ_{0} + Γ {(\nabla f_{0})}^{⊺} C_{0}^{- 1} (Φ_{O} - f_{0}), \end{matrix}

(5)

\begin{matrix} Γ & \equiv & {[{(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0} + S^{- 1}]}^{- 1} \end{matrix}

(6)

(see [1] Appendix B, for a derivation). They are fully characterized by the expansion variables

θ_{0}

,

f_{0}

,

C_{0}

, and

\nabla f_{0}

, as well as the prior covariance matrix

S

.

2.3. Check for Model Misspecification

The selfi posterior can be used as a check for model misspecification. Visually checking the reconstructed

γ

and

Γ

can yield interesting insights, especially if the latent function has some properties (such as an expected shape, periodicity, etc.) to which the data model may be sensitive if misspecified (see Section 4.2).

If a quantitative check for model misspecification is desired, we propose using the Mahalanobis distance between the reconstruction

γ

and the prior distribution

P (θ)

, defined formally by

d_{M} (θ, θ_{0} | S) \equiv \sqrt{{(θ - θ_{0})}^{⊺} S^{- 1} (θ - θ_{0})} .

(7)

The value of

d_{M} (γ, θ_{0} | S)

for the selfi posterior mean

γ

can be compared to an ensemble of values of

d_{M} (θ_{ω}, θ_{0} | S)

for simulated latent functions

θ_{ω} = T (ω)

, where samples

ω

are drawn from the prior

P (ω)

.

2.4. Score Compression and Simulation-Based Inference

Having checked the BHM for model misspecification, we now address the second part of the framework, aimed at inferring top-level parameters

ω

given observations. SBI is known to be difficult when the dimensionality of the data space P is high. For this reason, data compression is usually necessary. Data compression can be thought of as an additional layer at the bottom of the BHM, made of a deterministic function C acting on

Φ

. In practical scenarios, data compression shall preserve as much information about

ω

as possible, meaning that compressed summaries

C (Φ)

shall be as close as possible to sufficient summary statistics of

Φ

, i.e.,

P (ω | C (Φ)) = P (ω | Φ)

.

Here, we propose to use score compression [3]. We make the assumption (for compression only, not for later inference) that

P (Φ | ω)

is Gaussian distributed:

P (Φ_{O} | ω) \equiv exp [{\hat{ℓ}}_{ω} (ω)]

where

{\hat{ℓ}}_{ω} (ω) = {\hat{ℓ}}_{θ} (T (ω))

(see Equation (2)). The score function

\nabla_{ω} {\hat{ℓ}}_{ω 0}

is the gradient of this log-likelihood with respect to the parameters

ω

at a fiducial point

ω_{0}

in parameter space. Using as fiducial point the values that generate the selfi expansion point (i.e.,

ω_{0}

such that

θ_{0} = T (ω_{0})

), a quasi maximum-likelihood estimator for the parameters is

{\tilde{ω}}_{O} \equiv ω_{0} + F_{0}^{- 1} \nabla_{ω} {\hat{ℓ}}_{ω 0}

, where the Fisher matrix

F_{0}

and the gradient of the log-likelihood are evaluated at

ω_{0}

. Compression of

Φ_{O}

to

{\tilde{ω}}_{O}

yields N compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data [3].

In our case, the covariance matrix

C_{0}

is assumed not to depend on parameters (

\nabla_{ω} C_{0} = 0

), and the expression for

C (Φ)

is therefore

C (Φ) = \tilde{ω} \equiv ω_{0} + F_{0}^{- 1} [{(\nabla_{ω} f_{0})}^{⊺} C_{0}^{- 1} (Φ - f_{0})] .

(8)

The Fisher matrix of the problem further takes a simple form:

F_{0} \equiv - E [\nabla_{ω} \nabla_{ω} {\hat{ℓ}}_{ω 0} (ω)] = {(\nabla_{ω} f_{0})}^{⊺} C_{0}^{- 1} \nabla_{ω} f_{0} .

(9)

We therefore need to evaluate

\nabla_{ω} f_{0} = \nabla f_{0} \cdot {\frac{\partial T (ω)}{\partial ω}|}_{ω = ω_{0}} .

(10)

Importantly, in Equations (8)–(10),

C_{0}

and

\nabla f_{0}

have already been computed for latent function inference with selfi. The only missing quantity is the second matrix in the right-hand side of Equation (10), that is,

\nabla_{ω} T_{0}

, the gradient of T evaluated at

ω_{0}

. If unknown, its computation (e.g., via finite differences) does not require any more simulation of

Φ

. It is usually easy, as there are only N directions in parameter space and T is the numerically cheap part of the BHM. We note that, because we have to calculate

F_{0}

, we can easily get the Fisher–Rao distance between any simulated summaries

\tilde{ω}

and the observed summaries

{\tilde{ω}}_{O}

,

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O}) \equiv \sqrt{{(\tilde{ω} - {\tilde{ω}}_{O})}^{⊺} F_{0} (\tilde{ω} - {\tilde{ω}}_{O})},

(11)

which can be used by any non-parametric SBI method.

We specify a prior

P (ω)

(typically peaking at or centered on

ω_{0}

, for consistency with the assumptions made for data compression). Having defined C, we now have a full BHM that maps

ω

(of dimension N) to compressed summaries

\tilde{ω}

(of size N) and has been checked for model misspecification for the part linking

θ

to

Φ

. We can then proceed with SBI via usual techniques. These can include likelihood-free rejection sampling, but also more sophisticated techniques such as delfi (e.g., [4,5]) or bolfi (e.g., [6,7,8]).

3. Lotka–Volterra BHM

3.1. Lotka–Volterra Solver

The Lotka–Volterra equations describe the dynamics of an ecological system in which two species interact, as a pair of first-order non-linear differential equations:

\begin{matrix} \frac{d x}{d t} & = & α x - β x y, \end{matrix}

(12)

\begin{matrix} \frac{d y}{d t} & = & δ x y - γ y . \end{matrix}

(13)

where

x (t)

is the number of prey at time t, and

y (t)

is the number of predators at time t. The model is characterized by

ω = (α, β, γ, δ)

, a vector of four real parameters describing the interaction of the two species.

The initial conditions of the problem

\{x (0), y (0)\} = \{x_{0}, y_{0}\}

are assumed to be exactly known. Throughout the paper, timestepping and number of timesteps are fixed:

t_{i} = i Δ t

for

i \in 〚 0, S / 2 〛

.

The expression T is an algorithm that numerically solves the ordinary differential equations. For simplicity, we choose an explicit Euler method: for all

i \in 〚 0, S / 2 - 1 〛

,

\begin{matrix} x (t_{i + 1}) & = & x (t_{i}) \times [1 + α - β y (t_{i})] \times Δ_{t}, \end{matrix}

(14)

\begin{matrix} y (t_{i + 1}) & = & y (t_{i}) \times [1 + δ x (t_{i}) - γ] \times Δ_{t} . \end{matrix}

(15)

The latent function

θ (t)

is a concatenation of

x (t)

and

y (t)

evaluated at the timesteps of the problem. The corresponding vector is

θ \equiv \{{\{x (t_{i})\}}_{0 \leq i < S / 2}, {\{y (t_{i})\}}_{0 \leq i < S / 2}\}

of size S.

3.2. Lotka–Volterra Observer

3.2.1. Full Data Model

To go from

θ

to

Φ

, we assume a complex, probabilistic observational process of prey and predator populations, later referred to as “model A” and defined as follows.

Signal. The (unobserved) signal

s_{z}

is a delayed and non-linearly perturbed observation of the true population function for species

z \in \{x, y\}

, modulated by some seasonal efficiency

e_{z} (t)

. Formally,

s_{x} (0) = x_{0}

,

s_{y} (0) = y_{0}

, and for

i \in 〚 0, S / 2 - 1 〛

,

\begin{matrix} s_{x} (t_{i + 1}) & = & e_{x} (t_{i}) [x (t_{i}) - p x (t_{i}) y (t_{i}) + q x {(t_{i})}^{2}], \end{matrix}

(16)

\begin{matrix} s_{y} (t_{i + 1}) & = & e_{y} (t_{i}) [y (t_{i}) + p x (t_{i}) y (t_{i}) - q y {(t_{i})}^{2}] . \end{matrix}

(17)

These equations involve two parameters: p accounts for hunts between

t_{i}

and

t_{i + 1}

(temporarily making prey more likely to hide and predators more likely to be visible), and q accounts for the gregariousness of prey and independence of predators. The free functions

e_{x} (t)

and

e_{y} (t)

, valued in

[0, 1]

, describe how prey and predators are likely to be detectable at any time, accounting, for example, for seasonal variation (hibernation, migration).

Noise. The signal

s_{z}

is subject to additive noise, giving a noisy signal

u_{z} (t) = s_{z} (t) + n_{z}^{D} (t) + n_{z}^{O} (t)

, where the noise has two components:

Demographic Gaussian noise with zero mean and variance proportional to the true underlying population, i.e., $n_{x}^{D} (t) ⤺ G [0, r x (t)]$ and $n_{y}^{D} (t) ⤺ G [0, r y (t)]$ . The parameter r gives the strength of demographic noise.
Observational Gaussian noise that accounts for observer efficiency, coupling prey and predators such that

$(\begin{matrix} n_{x}^{O} (t) \\ n_{y}^{O} (t) \end{matrix}) ⤺ G [(\begin{matrix} 0 \\ 0 \end{matrix}), s (\begin{matrix} y (t) & t \sqrt{x (t) y (t)} \\ t \sqrt{x (t) y (t)} & x (t) \end{matrix})] .$

(18)

The parameter s gives the overall amplitude of observational noise, and the parameter t controls the strength of the non-diagonal component (it should be chosen such that the covariance matrix appearing in Equation (18) is positive semi-definite).

Censoring. Finally, observed data are a censored and thresholded version of the noisy signal: for each timestep

t_{i}

,

Φ_{z} (t_{i}) = m_{z} (t_{i}) \times min [u_{z} (t_{i}), M_{z}]

, where

M_{z}

is the maximum number of prey or predators that can be detected by the observer, and

m_{z}

is a mask (taking either the value 0 or 1). Masked data points are discarded. The data vector is

Φ = \{\{Φ_{x} (t_{i})\}, \{Φ_{y} (t_{i})\}\}

. It contains

P \leq S

elements depending on the number of masked timesteps for each species z (formally,

P = \sum_{i = 0}^{S / 2 - 1} (δ_{K}^{m_{x} (t_{i}), 1} + δ_{K}^{m_{y} (t_{i}), 1})

, where

δ_{K}

is a Kronecker delta symbol).

All of the free parameters (p, q, r, s, t,

M_{x}

,

M_{y}

) and free functions (

e_{x} (t)

,

e_{y} (t)

,

m_{x} (t)

,

m_{y} (t)

) appearing in the Lotka–Volterra observer data model described in this section are assumed known and fixed throughout the paper. Parameters used are

x_{0} = 10

,

y_{0} = 5

,

p = 0.05

,

q = 0.01

,

r = 0.15

,

s = 0.05

,

t = 0.2

.

3.2.2. Simplified Data Model

In this section, we introduce “model B”, a simplified (misspecified) data model linking

θ

to

Φ

. Model B assumes that underlying functions are directly observed, i.e.,

s_{z} (t) = z (t)

. It omits observational noise, such that

u_{z} (t) = s_{z} (t) + n_{z}^{D} (t)

. In model B, parameters p, q, s, and t are not involved, and the value of r (strength of demographic noise) can be incorrect (we used

r = 0.105

). Finally, model B fails to account for the thresholds:

Φ_{z} (t) = m_{z} (t) u_{z} (t)

.

4. Results

In this section, we apply the two-step inference method described in Section 2 to the Lotka–Volterra BHM introduced in Section 3. We generate mock data

Φ_{O}

from model A, using ground truth parameters

ω_{gt} = (α_{gt}, β_{gt}, γ_{gt}, δ_{gt}) = (0.55, 0.2, 0.2, 0.05)

. We assume that ground truth parameters are known a priori with a precision of approximately

3 %

. Consistently, we choose a Gaussian prior

P (ω)

with mean

ω_{0} = (0.5768, 0.1963, 0.1968, 0.0484)

and diagonal covariance matrix

diag (0 . 0173^{2}, 0 . 0059^{2}, 0 . 0059^{2}, 0 . 0015^{2})

.

4.1. Inference of Population Functions with SELFI

We first seek to reconstruct the latent population functions

x (t)

and

y (t)

, conditional on the data

Φ_{O}

, using selfi. We choose as an expansion point the population functions simulated from the mean of the prior on

ω

, i.e.,

θ_{0} = T (ω_{0})

. We use

N_{0} = 150

and

N_{s} = 100

; the computational workload is therefore a fixed number of

10, 150

simulations for each model. It is known a priori and perfectly parallel.

We adopt a Gaussian prior

P (θ)

and combine it with the effective likelihood to obtain the selfi effective posterior

P (θ | Φ_{O})

. Figure 1 (left panels) shows the inferred population functions

γ

in comparison with the prior mean and expansion point

θ_{0}

and the ground truth

θ_{gt}

. The figure shows

2 σ

credible regions for the prior and the posterior (i.e.,

2 \sqrt{diag (S)}

and

2 \sqrt{diag (Γ)}

, respectively). The full posterior covariance matrix

Γ

for each model is shown in the rightmost column of Figure 1.

4.2. Check for Model Misspecification

The inferred population functions allow us to check for model misspecification. From Figure 1, it is clear that model B fails to produce a plausible reconstruction of population functions: model B breaks the (pseudo-)periodicity of the predator population function

y (t)

, which is a property required by the model. In the bottom left-hand panels, the red lines differ in shape from fiducial functions

T (ω)

(grey lines), and the credible intervals exclude the expansion point. On the contrary, with model A, the reconstructed population functions are consistent with the expansion point. The inference is unbiased, as the ground truth typically lies within the

2 σ

credible region of the reconstruction.

As a quantitative check, we compute the Mahalanobis distance between

γ

and

P (θ)

(Equation (7)) for each model. We find that

d_{M} (γ, θ_{0} | S)

is much smaller for model A than for model B (

5.35

versus

12.54

). The numbers can be compared to the empirical mean among our set of fiducial populations functions,

〈d_{M} (T (ω_{n}), θ_{0} | S)〉 = 9.43

.

At this stage, we therefore consider that model B is excluded, and we proceed further with model A.

4.3. Score Compression

As T is numerically cheap, we get

\nabla_{ω} T_{0}

via sixth-order central finite differences around

ω_{0}

, then obtain

\nabla_{ω} f_{0}

using Equation (10). This does not require any further evaluation of the data model

P (Φ | θ)

, as

\nabla f_{0}

has already been computed.

Using Equations (8) and (9), we compress

Φ_{O}

and obtain

{\tilde{ω}}_{O} = (0.7050, 0.2287, 0.1471, 0.0415)

.

4.4. Inference of Parameters Using Likelihood-Free Rejection Sampling

As a last step, we infer top-level parameters

ω

given compressed summaries

{\tilde{ω}}_{O}

. As the problem studied in this paper is sufficiently simple, we rely on the simplest solution for SBI, namely likelihood-free rejection sampling (sometimes also known as approximate Bayesian computation, e.g., [9]). To do so, we use the Fisher–Rao distance between simulated

\tilde{ω}

and observed

{\tilde{ω}}_{O}

, which comes naturally from score compression (see Equation (11)), and we set a threshold

ε = 2

. We draw samples from the prior

P (ω)

, simulate

\tilde{ω}

, then accept

ω

as a sample of

P (ω | {\tilde{ω}}_{O})

if

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O}) < ε

, and reject it otherwise.

In Figure 2, we find that the inference of top-level parameters is unbiased, with the ground truth

ω_{gt}

(dashed lines) lying within the

2 σ

credible region of the posterior. We observe that the data correctly drive some features that are not built into the prior, for instance, the degeneracy between

α

and

γ

, respectively, the reproduction rate of prey and the mortality rate of predators.

5. Conclusions

One of the biggest challenges in statistical data analysis is checking data models for misspecification, so as to obtain meaningful parameter inferences. In this work, we described a novel two-step simulation-based Bayesian approach, combining selfi and SBI, which can be used to tackle this issue for a large class of models. BHMs to which the approach can be applied involve a latent function depending on parameters and observed through a complex probabilistic process. They are ubiquitous, e.g., in astrophysics and ecology.

In this paper, we introduced a prey–predator model, consisting of a numerical solver of the Lotka–Volterra system of equations and of a complex observational process of population functions. As a proof of concept, we applied our technique to this model and to a simplified (misspecified) version of it. We demonstrated successful identification of the misspecified model and unbiased inference of the parameters of the correct model.

In conclusion, the method developed constitutes a computationally efficient and easily applicable framework to perform SBI of BHMs while checking for model misspecification. It allows one to infer the latent function as an intermediate product, then to perform score compression at no additional simulation cost. This study opens up a new avenue to increase the robustness and reliability of Bayesian data analysis using fully non-linear, simulator-based models.

Funding

This work was done within the Aquila Consortium (https://aquila-consortium.org, accessed on 31 October 2022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and data underlying this paper, as well as additional plots, have been made publicly available as part of the pyselfi code at https://pyselfi.florent-leclercq.eu (accessed on 31 October 2022).

Conflicts of Interest

The author declares no conflict of interest.

References

Leclercq, F.; Enzi, W.; Jasche, J.; Heavens, A. Primordial power spectrum and cosmology from black-box galaxy surveys. Mon. Not. R. Astron. Soc. 2019, 490, 4237–4253. [Google Scholar] [CrossRef]
Rousset, F. Inferences from Spatial Population Genetics. In Handbook of Statistical Genetics; John Wiley & Sons, Ltd.: London, UK, 2007; Chapter 28; pp. 945–979. [Google Scholar] [CrossRef]
Alsing, J.; Wandelt, B. Generalized massive optimal data compression. Mon. Not. R. Astron. Soc. Lett. 2018, 476, L60–L64. [Google Scholar] [CrossRef] [Green Version]
Papamakarios, G.; Murray, I. Fast ϵ-free Inference of Simulation Models with Bayesian Conditional Density Estimation. In Advances in Neural Information Processing Systems 29: Proceedings of the 30th International Conference on Neural Information Processing Systems, 5–10 December 2016, Barcelona, Spain; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 1036–1044. [Google Scholar]
Alsing, J.; Wandelt, B.; Feeney, S. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology. Mon. Not. R. Astron. Soc. 2018, 477, 2874–2885. [Google Scholar] [CrossRef] [Green Version]
Gutmann, M.U.; Corander, J. Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models. J. Mach. Learn. Res. 2016, 17, 1–47. [Google Scholar]
Leclercq, F. Bayesian optimization for likelihood-free cosmological inference. Phys. Rev. D 2018, 98, 063511. [Google Scholar] [CrossRef] [Green Version]
Thomas, O.; Pesonen, H.; Sá-Leão, R.; de Lencastre, H.; Kaski, S.; Corander, J. Split-BOLFI for misspecification-robust likelihood free inference in high dimensions. arXiv, 2020; arXiv:2002.09377v1. [Google Scholar]
Beaumont, M.A. Approximate Bayesian Computation. Annu. Rev. Stat. Its Appl. 2019, 6, 379–403. [Google Scholar] [CrossRef]

Figure 1. selfi inference of the population function

θ

given the observed data

Φ_{O}

, used as a check for model misspecification. Left panels: the prior mean and expansion point

θ_{0}

and the effective posterior mean

γ

are represented as yellow and green/red lines, respectively, with their

2 σ

credible intervals. For comparison, simulations

T (ω)

with

ω ⤺ P (ω)

, and the ground truth

θ_{gt}

are shown in grey and blue, respectively. Middle and right panels: the prior covariance matrix

S

and the posterior covariance matrix

Γ

, respectively. The first row corresponds to model A (see Section 3.2.1) and the second row to model B (see Section 3.2.2).

Figure 1. selfi inference of the population function

θ

given the observed data

Φ_{O}

, used as a check for model misspecification. Left panels: the prior mean and expansion point

θ_{0}

and the effective posterior mean

γ

are represented as yellow and green/red lines, respectively, with their

2 σ

credible intervals. For comparison, simulations

T (ω)

with

ω ⤺ P (ω)

, and the ground truth

θ_{gt}

are shown in grey and blue, respectively. Middle and right panels: the prior covariance matrix

S

and the posterior covariance matrix

Γ

, respectively. The first row corresponds to model A (see Section 3.2.1) and the second row to model B (see Section 3.2.2).

Figure 2. Simulation-based inference of the Lotka–Volterra parameters

ω = (α, β, γ, δ)

given the compressed observed data

{\tilde{ω}}_{O}

. Plots in the lower corner show two-dimensional marginals of the prior

P (ω)

(yellow contours) and of the SBI posterior

P (ω | {\tilde{ω}}_{O})

(green contours), using a threshold

ε = 2

on the Fisher–Rao distance between simulated

\tilde{ω}

and observed

{\tilde{ω}}_{O}

,

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O})

. Contours show 1, 2, and

3 σ

credible regions. Plots on the diagonal show one-dimensional marginal distributions of the parameters, using the same color scheme. Dotted and dashed lines denote the position of the fiducial point for score compression

ω_{0}

and of the ground truth parameters

ω_{gt}

, respectively. The scatter plots in the upper corner illustrate score compression for pairs of parameters. There, red dots represent some simulated samples. Larger dots show some accepted samples (i.e., for which

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O}) < ε

), with a color map corresponding to the value of one component of

\tilde{ω}

. In the color bars, pink lines denote the mean and

1 σ

scatter among accepted samples of the component of

\tilde{ω}

, and the orange line denotes its value in

{\tilde{ω}}_{O}

.

Figure 2. Simulation-based inference of the Lotka–Volterra parameters

ω = (α, β, γ, δ)

given the compressed observed data

{\tilde{ω}}_{O}

. Plots in the lower corner show two-dimensional marginals of the prior

P (ω)

(yellow contours) and of the SBI posterior

P (ω | {\tilde{ω}}_{O})

(green contours), using a threshold

ε = 2

on the Fisher–Rao distance between simulated

\tilde{ω}

and observed

{\tilde{ω}}_{O}

,

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O})

. Contours show 1, 2, and

3 σ

credible regions. Plots on the diagonal show one-dimensional marginal distributions of the parameters, using the same color scheme. Dotted and dashed lines denote the position of the fiducial point for score compression

ω_{0}

and of the ground truth parameters

ω_{gt}

, respectively. The scatter plots in the upper corner illustrate score compression for pairs of parameters. There, red dots represent some simulated samples. Larger dots show some accepted samples (i.e., for which

d_{FR} (\tilde{ω}, {\tilde{ω}}_{O}) < ε

), with a color map corresponding to the value of one component of

\tilde{ω}

. In the color bars, pink lines denote the mean and

1 σ

scatter among accepted samples of the component of

\tilde{ω}

, and the orange line denotes its value in

{\tilde{ω}}_{O}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leclercq, F. Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification. Phys. Sci. Forum 2022, 5, 4. https://doi.org/10.3390/psf2022005004

AMA Style

Leclercq F. Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification. Physical Sciences Forum. 2022; 5(1):4. https://doi.org/10.3390/psf2022005004

Chicago/Turabian Style

Leclercq, Florent. 2022. "Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification" Physical Sciences Forum 5, no. 1: 4. https://doi.org/10.3390/psf2022005004

Article Menu

Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification^†

Abstract

1. Introduction

2. Method

2.1. Bayesian Hierarchical Models with a Latent Function

2.2. Latent Function Inference with SELFI

2.3. Check for Model Misspecification

2.4. Score Compression and Simulation-Based Inference

3. Lotka–Volterra BHM

3.1. Lotka–Volterra Solver

3.2. Lotka–Volterra Observer

3.2.1. Full Data Model

3.2.2. Simplified Data Model

4. Results

4.1. Inference of Population Functions with SELFI

4.2. Check for Model Misspecification

4.3. Score Compression

4.4. Inference of Parameters Using Likelihood-Free Rejection Sampling

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification †

Abstract

1. Introduction

2. Method

2.1. Bayesian Hierarchical Models with a Latent Function

2.2. Latent Function Inference with SELFI

2.3. Check for Model Misspecification

2.4. Score Compression and Simulation-Based Inference

3. Lotka–Volterra BHM

3.1. Lotka–Volterra Solver

3.2. Lotka–Volterra Observer

3.2.1. Full Data Model

3.2.2. Simplified Data Model

4. Results

4.1. Inference of Population Functions with SELFI

4.2. Check for Model Misspecification

4.3. Score Compression

4.4. Inference of Parameters Using Likelihood-Free Rejection Sampling

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Simulation-Based Inference of Bayesian Hierarchical Models While Checking for Model Misspecification^†