A limit result for the prior predictive applied to checking for prior-data conflict

https://doi.org/10.1016/j.spl.2011.02.025Get rights and content

Abstract

We consider checking for prior-data conflict in a Bayesian analysis via a tail probability based on the prior predictive distribution. We establish the appropriateness of this measure in the sense that the limiting value of the tail probability measures the extent to which the true value of the parameter is a surprising value from the prior.

Introduction

The relevance of the results of a statistical analysis depends upon the inputs chosen by the analyst. If the inputs are deemed not to be appropriate, then one has reason to doubt any conclusions drawn. For a Bayesian statistical analysis, these inputs comprise the sampling model {Pθ:θΘ} for the data considered here as a collection of probability measures, one of which is supposed to have generated the observed data, the prior Π on the model parameter and perhaps a loss function. One way to assess the relevance of these inputs is to see whether or not these make sense in light of the data collected. From this point of view, we have a model failure when the data observed is surprising for every probability distribution in the model. In this paper, we are concerned with assessing whether or not there is a prior-data conflict.

Intuitively, prior-data conflict arises when the likelihood is relatively high, where the prior is relatively low. While this seems easy to assess via a graph when dealing with a one-dimensional parameter, more formal methods seem necessary in general. Various approaches have been proposed for assessing prior-data conflict; see for example, Young and Pettit (1996), Evans and Moshonov (2006), and Marshall and Spiegelhalter (2007). Some Bayesian model checking methods, such as those discussed in Box (1980) and Gelman et al. (1996) could also be considered as assessments of the prior although this is confounded with checking the model. Separating out the assessment of the prior from the model gives greater information concerning which of these choices might be in conflict with the data. Box (1980) proposed the tail probability M(m(X)m(x)), where x is the observed data, and m(x)=Θfθ(x)Π(dθ) is the density of the data associated with the prior predictive measure M on the sample space X, where Pθ has density fθ. We show in Example 1 that this is not appropriate for checking the prior.

In this paper, we prove a consistency result for the check for prior-data conflict discussed in Evans and Moshonov, 2006, Evans and Moshonov, 2007. Suppose that T:XT is a minimal sufficient statistic for {Pθ:θΘ} with density fθT on T. The tail probability MT(mT(T)mT(T(x))), was proposed for checking for prior-data conflict where MT is the prior predictive distribution of T. The following example motivates why (1) is suitable for checking for prior-data conflict.

Example 1 Location Normal

Suppose that x=(x1,,xn) is a sample from a N(μ,1) distribution where μR1 is unknown. Then a minimal sufficient statistic is given by Tn(x)=x̄ and Tn(x) converges almost surely to the true value μtrue as n. Suppose we put a N(μ0,σ02) prior on μ. Then MTn is the N(μ0,σ02+1/n) distribution and this converges in distribution to the N(μ0,σ02) distribution. Also mTn(t) converges almost surely to the prior density (2π)1/2σ01exp{(tμ0)2/2σ02} uniformly for t in a compact set. A simple computation then shows that (1) converges almost surely to 2(1Φ(|μtrueμ0|/σ0)) which assesses how far out in the tails of the prior μtrue lies.

Now consider the Box (1980) tail probability for this problem. We have that XNn(0,In+τ1n1n) where In is the n×n identity matrix, 1n is a vector of n ones, and M(m(X)m(x))=1Gn(x(Inτ1+nτ1n1n)x) where Gn is the chi-squared(n) distribution function. The quadratic form can be decomposed as x(Inτ1+nτ1n1n)x=i=1n(xix̄n)2+n1+nτx̄n2=Vn+Wn, where, conditionally given θ,Vnχ2(n1),Wn=Op(1) and Vn and Wn are independent. Now (χ2(n)n)/2ndN(0,1) and Gn(n+x2n)Φ(x)=O(n1/2) uniformly in x, by Theorem XVI.4.1 in Feller (1971). Hence M(m(X)m(x))=1Φ((Vn(n1))/2n)+Op(n1/2) where we have used the uniform continuity of Φ. Since (Vn(n1))/2n=(Vn(n1))/2(n1)×11/ndN(0,1) we have that M(m(X)m(x))dUniform(0,1). This limit is independent of the prior and whether or not μtrue is in the tails of the prior. Therefore, this tail probability is not useful for checking for prior-data conflict.

While the potential ill effects of a prior-data conflict have long been recognized, it is not clear what one should do when we conclude that a conflict exists. One can note, however, that we have learned something of relevance and it seems only fair that an analyst report this. Also, we note that the situation is similar with model checking as it is not clear what we should do when we have a failure and this does excuse us from these checks. The typical response to model failure is that we must modify the model in some way, perhaps by enlarging the family of distributions. Similarly, when a prior-data conflict exists, our response can be to use a new prior that is less informative in the sense that we can expect fewer prior-data conflicts. We discuss this in Section 4.

A criticism of (1) is that, in the case of continuous models, (1) is not invariant under smooth transformations. For suppose that W:TW is 1-1 and smooth and let JW(t) be the reciprocal of the Jacobian determinant of W evaluated at t. Then W is also minimal sufficient and (1) applied to W gives the tail probability MT(mT(T)JW(T)mT(T(x))JW(T(x))) which is generally different than (1). This issue is avoided if we use the approach discussed in Evans and Jang (2010a) to get the invariant tail probability MT(mT(T)mT(T(x))) where mT(t)=mT(t)E(JT1(X)T(X)=t),JT(x)=|det(dT(x)dT(x))|1/2 and dT is the differential of T. The factor E(JT1(X)T(X)=t) corrects for volume distortions due to the transformation T. Note that whenever T is linear, then E(JT1(X)T(X)=t) is constant and the invariant tail probability is the same as (1). This is the case for all but one of our examples. We state a relevant convergence result for this tail probability in Section 2.

In Section 2, we provide theorems, with proofs in the Appendix, for the convergence of (1) to the tail probability Π(π(θ)π(θtrue)) where θtrue is the true value of the parameter, i.e., (1) is a consistent assessment of whether or not the true value of the parameter is in the tails of the prior. In Section 3, we provide some applications. In Section 4, we discuss what one can do when a prior-data conflict is encountered.

Section snippets

Consistency of the check

We consider the behavior of (1) as the amount of data grows. We have the following generalization of Example 1.

Theorem 1

Suppose ΘRk is open and (i) Tnθ a.s. Pθ for every θ, (ii) mTn(t)π(t) uniformly on compact subsets of Θ, (iii) π is continuous and the prior distribution of π(θ) has no atoms, then MTn(mTn(Tn)mTn(Tn(xn)))Π(π(θ)π(θtrue)) a.s. Pθtrue.

Note that our discussion here is restricted to situations where the minimal sufficient statistic is a consistent estimator of the true value which is

Examples

For these examples the details associated with establishing Theorem 1(ii) are similar to the proof of Theorem 2 and can be found in Evans and Jang (2010b).

Example 2 Scale-Gamma

Let x=(x1,,xn) be a sample from a Gamma(α0,θ) distribution where the scale parameter θ>0 is unknown. Then Tn(x)=(nα0)1i=1nxiGamma(nα0,θ/(nα0)) is minimal sufficient and Tn(x)a.s.θtrue. When π satisfies Theorem 1(iii), then Theorem 1(ii) holds and Theorem 1 applies.

The following example uses Example 2 in a problem of considerable

Resolving a prior-data conflict

There are several possible courses of action when we find that a given prior is in conflict with the data. First we note that, as we increase the amount of data it is typical that the effect of the prior disappears. So even though a prior-data conflict may exist, it may be that we can ignore it as the prior has little effect on the analysis. Diagnostics for assessing this are discussed in Evans and Moshonov (2006) and these involve comparing posterior inferences under the prior with those under

Acknowledgements

The authors thank the Editor and referees for some helpful comments.

References (13)

  • J. Berger et al.

    On the development of the reference prior method

  • J.O. Berger et al.

    The formal definition of reference priors

    Ann. Statist.

    (2009)
  • G.E.P. Box

    Sampling and Bayes’ inference in scientific modelling and robustness

    J. R. Stat. Soc. Ser. A

    (1980)
  • Evans, M., Jang, G.H., 2009. Weak informativity and the information in one prior relative to another. Tech. Rep. No....
  • M. Evans et al.

    Invariant P-values for model checking

    Ann. Statist.

    (2010)
  • Evans, M., Jang, G.H., 2010b. A limit result for the prior predictive. Tech. Rep. No. 1004. Department of Statistics,...
There are more references available in the full text version of this article.

Cited by (32)

  • Measuring statistical evidence using relative belief

    2016, Computational and Structural Biotechnology Journal
    Citation Excerpt :

    Such a check is carried out by computing a tail probability based on the prior predictive distribution of a minimal sufficient statistic (see Evans and Moshonov [20,21]). In Evans and Jang [16] it is proved that this tail probability is consistent in the sense that, as the amount of data grows, it converges to a probability that measures how far into the tails of the prior the true value of θ lies. Here “lying in the tails” is interpreted as indicating that a prior-data conflict exists since the data is not coming from a distribution where the prior assigns most of the belief.

  • Bayesian statistics and modelling

    2021, Nature Reviews Methods Primers
View all citing articles on Scopus
View full text