Analytical derivation of the reference prior by sequential maximization of Shannon's mutual information in the multi-group parameter case

https://doi.org/10.1016/j.jspi.2013.11.003Get rights and content

Highlights

  • Derivation of a non-informative prior by maximization of Shannon's mutual information.

  • The derived prior coincides with the reference prior proposed by Berger and Bernardo.

  • We present conditions under which the reference prior coincides with Jeffreys' prior.

Abstract

We provide an analytical derivation of a non-informative prior by sequential maximization of Shannon's mutual information in the multi-group parameter case assuming reasonable regularity conditions. We show that the derived prior coincides with the reference prior proposed by Berger and Bernardo, and that it can be considered as a useful alternative expression for the calculation of the reference prior. In using this expression we discuss the conditions under which an improper reference prior can be uniquely defined, i.e. when it does not depend on the particular choice of nested sequences of compact subsets of the parameter space needed for its construction. We also present the conditions under which the reference prior coincides with Jeffreys' prior.

Introduction

In many applications Bayes theorem is employed using priors that shall represent the absence of prior knowledge. The selection of according priors has long been researched, and several different principles have been suggested, cf., e.g. Kass and Wasserman (1996). The reference prior of Berger and Bernardo, 1989, Berger and Bernardo, 1992a, Berger and Bernardo, 1992b may be viewed as the currently favored one. For single-parameter problems it maximizes the expected Kullback–Leibler divergence between posterior and prior (Berger et al., 2009), and thus selects the prior such that it is least informative in a specified sense. For multi-parameter problems a sequence of conditional reference priors is constructed and combined. The resulting prior then generally depends on the parameter of interest, as well as on the ordering and grouping of the nuisance parameters.

In single-parameter problems the reference prior usually equals Jeffreys' prior, and this is also true for multi-parameter problems in case the whole parameter vector is of interest (Clarke and Barron, 1994, Datta and Gosh, 1996). However, in the presence of nuisance parameters the reference prior generally differs from Jeffreys' prior. In fact, while it is known that Jeffreys' prior often does not work well in such multi-parameter problems, the reference prior usually does. It typically produces proper posteriors, is invariant under reparameterization (as is Jeffreys' prior), avoids marginalization paradoxes, and HPDs usually show good frequentist coverage probabilities (Berger and Bernardo, 1992c).

Often, the resulting reference prior is improper, in which case care needs to be taken in its construction. A nested set of compact subsets is then used, and the reference prior is defined as a suitable limit of a sequence of proper priors determined on these subsets. This limit can depend on the particular choice of the compact subsets used in the construction of the reference prior (see, e.g. Berger and Bernardo, 1992b). Clarke and Yuan (2004) argue that this ‘makes good intuitive sense’ as ‘there just is not enough information to permit finite amounts of data to provide useful inferences’ in such cases.

In this paper we suggest an alternative procedure for the calculation of a non-informative prior based on a nested sequence of variational tasks, derived from an appropriate decomposition of Shannon's mutual information. We show that the resulting prior coincides with the reference prior as initially defined by Berger and Bernardo, 1989, Berger and Bernardo, 1992a, Berger and Bernardo, 1992b, Berger and Bernardo, 1992c assuming (reasonable) regularity assumptions. Furthermore, we study the problem of uniqueness of the reference prior, i.e. we discuss the conditions under which the reference prior does not depend on the particular choice of nested compact subsets. We also provide sufficient conditions for the reference prior to coincide with Jeffreys' prior. Throughout the paper we will make use of the following Regularity Conditions (RG): Let X1,,Xn be a sample of identically and independently distributed random variables whose distribution depends on a parameter (vector) θ. The regularity conditions are viewed to be satisfied when the Fisher information matrix exists and when the posterior for θ is asymptotically a normal distribution.

Section snippets

Multi-groups reference prior revisited

Let X1,,Xn be a sample of identically and independently distributed random variables with Xi~Fθ(·) and density f(xi|θ), where θ denotes the parameters of the corresponding distribution. Let the parameters θΩRK be partitioned into k groups, θ=(θ1T,θ2T,,θkT)T, ordered with respect to their significance, and let θ1 denote the parameter (vector) of interest. In order to determine a prior in the case k=1 Bernardo (1979) suggested the maximization of Shannon's mutual informationI(X,π(θ))=E(logπ(θ|

Summary

In the present paper we revisited the definition of the reference prior for several groups of parameters. An alternative expression for the calculation of the Berger and Bernardo, 1992a, Berger and Bernardo, 1992c reference prior was derived. The expression results from the sequential optimization of Shannon's mutual information and adds to the interpretation of the reference prior. Finally, conditions are discussed under which an improper reference prior is independent of the sequence of

Acknowledgments

The authors are grateful to the referees and the editor for their suggestions, which have improved the presentation in the paper.

References (13)

There are more references available in the full text version of this article.

Cited by (0)

View full text