On families of beta- and generalized gamma-generated distributions and associated inference

https://doi.org/10.1016/j.stamet.2008.12.003Get rights and content

Abstract

A general family of univariate distributions generated by beta random variables, proposed by Jones, has been discussed recently in the literature. This family of distributions possesses great flexibility while fitting symmetric as well as skewed models with varying tail weights. In a similar vein, we define here a family of univariate distributions generated by Stacy’s generalized gamma variables. For these two families of univariate distributions, we discuss maximum entropy characterizations under suitable constraints. Based on these characterizations, an expected ratio of quantile densities is proposed for the discrimination of members of these two broad families of distributions. Several special cases of these results are then highlighted. An alternative to the usual method of moments is also proposed for the estimation of the parameters, and the form of these estimators is particularly amenable to these two families of distributions.

Introduction

Recently, attempts have been made to define new families of probability distributions that extend well-known families of distributions and at the same time provide great flexibility in modelling data in practice. One such example is a broad family of univariate distributions generated from the beta distribution, proposed by Jones [1] (see also [2]), which extends the original beta family of distributions with the incorporation of two additional parameters. These parameters control the skewness and the tail weight. Earlier, with a similar goal in mind, Eugene et al. [3] defined the family of beta-normal distributions and discussed its properties.

Following the notation of Jones [1], the class of “beta-generated distributions” is defined as follows. Consider a continuous distribution function F with density function f. Then, the univariate family of distributions generated by F, and the parameters α,β>0, has its pdf as [1]gF(B)(x;α,β)=1B(α,β)f(x){F(x)}α1{1F(x)}β1,α>0andβ>0, where B(α,β)=01tα1(1t)β1dt is the complete beta function. Thus, this family of distributions has its cdf as GF(B)(x)=IF(x)(α,β),α>0andβ>0, where the function IF(x) denotes the incomplete beta ratio defined by Iy(α,β)=By(α,β)B(α,β), where By(α,β)=0ytα1(1t)β1dt,0<y<1, is the incomplete beta function.

Following the terminology of Arnold in the discussion of Jones’ [1] paper, the distribution F will be referred to as the “parent distribution” in what follows. Based on Jones and Larsen [4], the attractiveness of (1) is that from a symmetric f as parent pdf (corresponding to α=β=1), a large family of distributions can be generated with the parameters α and β controlling the skewness and the tail weight. The expression in (2) reveals that it is quite easy to simulate observations from XGF(B), as shown by Jones [1], through the relationship X=F1(B), where BBeta(α,β). The case α=β=1 corresponds to the well-known quantile function representation X=F1(U), where UU(0,1), which is used in order to generate data from the distribution F. Finally, in the case when α and β are positive integers, the beta-generated model in (1) is the distribution of the ith order statistic in a random sample of size n from distribution F, where i=α and n=α+β1.

The idea of beta-generated family of distributions gF stemmed from the paper of Eugene et al. [3], wherein the beta-normal distribution was introduced and its properties were studied. Specifically, if ϕ denotes the density of the normal distribution and Φ the corresponding distribution function, then gΦ is the beta-normal distribution considered by Eugene et al. [3]. Some other beta-generated families of distributions have also been discussed in the literature. For example, the beta-exponential distribution has been defined and studied by Nadarajah and Kotz [5]. Similarly, the beta-logistic distribution can also be generated through the beta variable, but it has been known as a tractable set of statistical models based on the logarithm of a F-variate; see [6]. The beta-logistic distribution has been reviewed by Jones [1] who has also discussed the skew-t distributions. All these members of the beta-generated family in (1) and some others have been reobtained by means of the maximum entropy principle by Zografos [7] who has also considered the beta-Weibull distribution through this principle. Jones’ family of beta-generated distributions in (1) has received great attention recently. Arnold et al. [8] introduced and studied a multivariate version of this family, while Ferreira and Steel [9], [10] used it as a skewing mechanism for constructing skewed distributions.

The aim of this paper is two-fold. In the first part, we will concentrate on the family in (1). Our main concern will then be to construct procedures to discriminate between members of this family. In other words, our aim will be to derive a test which would enable us to decide if a random sample from (1) is coming from a specific parent distribution F. The proposed procedures will be based on information theoretic methods and, in particular, on the maximum entropy principle. In the second part, we will define a broad family of univariate distributions, in the same vein as Jones’ family, through Stacy’s generalized gamma density generated by the parent distribution F.

In Section 2, we present suitable constraints leading to the maximum entropy characterization of the family in (1). Section 3 is devoted to the definition of a new family of distributions through Stacy’s generalized gamma density generated by the parent distribution F, and the derivation of constraints leading to its maximum entropy characterization. In Section 4, an alternative to the method of moments is discussed for the estimation of the parameters of beta- and generalized gamma-generated distributions with a parent distribution F. The constraints needed to obtain the maximum entropy characterization of these families of distributions enable us to introduce in Section 5 test statistics for the discrimination between members within these two families. Several univariate distributions, generated by beta and gamma models, will be presented in illustrative examples and their moments and Shannon entropies will be derived in a closed form.

Section snippets

Jones’ distribution and maximum entropy identification

The notion of entropy is of fundamental importance in different areas such as physics, probability and statistics, communication theory, and economics. Since Shannon’s [11] pioneering work on the mathematical theory of communication, Shannon entropy of a continuous distribution with density, say gF, defined by HSh(gF)=gF(x)lngF(x)dx, has become a major tool in information theory and in almost every branch of science and engineering. Closely related to the Shannon entropy is the maximum

Generalized gamma-generated by parent distribution F

A family of univariate continuous distributions will be introduced in this section through a particular case of Stacy’s generalized gamma distribution, in the same spirit as Jones’ family defined through the beta distribution. Some of the properties of this family will be explored and a maximum entropy characterization will be obtained which will be exploited later in Section 5 in order to construct a statistical test for the discrimination between members of this family.

Consider a continuous

An alternative method of moments

An alternative method of moments will be developed in this section for the estimation of the parameters of the beta and gamma families of skewed distributions which are generated by a parametric parent distribution F. For this purpose, let us suppose that the parent distribution function F of the skewed models (1), (11) involves a p-dimensional real parameter θ. We will denote by Fθ the parent distribution and by gFθ(B) and gFθ(G) the respective families (1), (11). In order to estimate the

Discriminating between members of beta and gamma distributions generated by the distribution F

The problem of testing whether some given observations can be considered as coming from one of two probability distributions is an old problem in statistics; see [23] and the references contained therein. In this framework, consider a random sample X1,X2,,Xn of size n from the family of Jones’ distributions with density (1). Our interest is to identify the specific model of (1) that is most appropriate to describe the data X1,X2,,Xn. We therefore need a way to discriminate between the models

Acknowledgement

A part of this paper has been done while the first author was visiting the Department of Mathematics and Statistics, of the McMaster University, Hamilton, Ontario, Canada in the summer of 2007.

References (23)

  • J.T.A.S. Ferreira et al.

    A constructive representation of univariate skewed distributions

    J. Amer. Statist. Assoc.

    (2006)
  • Cited by (351)

    View all citing articles on Scopus
    View full text