Testing on the common mean of several normal distributions

https://doi.org/10.1016/j.csda.2008.07.024Get rights and content

Abstract

Point estimation of the common mean of several normal distributions with unknown and possibly unequal variances has attracted the attention of many researchers over the last five decades. Relatively less attention has been paid to the hypothesis testing problem, presumably due to the complicated sampling distribution(s) of the test statistics(s) involved. Taking advantage of the computational resources available nowadays there has been a renewed interest in this problem, and a few test procedures have been proposed lately including those based on the generalized p-value approach. In this paper we propose three new tests based on the famous Graybill–Deal estimator (GDE) as well as the maximum likelihood estimator (MLE) of the common mean, and these test procedures appear to work as good as (if not better than) the existing test methods. The two tests based on the GDE use respectively a first order unbiased variance estimate proposed by Sinha [Sinha, B.K., 1985. Unbiased estimation of the variance of the Graybill–Deal estimator of the common mean of several normal populations. The Canadian Journal of Statistics 13 (3), 243–247], as well as the little known exact unbiased variance estimator proposed by Nikulin and Voinov [Nikulin, M.S., Voinov, V.G., 1995. On the problem of the means of weighted normal populations. Qüestiió (Quaderns d’Estadistica, Sistemes, Informatica i Investigació Operativa) 19 (1–3), 93–106] (after we’ve fixed a small mistake in the final expression). On the other hand, the MLE, which doesn’t have a closed expression, uses a parametric bootstrap method proposed by Pal, Lim and Ling [Pal, N., Lim, W.K., Ling, C.H., 2007b. A computational approach to statistical inferences. Journal of Applied Probability & Statistics 2 (1), 13–35]. The extensive simulation results presented in this paper complement the recent studies undertaken by Krishnamoorthy and Lu [Krishnamoorthy, K., Lu, Y., 2003. Inferences on the common mean of several normal populations based on the generalized variable method. Biometrics 59, 237–247], and Lin and Lee [Lin, S.H., Lee, J.C., 2005. Generalized inferences on the common mean of several normal populations. Journal of Statistical Planning and Inference 134, 568–582].

Introduction

One of the oldest and most interesting problems in statistical sciences is the inference on a common mean of several normal distributions with unknown and possibly unequal variances. In literature, this is also known as Meta-Analysis where data from multiple sources are combined or integrated with a common objective. This problem arises in situations where different instruments or methods are used repeatedly to measure, say, blood alcohol level, or lead content in gasoline. One wishes to use multiple datasets, assuming normality, for an improved inference of the common mean, rather than relying on individual samples. As an application of the Meta-Analysis in clinical trials see Vazquez et al. (2007).

To formulate the present problem, assume that we have iid observations Xi1,,Xini from N(μ,σi2),1ik, where all parameters are assumed to be unknown. Define X̄i and Si as X̄i=j=1niXij/ni,Si=j=1ni(XijX̄i)2; where X̄iN(μ,σi2/ni),Siσi2χ(ni1)2(1ik), and these statistics are all mutually independent. Throughout this paper it is assumed that ni2(1ik) unless mentioned otherwise.

Inference on the common mean μ has its genesis in a balanced incomplete block design (BIBD) with uncorrelated random block effects. For the lth treatment effect (say, τ) one has two point estimates — namely, the intra-block estimate and the interblock estimate (say, τˆ and τˆ respectively). Under the usual design assumptions, τˆ and τˆ are independent, have normal distributions with common mean τ, but with unknown and possibly unequal variances. Hence we have a common mean problem with k=2 (see also Montgomery (1997), page-216–218). Apart from estimating the lth treatment effect, one may wish to verify whether it is 0 or not, and thereby getting into a hypothesis testing problem. Zacks, 1966, Zacks, 1970, who contributed significantly to the theoretical foundation of the above mentioned common mean problem, described how he got interested in this problem by an application in soil science (see Kempthorne et al. (1991)).

Point estimation of μ has drawn the attention of many researchers over the past five decades. When σi’s are all known, the optimal estimator (MLE, BLUE as well as UMVUE) is μˆ=i=1k(ni/σi2)X̄i/i=1k(ni/σi2). But when σi’s are all unknown, one encounters the real challenge to propose an efficient estimator of μ combining the individual sample means X̄i’s.

Note that the minimal sufficient statistic (X̄i,Si,i=1,2,,k) is not complete. As a result, one can not get the UMVUE (if it exists) using the standard Rao–Blackwell theorem on an unbiased estimator for estimating μ.

In our present problem, where σi’s are all unknown, the most appealing unbiased estimator of μ has been the Graybill–Deal estimator (GDE) given as μˆGDE=i=1k(ni/si2)X̄i/i=1k(ni/si2), where si2=Si/(ni1),1ik. Graybill and Deal (1959) obtained conditions on the ni’s for k=2 only for which μˆGDE has a smaller variance than each X̄i(i=1,2).

It is important to remember that for estimating μ the above μˆGDE is not the MLE. The MLE, which does not have any closed expression, is μˆMLE=i=1k(ni/σˆi(MLE)2)X̄i/i=1k(ni/σˆi(MLE)2), where σˆi(MLE)2 is the MLE of σi2(1ik) obtained by solving the following system of equations: σˆi(MLE)2=(Si/ni)+[{j=1knj(X̄iX̄j)/σˆj(MLE)2}/{j=1knj/σˆj(MLE)2}]2,i=1,2,,k. It is because of this complicated expression that the MLE has dampened the interest of many researchers. In a recent paper Pal et al. (2007a) compared the MLE with the GDE for k=2, and observed that the MLE can be advantageous in a heavily unbalanced case. For a comprehensive review on the point estimation of μ till 1990 one can refer to Kubokawa (1991).

The main objective of this paper is to consider testing H0:μ=μ0 vs. Ha:μμ0 based on the statistics in (1.1). For a better understanding, we may occasionally focus on the special case of k=2. This special case is easy to understand in the context of BIBD mentioned earlier, but can be extended suitably for general k. Cohen and Sackrowitz (1977) showed that each individual t-test based only on one sample is admissible (which has the highest power at some boundary region of the parameter space). This is somewhat surprising in the light of inadmissibility results obtained in point estimation where, as expected, the combined data is supposed to help in inferences pertaining to the common mean. For the special case k=2 and n1=n2=n, Cohen and Sackrowitz (1984) considered several tests, including a normal approximate test based on μˆGDE in (1.3). Their study of power comparison showed that there is no outright winner among the tests considered. It is worth mentioning that the variance of μˆGDE, which is Var(μˆGDE)=E[i=1kniσi2/si4/(i=1kni/si2)2], does not have a closed expression. As a result, it is not easy to get an estimate of the variance of μˆGDE. Sinha (1985), using a special case of Haff’s (1979) Wishart identity, provided a way to obtain an approximate unbiased estimate of Var(μˆGDE) up to any desired order. The first order unbiased estimate of Var(μˆGDE) thus obtained, and hence denoted by Var̂(1)(μˆGDE), is given as Var̂(1)(μˆGDE)=(i=1kni/si2)1[1+4i=1k(ni+1)1(ni/si2){i=1k(ni/si2)(ni/si2)2/(i=1kni/si2)2}1], which is comparable to the approximation due to Meier (1953). The above result is helpful because the studentized version ((μˆGDEμ)/{Var̂(μˆGDE)}1/2), which follows a N(0,1) asymptotically, can be used for testing as well as for interval estimation of μ. Cohen and Sackrowitz (1984) used Meier’s (1953) approximation along with a suitable critical value in their studentized test statistic.

In the context of the BIBD mentioned earlier, Cohen and Sackrowitz (1989) proposed a test combining individual tests by weighing with respect to their sample variances. This idea was extended by Zhou and Mathew (1993) who proposed two tests and compared their power functions with that of Fisher’s (1932) test. Though no clear-cut winner could be found, one of the tests proposed by Zhou and Mathew (1993) can work well provided one has some prior knowledge about the ratio of two variances.

The test methods of Fisher (1932) and Mathew et al. (1993) are worth mentioning here. Let t0i2 be an observed value of ni(X̄iμ0)2/si2(1ik). Let Pi=lnP(tni12>t0i2). Fisher’s test is based on the fact that 2i=1kPiχ2k2, and it rejects H0 in (1.6) if P(χ2k2>c)<α, where c is an observed value of 2i=1kPi. On the other hand, the approximate test due to Mathew et al. (1993) for k=2 suggests using T={n1B1+n2B2θˆ2+2ηθˆn1n2B1B2}/(n1+θˆn2)2, where Bi=ni(X̄iμ0)2/Si,Si=ni(X̄iμ0)2+Si(i=1,2); and θˆ=(n2/n1)(S1/S2)1/2. For an observed value t of T this test rejects H0 in (1.6) whenever P(Y>t)<α, where Y is a Beta(a,b) random variable with specified a=a(n1,n2,θˆ) and b=b(n1,n2,θˆ) provided by the above authors.

Recently Krishnamoorthy and Lu (2003) proposed a new test based on the generalized p-value approach suggested by Tsui and Weerahandi (1989). Though the generalized p-value approach doesn’t guarantee that the size would attain the nominal level α, the simulation results provided by Krishnamoorthy and Lu (2003) show that the nominal level is attained most of the time.

The generalized p-value approach, as initiated by Tsui and Weerahandi (1989), depends heavily on the selection of a suitable pivotal element which is not necessarily unique. The pivot (or, the ‘generalized variable’) used by Krishnamoorthy and Lu (2003) is TKL=i=1kWi{x̄iTisi/(ni(ni1))}, with Wi=(niUi/si)/(j=1knjUj/sj), where Ti’s (t(ni1)) and Ui’s (χ(ni1)2) are independent (1ik), and (x̄i,si) are the observed values of (X̄i,Si),(1ik). For the observed data (i.e., (x̄i,si) of (1.1)), one can generate TKL a large number of times by replicating T=(Ti,1ik) and U=(Ui,1ik) from the known distributions. Let TKL,p be the (100p)th percentile point of the above mentioned generated values of TKL. Then, an approximate (1α) level two sided confidence interval for μ is found as IKL=(TKL,(α/2),TKL,1(α/2)). If μ0IKL, then accept H0; and reject H0 otherwise. Alternatively, find p1 (p2) as the proportion of times TKL is less (more) than μ0. Then the p-value of the test is: p-value =2min{p1,p2}.

Krishnamoorthy and Lu (2003) presented some simulation results comparing five test procedures discussed earlier (Zhou and Mathew’s (1993) two tests, Fisher’s test, Mathew et al.’s (1993) test, and their test based on the pivot TKL given above) for small sample sizes. Again, the results showed no clear-cut winner.

Lin and Lee (2005) used the same generalized p-value approach of Tsui and Weerahandi (1989), but with a different pivot given as TLL={(i=1knix̄iUi/si)Zi=1k(niUi/si)}/i=1k(niUi/si) where ZN(0,1) and is independent of Ui’s (χ(ni1)2), as given after (1.9). For the observed data (x̄i,si,1ik), the confidence interval and the subsequent testing for μ can be carried out by generating a large number of values of TLL, similar to what mentioned earlier with TKL.

Lin and Lee (2005) compared their generalized p-value test with a few others, including the one suggested by Krishnamoorthy and Lu (2003), in terms of power (without mentioning the exact size of their own test). However, it is not clear how closely the size of Lin and Lee’s (2005) test follow the nominal level. We will point this out in our comprehensive simulation study.

In this paper we propose three simple tests which, in our opinion, are natural contenders to those mentioned above. We also include the asymptotic likelihood ratio test (LRT) as a benchmark. Thus, the tests considered here are:

(a) The likelihood ratio test (LRT) which works fairly well for sample sizes greater or equal to 25;

(b) A studentized version of the GDE using the first order unbiased variance estimator proposed by Sinha (1985) (or Meier (1953), since there is hardly much difference between these two first order expressions);

(c) A studentized version of the GDE using the exact unbiased variance estimator proposed by Nikulin and Voinov (1995) (after correcting a mistake in their final expression); and

(d) Using the MLE as a test statistic with the help of a parametric bootstrap method.

The rest of the paper is organized as follows. In Section 2 we discuss the above three test methods in detail–especially the newly developed method using the MLE. In Section 3 we carry out a comprehensive simulation study to compare these four tests with the others discussed earlier. It is seen that the above three tests, the two based on the GDE and the other based on the MLE, exhibit good size and power behavior.

Section snippets

The new test procedures

In this section we describe the three test procedures mentioned at the end of Section 1.

Power analysis

In this section we provide some of the numerical results from our comprehensive simulation study to compare the powers of the tests discussed in the earlier section.

Note that the tests discussed above are all invariant under the group G={(X̄i,Si)(aX̄i+b,a2Si),1ik|<a<,b>0} of transformations. As a result, the power functions depend on the parameters through δ=(μμ0),λi=(σi/σ1),2ik, and σ1; apart from the sample sizes. Any power comparison should take this into account, and makes any

Acknowledgements

We sincerely thank Professor Thomas Mathew, University of Maryland Baltimore County, for many helpful discussions and clarifications. We also thank the three anonymous referees and the Co-Editor for their time and patience to review our paper. The referees’ suggestions helped us in improving the presentation of this work.

References (28)

  • F.A. Graybill et al.

    Combining unbiased estimators

    Biometrics

    (1959)
  • G.W. Hill

    Student’s t-distribution

    Communications of the ACM

    (1970)
  • J.S.U. Hjorth

    Computer Intensive Statistical Methods: Validation, Model Selection and Bootstrap

    (1994)
  • P. Kempthorne et al.

    Research — how to do it: A panel discussion

    Statistical Science

    (1991)
  • Cited by (30)

    • The two-stage batch ordering strategy of logistics service capacity with demand update

      2015, Transportation Research Part E: Logistics and Transportation Review
    • Hypothesis testing on the common location parameter of several shifted exponential distributions: A note

      2013, Journal of the Korean Statistical Society
      Citation Excerpt :

      For the current situation, we will employ a parametric bootstrap method using the point estimators in (1.7) and (1.8). Such parametric bootstrap methods have come in handy in several complicated problems where exact sampling distributions are not easy to derive, and/or one is more concerned about small samples than asymptotics (see also Chang & Pal, 2008b). In Section 2 we develop several test procedures, including an asymptotic one which is similar in spirit to the usual asymptotic version of LRT.

    View all citing articles on Scopus
    View full text