doi:10.1016/j.csda.2005.11.011
Published by Elsevier B.V.
Interval estimation in a finite mixture model: Modeling P-values in multiple testing applications
Qinfang Xianga, Jode Edwardsb and Gary L. Gadburya,
, 
aDepartment of Mathematics and Statistics, University of Missouri – Rolla, Rolla, MO 65409, USA
bUSDA ARS, Department of Agronomy, Iowa State University, Ames, IA, USA
Received 20 February 2005;
revised 6 September 2005;
accepted 16 November 2005.
Available online 9 December 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
The performance of interval estimates in a uniform-beta mixture model is evaluated using three computational strategies. Such a model has found use when modeling a distribution of P-values from multiple testing applications. The number of P-values and the closeness of a parameter to the boundary of its space both play a role in the precision of parameter estimates as does the “nearness” of the beta-distribution component to the uniform distribution. Three computational strategies are compared for computing interval estimates with each one having advantages and disadvantages for cases considered here.
Keywords: Bootstrap; Gene expression; Hessian; Interval estimation; MCMC; Microarray; MLE; Uniform beta mixture
Fig. 1. Interval estimation performance—percentage coverage and average length for 95% interval estimate for λ among 1000 simulations when k=10,000. The coverage and average length of the intervals for λ are plotted against the true value of λ and against the true value of s. The plot is staggered about the true values to improve visibility of the three computational methods. The Hessian method is shown with circles, the bootstrap by triangles, and the MCMC method using inverted triangles. The dashed line on the top row represents nominal coverage of 95%.
Fig. 2. Same as for Fig. 1 except k=1000.
Fig. 3. Same as for Fig. 1 except the performance of intervals for TP evaluated at the threshold τ=0.001 are plotted against the true values of λ and of s, and k=10,000, as for Fig. 1.
Fig. 4. Same as for Fig. 3 except k=1000.
Fig. 5. Three-dimensional plots obtained from MCMC output for two cases: (a) the case θ=(0.5,0.6,8) and k=10,000, (b) the case θ=(0.9,0.8,2) and k=500. The vertical axis, z, is the log-likelihood plotted against λ and s.
Fig. 6. Fitted mixture model (solid line curve) to 12,625 p-values obtained from the example data set 1. The dashed line is a uniform density function.
Fig. 7. Column 1 are the simulated MCMC posterior distributions (15,000 values) of parameters and column 2 are the distribution of 100 “bootstrapped” MLEs for data set 2. All plots are relative frequency histograms.
Table 1.
Nine selected cases showing percent Coverage (average length), and number of misses at the lower (low), upper (up) bounds of 95% interval estimates from 1000 simulations for three methods

Table 2.
As for Table 1, the same nine cases showing percent coverage (average length), number of misses at the lower, upper bounds of 95% confidence interval estimates for true positive probability TP over 1000 simulations for the three methods. TP0001, TP001, and TP01 represent TP evaluated at thresholds τ=0.0001,0.001,0.01, respectively

The cases are in the same order as those for Table 1. The first three rows for each section (i.e., Hessian, Bootstrap, and MCMC) are k=10,000 for strong, medium, and weak signals. Similarly, the next three rows are for k=1000, and the last three for k=500.
Table 3.
Parameter θ=(λ,r,s) estimates and their 95% interval estimates for two example data sets using the Hessian, bootstrap, and MCMC methods

The estimates from Hessian and bootstrap methods are MLEs, but the estimates from MCMC are means of posterior distributions after discarding the first 5000 iterations.