Model-averaged confidence intervals for factorial experiments
Introduction
In many application areas, it is increasingly common to allow for model uncertainty when providing parameter estimates and confidence intervals. This extra level of uncertainty arises because we cannot be sure that a model-selection process will always lead to the same best model (Chatfield, 1995). Traditionally parameter estimation has been carried out using a single model, often after model selection, and is therefore conditional upon choice of that model. Model-averaging has been proposed as a means of allowing for some of the model uncertainty, in that it is conditional upon a set of models rather than a single best model (Buckland et al., 1997, Burnham and Anderson, 2002, Claeskens and Hjort, 2008). A model-averaged estimate of a parameter is a weighted mean of a set of single-model estimates for the parameter, where the weights are typically chosen using Akaike’s information criterion (AIC), Bayes’ information criterion (BIC), or using bootstrap methods (Buckland et al., 1997).
Methods for calculating a confidence interval around a model-averaged estimate have been considered by Buckland et al. (1997), Burnham and Anderson (2002), Claeskens and Hjort (2008) and Hjort and Claeskens (2003). Hjort and Claeskens (2003) assessed the asymptotic properties of some of these methods, but did not consider the coverage rates that might be achieved with real data. Lukacs et al. (2010) used simulation in the context of linear regression to show that the coverage rate for a model-averaged confidence interval for each parameter in the generating model was close to the nominal level, and was a substantial improvement over stepwise regression. Wheeler and Bailer (2009) used simulation to assess the coverage rates of model-averaged confidence intervals obtained using AIC and BIC in the context of dose–response relationships, and found that AIC performed marginally better. Chen et al. (2007) considered the use of model-averaging in the context of factorial ANOVA, but did not focus on coverage rates of confidence intervals.
The purpose of this paper is twofold. First, to propose that model-averaging become the default method for estimating treatment means in a factorial experiment. We illustrate the benefits of its use in this context via a simulation study. Second, in the simulation study we also compare four information criteria that might be used to perform model-averaging: three variations of AIC, and BIC. In Section 2 we define our notation and several methods for model-averaging. In Section 3 we illustrate the use of model-averaging by analysing the results from a 23 factorial experiment. In Sections 4 Simulations, 5 Results we describe and present the results from a simulation study of the coverage properties of confidence intervals for treatment means. We conclude with recommendations and ideas for further research in Section 6.
Section snippets
Notation and methods
Suppose we use a normal linear model to analyse data from a factorial experiment and we wish to estimate , the expected value of the response variable for a particular treatment combination. An analysis based on a single model involves use of the following formula to obtain a 95% confidence interval for where is the estimate of obtained from the model and is the 97.5th percentile of the -distribution with degrees of freedom equal to the error degrees of freedom for that
Example
We illustrate the use of model-averaging when analysing the results from a factorial experiment with data from a completely randomised design involving three factors, each at two levels, taken from Mead (1990, p. 39). Eight frogs and eight toads were kept in either moist or dry conditions and half were then injected with a water-balance hormone. The response variable was the percent increase in weight after immersion in water for two hours, with the factors being species (frog or toad),
Simulations
We simulated data from the following model for a 23 study involving replicates: where is the overall effect, are the main effects, are the two-way interactions, is the three-way interaction, and is the error term, with . The value of can have no impact on the coverage rates and widths of confidence intervals, so we arbitrarily set . In addition, for simplicity we
Results
Use of Eqs. (3), (4) led to very similar coverage rates, the differences being small relative to those between both the criteria and the scenarios. Overall, for all three variations of AIC, the two equations performed equally well. For BIC, use of Eq. (3) always provided slightly better coverage than Eq. (4). For simplicity of presentation, we therefore focus attention on the results for Eq. (3).
Table 4 shows the error in the coverage rate (difference between the mean coverage rate and 0.95)
Discussion
Analysis of a factorial experiment is a natural setting in which to use model-averaging, when we are interested in estimating the treatment means. Note that it is not appropriate to use model-averaging for estimating main effects and interactions, as interpretation of these changes with the model fitted (Davison, 2003, p. 470). Our results suggest that model-averaged confidence intervals can have coverage rates close to the nominal level and be narrower than those from the full model. As
Acknowledgments
We are grateful to David Anderson and Ken Burnham for commenting on a draft of the paper, as well as to the reviewers for their helpful comments.
References (20)
- et al.
Model combining in factorial data analysis
Journal of Statistical Planning and Inference
(2007) - et al.
Model selection: an integral part of inference
Biometrics
(1997) - et al.
Model Selection and Multimodel Inference: A Practical Information-theoretic Approach
(2002) - et al.
Multimodel inference: understanding AIC and BIC in model selection
Sociological Methods and Research
(2004) Model uncertainty, data mining and statistical inference
Journal of the Royal Statistical Society, Series A
(1995)- et al.
Model Selection and Model Averaging
(2008) Statistical models
(2003)- et al.
Frequentist model average estimators
Journal of the American Statistical Association
(2003) - et al.
The impact of model selection on inference in linear regression
The American Statistician
(1990) - et al.
Regression and time series model selection in small samples
Biometrika
(1989)
Cited by (22)
Finite sample properties of confidence intervals centered on a model averaged estimator
2020, Journal of Statistical Planning and InferenceCitation Excerpt :The data-based model weights were constructed by exponentiating an information criterion, such as the Akaike Information Criterion (AIC), see Buckland et al. (1997, pp. 605–606). This kind of model weighting has been adopted in much of the later literature (Fletcher and Dillingham, 2011; Fletcher and Turek, 2011). We examine confidence intervals centered on this model averaged estimator.
Model selection and model averaging after multiple imputation
2014, Computational Statistics and Data AnalysisCitation Excerpt :Recent work of Wang et al. (2012) and Wang and Zhou (forthcoming) shows that under a fair amount of models the confidence intervals suggested by Hjort and Claeskens (2003) are asymptotically equivalent to the intervals obtained from the full model indicating limited use of model averaging. While it is still been pointed out that even symmetric confidence intervals can perform well in many situations (Fletcher and Dillingham, 2011), more and more value is seen in the evaluation and modification of interval estimation (Turek and Fletcher, 2012). Given the relevance and timeliness of these discussions we find it desirable to devote some investigations to interval estimation for our estimators: In light of the additional complication introduced by missing data and the implementation of multiple imputation, it is especially useful to address these and other important questions by means of Monte Carlo studies and a motivating data example.
Model-averaged Wald confidence intervals
2012, Computational Statistics and Data AnalysisCitation Excerpt :Our findings agree with those of Fletcher and Dillingham (2011) for the current model-averaged Wald intervals.
Efficient analysis of split-plot experimental designs using model averaging
2023, Journal of Quality TechnologySDG 14: Life below water: A machine-generated overview of recent literature
2022, SDG 14: Life Below Water: A Machine-Generated Overview of Recent LiteratureFrequentist Model Averaging in Structure Equation Model With Ordinal Data
2022, Psychometrika
- 1
Present address: George Perkins Marsh Institute, Clark University, 950 Main Street, Worcester, MA 01610-1477, USA.