Model-averaged confidence intervals for factorial experiments

https://doi.org/10.1016/j.csda.2011.05.014Get rights and content

Abstract

We consider the coverage rate of model-averaged confidence intervals for the treatment means in a factorial experiment, when we use a normal linear model in the analysis. Model-averaging provides a useful compromise between using the full model (containing all main effects and interactions) and a “best model” obtained by some model-selection process. Use of the full model guarantees perfect coverage, whereas use of a best model is known to lead to narrow intervals with poor coverage. Model-averaging allows us to achieve good coverage using intervals that are also narrower than those from the full model. We compare four information criteria that might be used for model-averaging in this setting: AIC, AICc, AICc and BIC. In this setting, if the full model is “truth”, all the criteria will have perfect coverage rates asymptotically. We use simulation to assess the coverage rates and interval widths likely to be achieved by a confidence interval with a nominal coverage of 95%. Our results suggest that AIC performs best in terms of coverage rate; across a wide range of scenarios and replication levels, it consistently provides coverage rates within 1.5% points of the nominal level, while also leading to reductions in interval-width of up to 30%, compared to the full model. AICc performed worst overall, with a coverage rate that was up to 5.2% points too low. We recommend that model-averaging become standard practise when summarising the results of a factorial experiment in terms of the treatment means, and that AIC be used to perform the model-averaging.

Introduction

In many application areas, it is increasingly common to allow for model uncertainty when providing parameter estimates and confidence intervals. This extra level of uncertainty arises because we cannot be sure that a model-selection process will always lead to the same best model (Chatfield, 1995). Traditionally parameter estimation has been carried out using a single model, often after model selection, and is therefore conditional upon choice of that model. Model-averaging has been proposed as a means of allowing for some of the model uncertainty, in that it is conditional upon a set of models rather than a single best model (Buckland et al., 1997, Burnham and Anderson, 2002, Claeskens and Hjort, 2008). A model-averaged estimate of a parameter is a weighted mean of a set of single-model estimates for the parameter, where the weights are typically chosen using Akaike’s information criterion (AIC), Bayes’ information criterion (BIC), or using bootstrap methods (Buckland et al., 1997).

Methods for calculating a confidence interval around a model-averaged estimate have been considered by Buckland et al. (1997), Burnham and Anderson (2002), Claeskens and Hjort (2008) and Hjort and Claeskens (2003). Hjort and Claeskens (2003) assessed the asymptotic properties of some of these methods, but did not consider the coverage rates that might be achieved with real data. Lukacs et al. (2010) used simulation in the context of linear regression to show that the coverage rate for a model-averaged confidence interval for each parameter in the generating model was close to the nominal level, and was a substantial improvement over stepwise regression. Wheeler and Bailer (2009) used simulation to assess the coverage rates of model-averaged confidence intervals obtained using AIC and BIC in the context of dose–response relationships, and found that AIC performed marginally better. Chen et al. (2007) considered the use of model-averaging in the context of factorial ANOVA, but did not focus on coverage rates of confidence intervals.

The purpose of this paper is twofold. First, to propose that model-averaging become the default method for estimating treatment means in a factorial experiment. We illustrate the benefits of its use in this context via a simulation study. Second, in the simulation study we also compare four information criteria that might be used to perform model-averaging: three variations of AIC, and BIC. In Section 2 we define our notation and several methods for model-averaging. In Section 3 we illustrate the use of model-averaging by analysing the results from a 23 factorial experiment. In Sections 4 Simulations, 5 Results we describe and present the results from a simulation study of the coverage properties of confidence intervals for treatment means. We conclude with recommendations and ideas for further research in Section 6.

Section snippets

Notation and methods

Suppose we use a normal linear model to analyse data from a factorial experiment and we wish to estimate θ, the expected value of the response variable Y for a particular treatment combination. An analysis based on a single model involves use of the following formula to obtain a 95% confidence interval for θθˆ±tVˆ(θˆ) where θˆ is the estimate of θ obtained from the model and t is the 97.5th percentile of the t-distribution with degrees of freedom equal to the error degrees of freedom for that

Example

We illustrate the use of model-averaging when analysing the results from a factorial experiment with data from a completely randomised design involving three factors, each at two levels, taken from Mead (1990, p. 39). Eight frogs and eight toads were kept in either moist or dry conditions and half were then injected with a water-balance hormone. The response variable was the percent increase in weight after immersion in water for two hours, with the factors being species (frog or toad),

Simulations

We simulated data from the following model for a 23 study involving r replicates: Yijkl=μ+αi+βj+γk+αβij+αγik+βγjk+αβγijk+εijkl where μ is the overall effect, {αi,βj,γk} are the main effects, {αβij,αγik,βγjk} are the two-way interactions, αβγijk is the three-way interaction, and εijkl is the error term, with V(εijkl)=σ2(i=1,2;j=1,2;k=1,2;l=1,,r). The value of μ can have no impact on the coverage rates and widths of confidence intervals, so we arbitrarily set μ=0. In addition, for simplicity we

Results

Use of Eqs. (3), (4) led to very similar coverage rates, the differences being small relative to those between both the criteria and the scenarios. Overall, for all three variations of AIC, the two equations performed equally well. For BIC, use of Eq. (3) always provided slightly better coverage than Eq. (4). For simplicity of presentation, we therefore focus attention on the results for Eq. (3).

Table 4 shows the error in the coverage rate (difference between the mean coverage rate and 0.95)

Discussion

Analysis of a factorial experiment is a natural setting in which to use model-averaging, when we are interested in estimating the treatment means. Note that it is not appropriate to use model-averaging for estimating main effects and interactions, as interpretation of these changes with the model fitted (Davison, 2003, p. 470). Our results suggest that model-averaged confidence intervals can have coverage rates close to the nominal level and be narrower than those from the full model. As

Acknowledgments

We are grateful to David Anderson and Ken Burnham for commenting on a draft of the paper, as well as to the reviewers for their helpful comments.

References (20)

  • L. Chen et al.

    Model combining in factorial data analysis

    Journal of Statistical Planning and Inference

    (2007)
  • S.T. Buckland et al.

    Model selection: an integral part of inference

    Biometrics

    (1997)
  • K.P. Burnham et al.

    Model Selection and Multimodel Inference: A Practical Information-theoretic Approach

    (2002)
  • K.P. Burnham et al.

    Multimodel inference: understanding AIC and BIC in model selection

    Sociological Methods and Research

    (2004)
  • C. Chatfield

    Model uncertainty, data mining and statistical inference

    Journal of the Royal Statistical Society, Series A

    (1995)
  • G. Claeskens et al.

    Model Selection and Model Averaging

    (2008)
  • A.C. Davison

    Statistical models

    (2003)
  • N.L. Hjort et al.

    Frequentist model average estimators

    Journal of the American Statistical Association

    (2003)
  • C.M. Hurvich et al.

    The impact of model selection on inference in linear regression

    The American Statistician

    (1990)
  • C.M. Hurvich et al.

    Regression and time series model selection in small samples

    Biometrika

    (1989)
There are more references available in the full text version of this article.

Cited by (22)

  • Finite sample properties of confidence intervals centered on a model averaged estimator

    2020, Journal of Statistical Planning and Inference
    Citation Excerpt :

    The data-based model weights were constructed by exponentiating an information criterion, such as the Akaike Information Criterion (AIC), see Buckland et al. (1997, pp. 605–606). This kind of model weighting has been adopted in much of the later literature (Fletcher and Dillingham, 2011; Fletcher and Turek, 2011). We examine confidence intervals centered on this model averaged estimator.

  • Model selection and model averaging after multiple imputation

    2014, Computational Statistics and Data Analysis
    Citation Excerpt :

    Recent work of Wang et al. (2012) and Wang and Zhou (forthcoming) shows that under a fair amount of models the confidence intervals suggested by Hjort and Claeskens (2003) are asymptotically equivalent to the intervals obtained from the full model indicating limited use of model averaging. While it is still been pointed out that even symmetric confidence intervals can perform well in many situations (Fletcher and Dillingham, 2011), more and more value is seen in the evaluation and modification of interval estimation (Turek and Fletcher, 2012). Given the relevance and timeliness of these discussions we find it desirable to devote some investigations to interval estimation for our estimators: In light of the additional complication introduced by missing data and the implementation of multiple imputation, it is especially useful to address these and other important questions by means of Monte Carlo studies and a motivating data example.

  • Model-averaged Wald confidence intervals

    2012, Computational Statistics and Data Analysis
    Citation Excerpt :

    Our findings agree with those of Fletcher and Dillingham (2011) for the current model-averaged Wald intervals.

  • SDG 14: Life below water: A machine-generated overview of recent literature

    2022, SDG 14: Life Below Water: A Machine-Generated Overview of Recent Literature
View all citing articles on Scopus
1

Present address: George Perkins Marsh Institute, Clark University, 950 Main Street, Worcester, MA 01610-1477, USA.

View full text