Original ArticleOptimal type I and type II error pairs when the available sample size is fixed
Introduction
Tradition and convenience have driven the selection of type I and type II error pairs to date. In much of clinical research, including randomized trials, values of α = 0.05 and β = 0.2 are standard [1], [2]. However, in some studies, the sample size needed for such acceptable type I and type II errors is impossible to obtain for various reasons, including limited availability of participants, samples, or resources, and/or high cost. For example, this situation may arise in trials of rare diseases or when treatment effects are very small but still clinically relevant to document (e.g., for mortality). In many epidemiologic studies also, there are often constraints on the number of participants available because the sample size is already fixed (e.g., case–control samples or cohorts enrolled in the past) or measurements are expensive and budgets are fixed. Finally, in fields with massive testing, much lower α is used [3], [4], [5] trying to account for multiple comparisons. When the risk effects are small, reaching optimal power (e.g., β = 0.2) would require impossibly large sample sizes that cannot be attained, if one insists on extremely low α (e.g., 10–8). In all these situations, the challenge is how to rationally select an optimal pair of type I and type II errors in designing a study, in which the available sample size is fixed and cannot reasonably exceed a certain number of participants.
The optimal type I and type II errors when the sample size is constrained should maximize the chances of the study making correct inferences and minimize the chances of wrong inferences [6]. Correct inferences include correctly identifying a true nonnull effect [true positive (TP)] and correctly claiming that an effect is null [true negative (TN)]. Wrong inferences include claiming that a nonnull effect exists while it does not [false positive (FP)] and claiming that an effect is null while it is nonnull [false negative (FN)]. Value is gained in making correct inferences and lost in making wrong inferences. Here, we explore models that optimize the composite value of correct and wrong inferences.
Section snippets
Modeling background
Two types of models can be considered. In multiplicative models, the importance of the research study is the product of the values of TP and TN inferences divided by the product of the values of FP and FN inferences. One wishes to maximize this ratio. In additive models, one wishes to maximize the net difference between the sum of the values gained from correct inferences and the sum of the values lost from wrong inferences. Here, we summarize the main formulas. Details appear in the Appendix
Discussion
Type I and type II errors are inevitable and inextricably linked: as FP decreases, FN increases. Consequences of FP, FN, TP, and TN may vary across types of research and outcomes [12], [13], [14], [15]. Fixed traditional error rates (e.g., α = 0.05 and β = 0.20) cannot capture the breadth of desirable inferences in different research settings and may be impossible because of sample constraints. We have developed models that optimize the selection of type I and type II errors in a research study
Acknowledgments
The authors are grateful to Robert Tibshirani for insightful comments.
References (31)
- et al.
MicroRNAs in the pathogenesis of cancer
Semin Oncol
(2011) - et al.
Inadequate statistical power to detect clinically significant differences in adverse event rates in randomized controlled trials
J Clin Epidemiol
(2009) - et al.
Optimism bias leads to inconclusive results—an empirical study
J Clin Epidemiol
(2011) Clinical trials: a methodologic perspective
(2005)Clinical trials: design, conduct, and analysis
(1986)- et al.
Statistical significance for genomewide studies
Proc Natl Acad Sci U S A
(2003) - et al.
Genome-wide significance for dense SNP and resequencing data
Genet Epidemiol
(2008) - et al.
Replication in genome-wide association studies
Stat Sci
(2009) Why most published research findings are false
PLoS Med
(2005)- et al.
Size matters: just how big is BIG?: quantifying realistic sample size requirements for human genome epidemiology
Int J Epidemiol
(2009)