Original Article
Optimal type I and type II error pairs when the available sample size is fixed

https://doi.org/10.1016/j.jclinepi.2013.03.002Get rights and content

Abstract

Objective

To model how to select the optimal pair of type I and type II errors that maximize study value when there are constrains on the available study sample size.

Study Design and Setting

Correct inferences [true positives (TPs) and true negatives (TNs)] increase and wrong inferences (false positives and false negatives) decrease the value of a study. We model the composite value of a study based on these four inferences, their relative importance, and relative frequency using multiplicative and additive models. Numerical examples are presented for randomized trials, epidemiologic studies, and agnostic omics investigations with massive testing and variable sample size constraints.

Results

The optimal choice of type I and type II errors varies a lot according to the available sample size and the plausible effect sizes in each field. We show how equations can be streamlined for special applications: when the value of all four inferences is considered equal, when the identification of TNs carries no value, and when a study carries no value unless at least one TP is discovered.

Conclusion

The proposed optimization equations can be used to guide the selection of the optimal type I and type II errors of future studies in which sample size is constrained.

Introduction

Tradition and convenience have driven the selection of type I and type II error pairs to date. In much of clinical research, including randomized trials, values of α = 0.05 and β = 0.2 are standard [1], [2]. However, in some studies, the sample size needed for such acceptable type I and type II errors is impossible to obtain for various reasons, including limited availability of participants, samples, or resources, and/or high cost. For example, this situation may arise in trials of rare diseases or when treatment effects are very small but still clinically relevant to document (e.g., for mortality). In many epidemiologic studies also, there are often constraints on the number of participants available because the sample size is already fixed (e.g., case–control samples or cohorts enrolled in the past) or measurements are expensive and budgets are fixed. Finally, in fields with massive testing, much lower α is used [3], [4], [5] trying to account for multiple comparisons. When the risk effects are small, reaching optimal power (e.g., β = 0.2) would require impossibly large sample sizes that cannot be attained, if one insists on extremely low α (e.g., 10–8). In all these situations, the challenge is how to rationally select an optimal pair of type I and type II errors in designing a study, in which the available sample size is fixed and cannot reasonably exceed a certain number of participants.

The optimal type I and type II errors when the sample size is constrained should maximize the chances of the study making correct inferences and minimize the chances of wrong inferences [6]. Correct inferences include correctly identifying a true nonnull effect [true positive (TP)] and correctly claiming that an effect is null [true negative (TN)]. Wrong inferences include claiming that a nonnull effect exists while it does not [false positive (FP)] and claiming that an effect is null while it is nonnull [false negative (FN)]. Value is gained in making correct inferences and lost in making wrong inferences. Here, we explore models that optimize the composite value of correct and wrong inferences.

Section snippets

Modeling background

Two types of models can be considered. In multiplicative models, the importance of the research study is the product of the values of TP and TN inferences divided by the product of the values of FP and FN inferences. One wishes to maximize this ratio. In additive models, one wishes to maximize the net difference between the sum of the values gained from correct inferences and the sum of the values lost from wrong inferences. Here, we summarize the main formulas. Details appear in the Appendix

Discussion

Type I and type II errors are inevitable and inextricably linked: as FP decreases, FN increases. Consequences of FP, FN, TP, and TN may vary across types of research and outcomes [12], [13], [14], [15]. Fixed traditional error rates (e.g., α = 0.05 and β = 0.20) cannot capture the breadth of desirable inferences in different research settings and may be impossible because of sample constraints. We have developed models that optimize the selection of type I and type II errors in a research study

Acknowledgments

The authors are grateful to Robert Tibshirani for insightful comments.

References (31)

  • M.J. Khoury et al.

    On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies

    Int J Epidemiol

    (2007)
  • G.C. Siontis et al.

    Risk factors and interventions with statistically significant tiny effects

    Int J Epidemiol

    (2011)
  • J.P. Ioannidis

    Excess significance bias in the literature on brain volume abnormalities

    Arch Gen Psychiatry

    (2011)
  • J.A. Sterne et al.

    Sifting the evidence—what's wrong with significance tests?

    BMJ

    (2001)
  • L. Held

    A nomogram for P values

    BMC Med Res Methodol

    (2010)
  • Cited by (0)

    View full text