Original ArticleAssessment of vibration of effects due to model specification can demonstrate the instability of observational associations
Introduction
Observational associations between variables do not guarantee causality, and they are often complex and influenced by other variables (confounders and effect modifiers). Accounting for covariates is typically achieved through statistical modeling, such as multivariate regression. However, what variables should one choose to account for in complex multivariate phenomena where many variables may be confounded or correlated [1]? Model specification can be a major issue in diverse fields, including epidemiology [2], economics [3], [4], [5], and psychological science and neurosciences [6]. Thousands of associations are published, and many are often challenged and refuted by subsequent investigations [7], [8], [9]. Choices of models underlie our assumptions about association and about potential causes and effect [10]. Very often there is large uncertainty about what variables should be modeled and how they are related. Consequently, there is large heterogeneity in how investigators associate variables [2].
In discovery-based research in large data sets, there is often no prior evidence or biological plausibility on what adjustment variables to include in statistical models. In other cases, unequivocal evidence and plausibility may exist to include some adjustment variables in the model, lack of consensus on some others, and no available guidance on yet another set of adjustment variables. Interpretation of effects may vary depending on the analytical choices made. A way to compute the extent of instability of the results due to model specification is needed to guide inference.
The “vibration of effects” (VoE) [2] describes the extent to which an estimated association changes under multiple distinct analytical modeling approaches. The VoE is related also to the previously described concept of “multiple modeling” [9] or statistical model-induced variability [11]. To estimate the VoE empirically, we can compute the distribution of the point estimates of measures of association (e.g., relative risks, odds ratios) and P-values that are possible under different analytical scenarios. The VoE measures how susceptible an association is under different modeling scenarios; the larger the VoE, the greater the instability of the results. One may also explore which specific scenarios most influence the estimated association. Here, we describe a framework to systematically evaluate the VoE for a set of adjustment covariates.
Section snippets
Example of a controversial association
As an introductory example, we use the VoE framework to evaluate a contentious association between vitamin E (α-tocopherol) and mortality. Early publications of observational studies claimed large reductions in disease-related and mortality-related events in association with vitamin E [12], [13]. However, clinical trials that followed were not able to support the early observational findings [14], [15], [16], [17]. Furthermore, meta-analyses of clinical trials have showed nearly the opposite of
Data source: NHANES 1999–2000, 2001–2002, and 2003–2004
We downloaded National Health and Nutrition Examination Survey (NHANES) examination, laboratory, questionnaire, and National Death Index (NDI) linked mortality data for 1999–2000, 2001–2002, and 2003–2004 surveys. Mortality information was collected from the date of the survey participation through December 31, 2006, and ascertained via a probabilistic match between NHANES and NDI death certificate information. The NDI matches individuals on personal and demographic criteria, such as social
Estimating the VoE
VoE is estimated by computing the hazard ratio (HR) and P-value for a variable of interest while adjusting for all possible combinations of adjustments from a finite set of adjustment variables. Our algorithm for computing the VoE for a variable x (e.g., serum vitamin D) is shown in Fig. 1.
First, we downloaded 417 self-reported, clinical, and molecular measures with linked all-cause mortality information in participants from NHANES 1999–2004 (Fig. 1A). Mortality information was collected from
Discussion
Almost all reported findings in observational quantitative research to date in fields such as epidemiology consider only a single or a few modeling scenarios. It is often not clear whether this or these model(s) was/were selected a priori. It is often suspected that selective reporting abounds, that is, several models are tested and only those with the most impressive results are presented with particular attraction for nominally significant results [39]. There are ongoing efforts to enhance
Acknowledgments
The authors thank Profs. Andrew Gelman and Bin Yu for their comments. All data, software code, and additional figures can be found at the following website: http://chiragjpgroup.org/voe.
Author Contributions: C.J.P. and B.B. wrote the software code to conduct the VoE analysis. C.J.P., B.B., and J.P.A.I. came up with the idea and wrote/edited the manuscript.
References (58)
- et al.
Effects of vitamins C and E and beta-carotene on the risk of type 2 diabetes in women at high risk of cardiovascular disease: a randomized controlled trial
Am J Clin Nutr
(2009) - et al.
Use of antioxidant vitamins for the prevention of cardiovascular disease: meta-analysis of randomised trials
Lancet
(2003) - et al.
Higher baseline serum concentrations of vitamin E are associated with lower total and cause-specific mortality in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study
Am J Clin Nutr
(2006) - et al.
The USDA Automated Multiple-Pass Method accurately estimates group total energy and nutrient intake
J Nutr
(2006) - et al.
Effects of leisure and non-leisure physical activity on mortality in U.S. adults over two decades
Ann Epidemiol
(2008) - et al.
The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies
Lancet
(2007) - et al.
Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology
PLos Med
(2007) Why most discovered true associations are inflated
Epidemiology
(2008)I just ran two million regressions
Am Econ Rev
(1997)Sensitivity analyses would help
Am Econ Rev
(1985)
Reporting the fragility of regression estimates
Rev Econ Stat
Challenges and opportunities in mining neuroscience data
Science
False-positive results in cancer epidemiology: a plea for epistemological modesty
J Natl Cancer Inst
Voodoo correlations are everywhere—not only in neuroscience
Perspect Psychol Sci
Deming, data and observational studies
Significance
Causality: models, reasoning and inference
Association of bisphenol A with diabetes and other abnormalities
J Am Med Assoc
Vitamin E and cardiovascular disease: observational studies
Ann N Y Acad Sci
Vitamin E: the evidence for multiple roles in cancer
Nutr Cancer
MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20,536 high-risk individuals: a randomised placebo-controlled trial
Lancet
Vitamin E supplementation and cardiovascular events in high-risk patients. The Heart Outcomes Prevention Evaluation Study Investigators
N Engl J Med
Incidence of cancer and mortality following alpha-tocopherol and beta-carotene supplementation: a postintervention follow-up
JAMA
Meta-analysis: high-dosage vitamin E supplementation may increase all-cause mortality
Ann Intern Med
Identifying a national death index match
Am J Epidemiol
Compendium of physical activities: an update of activity codes and MET intensities
Med Sci Sports Exerc
Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels
Int J Epidemiol
2008 Physical Activity Guidelines for Americans
Cited by (167)
Grilling the data: application of specification curve analysis to red meat and all-cause mortality
2024, Journal of Clinical EpidemiologyBeyond single paradigms, pipelines, and outcomes: Embracing multiverse analyses in psychophysiology
2024, International Journal of PsychophysiologyA methodological review of population-adjusted indirect comparisons reveals inconsistent reporting and suggests publication bias
2023, Journal of Clinical EpidemiologyPublished registry-based pharmacoepidemiologic associations show limited concordance with agnostic medication-wide analyses
2023, Journal of Clinical Epidemiology
Funding: This work was supported by a National Institute of Environmental Health Sciences grant K99 ES023504 and R21 ES0250252 and a PhRMA foundation award to C.J.P.
Conflicts of interest: The authors declare no competing interests.