The integration of principles and procedures underlying population genetics and epidemiology provides a potential research framework to make causal inferences about the association between a risk factor and disease [1, 2]. In an epidemiological context, numerous biomarkers have been linked to the development of type 2 diabetes. In this issue of Diabetologia, Herder and colleagues assess the prospective association between circulating levels of macrophage migration inhibitory factor and risk of type 2 diabetes [3]. Their results provide suggestive statistical evidence for a sex-specific association between this biomarker and risk of type 2 diabetes. This observation is consistent with the current paradigm that inflammation plays a role in the pathogenesis of insulin resistance and type 2 diabetes [4].

The association might be causal. Alternatively, given the observational nature of epidemiological research, it may also be completely explained by confounding. In this scenario, the relationship is explained by a third factor that is associated with the biomarker and disease risk. That is, the biomarker is not independently or causally linked to disease risk. Epidemiological approaches generally use multivariable statistical analysis to reduce confounding. However, because of residual confounding (as a result of measurement error) and unknown confounders, standard observational epidemiology cannot resolve whether an observed association has a causal basis. The conceptual ambiguity of any disease mechanism (specifically, defining mediators and confounders) also limits statistical modelling to reduce confounding. Furthermore, the observed association may be due to reverse association/causation—that is, the relationship between biomarker and disease could be the result of undiagnosed or early disease rather than the risk factor, and thus a consequence of disease rather than a cause.

‘Mendelian randomisation’—the random assortment of genes from parents to offspring that occurs during gamete formation and conception [1]—provides an epidemiological approach that is much less susceptible to confounding by classical or environmental risk factors and excludes reverse causality as a possible non-causal explanation for the observed association between a biomarker and disease risk [5]. In this research framework, the inter-relationship and consistency of associations among genetic variants that encode or regulate a biomarker, the expression or circulating levels of the biomarker, and disease risk may characterise the true magnitude of the relationship between the biomarker and risk of disease (Fig. 1) [57]. Reproducible evidence for a statistical association between a genetic variant and biomarker provides the basis for a Mendelian randomisation study. Because of independent assortment [1], characteristics that may confound any association between a biomarker and disease are equally distributed among the relevant genetic variants. Thus, a comparison of groups of individuals defined by a genetic variant, based on a Mendelian randomisation design, is equivalent to a randomised comparison, with only the relevant biomarker differing across the relevant genetic variant [8, 9]. Examining the relationship between variants in the genome that show unequivocal associations with the relevant biomarker and disease risk is therefore a potential method of assessing whether a biomarker might be causally linked to disease. In this context, Herder et al. [3], based on specificity and directional consistency of associations among genetic variants, biomarker and disease risk, provide some evidence for a causal link between circulating levels of macrophage migration inhibitory factor and risk of type 2 diabetes. However, much more robust statistical evidence is required to make causal inferences using this approach [6, 1012].

Fig. 1
figure 1

Mendelian triangulation: strategy to help clarify whether there is a causal relation between a biomarker and disease risk. Information on the magnitude of the association between the biomarker and disease (a), combined with information on the magnitude of the association between the genetic variant and the biomarker (b) is used to estimate the expected magnitude of the association between the genetic variant and disease risk (dashed line in c). Based on several assumptions (see text), including a linear association between the biomarker and disease, the direct assessment of c (indicated by the solid line) provides an unconfounded assessment of the association between the biomarker and disease risk

Mendelian randomisation studies, like other genetic epidemiological studies, require reliable identification of statistical associations between genetic variants and biomarkers and disease risk [12]. Importantly, the principal requirement is the ability of Mendelian randomisation studies to detect an association of equivalent magnitude to that predicted by a proportional change in the relevant trait or risk factor. Given that individual genetic variants are likely to explain only a very small proportion of the variation in a biomarker trait or risk factor, these assessments will require very large sample sizes, which are probably beyond the scale of existing studies or collaborations in type 2 diabetes genetics [13]. This statistical limitation is compounded by random measurement error in the assessment of the magnitude of the associations among the genetic variant, biomarker and disease risk, which can lead to attenuated association signals. To help overcome issues of statistical resolution, investigators have opted to use meta-analysis [8]. However, using this approach for Mendelian randomisation triangulation has several limitations, which are generic to this strategy [14]. Heterogeneity among studies (resulting from, for example, potential gene–gene or gene–environment interactions) may distort assessments of the magnitude of the associations [6]. This potential limitation may be exacerbated when different studies are used for data aggregation for each side of the Mendelian randomisation triangle (Fig. 1).

It is therefore likely that international collaborative frameworks for Mendelian randomisation studies of type 2 diabetes will be needed to achieve the required level of statistical precision. For example, reproducible statistical associations between genetic variants and risk of type 2 diabetes show an effect size of around 10–30% increased risk per allele [13]. Under a log-additive model, around 15,000 cases and 15,000 controls are required to detect a per allele risk of 10% (with an α-level of 1 × 10−4 and 90% power) for a relatively common genetic variant (20% allele frequency). The association between genetic variants and type 2 diabetes risk may be even smaller for those variants which also show reproducible associations with relevant biomarkers [15]. Thus, it is likely that Mendelian randomisation studies will need much greater statistical resolution. To facilitate this requirement, improved statistical approaches for meta-analysing data across studies are in development [16]. However, data pooling strategies have distinct advantages in this context, allowing enhanced data harmonisation and statistical testing, including haplotype analysis and imputational methods to increase the comparability of markers across studies.

There are other caveats. Genetic confounding and biological compensation may also limit inferences in Mendelian randomisation studies. Biological compensation, also referred to as canalisation, is defined as a developmental and physiological adaptation to a genetic difference, and could lead to inconsistencies in the magnitude of the triangulated associations [17]. This physiological adaptation, among other important differences, distinguishes Mendelian randomisation studies from randomised controlled trials, which are not susceptible to this limitation [6]. Importantly, the pleiotropic nature (multiple biological effects) of some genetic variants may produce confounded associations between genetic variants and phenotypes. Undetected population stratification (confounding), where genetic differences and trait associations may result from underlying differences in ancestry, may also distort the magnitude of gene–biomarker and gene–disease associations. By contrast, correlation among genetic variants (linkage disequilibrium) can be used to examine associations among these variants, traits and diseases, without directly ascertaining functional variants. However, in the same context, correlation with pleiotropic genetic variants, for example, could distort associations.

Despite these limitations, Mendelian randomisation provides a potential research framework to assess causal links between biological and environmental phenotypes and disease risk. These studies, when correctly performed, will provide insights into aetiological mechanisms and causality, informing potential therapeutic and preventative strategies.