Skip to main content
Advertisement
  • Loading metrics

Estimation of causal effects of a time-varying exposure at multiple time points through multivariable mendelian randomization

  • Eleanor Sanderson ,

    Roles Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    eleanor.sanderson@bristol.ac.uk

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom

  • Tom G. Richardson,

    Roles Formal analysis, Writing – review & editing

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom, Novo Nordisk Research Centre, Headington, Oxford, United Kingdom

  • Tim T. Morris,

    Roles Conceptualization, Writing – review & editing

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom

  • Kate Tilling,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom

  • George Davey Smith

    Roles Conceptualization, Writing – review & editing

    Affiliations MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom

Abstract

Mendelian Randomisation (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants as instrumental variables (IVs) for the exposure. The effect estimates obtained from MR studies are often interpreted as the lifetime effect of the exposure in question. However, the causal effects of some exposures are thought to vary throughout an individual’s lifetime with periods during which an exposure has a greater effect on a particular outcome. Multivariable MR (MVMR) is an extension of MR that allows for multiple, potentially highly related, exposures to be included in an MR estimation. MVMR estimates the direct effect of each exposure on the outcome conditional on all the other exposures included in the estimation. We explore the use of MVMR to estimate the direct effect of a single exposure at different time points in an individual’s lifetime on an outcome. We use simulations to illustrate the interpretation of the results from such analyses and the key assumptions required. We show that causal effects at different time periods can be estimated through MVMR when the association between the genetic variants used as instruments and the exposure measured at those time periods varies. However, this estimation will not necessarily identify exact time periods over which an exposure has the most effect on the outcome. Prior knowledge regarding the biological basis of exposure trajectories can help interpretation. We illustrate the method through estimation of the causal effects of childhood and adult BMI on C-Reactive protein and smoking behaviour.

Author summary

Mendelian Randomisation (MR) can be used to estimate whether a potential exposure has a causal effect on an outcome in the presence of a third (unobserved) variable that affects both of them and so biases the observed association between them. The effect estimates obtained from MR studies can be interpreted as the effect of the exposure on the outcome over the lifetime. However, for some exposures there may be periods during which the causal effect on an outcome is greater or lesser than other periods. Multivariable MR (MVMR) is an extension of MR that allows for estimation of the causal effect of multiple, potentially highly related, exposures. In this paper we investigate how MVMR can be used to estimate the causal effect of the same exposure at different points across the lifecourse. We show that these effects can be estimated through MVMR when the association between the genetic variants used as instruments and the exposure varies between measurements of the exposures. However, we find that this estimation will not necessarily identify exact time periods over which an exposure has the most effect on the outcome as different periods in the lifecourse can only be separated when they are differently associated with genetic variants.

Introduction

Mendelian Randomization (MR) uses the special properties of germline genetic variation to strengthen causal inference regarding the effect of modifiable exposures on disease. [1,2] MR can be implemented as a form of instrumental variable (IV) estimation that uses genetic variants to estimate causal effects of an exposure on an outcome that is free from bias due to unmeasured confounding. [35] As genetic variants which do not change during an individual’s lifetime are used as instruments the estimated effects are interpreted as the effect of the genetically predicted exposure over the lifetime, or genetic liability to an exposure if that exposure is binary. [6] Under the assumption of ‘gene-environment equivalence’, i.e. that the effect of the exposure on the outcome is the same whether variation in the exposure is due to genetic or environmental variation, the effect estimates obtained by MR can be interpreted as the effect of variation in the exposure on the outcome. [2,7]

Many exposures, such as BMI, may have varying effects on any particular outcome over the course of an individual’s lifetime. [8] Higher BMI in childhood is observationally associated with many health outcomes later in life. However whether this is due to a direct causal effect of childhood BMI on those outcomes or the high correlation between childhood and adult BMI with the latter having a causal effect on the outcome, is unclear. [912] If a time-varying exposure only affects a (time-invariant) outcome during a particular period then intervening on the exposure during other periods will not have any effect on the outcome. Conventional observational studies often use a lifecourse approach with the intention of identifying the particular period(s) in life that affect an outcome. [13] For example, observational studies have shown that sunlight (and from this inferred vitamin D level) in childhood, but not adulthood, is associated with risk of multiple sclerosis. [1416] Therefore, in order to prevent multiple sclerosis it could be important to focus on time spent outside during childhood; intervening in this way during adulthood would not have any effect on the incidence of multiple sclerosis. A lifecourse approach contrasts with a MR approach which will generally provide evidence of a causal effect of the exposure on the outcome regardless of when in the lifecourse the exposure is measured. [17] For example, MR studies have shown a causal effect of vitamin D levels in the aetiology of multiple sclerosis [18] but have not identified which period is important.

When the association between genetic variants and an exposure vary over different points in the lifecourse MR estimates can be interpreted as the genetically predicted liability underlying the entire exposure history up to the time the outcome occurs. [19] That is; the effect estimated will be the effect on the outcome of having a genetic liability for the exposure that results in a one unit higher level of the exposure at the time the exposure is measured. If the genetic variants have a (proportionally) constant effect on the exposure across the entire lifetime this will be the genetically predicted lifetime effect of having a unit higher level of the exposure across the lifecourse. [20]

In this paper we explore the use of multivariable Mendelian randomization (MVMR) to estimate the causal effect of a single exposure measured at different time points in an individual’s lifetime on an outcome measured at a single fixed point in time. Structural mean models have previously been proposed for estimation of MR models with a time varying exposure. [21,22] The interpretation of the results from estimation of structural mean models will depend on the availability of data for the time-varying exposure, particularly how many time points data are available for. [21] MVMR can be implemented when multiple measures of the exposures at different points in the lifecourse are available and can be used to estimate direct effects of the exposure at each of those time points, conditional on the other time points included in the estimation. The effects estimated by MR and MVMR are described in Box 1. We outline a model for MR with an exposure measured at multiple time points and explain how this can be estimated with MVMR. We consider specific examples where the assumptions of this model do not hold and present simulation results to investigate what happens in these settings. From these results we explain how the results of such a MVMR estimation can be interpreted. We illustrate these results with application to estimating the effect of child and adulthood BMI on circulating C-reactive protein (CRP) and smoking behaviour. The results presented here show that it is possible to estimate genetically predicted direct causal effects of different time periods of an exposure on an outcome using MVMR, however careful interpretation of any results obtained from such an analysis is required. Prior knowledge regarding the biological basis of exposure trajectories can help interpretation.

Box 1 –Effects estimated by MVMR with a time-varying exposure

MVMR is an extension of MR that can be used to estimate the causal effects of multiple, potentially highly related, exposures. [23] The causal effect estimated by MR for an exposure that can only occur once (e.g. birthweight or age at menarche) is the total effect of the exposure on the outcome. The causal effects estimated by MVMR are the direct effect of each exposure that is not mediated by any of the other exposures included in the estimation. [24]

MR with one measure of a time-varying exposure estimates the total lifetime effect of liability to the exposure on the outcome. [19] This will include any effect on the outcome that acts through the exposure at other time periods. As we illustrate here, MVMR with multiple measures of a time-varying exposure estimates the direct effect of the liability to exposure at a particular period, i.e. the effect of the liability to the exposure at a time point that is not mediated by other time points included in the estimation. MVMR with a time-varying exposure can be implemented in the same way as MVMR with different exposures, as outlined in the methods section and in more detail elsewhere. [23,25]

Methods

Ethics statement

UK biobank has received ethical approval from the UK National Health Service’s National Research Ethics Service (re 11/NW/0382). All other data analysed were from publicly available summary statistics generated using relevant ethical approval from their respective studies.

We consider a model where genetic variants are associated with an unmeasured genetic liability for the exposure of interest which is associated with the observed value of the exposure. In this context we use the term genetic liability for the exposure to refer to the collective effect of all genetic variants associated with the exposure. [19] This liability may differ in different periods of an individual’s lifetime, however the observed trait is likely to change on a more frequent basis reflecting more short term variation and measurement error. Traits that vary across the lifecourse may have multiple liabilities that act on the exposure in different ways at different points in the lifecourse. Each liability is determined by the genetic effects and earlier liability levels do not have effects on later liability levels, although they maybe correlated through shared genetic influences and unobserved confounding. The exposure is influenced by the liability as well as confounders and other environmental influences and as Fig 1 suggests, also the earlier levels of the exposure. These exposures have a direct effect on the outcome of interest as well an effect on the subsequent value of the exposure in following periods. The genetic variants used as IVs are associated with liability in at least one period of the lifecourse but do not have to have the same association with liability in different periods. This allows for the association between a genetic variant and an exposure to vary across ages. This model is given in Fig 1. IV estimation can correct for the bias introduced by measurement error in the exposure under the assumption that the instrument is not associated with the level of that measurement error. [26] We therefore assume that the exposures are measured without error. All of the variables included (other than the individual genetic variants) are assumed to be continuous. For simplicity we initially limit the number of liability periods to two, however this model could be generalised to any number of periods. We have also excluded an effect of unobserved confounding on the liability. This simplification makes no difference to the results obtained.

thumbnail
Fig 1. Liability exposure model with two periods of exposure.

L1 is the earlier liability, L2 is the later liability, G1 is a set of genetic variants associated with L1, G2 is a set of genetic variants associated with L2, G12 is a set of genetic variants associated with both L1 and L2. X1 is a measure of the exposure in the early time period X2 is a measure of the exposure at the second time period, Y is an outcome observed at one time only, U is a set of unobserved confounders of the exposure at each time period and the outcome. X1 and X2 are potentially measured with error, error in this measurement is uncorrelated with the genetic variants.

https://doi.org/10.1371/journal.pgen.1010290.g001

Estimation with MVMR

MVMR can be used to estimate the effect of liability to the exposure during each period, given the liability to the exposure at all of the other time points included in the estimation, i.e. the effect of L1 and L2 in Fig 1. MVMR can be conducted with either individual level data or summary data and so it is possible to use the methods described here with either type of data. Estimation using individual level data requires a dataset with the exposure measured at all time points considered and the outcome. Summary data from another non-overlapping dataset is required to enable selection of SNPs for use as instruments, conventionally those which are genome-wide significant in a GWAS study. Estimation using summary data requires SNP-exposure effects and SNP-outcome effects taken from separate samples. SNP-exposure associations for the different time points can be taken from either the same or different datasets. Analysis using summary level data is more likely to be feasible in many cases, given the large datasets required and multiple observations at different time points, so we focus here on summary level analysis.

MVMR can be implemented in a summary data setting using estimates of the association between each SNP and: the outcome, ; exposure at one time point X1, ; and the exposure at another time point X2, , by fitting the following model:

Weighted by , the inverse variance of . This approach to MVMR is a straightforward generalization of the IVW estimation framework for MR. [23,25]

MVMR estimation relies on three assumptions for estimating the causal effect of liability to the exposure at each point on the outcome. [23] These assumptions mirror the standard assumptions required for IV estimation and are that; 1. liability to each exposure is robustly predicted by the genetic variants conditional on the other exposures included in the estimation, 2. there is no confounding of the genetic variants and the outcome and 3. the genetic variants are not associated with the outcome other than via liabilities to exposures included in the estimation, i.e. there are no horizontal pleiotropic effects of the genetic variants on the outcome via other phenotypes. We address the potential for the genetic variants to affect the outcome through liability at time points not included in the estimation in our simulation results.

The first MVMR assumption implies that the exposure at each time period included in the estimation is associated with a different liability, although those liabilities may be correlated. This assumption can be tested with a conditional F-statistic. [27,28] As well as having an F-statistic at each time point greater than 10 to indicate that the genetic variants are strongly associated with that exposure, it is necessary for the conditional F-statistics to be greater than 10, indicating that the genetic variants are robustly associated with liability to exposure at each time period conditional on their association with liability to exposure at the other time periods included in the estimation.

A heterogeneity Q-statistic can be used to test for violations of the third IV assumption in the MVMR estimation. [23,27] One potential reason for excessive heterogeneity is that some of the SNPs may be associated with the outcome through pathways that are not included in the MVMR estimation, i.e. there is horizontal pleiotropy. This pleiotropy will bias the results obtained from inverse variance weighted MVMR estimation. [29,30] If pleiotropy is suspected, alternative estimation methods can be used to estimate MVMR causal effects under different assumptions of the form the pleiotropy takes. [27,30,31]

All IV estimation requires additional assumptions for interpretation of the point estimates obtained as causal effects. Firstly, all the MR and MVMR methods implemented here assume that the causal effects of the exposure(s) on the outcome are linear and, for MVMR, that there are no interactions between the effects of the exposures. Secondly, a ‘point-identifying’ assumption is required. Common point identifying assumptions for univariable IV estimation include homogeneity and monotonicity. [2] The exact definition of this point identifying assumption will determine the precise causal effect estimated, however, it is not currently well-understood how these assumptions relate to estimation with multiple exposures as in MVMR.

Verification and comparison

Inclusion of exposures associated with different liability periods.

We illustrate the requirement for genetic separation in the included time periods with a simulation. We have included an exposure measured at two time points, where both measures of the exposure have a direct causal effect on the outcome and the exposure at the earlier time point also has a small direct effect on the exposure at the later time point. Following the liability model described in Fig 1 we consider two different structures for the relationship between the genetic variants, the liability and the observed value of the exposure. In the first setting each observed exposure is associated with a different underlying liability and the genetic variants have different (but correlated) effects on each liability. In the second setting both the observed exposures are associated with the same liability. This means that the genetic variants have the same effect on the exposure at each time point. These models are illustrated in Fig 2.

thumbnail
Fig 2. Models with different relationships between the genetic variants and the exposure at each time point.

L1 is the liability in the first time period, L2 is the liability in the second period. G1 is a set of genetic variants associated with L1, G2 is a set of genetic variants associated with L2 and G12 is a set of genetic variants associated with both L1 and L2. X1 and X2 are observed values of the exposure, where X2 is observed at a later point in an individual’s life than X1. Y is an outcome. X1, X2 and Y are confounded by a set of unobserved confounders U. In (a) X1 and X2 are associated with different liabilities. In (b) X1 and X2 are associated with the same liability. X1 and X2 are measured with error, though this measurement error is uncorrelated with the genetic variants. The direct causal effect of X1 on X2 and Y, and X2 on Y are given on the digram.

https://doi.org/10.1371/journal.pgen.1010290.g002

In each simulation we included 150 SNPs, two measures of the continuous exposures and a single continuous outcome. Unobserved confounding was modelled as two continuous variables that affected the earlier exposure measurement and the outcome or the later exposure measurement and the outcome and were excluded from the estimation. These confounders were highly correlated (rho = 0.8). The data simulated were used to generate summary associations between the SNPs and each exposure from the same sample and for the outcome using a second sample, drawn from the same population. The true association between the SNPs and each liability was normally distributed around 0 with variance 0.1/l where l is the number of SNPs. Effects of the SNPs on each liability were correlated with ρ = 0.25. SNPs associated with the exposure of interest for the MR estimation, or either exposure for the MVMR, with p-values < 5×10−8 in the exposure sample where included in the estimation. Effect estimates were obtained through inverse variance weighting MVMR (IVW–MVMR). [25] The simulations had a sample size of 150,000 and 2000 repetitions.

Results for the model with either one or two underlying liabilities are given in Table 1. These results show that the univariable estimates give an estimate of the total effect of a liability that is associated with having a unit higher level of the exposure at the time point associated with the measured exposure. This is larger than either the direct or total effect of the exposure at either time point on the outcome (given in Fig 2), due to the correlation between the genetic effects on the exposure at each time period. For example, for the first simulation given in Table 1 the direct effect of X1 on Y is 0.20, the total effect is 0.23 and the genetically predicted total effect is 0.34, due to an additional effect of the genetic variants on X2 which then has an effect on Y.

thumbnail
Table 1. Simulation results under different relationships between the genetic variants and the exposure at each time point.

https://doi.org/10.1371/journal.pgen.1010290.t001

When the measured exposures are associated with different liabilities, MVMR consistently estimates the genetically predicted causal effect of being on a trajectory associated with a unit higher level of that exposure, given the liability to the exposure at the other time period. However, when the measured exposures are associated with the same liability there is no difference in the genetic effects on the measured exposures and therefore weak instrument bias is introduced into the MVMR estimation. [27] This is highlighted through low conditional F-statistics. In this setting there is random variation in the direction of the bias for each exposure in each repetition of the simulation. Therefore, the mean point estimate is close to the true value of the causal effect. However, the high mean level of absolute bias shows the bias from conditionally weak instruments. This highlights how the MVMR estimates are not only biased by weak instruments but that the bias could act in either direction, with different repetitions within the same simulation being biased in opposite directions. Coverage in these simulations remains high due to the large standard error in the estimation resulting from the weak instruments, however this imprecision means total uncertainty in the simulations is high.

We additionally explored the effect of only selecting genetic variants which had differing effects at each time point on the results obtained for each of the models described here, as has previously been applied elsewhere. [32] This analysis shows that although this causes apparent conditional instrument strength to increase the causal effect estimates are potentially biased due to genetic variants which differ in the effects on each exposure more than others by chance by being selected for the analysis. These estimation results have lower power than those using all SNPs due to the reduction in the number of genetic variants included. We therefore recommend that this approach is avoided and do not consider it further. Results from this estimation and a full description of the analysis are given in S1 Text.

Estimation in the presence of a causal effect from the outcome to the later time point.

We now consider a model where the outcome has an effect on the exposure measured at the later time point. The exposure at the later time period is therefore a collider of the earlier exposure and the outcome. This is illustrated in Fig 3 and in all other aspects the model is the same as that described in Fig 2A. Morris et al (2021) showed that estimation of this scenario with MR gives consistent estimates when there is a single underlying liability. [19] Here we consider MVMR estimation of a model with two underlying liabilities.

thumbnail
Fig 3. Model with a causal effect from the outcome to the later time point.

L1 is the liability in the first time period, L2 is the liability in the second period. G1 is a set of genetic variants associated with L1, G2 is a set of genetic variants associated with L2 and G12 is a set of genetic variants associated with both L1 and L2. GY is a set of genetic variants associated with the outcome. X1 and X2 are observed values of the exposure, where X2 is observed at a later point in an individual’s life than X1. Y is an outcome. X1, X2 and Y are confounded by a set of unobserved confounders U. X1 and X2 are measured with error, this measurement error is uncorrelated with the genetic variants. The direct causal effect of X1 on X2 and Y, and Y on X2 are given on the diagram.

https://doi.org/10.1371/journal.pgen.1010290.g003

Simulations were set up in the same way as described for Table 1 with the addition of 50 SNPs included that were associated with the outcome Y. This model was estimated assuming that X1 and X2 are the true exposures and Y is the true outcome. All genetic variants associated with the exposure at either time period, selected based on a p-value for the SNP–exposure association of <5x10-8, reflecting genome-wide significance, were included in the MVMR estimation. Therefore, some SNPs strongly associated with Y were selected as instruments for the later time period. The model was estimated twice; firstly with no additional restrictions on the SNPs selected and secondly with Steiger filtering applied to remove any SNPs that explain more variation in the outcome than the later exposure. [33] Results from this simulation are given in Table 2.

thumbnail
Table 2. Simulation results for multiple time points with a causal effect from the outcome to the later time point.

https://doi.org/10.1371/journal.pgen.1010290.t002

Simulation results without Steiger filtering show that although the genetic variants strongly predict the exposure at each time period conditional on the other, MVMR estimation gives biased estimates of the direct causal effect of the exposure at both time periods on Y. This bias is due to conditioning on a variable that depends on both the exposure and the outcome (a collider) in the estimation, introducing collider bias. [3437] Because the genetically predicted value of X2 depends on genetic variants associated with Y, X2 becomes the collider in the MVMR estimation. Conditioning on a collider distorts the estimated association between the other exposure and the outcome and so means that the estimates obtained in the MVMR are no longer reliable estimates of the direct effect of the earlier exposure on the outcome. Importantly, the introduction of collider bias in this estimation biases the effect estimates at each time point included in the estimation, including the earlier time point which is not dependent on Y.

Sanderson et al. (2019) showed that MVMR conditioning on a collider does not introduce collider bias when only genetic variants associated with the exposures are included in the estimation. [23] The different result here occurs because we have allowed for genetic variants associated with Y to be included as instruments, which was not the case in Sanderson et al. (2019) and reflects a situation where the primary phenotype has been mis-specified. [3] When Steiger filtering is applied to the results given in Table 2 there is no bias in the results obtained, as the genetic variants are restricted to those which affect the exposure at either time point directly without acting via the outcome.

Additional excluded liability

We finally consider a model where the exposure has three underlying liabilities associated with it but where the model estimated only includes the exposure at times associated with two of those liabilities. The true structure of the data is illustrated in Fig 4 however the model estimated is assumed to be the same as that given in Fig 2A.

thumbnail
Fig 4. Model with three liability time periods.

L1 is the liability of the exposure in the first time period, L2 is the liability of the exposure in the second period, L3 is the liability of the exposure in the third period, G1 is a set of genetic variants associated with L1, L2 and L3, G2 is another set of genetic variants associated with L1, L2 and L3 and G3 is a set of genetic variants associated with L3. X1 and X2 are observed values of the exposure, where X2 is observed at a later point in an individual’s life than X1. X3 is a third value of the exposure, X3, L3 and G3 are given in grey to illustrate that although they have an effect on the outcome they are not included in the estimation. Y is an outcome. X1, X2, X3 and Y are confounded by a set of unobserved confounders U. X1, X2 and X3 are measured with error, this measurement error is uncorrelated with the genetic variants. The direct causal effect of X1 on X2 and Y, X2 on X3 and Y, and X3 on Y are given on the diagram.

https://doi.org/10.1371/journal.pgen.1010290.g004

We set the simulations up in the same way as described for Table 1 with the addition of a third liability time period associated with a measured value of the exposure. This third measured exposure is assumed to be dependent on X2 and subject to overlapping confounding to both X1 and X2 with Y. We considered two models for the effect of G on L3. In the first there is no correlation between the association between G and L3 and the association between G and the other liabilities. In the second correlation between the association between G and L3 and G and L1 and L2 was added with higher correlation between GL2 and GL3 (ρ = 0.25) and a lower level of correlation between GL1 and GL3 (ρ = 0.1). These correlations arise from the overlap in the genetic effects on each liability. In both cases the outcome is assumed to occur at or after the time at which X3 is measured and all exposures have a direct causal effect on the outcome. The results from this simulation are given in Table 3.

thumbnail
Table 3. Simulation results with a relevant liability period excluded.

https://doi.org/10.1371/journal.pgen.1010290.t003

When the association between the genetic variants and the excluded liability are correlated with those for the included periods the effect estimated will include some of the effect that acts via the omitted liability. The estimated effect of liability to X1 and X2 both consistently estimate of the effect of the liability at that time point. When the genetic effects on L3 are uncorrelated with the included liability periods the effect estimated does not include the effect of the later liability. Additional simulations with no direct causal effect of X2 on Y showed the same pattern of results and are given in Table B in S1 Text.

We finally simulated data where only X2 had a causal effect on the outcome but X1 and X3 (but not X2) were included in the estimation. In this set up we varied the association between each of L1 and L3 with L2. Results from estimation of this model showed the same pattern of results as above. The liability effect estimated includes the effect of that time period and some of the effect of the excluded time period, with the proportion of the effect of the excluded period included in the estimated effect depending on the genetic correlation between the two periods. Results from these simulations are given in Table C in S1 Text.

Application

We consider an illustrative application where we estimate the effect of childhood and adult body mass index (BMI) on circulating C-reactive protein levels (CRP) and smoking behaviour, measured as smoking initiation, smoking cessation and cigarettes per day.

Data

Data on child and adulthood BMI were taken from the UK biobank (UKB) study. [38,39] Between 2006 and 2010, the UK Biobank study enrolled 500,000 individuals aged between 40 and 69 at baseline across 22 assessment centres in the UK. Data were collected on clinical examinations, assays of biological samples, detailed information regarding self-reported health characteristics and genome-wide genotyping. In total 12,370,749 genetic variants in up to 463,005 individuals were available for analysis, as described previously. [40] For BMI we derived a measure of childhood body size using recall questionnaire data asking UKB participants if they were ‘thinner’, ‘plumper’ or ‘about average’ when they were aged 10 years old compared to the average. Adult body size was derived using clinically measured BMI data (mean age 56.5 years), which we categorized into a 3-category variable using the same proportion as the early life measure for comparative purposes. Genetic variants robustly associated with childhood and adult body size (based on P<5x10-8 and r2<0.001 using a reference panel from the 1000 genomes project phase 3 [41]) were identified from a previously undertaken Genome Wide Association Study (GWAS) in UKB. This GWAS has been described in-detail elsewhere as well as validation studies of the resulting genetic instruments. [8,42,43]

GWAS summary statistics for CRP levels for 204,402 European adults were extracted from Lighart et al (2018) to avoid sample overlap with UK Biobank. [44] For each of the smoking behaviour outcomes GWAS data was extracted from Lui et al (2019) using summary statistics produced excluding UKB. [45] The mean age of smoking initiation across individuals with available data (excluding UK Biobank) was 17.5 years, with the mean for each study included in the GWAS ranging from 16.0 to 21.0 years. SNPs associated with smoking initiation were identified in a sample including smokers and non-smokers. SNPs associated with smoking cessation and cigarettes per day were identified in a sample of smokers only.

For each outcome considered we estimated the genetically predicted total effect of early life and later life exposure separately through a two-sample MR using the SNPs associated with the exposure at the relevant time period. We then estimated the genetically predicted direct effects of the exposure at each time point through a MVMR estimation including both early and later life body size in the same estimation, including all SNPs associated with the exposure at either time.

Results

Results for the estimation of the effect of BMI on CRP are given in Table 4. Our MR estimates showed a strong total effect of liability to body size in childhood and adulthood on CRP (total effect of a category increase in childhood body size on CRP (log mg/L) = 0.35, 95% CI = 0.27 to 0.42; for adult body size 0.56, 95% CI = 0.50 to 0.62) However, in the MVMR estimation no effect of early life body size liability was observed and the effect of later life liability remained largely unchanged implying that the total and direct effects of later life body size liability are similar (direct liability effect of a category increase childhood body size on circulating CRP = -0.04, 95% CI = -0.14 to 0.06, for adult body size; 0.56, 95% CI = 0.47 to 0.65).

thumbnail
Table 4. Univariable and multivariable estimates for effect of child and adulthood BMI liability on circulating CRP.

https://doi.org/10.1371/journal.pgen.1010290.t004

Similar results were obtained for smoking behaviour, given in Table 5. MR estimates showed a strong total effect of body size liability in childhood and adulthood on all of the smoking outcomes (total effect of a category increase childhood body size on number of cigarettes per day = 0.13, 95% CI = 0.07 to 0.18, P = 2.11x10-6, for adult body size: 0.25, 95% CI = 0.20 to 0.30, P = 3.54x10-26). However, in the MVMR estimation no effect of early life body size liability on number of cigarettes per day was observed and the effect of later life liability remained largely unchanged implying that the total and direct effects of later life body size liability are similar (direct effect of a category increase childhood body size on number of cigarettes per day = -0.05, 95% CI = -0.11 to 0.01, P = 0.174, for adult body size; 0.27, 95% CI = 0.22 to 0.35, P = 7.15x10-20). Similar results were observed for the other smoking behaviour measures with positive total effects of liability to a higher category of childhood body size on smoking initiation and cessation observed in the MR estimation and no direct effect of childhood body size liability observed in the MVMR estimation.

thumbnail
Table 5. Univariable and multivariable estimates for effect of child and adulthood BMI liability on smoking behaviour.

https://doi.org/10.1371/journal.pgen.1010290.t005

These results suggest that there is no direct effect of childhood body size liability on CRP or smoking behaviour in later life conditional on later life body size. The observed effect in the MR estimates of childhood body size on CRP and smoking are due to a combination of the effect of SNPs associated with childhood body size also having an effect on adult body size and an indirect effect of childhood body size on CRP and smoking behaviour through its effect on adult body size. Steiger filtering [33] between adult body size and the outcome removed 10 SNPs for CRP and ≤5 SNPs for any of the smoking behaviours and did not change the results obtained, results given in S1 Text.

We have not explored the potential for biases that often arise in MR and MVMR studies in the results presented here, such as biases due to pleiotropy or selection bias. [29,36,46] SNPs for smoking cessation and cigarettes per day were identified in smokers only. This leads to the potential for collider bias in the GWAS results, which would then bias the MR and MVMR results given here. These results should therefore be taken as an illustration of the application and interpretation of the methods discussed.

Discussion

When multiple measures of an exposure at different time points are available, MVMR can be used to estimate the causal effect of changing the liability of the exposure at different time points on the outcome. The interpretation of the MVMR estimate is the direct effect of having a liability associated with a unit higher level of the exposure at that time point, for a given liability for the exposure at the other time points included in the estimation. That is, the effect of having a liability associated with a unit higher level of X1 while keeping the liability for X2 constant. If measures of the exposure at different time periods are available, it is possible to identify whether the exposure effects persist over time or key periods exist in the lifecourse.

As shown in simulation results given in Table 1, an important restriction for estimation of these models is that the association between the genetic variants and the exposure must vary over the periods included in the estimation. Although genetic variants themselves do not vary over an individual’s lifetime, variation in their effects could arise from different genetic variants having different levels of importance in the development of the exposure at different ages. In the simulations we have assumed that each liability only directly affects the exposure at one time period but that genetic variants can be associated with multiple liabilities. However, the results obtained would be the same if we had allowed each genetic variant to influence one liability only but for the liabilities to affect the exposure at multiple time periods and each exposure to be influenced by multiple liabilities.

Our simulation results highlight how it is possible to introduce collider bias to the results obtained when genetic variants for the outcome are selected as instruments for the exposure. Steiger filtering should be applied to help remove this bias if there is potential for the outcome to mediate some of the relationship between the exposure at the time periods included in the estimation.

Our final set of simulations show how the effects of any time periods excluded from the estimation but associated with genetic variants included in the estimation will form part of the effect estimated. The size of this effect will depend on the level of correlation between the liabilities for the included and excluded time periods. It is likely that for many exposures the genetic variants associated with the trait at one time point will also be associated with the trait at another time point to some degree. Therefore, an observed effect for one time period may not be due to the exact time period measured. If MVMR is being used to identify which periods in the lifecourse are most important, then the other potentially important periods also need to be included in the estimation.

Our application to smoking behaviour illustrates how genetic correlation between time points can mean that a particular point can appear important even though any effect is likely to have occurred earlier in the lifecourse. The results obtained show an effect of adult body size liability (mean age: 56.5) on smoking initiation (mean age: 17.5 years) once childhood body size liability has been controlled for. Typical age of smoking initiation therefore precedes the measurement of adulthood body size. The large effect of body size liability at the age measured in our sample on risk of smoking is unlikely to be causal at the point of time that the exposure and outcome were measured. If liability for higher body size in adulthood is associated with liability for higher body size in adolescence/early adulthood the effect we estimate may reflect the effect of body size in adolescence on smoking initiation even though it is actually measured at a later time point. This model is illustrated in Fig 5. Data on BMI at different ages between childhood and adulthood would potentially enable estimation of the effect on smoking behaviour at a range of different ages and so identification the period between childhood and adulthood that was most important in the development of smoking behaviour. Implementation of this approach with MVMR would however rely on those periods being differentially associated with the genetic variants used as instruments. This difference has been shown previously for body mass index (BMI) at different points across childhood and between childhood and adulthood. [8,47,48] However other research has shown that the genetic influences on BMI are consistent across adulthood. [49] This would prevent using this approach to determine which point in adulthood is most important for risk of smoking behaviour.

thumbnail
Fig 5. Estimated relationship between adiposity and smoking behaviour.

Figure shows (a) the estimated relationship between adiposity and smoking behaviour and (b) a potential underlying model that would give this result.

https://doi.org/10.1371/journal.pgen.1010290.g005

Previous work using this estimation approach has shown that early life BMI liability does not have a direct effect on type 2 diabetes and coronary heart disease. [8] Therefore if an individual with a high BMI in early life reduces their excess weight in later life their risk for type 2 diabetes and coronary heart disease will not be increased via this pathway. Our analysis of the effect of body size on circulating CRP levels show a similar result, larger body size in early life will not have an effect on increased CRP levels if the excess weight is reduced in adulthood. These results however do not identify the periods in adulthood that are most important.

The methods described here also require the general assumptions of MR estimation to hold. The assumption that all of the data is from the same underlying population is important to all summary-data MR analysis. [50,51] This is likely to be particularly important when considering the same exposure at different ages as changes in the distribution of the exposure or the relationship between the exposure and the outcome between different cohorts could potentially bias the results obtained. The choice of datasets should be carefully considered if the same data cannot be used for each time point. For example; the distribution of childhood BMI levels has changed notably over the last 50 years and therefore it would not be correct to assume that BMI measured in groups of adults and children at the same point in time would represent measures from the same population.

If some SNPs are differently associated with more than one time period for the exposure and the causal effect of the exposure on the outcome varies over time there may be heterogeneity in the results obtained, even in the absence of pleiotropy. In this case those SNPs that have a larger association with the exposure in the time period with the largest causal effect will estimate a larger causal effect of the exposure on the outcome. This will inflate the heterogeneity Q-statistic even in the absence of conventional pleiotropic effects, unless all relevant time periods are included in the estimation.

Throughout this work we have only considered a single measurement of the outcome. For many exposures and outcomes it may be possible that the outcome could also vary over time with the relationship between the exposure and outcome varying at different time points, and potentially also effects of earlier values of the exposure on later values of the outcome. This type of relationship, with multiple different outcomes, cannot be estimated with standard MVMR methods. This is therefore left as an area of future research.

Supporting information

S1 Text. Supplementary material.

Table B–No direct causal effect of X2. Table C–Additional simulation results for a model with three exposure periods.

https://doi.org/10.1371/journal.pgen.1010290.s001

(DOCX)

Acknowledgments

This research has been conducted using the UK Biobank Resource under Application Number 15825. This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol - www.bristol.ac.uk/acrc.

References

  1. 1. Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International journal of epidemiology. 2003;32(1):1–22. pmid:12689998
  2. 2. Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, Munafò MR, et al. Mendelian randomization. Nature Reviews Methods Primers. 2022;2(1):1–21.
  3. 3. Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23(R1):R89–R98. pmid:25064373
  4. 4. Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Statistics in medicine. 2008;27(8):1133–63. pmid:17886233
  5. 5. Richmond RC, Davey Smith G. Mendelian randomization: Concepts and scope. Cold Spring Harbor Perspectives in Medicine. 2021:a040501.
  6. 6. Tudball MJ, Bowden J, Hughes RA, Ly A, Munafò MR, Tilling K, et al. Mendelian randomisation with coarsened exposures. Genetic Epidemiology. 2021;45(3):338–50. pmid:33527565
  7. 7. VanderWeele TJ. Concerning the Consistency Assumption in Causal Inference. Epidemiology. 2009;20(6):880–3. pmid:19829187
  8. 8. Richardson TG, Sanderson E, Elsworth B, Tilling K, Davey Smith G. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: mendelian randomisation study. bmj. 2020;369.
  9. 9. Bibbins-Domingo K, Coxson P, Pletcher MJ, Lightwood J, Goldman L. Adolescent overweight and future adult coronary heart disease. New England Journal of Medicine. 2007;357(23):2371–9. pmid:18057339
  10. 10. Biro FM, Wien M. Childhood obesity and adult morbidities. The American journal of clinical nutrition. 2010;91(5):1499S–505S. pmid:20335542
  11. 11. Baer HJ, Colditz GA, Rosner B, Michels KB, Rich-Edwards JW, Hunter DJ, et al. Body fatness during childhood and adolescence and incidence of breast cancer in premenopausal women: a prospective cohort study. Breast Cancer Research. 2005;7(3):1–12. pmid:15987426
  12. 12. Park MH, Sovio U, Viner RM, Hardy RJ, Kinra S. Overweight in childhood, adolescence and adulthood and cardiovascular risk in later life: pooled analysis of three british birth cohorts. PloS one. 2013;8(7):e70684. pmid:23894679
  13. 13. Ben-Shlomo Y, Cooper R, Kuh D. The last two decades of life course epidemiology, and its relevance for research on ageing. International Journal of Epidemiology. 2016;45(4):973–88. pmid:27880685
  14. 14. Cabre P. Migration and multiple sclerosis: the French West Indies experience. Journal of the neurological sciences. 2007;262(1–2):117–21. pmid:17651756
  15. 15. Dean G, Elian M. Age at immigration to England of Asian and Caribbean immigrants and the risk of developing multiple sclerosis. Journal of Neurology, Neurosurgery & Psychiatry. 1997;63(5):565–8. pmid:9408093
  16. 16. Elian M, Nightingale S, Dean G. Multiple sclerosis among United Kingdom-born children of immigrants from the Indian subcontinent, Africa and the West Indies. Journal of Neurology, Neurosurgery & Psychiatry. 1990;53(10):906–11. pmid:2266374
  17. 17. Holmes MV, Ala-Korpela M, Davey Smith G. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat Rev Cardiol. 2017;14(10):577–90. pmid:28569269
  18. 18. Mokry LE, Ross S, Ahmad OS, Forgetta V, Davey Smith G, Leong A, et al. Vitamin D and risk of multiple sclerosis: a Mendelian randomization study. PLoS medicine. 2015;12(8):e1001866. pmid:26305103
  19. 19. Morris TT, Heron J, Sanderson E, Davey Smith G, Didelez V, Tilling K. Interpretation of mendelian randomization using one measure of an exposure that varies over time. International Journal of Epidemiology 2022. https://doi.org/10.1093/ije/dyac136 pmid:35848950
  20. 20. Labrecque JA, Swanson SA. Interpretation and potential biases of Mendelian randomization estimates with time-varying exposures. American journal of epidemiology. 2019;188(1):231–8. pmid:30239571
  21. 21. Shi J, Swanson SA, Kraft P, Rosner B, De Vivo I, Hernán MA. Mendelian Randomization With Repeated Measures of a Time-varying Exposure: An Application of Structural Mean Models. Epidemiology. 2021;33(1):84–94.
  22. 22. Shi J, Swanson SA, Kraft P, Rosner B, De Vivo I, Hernán MA. Instrumental variable estimation for a time-varying treatment and a time-to-event outcome via structural nested cumulative failure time models. BMC Medical Research Methodology. 2021;21(1):258. pmid:34823502
  23. 23. Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. International journal of epidemiology. 2019;48(3):713–27. pmid:30535378
  24. 24. Sanderson E. Multivariable Mendelian randomization and mediation. Cold Spring Harbor perspectives in medicine. 2021;11(2):a038984. pmid:32341063
  25. 25. Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. American journal of epidemiology. 2015;181(4):251–60. pmid:25632051
  26. 26. Wooldridge JM. Econometric analysis of cross section and panel data: MIT press; 2010.
  27. 27. Sanderson E, Spiller W, Bowden J. Testing and Correcting for Weak and Pleiotropic Instruments in Two-Sample Multivariable Mendelian Randomisation. Statistics in medicine. 2021.
  28. 28. Sanderson E, Windmeijer F. A weak instrument F-test in linear IV models with multiple endogenous variables. Journal of econometrics. 2016;190(2):212–21. pmid:29129953
  29. 29. Hemani G, Bowden J, Davey Smith G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Human molecular genetics. 2018;27(R2):R195–R208. pmid:29771313
  30. 30. Grant AJ, Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Statistics in medicine. 2021. pmid:34342032
  31. 31. Rees JM, Wood AM, Burgess S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Statistics in medicine. 2017;36(29):4705–18. pmid:28960498
  32. 32. Gill D, Georgakis MK, Zuber V, Karhunen V, Burgess S, Malik R, et al. Genetically Predicted Midlife Blood Pressure and Coronary Artery Disease Risk: Mendelian Randomization Analysis. Journal of the American Heart Association. 2020;9:e016773. pmid:32627641
  33. 33. Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS genetics. 2017;13(11):e1007081. pmid:29149188
  34. 34. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004:615–25. pmid:15308962
  35. 35. Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, et al. Illustrating bias due to conditioning on a collider. International Journal of Epidemiology. 2009;39(2):417–20. pmid:19926667
  36. 36. Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G. Collider scope: when selection bias can substantially influence observed associations. International journal of epidemiology. 2018;47(1):226–35. pmid:29040562
  37. 37. Griffith GJ, Morris TT, Tudball MJ, Herbert A, Mancano G, Pike L, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nature communications. 2020;11(1):1–12.
  38. 38. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9. pmid:30305743
  39. 39. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Plos med. 2015;12(3):e1001779. pmid:25826379
  40. 40. Elsworth B, Mitchell R, Raistrick C, Paternoster L, Hemani G, Gaunt T. MRC IEU UK Biobank GWAS pipeline version 2.. 2019.
  41. 41. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56. pmid:23128226
  42. 42. Brandkvist M, Bjørngaard JH, Ødegård RA, Åsvold BO, Davey Smith G, Brumpton B, et al. Separating the genetics of childhood and adult obesity: a validation study of genetic scores for body mass index in adolescence and adulthood in the HUNT Study. Hum Mol Genet. 2021;29(24):3966–73. pmid:33276378
  43. 43. Richardson TG, Mykkänen J, Pahkala K, Ala-Korpela M, Bell JA, Taylor K, et al. Evaluating the direct effects of childhood adiposity on adult systemic metabolism: a multivariable Mendelian randomization analysis. Int J Epidemiol. 2021;50(5):1580–92. pmid:33783488
  44. 44. Ligthart S, Vaez A, Võsa U, Stathopoulou MG, de Vries PS, Prins BP, et al. Genome Analyses of >200,000 Individuals Identify 58 Loci for Chronic Inflammation and Highlight Pathways that Link Inflammation and Complex Disorders. Am J Hum Genet. 2018;103(5):691–706. pmid:30388399
  45. 45. Liu M, Jiang Y, Wedow R, Li Y, Brazel DM, Chen F, et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nature Genetics. 2019;51(2):237–44. pmid:30643251
  46. 46. Gkatzionis A, Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? International Journal of Epidemiology. 2018;48(3):691–701.
  47. 47. Hardy R, Wills AK, Wong A, Elks CE, Wareham NJ, Loos RJ, et al. Life course variations in the associations between FTO and MC4R gene variants and body size. Human molecular genetics. 2010;19(3):545–52. pmid:19880856
  48. 48. Ortega-Alonso A, Sipilä S, Kujala UM, Kaprio J, Rantanen T. Genetic Influences on Change in BMI from Middle to Old Age: A 29-Year Follow-up Study of Twin Sisters. Behavior Genetics. 2009;39(2):154–64. pmid:19067155
  49. 49. Gillespie NA, Gentry AE, Kirkpatrick RM, Maes HH, Reynolds CA, Mathur R, et al. Determining the stability of genome-wide factors in BMI between ages 40 to 69 years. bioRxiv. 2021:2021.07.28.454172.
  50. 50. Hartwig FP, Davies NM, Hemani G, Davey Smith G. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. International Journal of Epidemiology. 2017;45(6):1717–26.
  51. 51. Zhao Q, Wang J, Spiller W, Bowden J, Small DS. Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples. Statistical Science. 2019;34(2):317–33, 17.