Introduction

Pregnancy is a critical time window for neurodevelopment and a time when the fetus is most susceptible to adverse environmental and prenatal exposures [1]. Prenatal tobacco exposure (PTE) and prenatal alcohol exposure (PAE) during pregnancy are associated with adverse fetal and childhood outcomes. Maternal smoking and alcohol consumption during pregnancy have been associated with adverse fetal outcomes such as low birth weight, sudden infant death syndrome (SIDS), and preterm birth [2,3,4,5].

PTE and PAE have also been linked to the etiology of a range of neurodevelopmental disorders in children. For example, PTE has previously been associated with attention deficit hyperactivity disorder (ADHD) [6]. There is evidence that the risk of ADHD increases with higher concentrations of PTE [7]. There is also evidence that maternal smoking during pregnancy is a risk factor for neuropsychiatric disorders such as schizophrenia (SZ) and bipolar disorder (BPD) [8, 9]. PAE is the cause of fetal alcohol spectrum disorder (FASD), a condition characterized by severe neurodevelopmental delay, and which has various manifestations, including growth deficiencies and both behavioral and cognitive deficits [10, 11].

While PTE and PAE have been implicated in the etiology of several neurodevelopmental disorders, the biological mechanisms have yet to be established. Epigenetic modifications, such as differential DNA methylation (DNAm), are dynamic and highly sensitive to external environmental factors [12]. More importantly, DNAm is potentially reversible, indicating that the methylome could be a therapeutic target for disease treatment and prevention [13].

There is strong evidence from large-scale epigenome-wide association studies (EWAS) suggesting that PTE is associated with changes in the cord blood methylome [14,15,16]. The evidence is weaker for an association between PAE and the cord blood methylome [17], although there has been some evidence of epigenomic changes in other tissues such as placental or buccal tissue [18]. While there is evidence that PTE and PAE influence DNA methylation in cord blood, not much is known about the subsequent adverse outcomes, such as delayed neurodevelopment [19, 20] or about the association between newborn DNAm and child neurodevelopment or related disorders in general [21,22,23]. Few studies have investigated the mediation of PTE and neuropsychiatric outcomes by DNAm. A candidate gene study found that differential cord blood DNAm levels in GFI1 acts as a mediator of the association between PTE and ADHD in 6-year-old children [24]. There is also evidence of whole blood DNAm mediating the association between PTE and schizophrenia, as indicated by a meta-analysis of five prospective birth cohorts [25]. While there is evidence of DNAm as a mediator of this relationship with psychiatric outcomes, the epigenetic mechanisms connecting PTE and PAE, cord blood DNAm, and child neurodevelopment are largely unknown.

The majority of EWAS have been conducted in high-income countries (HICs) and/or in children of primarily European ancestry. Despite the rapid proliferation of the field of genomics and translational science in the last decade, cohorts from non-European ancestries and low- and middle-income countries (LMICs) are still largely underrepresented [26]. Consequently, there is a large gap in the literature on these underrepresented populations in which risk factors such as low socioeconomic status, poor maternal health, maternal substance use, and adverse child health outcomes, including neurodevelopmental delay, are at a much higher prevalence [27].

Here, we investigated whether differential DNAm in cord blood mediates the relationship between PTE and PAE and neurodevelopment in children of 6 and 24 months of age using data from a well-characterized South African birth cohort, the Drakenstein Child Health Study (DCHS) [28]. For the causal mediation analysis (CMA), we utilized methylation risk scores (MRS) and high-dimensional mediation analysis (HDMA) to increase statistical power and reduce the dimensionality of the methylation data. To our knowledge, this study is one of the first mediation analyses to examine this relationship in an underrepresented, at-risk, LMIC population, and is also one of the first to utilize HDMA and MRS for epigenetic mediation analysis.

Materials and methods

Study population

The DCHS is a South African, population-based birth cohort of African and mixed ancestry and has been previously described [28]. Our analysis sample consisted of 262 children from the DCHS with DNAm data from cord blood, genotype data, PTE and PAE measures, and important covariates. Of these,112 infants had measurements available from the Bayley Scales of Infant and Toddler Development, third edition (BSID-III) at 6 months of age, and 184 had measurements available from BSID-III at 24 months of age. Mothers were enrolled during their second trimester and followed through pregnancy at two primary clinics, Newman and Mbekweni. Mother-child pairs were then followed from birth until the child was at least 5 years of age. Ethical approval for human subjects research was given by the Human Research Ethics Committee of the Faculty of Health Sciences of the University of Cape Town. Written consent for participation was obtained from the mother on behalf of herself and the infant.

DNA methylation measurements

As described previously, DNA methylation was measured from cord blood using either the Illumina Infinium HumanMethylation450 BeadChip (450 K; n = 115) or the Infinium MethylationEPIC BeadChip (EPIC; n = 147) arrays [23]. Preprocessing was conducted separately for each array in R 3.5.1, using an identical preprocessing pipeline. The 450 K and EPIC arrays contained 426,378 probes and 781,536 probes, respectively. The 450k and EPIC arrays were combined using the minfi package, which resulted in 316 samples and 453,093 probes available in both arrays [29]. Background subtraction, color correction, and normalization were performed using the preprocessFunnorm function [30]. After sample and probe filtering, 273 samples and 409,033 probes remained for analysis. Of these samples, there were 262 complete cases with genotype data, PTE and PAE measures, and important covariates. Batch effects were removed using the ComBat function from the sva package [31]. Cell type composition estimates were calculated using the most recent cord blood reference dataset [32].

Smoking and alcohol measurements

Prenatal tobacco exposure was objectively measured using urine cotinine levels, which were taken within 4 weeks of enrollment. Urine cotinine was measured using the IMMULITE® 1000 Nicotine Metabolite Kit (Siemens Medical Solutions Diagnostics®, Glyn Rhonwy, United Kingdom) [33]. Urine cotinine levels were classified as non-smoker or passive smoker (<499 ng/ml) or active smoker (≥500 ng/ml). The continuous urine cotinine concentrations were used in this analysis [34].

Prenatal alcohol exposure was measured at the second antenatal study visit using a dichotomous, composite score calculated from the Alcohol, Smoking, and Substance Involvement Screening Test (ASSIST), a self-report questionnaire, and from retrospectively collected data on alcohol consumption during pregnancy [35]. The ASSIST questionnaire was developed by the World Health Organization and has been validated for use in international settings [36, 37]. Details of the composite score calculation have been described elsewhere [35, 38, 39].

Neurodevelopment measurements

Neurodevelopment was assessed at 6 and 24 months of age using the BSID-III [40, 41]. The BSID-III assessment and its composite scores have been previously validated in South African populations [42, 43]. Trained assessors administered the BSID-III to children via direct observation to generate scores for the cognitive, language, and motor development domains. The same trained assessors administered the BSID-III to mothers to report and generate a score for the adaptive behavior and socioemotional domains [35]. Composite scores generated for each domain were used in this analysis.

Statistical analysis

First, we estimated the association between PTE and PAE and neurodevelopment at age 6 and 24 months in adjusted linear regression analysis. Mediation analyses with differential DNAm as mediators were conducted using MRS and individual CpG sites as mediators. Significant mediators identified at 6 months of age were validated for neurodevelopment measured at the second timepoint of 24 months.

Confounding assessment

Confounding was assessed by constructing directed acyclic graphs (DAGs). DAGs were created for each exposure-mediator (E-M) (PTE/PAE-DNAm), mediator-outcome (M-O) (DNAm-neurodevelopment), and exposure-outcome (E-O) relationship (PTE/PAE-neurodevelopment) (Fig. S1). Potential confounders were selected based on existing literature [35]. All models in this analysis were adjusted for maternal age, maternal HIV status, maternal depression during pregnancy (assessed with the Beck Depression Inventory II (BDI-II)), maternal psychological distress during pregnancy (assessed with the Self Reporting Questionnaire (SRQ-20)), parental socioeconomic status (SES), gestational age, and cell-type proportions. Population stratification was adjusted for using the first five genetic principal components. Models evaluating PTE as the primary exposure were additionally adjusted for PAE and likewise, models evaluating PAE as the primary exposure were additionally adjusted for PTE. Multicollinearity between covariates was evaluated by calculating the variance inflation factors for each predictor using the total effect regression model. There was no evidence of multicollinearity for PTE and PAE (Table S6).

Methylation risk scores

First, we constructed methylation risk scores (MRS) for PTE and PAE as described elsewhere [44, 45]. All 262 samples with DNAm data were included in the MRS calculation. MRS are calculated as the weighted sum of methylation beta levels of CpG sites. MRS calculations were based on summary statistics from epigenome-wide association studies (EWAS) for DNAm probes in which the association between the exposure and DNAm was investigated. These summary statistics, which describe the effect size (beta) at each CpG site in the respective EWAS, were used as external weights to calculate the methylation risk score in our cohort. External summary statistics were acquired from EWASs conducted by the Pregnancy and Childhood Epigenetics (PACE) consortium. PTE summary statistics were acquired from an EWAS investigating the association between maternal smoking and cord-blood DNAm (N = 5648) [16]. PAE summary statistics were acquired from an EWAS investigating the association between maternal alcohol use and cord-blood DNAm (N = 1147) [17]. Both EWASs were comprised of cohorts from primarily European ancestry. To correct for correlated CpG sites, we estimated co-methylated regions (CMR) using the CoMeBack package [46]. Then, we conducted “clumping”, in which one CpG site with the smallest p value from the external summary statistics per CMR was included in the final MRS. MRS was calculated at several p value thresholds for each EWAS, a procedure similar to “thresholding” in traditional polygenic risk score calculations. Finally, we calculated the correlation between the exposure and the resulting MRS at different p value thresholds and chose the p value threshold associated with the highest prediction accuracy for the subsequent analyses.

High-dimensional mediation analyses (HDMA)

MRS are limited in that they only consider the E-M but not the M-O relationship. HDMA considers the E-M as well as the M-O relationship, which allows for a more holistic analysis of the mediating relationships. We performed two separate high-dimensional mediation analyses using the R packages HIMA and DACT for any exposure-outcome combinations for which we identified indications of a total effect [47, 48]. The HIMA package utilizes a penalized-based regression and consists of three steps. First, it performs dimension reduction through sure independence screening to identify the n/log(n) CpG sites with the largest effect size in the mediator-outcome regression model. Second, the minimax concave penalty is applied to this subset of CpG sites for further dimension reduction [49]. Finally, a joint significance test is conducted to evaluate the significance of the mediation effects using a Benjamini–Hochberg false discovery rate (FDR) correction for multiple testing. The divide-aggregate composite null test (DACT) is a more recent method for high-dimensional mediation analysis that is better powered than HIMA [48]. First, it utilizes the Efron empirical null framework to estimate the proportions of the three null cases across all epigenome mediators [50]. Then, it performs the DACT for the composite null of no mediation effect in three cases in which the M-O effect is non-zero, the E-M effect is non-zero, or both effects are non-zero. Finally, the DACT p value is calculated as the weighted sum of all three p values under the three null hypothesis. We conducted an association analysis for the E-M and M-O associations with robust linear regression using the rlm function to input into DACT [51]. We then pre-filtered CpG sites by two criteria to alleviate the burden of multiple testing. First, we filtered the CpG sites by a p value threshold of 0.05 for the E-M and M-O models. Second, we filtered CpG sites by the direction of the E-M and M-O effects to achieve an overall negative indirect effect, in line with the hypothesized negative total and direct effects. After running DACT for the resulting subset of CpGs, we corrected for multiple testing using the Benjamini–Hochberg FDR method.

Causal mediation analysis (CMA)

Finally, we conducted a CMA for the MRS and significant CpG sites from HDMA as mediators. The mediation R package was used to estimate mediation effects [52]. The mediation package estimates the average causal mediation effect (ACME), the average direct effect (ADE), the total effect (TE), and the proportion mediated (PM) in the population. The 95% confidence intervals were constructed using a nonparametric, bootstrapped, quasi-Bayesian method and two-sided p values were provided for each effect. An α level of 0.05 was used to determine significance [53]. To examine trends across both time points, we cross-validated significant CpG sites identified at 6 months by determining whether they are also significant mediators of the association at 24 months of age.

Secondary analyses

To further support our findings, we conducted follow-up analyses for any CpG sites that were identified as significant mediators. This included gene ontology (GO) functional enrichment analysis using the missMethyl R package [54], blood–brain concordance analysis using the Blood–Brain Epigenetic Concordance (BECon) tool [55], and methylation quantitative trait loci (mQTL) mapping using the GoDMC database [56]. Details are provided in the supplementary methods. Additionally, we compared our findings to previous studies using the publicly available EWAS catalog [57] and conducted further searches on PubMed.

Results

Study population characteristics

This analysis sample included 262 children with DNAm data, genotype data, and other relevant covariates, with a subset of 112 children and 184 children with information on neurodevelopment across five domains at 6 months and 24 months, respectively (Table 1). The sample size at each timepoint varied slightly by neurodevelopmental domain and timepoint (Table 1). This is a cohort of children from African (55%) and mixed (45%) ancestry. There was a high prevalence of maternal smoking with 48% of mothers classified as passive smokers and 30% classified as active smokers. The prevalence of prenatal alcohol use among mothers was 17%. Additionally, 25% of mothers were classified as being above the Beck Depression Inventory-II threshold and 30% being at elevated risk of psychological distress on the Self-Reporting Questionnaire (SRQ-20). The prevalence of mothers with an HIV diagnosis was 24%, but all children in this population remained uninfected. Among the subset of children with information on neurodevelopment at 6 months of age, the prevalence of mothers who smoked and reported alcohol use was higher compared to the whole population with DNAm data and children at 24 months, with 35% of mothers classified as active smokers and 22% of mothers classified as consuming alcohol.

Table 1 Study population characteristics for the whole sample, children with BSID-III measurements at 6 months of age, and children with BSID-III measurements at 24 months of age.

The total effect of prenatal smoking and alcohol exposure on child neurodevelopment

The total effects of PTE and PAE on child neurodevelopment across five domains were estimated using adjusted linear regression (Fig. 1). There was a consistent negative association between PTE and neurodevelopment across all domains at 6 months of age, which was significant for the cognitive domain (β = −0.02; 95% CI: −0.028, 0.002; P = 0.023). There was also a consistent negative association between PAE and neurodevelopment across all domains at 6 months of age, which was significant for the motor domain (β = −9.36; 95% CI: −16.64, −2.08; P = 0.012). No associations were found between PTE and PAE and neurodevelopment at 24 months in this sample (Fig. S2 and Tables S1, S2). Therefore, the subsequent mediation analyses focused on neurodevelopment at 6 months as the primary outcome. Neurodevelopment at 24 months was used as validation for any significant mediators identified at age 6 months.

Fig. 1: Association of PTE and PAE on neurodevelopment domains at 6 months.
figure 1

Models were adjusted for parental SES, maternal depression, maternal psychological distress, gestational age, maternal age, maternal HIV status, cell-type proportions, and the first five genetic principal components. A Association of PTE on neurodevelopment domains, additionally adjusted for PAE. B Association of PAE on neurodevelopment domains, additionally adjusted for PTE.

Mediation analyses with methylation risk scores as mediators

The MRS for PTE was well correlated with prenatal cotinine levels, with the highest R2 being 0.23 at a p value threshold of 5e-21 (Fig. 2A). As evident from Fig. 2B, active smokers had a higher MRS than non-smokers and passive smokers combined (β = 0.96, 95% CI: 0.72, 1.21; P = 4.01E-13). Additionally, there was a strong and significant association between PTE (continuous maternal urine cotinine levels) and the MRS (β = 0.0022; 95% CI: 0.0016, 0.0027; P = 6.7E-15) (Fig. 2C). In contrast, the MRS for PAE was not correlated with alcohol use status, with the highest R [2] being 0.006 at a p value threshold of 5e-06 (Fig. 2D). There was no difference between the MRS for children with and without alcohol exposure (Fig. 2E) and there was no association between PAE and the MRS (β = 5.4e-03; 95% CI: −0.26,0.27; P = 0.97) (Fig. 2F). Because of the poor performance of the PAE methylation risk score, we did not proceed with CMA for PAE using this approach.

Fig. 2: Performance of the MRS for PTE (N = 262) and PAE (N = 262).
figure 2

A Correlation between PTE (continuous maternal urine cotinine levels) and MRS at several p value thresholds. B MRS (p value threshold of 5e-21) for non-/passive smoking mothers and active smoking mothers. C Effect of PTE (continuous maternal urine cotinine levels) on most highly correlated MRS (p value threshold of 5e-21). D Correlation between PAE and MRS at several p value thresholds. E MRS (p value threshold of 5e-06) for non-alcohol consuming mothers and alcohol consuming mothers. F Effect of PAE on most highly correlated MRS (p value threshold of 5e-06).

There was no evidence of mediation by MRS in any domain (Fig. 3 and Table S3). While the cognitive domain showed a significant total effect, a significant indirect effect (ACME) was not found (ACME = 0.006; 95% CI: −3.84E-03, 0.002; P = 0.2). The direct effect was significant, however (ADE = −0.02; 95% CI: −0.04, −0.01; P = 0.01).

Fig. 3: Causal mediation analysis for PTE as the exposure, MRS as the mediator, and neurodevelopment across five domains at 6 months of age as the outcome.
figure 3

Estimated indirect effect, direct effect, and total effect with their corresponding 95% confidence intervals are shown.

HDMA with individual CpG sites as mediators

HDMA was conducted for exposure-outcome combinations for which there was an indication of a total effect: PTE-cognitive, PTE-motor, PAE-motor, and PAE-adaptive behavior. After correcting for multiple testing, DACT identified a total of 191 CpG sites as significant mediators of the association between PTE and neurodevelopment across the cognitive (123 CpG sites) and motor (68 CpG sites) domains. After validating these CpG sites with CMA, we identified 31 CpG sites to have a significant mediating effect (ACME P and TE P < 0.05) between PTE and cognitive development and eight CpG sites to have a significant mediating effect between PTE and motor development (Table 2). Three of these CpG sites overlapped between both the cognitive and motor domains (cg22263591 [SLC39A11], cg25284031 [HVCN1], and cg25857569 [PPM1L]).

Table 2 Significant CpG sites (ACME P and TE P < 0.05) resulting from HDMA and CMA of DNAm as a potential mediator of the association between PTE and neurodevelopmental delay at 6 months of age using the DACT method and after correction for multiple testing (BH FDR ≤0.05). ACME, ADE, TE, and PM effect sizes and 95% confidence intervals were estimated by CMA.

DACT identified 193 CpG sites as significant mediators of the association between PAE and neurodevelopment across the motor (100 CpG sites) and adaptive behavior (93 CpG sites) domains. After conducting CMA, this resulted in 16 CpG sites and 1 CpG site with a significant mediating effect between PAE and the motor and adaptive behavior domains, respectively (Table 3).

Table 3 Significant CpG sites (ACME P and TE P < 0.05) resulting from HDMA and CMA of DNAm as a potential mediator of the association between PAE and neurodevelopmental delay at 6 months of age using the DACT method and after correction for multiple testing (BH FDR ≤0.05). ACME, ADE, TE, PM and corresponding 95% confidence intervals estimated by CMA are shown.

The second HDMA method we tested, HIMA, did not identify any CpG sites as potential mediators of the association between either PTE and PAE and neurodevelopment at 6 months of age after correction for multiple testing.

None of the CpG sites identified as significant mediators for neurodevelopment at 6 months mediated the association between PTE or PAE and neurodevelopment at 24 months (Tables S4, S5).

Secondary analyses

After correction for multiple testing (FDR <0.05), we did not identify any GO terms or KEGG pathways with an overrepresentation of genes containing significantly, differentially methylated CpGs that would indicate an enriched biological pathway. The top GO terms and KEGG pathways for each E-O model are included in the supplement (Table S7a, b).

Of the 31 CpG sites identified as significant mediators of the association between PTE and cognitive development, 19 were associated with at least one mQTL (Table S8). Seven CpG sites exhibited blood–brain concordance in accordance with the criteria listed in the methods (Table S9). CpG sites associated with the genes ALDH1A2 (75–90%), GPR6 (75–90%), and HVCN1 (90%) exhibited the highest percentile mean correlations.

Of the eight CpG sites identified as significant mediators of the association between PTE and motor development, six were associated with at least one mQTL (Table S8). Here, six CpG sites exhibited blood–brain concordance, with the highest percentile mean correlations seen at CpG sites associated with the genes FANCE (75–90%), HVCN1 (90%), ZMIZ1-AS1 (75–90%), and ZNF710 (90%) (Table S9).

For the 16 CpG sites identified as significant mediators of the association between PAE and motor development, six were associated with an mQTL. No CpG sites exhibited blood–brain concordance for the association between PAE and motor development (Table S9).

There were no mQTLs associated with the CpG site identified as a significant mediator of the association between PAE and adaptive behavior development (Table S8). Additionally, the probe did not exhibit blood–brain concordance (Table S9).

Discussion

In this South African cohort, we found a significant negative association (total effect) between PTE and PAE and neurodevelopment of exposed children at 6 months of age, as well as evidence of epigenetic mediation of this association using high-dimensional mediation analysis (DACT). We identified 39 CpGs that mediated the association between PTE and child cognitive (31 CpGs) and motor (eight CpGs) development. We also identified 16 CpG sites and one CpG site that mediated the association between PAE and child motor and adaptive behavior development, respectively. While the PTE MRS was highly predictive of PTE, it did not mediate the association between PTE and neurodevelopment. We did not find any evidence of epigenetic mediation of PTE and PAE and neurodevelopment in children of 24 months of age, most likely due to the absence of a total effect of PTE and PAE on neurodevelopment at 24 months in this sample.

Several studies have investigated the association between PTE and PAE and neurodevelopment and related disorders [58,59,60]. One study (N = 446) conducted in a rural region of China in children with an average age of 15 months found that prenatal exposure to tobacco smoke was associated with neurodevelopmental delay in the cognitive and language domains, assessed using the BSID-III [61]. While we did find evidence of an association between PTE and cognitive development, associations with language development were not statistically significant, although it may be difficult to detect language differences as early as 6 months [43, 62]. Prenatal alcohol exposure has been previously associated with decreased gross motor function, but findings have been inconsistent [63, 64]. For example, a study (N = 1324) that utilized the BSID-III to evaluate the relationship between PAE and gross motor development in infants of 12 months did not find evidence of an association [65]. However, the prevalence of moderate to severe PAE was lower in this cohort than in the DCHS. In our study, we found associations between PTE and PAE and neurodevelopment at age 6 months but not at age 24 months. This is in line with findings from a previous study conducted in a larger subset of the DCHS (N = 734), in which PAE was associated with delayed motor development at 6 months, but not at 24 months [66]. Previous literature on neurodevelopment at 6 months of age is sparse, particularly in association with prenatal exposures. Most studies had an average infant age of at least 12 months. Unlike our present analysis, previous studies have found associations between PTE and PAE and neurodevelopment at or around 24 months of age or older [59, 60, 67].

For example, a study conducted in a Polish birth cohort (N = 461) identified a statistically significant association between prenatal tobacco smoke exposure and delayed development in the cognitive, language, and motor domains at 24 months of age using the BSID-III [68]. A potential explanation for this discrepancy is our relatively small sample size (N = 112 at age 6 months and N = 184 at age 24 months). Several of the CpG sites and associated genes that we identified through HDMA are well known for their association with sustained maternal smoking during pregnancy across two major studies from the PACE consortium [69]. Differential methylation at one of the CpG sites (cg23219570) and four genes (MAD1L1, CAMTA1, LARP4B, FGF23) that were identified as mediators for associations with cognitive development was previously identified in one PACE study which investigated the association of sustained prenatal smoking on changes in cord blood DNAm (450 K) across nine cohorts (N = 5648) [16]. Differential DNAm at two CpG sites (cg02668773 and cg12031962) and seven genes (MAD1L1, MPO, CAMTA1, COL11A2, LARP4B, ALDH1A2, and SLC39A11) in association with cognitive development and one CpG site (cg24671734) and four genes (SKAP2, ZNF710, SLC39A11, and BTBD11) in association with motor development which we identified as mediators were previously identified in a more recent PACE study, which investigated the effects of sustained prenatal smoking on cord blood DNAm (450 K) across 13 cohorts (N = 6685) [14]. It is notable that the genes MAD1L1, CAMTA1, and LARP4B had significantly, differentially methylated CpG sites across both PACE studies and our study.

Four of these genes have further been shown to be associated with cognitive outcomes. Aldehyde dehydrogenase 1A2 (ALDH1A2) is associated with autism spectrum disorder (ASD) in children [70]. The ALDH family of genes plays a key role in the metabolism of vitamin A, an essential molecule for neuronal differentiation and development [71]. Additionally, the identified probe (cg12031962) had high blood–brain concordance. However, there has been inconsistent evidence of an association between maternal smoking and ASD [72,73,74]. Furthermore, brain tissue-based DNAm in two of these genes (MAD1L1 and COL11A2) was associated with ASD, indicating their role in neurodevelopment [75].

Mitotic arrest deficient 1 like 1 (MAD1L1) is a key regulatory gene of the cell cycle. It has been implicated in the etiology of neuropsychiatric disorders such as SZ and BPD in several genome-wide association studies (GWAS) and candidate gene studies, including one study which found altered levels of expression of MAD1L1 in human neural progenitor cells [76,77,78]. The biological mechanisms of both SZ and BPD are linked to changes in the mesolimbic reward pathway and high levels of MAD1L1 expression have been found in brain regions such as the ventral tegmental area [79]. However, the identified probe (cg01258793) did not exhibit blood–brain concordance using the BECon tool. Previous studies have found that maternal smoking was associated with a higher rate of SZ and BPD in offspring, suggesting that differential expression of MAD1L1 may be a plausible pathway that links smoking to neurodevelopmental disorders [80].

Calmodulin-binding transcription activator 1 (CAMTA1) is also involved in cell cycle regulation and is highly expressed in the brain [81]. Studies have found that CAMTA1 plays a role in memory production and recall, dysfunction of which is implicated in disorders such as down syndrome, schizophrenia, and depression [82, 83]. However, these studies have been conducted in older adults, and more research needs to be done to corroborate memory dysfunction in children with differential expression of CAMTA1.

Differential methylation at two CpG sites that we identified as significant mediators of the association between PAE and motor development mapped to SLC18A2 (cg00512279) and HNRNPU (cg02230180) [84]. Homozygous mutations in SLC18A2, a vesicular monoamine transporter, have been implicated in the presentation of infantile movement disorder also known as infantile hypotonic parkinsonian disorder or brain dopamine-serotonin vesicular transport disease [85,86,87]. Mutations in HNRNPU have been associated with infantile spasms [88, 89]. However, there has been little to no research published on prenatal and early life exposures or pediatric movement disorders and related epigenetic modifications.

Methylation risk scores can be used as a potential biomarker for prenatal smoking and alcohol use and to reduce the dimensionality of methylation data, giving us more statistical power to detect associations [44]. Our MRS for prenatal smoking was highly predictive of PTE and is comparable to a previously established MRS for PTE by Reese et al. (R2 = 0.23) and outperforms another established MRS for PTE by Richmond et al. (R2 = 0.14) [45, 90, 91]. However, our MRS for PTE did not significantly mediate the association between PTE and neurodevelopment. One possible explanation is that our MRS algorithm only considers the relationship between exposure and mediator, whereas HDMA considers both the E-M and M-O relationships. Additionally, it is possible that the difference in ancestry between the external studies (primarily European ancestry) and the DCHS (African and mixed ancestry) reduced the prediction accuracy of our MRS for both prenatal smoking and prenatal alcohol use [92].

Our study has a few limitations. First, unlike PTE, which has been extensively studied [14,15,16, 93] in association with DNAm, the association between PAE and DNAm is not well understood, which makes the comparison of our findings to existing literature more challenging. The vast majority of epigenetic studies evaluating the effects of prenatal alcohol exposure are conducted in birth cohorts from HICs and tend to have a low prevalence of PAE [17, 69]. Because of the low prevalence of the exposure, these studies often do not have the statistical power to detect an association. A PACE study that investigated the association between PAE and cord blood DNAm (450 K) across six cohorts (N = 1147) did not find strong evidence of a relationship [17]. This limitation also in part explains the poor performance of our MRS for PAE. Furthermore, although the DCHS has a higher prevalence of both PAE and PTE compared to HIC cohorts, this study had a relatively small sample size at both 6 months (N = 112) and 24 months (N = 184), which may have limited our statistical power to detect associations between PTE and PAE and neurodevelopment across other domains and at 24 months of age.

Another limitation of our study is the possible presence of exposure misclassification bias. Underreporting is a common problem when attempting to ascertain socially undesirable behaviors such as smoking or alcohol consumption during pregnancy. In our case, alcohol use was measured using a self-report questionnaire and smoking was measured via urine cotinine levels. Although using cotinine as a biomarker of smoking is not affected by social desirability bias, it was only measured at one timepoint in this cohort. Exposure misclassification can be an issue in epigenomic mediation analyses in that the DNA methylation estimates may be a more accurate indicator of exposure than the variables available in our cohort, which can lead to an overestimation of the mediation effect, an underestimation of the direct effect, and an increased type I error rate [94]. Because of this potential for bias, our results should be interpreted with caution and methods to correct for misclassification should be considered in future studies.

Furthermore, while blood is the most common tissue used to evaluate the epigenomic mechanisms of a multitude of risk factors and outcomes, it is not yet clear whether blood tissue can be effectively used to evaluate neurological outcomes. Epigenetic signatures are tissue and cell-type specific, and as such, the selection of relevant tissues and cell-type is crucial for epigenetic studies. While brain tissue would be the most relevant tissue to study in association with neurodevelopment, cord blood has the advantage in that it is more easily accessible from living individuals and could therefore be linked to future cognitive outcomes [95]. Furthermore, many of the CpG sites we identified as significant mediators of the association between PTE and neurodevelopment showed good blood–brain concordance (assessed using the BECon tool [55]).

Lastly, many of our significant CpG sites were associated with at least one known mQTL, which is an indicator of the genetic influence on DNAm levels [56]. While we cannot decompose the effects of genetic and environmental factors on DNAm levels, only a proportion of the variation in DNAm levels is explained by genetic effects. In fact, the joint effects of single nucleotide polymorphisms (SNP) and environmental factors have been found to be larger contributors to DNAm variation than SNPs alone [96, 97]. Since the environmental exposures, PTE and PAE, were ascertained before cord blood was collected, and neurodevelopment was ascertained at age 6 and 24 months, reverse causation is unlikely to be a problem in this study. Our study has several strengths. To start, we have data at 6 months of age, which may allow us to infer a causal relationship more confidently because of the short temporality between birth and measurement. Further, while this analysis had a small sample size, the use of novel methods to increase our statistical power allowed us to detect more potentially meaningful associations and is an important strength. To our knowledge, this is one of the first epigenomic studies to employ HDMA. The HIMA method, which utilizes the joint significance test, was potentially too conservative and underpowered to detect significant mediators in this study [48]. DACT has been shown to be more powered than HIMA. The use of methylation risk scores and high-dimensional mediation analysis in the field is still in its infancy. We recognize that more studies will be required to validate the robustness of these methods as the field advances, but this study provides some insight into the utility and potential of HDMA in real-life settings.

Finally, our analysis was conducted on a subset of a well-characterized, underrepresented, birth cohort from an LMIC. We were able to detect associations due to the high prevalence of behavioral risk factors—such as maternal smoking and alcohol consumption- in such vulnerable populations, a characteristic that cohorts from HICs typically do not exhibit. This highlights the importance of conducting large-scale multi-ethnic cohort studies to advance the field of (epi)genomics and reduce health disparities. As Wojcik et al. point out, while the burden of disease disproportionality lies with marginalized populations, the majority of studies are conducted in populations of European ancestry, which limits the field’s ability to investigate the burden of disease in vulnerable communities [98]. As follows, investigating epigenomic associations with health outcomes in a well-characterized cohort from an LMIC allows us to add to the sparse literature and shed light on the importance of prioritizing cohorts like the DCHS.

Overall, these findings suggest that epigenetic changes in the methylome may in part explain the biological mechanisms underlying the relationship between maternal smoking and alcohol consumption during pregnancy and child neurodevelopment across the cognitive, motor, and adaptive behavior domains. Our study provides motivation to conduct larger mediation studies to replicate our findings. Mediation analyses in epigenomics are important for discerning causal mechanisms from exposure to disease. The identification of significant CpG sites could provide novel insights for the early detection of disease and potential prevention targets in translational research and community interventions in at-risk populations.