Main

The aetiology of childhood acute lymphoblastic leukaemia (ALL) is largely unknown, and likely arises from interactions between exogenous and/or endogenous exposures, genetic susceptibility, and chance. Genetic causes of ALL account for a small proportion of cases, and while the disease is usually initiated in utero, other promotional exposures are probably necessary for disease emergence (Greaves et al, 2003). There are two key hypotheses on infections and the development of ALL. Kinlen proposed the ‘population mixing’ hypothesis to describe the observed increased rates of childhood ALL following an influx of migrants into rural areas (Kinlen, 1988, 2012). Briefly, the mixing of rural, isolated individuals with the influx of mostly urban individuals into a rural area would create a localised epidemic of an underlying infection due to the increased level of contact between susceptible and infected individuals that may produce the rare response of ALL. Studies from Kinlen and others have found evidence to support the hypothesis (Kinlen, 1988, 2006, 2012; Alexander et al, 1998; Kinlen and Doll, 2004). The hypothesis suggests a direct pathological role of a specific infection, presumed to be viral, in the development of ALL and that a protective effect may be acquired from previous exposure. Currently, there is limited molecular evidence that implicates a specific infection (Martin-Lorenzo et al, 2015; da Conceicao Nunes et al, 2016). Greaves’ ‘delayed infection’ hypothesis for childhood ALL suggests a two-hit model that emphasises the timing of exposure and the child’s immune system (Greaves, 1997, 2006). The first hit occurs in utero through one’s genetic makeup that produces a pre-leukaemic clone. In a small number of pre-leukaemia carriers, it is the absence of exposure to infections in early life, and a postnatal secondary genetic event caused by a delayed, stress-induced infection (second hit) on the developing, ‘unprepared’ immune system that may increase the risk of childhood ALL. Although the mechanisms differ, both hypotheses suggest that ALL is a rare response to one or more common infections acquired through personal contact.

The difficulties in measuring exposure to infectious agents and subsequent responses make it challenging to directly test the hypotheses, especially since no specific leukaemogenic agent has been identified. Several previous epidemiological studies have used a history of infections as an indicator for early exposure to infections. Establishing the timing of the infections is critical to testing the hypotheses; however, birth cohort studies are not feasible given the rarity of childhood ALL. Thus, most studies used a case–control design and interviews to measure infections. Assessing a history of infections through interviews can be problematic due to the potential for recall bias and misclassification of children who had asymptomatic infections (Simpson et al, 2007). Other methods for measuring infections such as using administrative data overcome these limitations, but may lack information on important confounders. Other than narrative summaries (McNally and Eden, 2004; Buffler et al, 2005; Ma et al, 2009; Maia Rda and Wunsch Filho, 2013), no study has attempted to synthesise and quantitatively pool studies examining the relationship using a history of infections, or tried to explain the differences between the studies. The aim of this systematic review and meta-analysis was to assess the relationship between childhood infections, and the development childhood ALL by summarising the findings for an overall measure of infections, the frequency, severity, timing of infections, and examining specific infectious agents and syndromes.

Materials and methods

The Meta-analysis of Observational Studies in Epidemiology (MOOSE) was developed as a guideline for the reporting of meta-analyses of observational studies in epidemiology and was used for the current study (Stroup et al, 2000).

Data sources and searches

We performed electronic searches from inception to 21 February 2017 in Ovid MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, Web of Science (Science Citation Index Expanded, Social Sciences Citation Index, Conference Proceedings Citation Index for both Science and Social Science & Humanities), and Scopus. Supplementary Table 1 shows the search strategies used. Text words used included acute lymphoblastic leukaemia, acute leukaemia, infection, virus, and bacteria. We limited the search to subjects 0–19 years old, and did not restrict the search by language. References of the included studies were searched, and the first four pages of a Google search using the same keywords were used to search for grey literature.

Study selection

We defined the inclusion and exclusion criteria a priori as studies of any design excluding editorials, reviews, and case reports. Studies were included if: (1) the primary exposure of interest included a prior history of any infection before the diagnosis of childhood ALL; (2) the primary outcome of interest was defined as clinically diagnosed ALL in children aged 19 years; (3) comparisons were made against a control or comparison group; and (4) testing samples must have been collected and assessed prior to treatment, if laboratory investigations were used to determine past infections. Infections must have been reported by the parent or guardian, or obtained through other data sources such as medical records.

We excluded studies based on the following order: (1) definition for infections was not at the individual level, for example, at an ecological level that examines infections aggregated for a region; (2) definition for infections that examined population mixing; (3) infections were not explicitly infections during childhood (e.g., infections during pregnancy); (4) outcomes was not childhood ALL in children aged 19 years; (5) absence of a comparison group; (6) it was a review article; and (7) duplicate publication with the same study population. When more than one publication from a study was available, the most recent version, or the version with the exposure or outcome of interest that was closest to the objectives of this review was included. Studies were not restricted by publication status, and relevant studies in other languages were translated.

Two reviewers (JH and CT) independently evaluated the titles and abstracts of publications identified by the search strategy, and any publication thought to be potentially relevant by either reviewer was retrieved in full. Final inclusion of studies in the systematic review was determined by agreement of both reviewers. Agreement between reviewers was evaluated using the kappa statistic (κ). Strength of agreement was defined as slight (κ=0.00–0.20), fair (κ=0.21–0.40), moderate (κ=0.41–0.60), substantial (κ=0.61–0.80), or almost perfect (κ=0.81–1.00) (Landis and Koch, 1977).

Data extraction and quality assessment

Data extraction was conducted in duplicate (JH and CT) using a standard form, which collected information on: the primary exposure of ‘common infections’, defined as any infection occurring from birth to the diagnosis of ALL; secondary exposures of infection frequency, severity of infections; and study design, region, publication era, and source of controls. In studies that used laboratory investigations for identification of infectious agents, we extracted IgG antibody estimates to represent past infections, and if that was not available, the polymerase chain reactions (PCR) method was extracted to assess for the presence of the agent. We extracted infections occurring in the first year of life or similar time windows in cases with multiple time windows, as we felt this best represented early exposure to infections. We extracted infection frequency levels for common infections, and defined severity based on admission to hospital. The adjusted models that incorporated the most confounders for our primary outcome ALL were extracted. Authors were contacted for further information regarding results that were not presented. Five authors were contacted (Nishi and Miyake, 1989; Schlehofer et al, 1996; Neglia et al, 2000; Rosenbaum et al, 2005; MacArthur et al, 2008), and three responded with no additional information (Nishi and Miyake, 1989; Neglia et al, 2000; Rosenbaum et al, 2005).

Study quality was assessed using the Meta Quality Appraisal Tool (MetaQAT) (Rosella et al, 2015) and the Critical Appraisal Skills Programme (CASP) for case–control (Programme CAS, 2014a), and cohort studies (Programme CAS, 2014b). Two reviewers (JH and CT) assessed each study. For case–control studies, we considered CASP scores of 1–3, 4–6, and 7–9 to be high, moderate, and low-risk of bias, respectively; for cohort studies, we considered CASP scores of 1–4, 5–8, and 9–11 to be high, moderate and low-risk of bias, respectively.

Data synthesis and analysis methods

Our analysis combined data at the study level. Our primary analysis sought to assess exposure to common infections vs no common infections (referent group) on the risk of developing ALL, relying on each study’s definition. The most frequent infection was used when studies did not report a common infection variable. We used the adjusted odds ratio (OR) or rate ratio (RR) to calculate a pooled overall effect, and assumed OR and RR were equivalent due to the rarity of the outcome (Greenland, 1987); ORs or RRs <1 suggest infections are protective against ALL. If a study presented multiple frequency categories, we used the lowest vs the highest category, a method commonly used in meta-analyses (Bae, 2016). The method described by Greenland was used to calculate the variance using the reported 95% confidence intervals (CI) (Greenland, 1987). We calculated a crude OR for studies not reporting one, and to facilitate the calculation we added 0.5 to all cells if one of the four cells reported a zero (Gart and Nam, 1988). In secondary analyses, we used the different exposure levels of infection to compute a regression slope (Greenland and Longnecker, 1992). If an exposure level was defined using a range, we used the midpoint of the range (e.g., 1–3 infections was assigned a frequency of 2), and if the level was 4, we assigned a frequency of 4. For infection severity, a dichotomous variable (yes vs no) was used to determine the relationship with ALL. Post hoc analyses examined the timing of infections in the first year of life compared to infections that occurred after the first year of life, and putative infectious agents was conducted if 3 studies reported the agent.

As we anticipated heterogeneity between the studies, we used an inverse variance weighted average, random-effects model where the Wald-type tests and confidence intervals were estimated under a normal distribution (DerSimonian and Laird, 1986). We investigated potential sources of heterogeneity using subgroup analyses and mixed-effects meta-regression. To examine the association of study-level characteristics and infection effect, we fitted mixed-effects meta-regression models to the natural logarithm of the OR. The natural logarithm of the OR was assumed to have a normal distribution, and a method-of-moments-based estimator to estimate model variables. The mixed-effects model included fixed effects for the covariates, and a random intercept term was specified to model residual heterogeneity not accounted for by the covariates. We corrected for multiple testing using a Bonferroni correction that divides the P-value by the number of tests (Lagakos, 2006). Because of methodological differences (Wiemels, 2012), we tested for interactions to assess the differences between studies that used administrative/medical records, self-reported, and laboratory investigation data (Altman and Bland, 2003). We stratified infections in the first year of life by self-reported data and administrative/medical records data. We explored clinical heterogeneity by conducting a subgroup analysis limiting cases of ALL to B-cell precursor ALL (Wiemels, 2012). We also explored the extent to which region (North America, Europe, Asia, or other), publication era (1999, 2000–2009, 2010), source of controls (general population, general practitioner list, or hospital controls), and risk of bias influenced the magnitude of the average effect estimate in the meta-analysis. Publication bias was assessed by funnel plot and the Egger’s test (Egger et al, 1997; Peters et al, 2008). The meta-analysis was performed using the metafor package in R, version 3.3 (Viechtbauer, 2010).

Results

Titles and abstracts of 9445 records were reviewed and 314 full-text articles were retrieved (Figure 1). There were 39 studies that satisfied the inclusion criteria (Till et al, 1979; van Steensel-Moll et al, 1986; Nishi and Miyake, 1989; Schlehofer et al, 1996; Dockerty et al, 1999; McKinney et al, 1999; Schuz et al, 1999; Neglia et al, 2000; Mackenzie et al, 2001; Petridou et al, 2001; Chan et al, 2002; Perrillat et al, 2002; Salonen et al, 2002; Kerr et al, 2003; Canfield et al, 2004; Jourdan-Da Silva et al, 2004; Ma et al, 2005; Rosenbaum et al, 2005; Surico and Muggeo, 2005; Loutfy et al, 2006; Paltiel et al, 2006; Zaki et al, 2006; Roman et al, 2007; Cardwell et al, 2008; MacArthur et al, 2008; Flores-Lujano et al, 2009; Tesse et al, 2009; Rudant et al, 2010; Zaki and Ashray, 2010; Mahjour et al, 2010; Ahmed et al, 2012; Chang et al, 2012; Vestergaard et al, 2013; Ibrahem et al, 2014; Ajrouche et al, 2015; Lin et al, 2015; Rudant et al, 2015; da Conceicao Nunes et al, 2016; Ateyah et al, 2017), and of those, 38 were included in the meta-analysis. One study did not report infections and the effect estimate could not be calculated (Paltiel et al, 2006). The reviewers had almost perfect agreement on the articles for inclusion (κ=0.85, 95% CI: 0.75, 0.95). Characteristics of the included studies are presented in Table 1. The exposure definitions are presented in Supplementary Table 2. The reviewers had moderate agreement on the judgement of the risk of bias for each study (κ=0.50, 95% CI: 0.28, 0.72). Thirteen studies were judged as being low-risk of bias, 7 as being moderate-risk of bias, and 19 as being high-risk of bias (Supplementary Table 3a and b). We found evidence of publication bias (bias coefficient=1.19, 95% CI: 0.30, 2.08; Supplementary Figure 1).

Figure 1
figure 1

Study selection flow diagram.

Table 1 Characteristics of the included studies and associated references

Our analysis included 12 496 children with ALL and 2 356 288 children without ALL. There was no association between infections and ALL, OR=1.10, 95% CI: 0.95, 1.28; P=0.187 (Figure 2). We observed considerable heterogeneity between the studies (I2=76.5%; Q-statistic P<0.001). The trend analysis included 13 studies and we did not find frequency of infections to be associated with ALL (OR=1.00, 95% CI: 0.95, 1.05; P=0.967). In the four studies that assessed the infection severity, the combined average effect of hospitalisations for infections was not associated with ALL (OR=1.22, 95% CI: 0.85, 1.75; P=0.239). Infections that occurred in the first year of life was not associated with ALL (OR=0.99, 95% CI: 0.85, 1.16, P=0.920). Infections that occurred after the first year of life suggested an association with ALL (OR=1.45, 95% CI: 0.71, 2.96, P=0.313), but did not differ compared to infections in the first year of life (interaction effect OR=0.69, 95% CI: 0.32, 1.43, P=0.314) (Supplementary Figure 2). Parvovirus B19 (OR=2.69, 95% CI: 1.16, 6.22, P=0.020) was found to be associated with ALL (Figure 2). No associations were observed for human herpesvirus-6 (OR=0.89, 95% CI: 0.42, 1.87, P=0.752), however Epstein–Barr virus (OR=1.39, 95% CI: 0.83, 2.33, P=0.208), cytomegalovirus (OR=1.95, 95% CI: 0.64, 5.96, P=0.242), influenza (OR=1.97, 95% CI: 0.97, 3.98, P=0.061), and herpes simplex virus (OR=2.04, 95% CI: 0.66, 6.23, P=0.214) showed a strong association to ALL, but lacked precision. Varicella, rubella, mumps, measles, and pertussis were not associated with ALL (Supplementary Figure 3).

Figure 2
figure 2

Random-effects model examining the association between common infections and risk of childhood acute lymphoblastic leukaemia. CI represents confidence interval. Common infections are reported as a two-class variable, or highest vs lowest in more than two categories. The secondary analysis for frequency of infections is a combined maximum likelihood effect estimate that estimates a trend from summarised dose–response data. The presence of parvovirus B19 was measured as a dichotomous variable, presence of IgG antibodies vs no IgG antibodies for parvovirus B19. All other studies, the reference was no infections.

Subgroup and sensitivity analyses

After applying the Bonferroni correction, the P-value to indicate statistical significance for the additional analyses was <0.005. The data sources for the studies can be found in Table 1. Among the studies that used self-reported data, we found no association between infections and ALL (OR=0.89, 95% CI: 0.79, 1.00, P=0.049; I2=50.5%). Among studies that used administrative/medical record data, we found no association between infections and ALL (OR=1.00, 95% CI: 0.61, 1.63, P=0.994; I2=90.8%). Among studies that used laboratory data, we found infections to be associated with ALL (OR=2.42, 95% CI: 1.54, 3.82, P<0.001, I2=54.2%). The interaction effect showed no difference between self-reported and administrative/medical records data sources (OR=0.89, 95% CI: 0.54, 1.48, P=0.656). Infections identified through laboratory data increased the risk of ALL compared to infections captured through self-reported data (interaction effect OR=2.73, 95% CI: 1.71, 4.36, P<0.001), but not administrative/medical records data sources (interaction effect OR=2.43, 95% CI: 1.24, 4.75, P=0.009). Among studies that used self-reported data, every additional infection reduced the odds of ALL by 4% (OR=0.96, 95% CI: 0.94, 0.98; P<0.001), whereas among studies that used administrative/medical records data, every additional infection increased the odds of ALL by 11% (OR=1.11, 95% CI: 1.07, 1.15; P<0.001). We found self-reported and administrative/medical records data sources qualitatively differed in the frequency of infections (interaction effect OR=0.86, 95% CI: 0.83, 0.90, P<0.001). Severity of infections remained unchanged in studies with self-reported data (OR=1.51, 95% CI: 0.86, 2.65; P=0.158; I2=70.2%). Among self-reported studies, infections in the first year of life suggested a protective effect against ALL (OR=0.88, 95% CI: 0.80, 0.98, P=0.017). No association was found between infections in the first year of life and ALL among administrative/medical records data (OR=0.93, 95% CI: 0.55, 1.56, P=0.775), and did not differ from self-reported studies (interaction effect OR=0.95, 95% CI: 0.56, 1.62, P=0.862).

The results from our primary analysis remained unchanged when we restricted the analysis to B-cell precursor ALL or B-cell common ALL (OR=0.87, 95% CI: 0.77, 0.98, P=0.022). Meta-regression models that assessed study level characteristics included data source, region, publication era, source of controls, and risk of bias. Data source and region accounted for the largest proportion of heterogeneity between the studies (R2=47.2%, see Supplementary Table 4). Stratification by risk of bias indicated studies of low-risk of bias showed similar results to our main analysis (OR=0.92, 95% CI: 0.76, 1.10, P=0.349), whereas studies of moderate-to-high-risk of bias suggested infections increased the risk of ALL (OR=1.45, 95% CI: 1.12–1.86, P=0.005). Compared to studies of moderate-to-high-risk of bias, studies of low-risk of bias were more likely to suggest infections were protective against ALL (OR=0.63, 95% CI: 0.46, 0.87, P=0.004).

Discussion

In this systematic review of 39 studies, we found no association between any common infections, frequency, severity of infections, and timing of infections and childhood ALL. We did however, find a qualitative difference in our subgroup analyses; infections increased the odds of developing ALL by 2.4-fold in studies with laboratory investigations. Further, infections identified through laboratory investigations increased the odds of ALL by 2.7-fold and 2.4-fold compared to infections identified through self-reported and administrative/medical records data, respectively. Among studies that used self-reported data, we found each additional infection reduced the odds of ALL by 4%, and this differed significantly from studies that used administrative/medical records data that suggested each additional infection increased the odds of ALL by 11%. The heterogeneity between the studies remained a challenge and could partly be explained by differences in the data sources.

We failed to demonstrate an association in our primary analysis, but found associations in our secondary and subgroup analyses by data source. There are three plausible explanations for the observed findings. First, the apparent results may be a chance finding from multiple testing. Second, the ascertainment of infections from parental recall has been shown to under-report childhood infections and may be inaccurate in both the timing and occurrence of infections, compared to medical records (McKinney et al, 1991; Simpson et al, 2007). Despite these potential issues, studies that confirmed the self-reported infections with medical records for accuracy and completeness still found an inverse association (Dockerty et al, 1999; Ajrouche et al, 2015). Although studies that used medical records were void of recall bias, they were often unable to include other important confounders, such as ethnicity, parental occupation, maternal age, birth weight, and parity (Dockerty et al, 2001; Hjalgrim et al, 2004; Ma et al, 2005; Lim et al, 2014). Finally, the findings from the laboratory studies must be interpreted with caution due to the study quality, and smaller sample sizes and larger effect sizes as shown by the asymmetry of the funnel plot.

The mutational mechanisms of ALL point to three potential pathways: (1) anomalies in lineage-specific factors (ETV6-RUNX1, IKZF1, and PAX5); (2) flaws in receptor protein tyrosine kinases and their down-stream pathways; and (3) epigenetic modifiers (Whitehead et al, 2016). Recent developments in genome and mouse model studies may change our initial understanding of the aetiology of ALL as new studies have generated new hypotheses with respect to identifying potential infectious candidates (Martin-Lorenzo et al, 2015; Swaminathan et al, 2015). The presence of parvovirus B19 IgG antibodies is associated with the presence of ETV6-RUNX1 (Ibrahem et al, 2014), and is associated with certain class II HLA alleles that are risk factors for the development of childhood ALL. Furthermore, parvovirus B19 has certain characteristics similar to other oncoviruses, that is, its DNA genome persists indefinitely in human tissues following acute infection, causing mild or no disease, and upregulates pro-inflammatory cytokines associated with ALL onset (Kerr and Mattey, 2015). The results from the small laboratory studies will require confirmation in larger population studies. Since half of 15-year-old adolescents have specific antiparvovirus B19 antibodies (Young and Brown, 2004), the measurement of the clinical syndromes caused by parvovirus B19 may be preferred to assess manifestations of the pathogen. Parvovirus B19 infection may provide only a subset of an oncogenic hit in a multistep carcinogenesis process.

The qualitative differences in our findings support the hypothesis of an alternative pathway for ALL development. Recent qualitative reviews have attempted to explain the positive association between infections and ALL and suggested studies that used medical records or administrative data may be capturing children with an earlier than expected altered immune system. These children may respond differently to infections, have a greater propensity to seek medical care when infections are contracted, and/or have a stronger immune response (Wiemels, 2012; Whitehead et al, 2016). The sensitivity to infections may be due to a lack of immunomodulation from lower levels of anti-inflammatory cytokine interleukin-10 in newborns who later go on to develop ALL (Chang et al, 2011).

As in previous reviews, there continues to be substantial heterogeneity among the studies; however, our review focuses on specific objectives and highlights the recent developments of the field (McNally and Eden, 2004; Buffler et al, 2005; Greaves, 2006; Ma et al, 2009; Maia Rda and Wunsch Filho, 2013). There are several limitations of this study. The heterogeneity between the studies in the definition of infections, the time period to observe the infections, and the evidence of publication bias was a challenge. We decided to use any common infection as our main exposure variable in the primary analysis because we felt it to be the most appropriate measure that reflects the hypotheses from Kinlen and Greaves (Kinlen, 1988; Greaves, 2006). The heterogeneity likely stems from the unknown aetiology of ALL, and one that requires further research. The limitation with laboratory investigation studies is the inability to disentangle temporality. The presence of the infectious agent was assessed after a diagnosis of ALL was made and it is unknown if the agent was present before or after the onset of ALL. It is unclear whether the infection occurred before the onset of ALL, or if the potentially reduced immune function because of ALL contributed to the contraction of specific infections. Further, the laboratory studies were appraised as high-risk of bias, often small, and may not be generalisable. Despite the differences in the risk of bias amongst the included studies, our conclusions were unchanged after we stratified the analysis to the 13 studies with a low-risk of bias. Another limitation was the quality of reporting in the studies included in the review. Most studies clearly reported their findings, but studies published earlier tended to have incomplete reporting.

Costs and feasibility are the usual barriers to establishing new large pregnancy and birth cohorts (Riley and Duncan, 2016), research groups have instead combined existing cohorts to study childhood cancers (Brown et al, 2007; Metayer et al, 2013) and other diseases (Larsen et al, 2013). The increased power may help to identify high risk or vulnerable, and understudied populations. The next step should focus on the measurement of infections and infectious exposures. The use of linked administrative data provides a large population for study with accurate information on the timing of physician diagnosed infections, frequency, and severity of infections as answers to these questions remain elusive. Enhancing the administrative data with surveys to obtain other infectious exposures such as day-care attendance, breastfeeding, or by applying emerging technologies that detect and quantify the pathogen burden with greater speed, accuracy, and simplicity (Caliendo et al, 2013) in a subset sample would improve the accuracy and strengthen the measurement of infections. Day-care attendance has been found to increase the risk of exposure to infections, and has been used as a proxy for infections. A meta-analysis found day-care attendance reduced the risk of childhood ALL (Urayama et al, 2010). Breastfeeding has been found to reduce the risk of ALL through its immunologically active components, antibodies, and other elements that influence the development of the infant’s immune system (Kwan et al, 2004; Martin et al, 2005; Amitay and Keinan-Boker, 2015). The challenge will be to disentangle the mechanistic pathways of the infectious aetiology hypothesis by combining different measurements of infectious exposures to determine the total, direct, and indirect effect of infections on the risk of developing childhood ALL.

An infectious aetiology of ALL is suggestive in our study; however, the challenges in measuring infections must be addressed. Parvovirus B19 as a putative causal infectious agent for childhood ALL needs to be tested in larger cohorts and the rather substantial point estimates from influenza, cytomegalovirus, and herpes simplex virus warrant a follow-up in larger studies. Whether children with ALL have a dysregulated immune function present at birth requires further investigation. Only one study conducted an exploratory assessment on a key aspect of Greaves’ hypothesis, the timing of the infections in early life (Crouch et al, 2012). Our future research aims to provide further insight on the timing of infections and the risk of developing childhood ALL. The use of administrative data or medical records with linked laboratory data would overcome the challenges facing studies that used self-reported and laboratory investigation data, and would be ideal to evaluate the association between childhood ALL and the timing and frequency of infections. The review has highlighted knowledge gaps surrounding the relationship between childhood ALL and severity of infections. The causal association of infections will need to be tested in conjunction with other identified risk factors to quantify the direct and indirect interaction and mediated effect of infections on ALL risk. These will be critical research questions in discovering the causes of childhood ALL and will be the foundation for future studies that can combine epidemiologic, genetic, and environmental factors.