Original articleApplication of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women
Introduction
Spontaneous preterm birth (sPTB) is a leading cause of perinatal mortality and child morbidity globally [1], occurring disproportionately in the United States among non-Hispanic (NH) black women (16.3%) as compared to NH white women (10.2%) [2]. Although prior preterm birth is recognized as a strong predictor for future preterm birth among multiparous women (women having at least one prior birth) [3], few indicators of risk for preterm birth exist in nulliparous women (women without a prior birth). A noninvasive screening measure to quantify the risk for sPTB among nulliparous women at the onset of pregnancy could help guide clinical decisions regarding proper pregnancy surveillance and management.
Maternal demographic [4], [5] and health characteristics [6] early in pregnancy have lacked power to predict individual risk despite their statistical association with PTB at a population level. The screening performance of traditional regression algorithms that include these factors are only a little better than chance, with an area under the receiver operating characteristic curve (AUC) on the order of 0.61–0.63 for predicting early sPTB (<32 weeks) [4]. The addition of maternal biomarkers (e.g., cervical length, fetal fibronectin, and maternal serum analytes) improves prediction but only marginally (e.g., AUCs of 0.67 and 0.70) [7], [8]. The application of machine-learning techniques may uncover more sensitive and specific screening algorithms than is possible with traditional methods. It is becoming increasingly common to use machine-learning algorithms to learn from “big data” for more reliable predictions. Many popular machine-learning algorithms relax the modeling assumptions and restrictions of traditional regression (e.g., factors are linearly associated to the risk of PTB) by including semiparametric and nonparametric models (e.g., nonlinear models or decision trees) and parametric models with multiple higher-order terms (e.g., squared terms or two-way interactions). Disparate machine-learning algorithms are easily compared using cross-validation, a technique used to assess how well a prediction model performs outside of the sample with which the model was fit. The original sample is partitioned into a training set to fit the model and a testing set to evaluate the goodness of the fit. In this way, the generalizability of results is assessed while minimizing the risk of overfitting the model to the available data.
In this article, we used machine-learning methods to explore how well a set of demographic and maternal characteristics that could be known at the first prenatal visit could predict early sPTB and whether prediction differed for NH black and white women. We restricted the analysis to nulliparous women, for whom the etiologies of early sPTB are perhaps the least well understood [9]. We assessed whether prediction could be improved by overlaying information about residential census tract, including indicators of social disadvantage (e.g., poverty) [10] and of air, water, and soil pollutants (from the CalEnviroScreen 2.0) [11]. Finally, we tested for statistically significant differences in the magnitudes of association of factors with early sPTB for NH black versus white women that might help explain the black-white disparity in prevalence of early sPTB.
Section snippets
Study population
The study's source population consisted of ∼2.7 million singleton births in the state of California from 2007 to 2011 with individual-level birth records obtained from Vital Statistics for the State of California. Birth record data were linked with hospital discharge ICD-9 codes at the time of delivery from the Office of Statewide Health Planning and Development (OSHPD). The algorithm used to assemble these data is accurate and has been described previously [12]. Just under one million of
Results
The demographic and maternal characteristics included in the models are shown in Table 1 separately for NH black (n = 54,084) and NH white (n = 282,130) nulliparous women with nonimputed data. Early sPTB occurred 2.6 times more frequently among NH blacks as compared to whites. NH black women were more than twice as likely to have used MediCal for payment of the delivery as white women (54.6% vs. 22.7%). In addition, NH black women had lower educational attainment than white women (14.6% vs.
Discussion
Prediction models using machine-learning methods with population-based cohort data were marginally predictive of early sPTB as compared to term birth and performed similarly for NH black and NH white women in California. Predictive power improved when the two race-ethnicity groups were combined, such that the AUC values surpassed those reported by others [4] and approached those combining maternal characteristics with biological markers (e.g., serum analytes) [7], [8]. Census tract-level
Acknowledgments
We are grateful for technical and data support from Jonathan A. Mayo and John W. Oehlert. This work was funded by the Stanford Child Health Research Institute and Stanford NIH-NCATS-CTSA (grant no. UL1 TR001085) and the March of Dimes Prematurity Research Center at Stanford (MOD PR625253). The funding sources had no involvement in study design, analysis and interpretation of data, writing of the report, or in the decision to submit the article for publication.
References (25)
- et al.
The epidemiology, etiology, and costs of preterm birth
Semin Fetal Neonatal Med
(2016) - et al.
The preterm prediction study: effect of gestational age and cause of preterm birth on subsequent obstetric outcome
Am J Obstet Gynecol
(1999) - et al.
Racial Disparities in Preterm Birth
Semin Perinatol
(2011) Psychiatric and substance use disorders as risk factors for low birth weight and preterm delivery
Obstet Gynecol
(2002)- et al.
A proposed method to predict preterm birth using clinical data, standard maternal serum screening, and cholesterol
Am J Obstet Gynecol
(2013) - et al.
Vital Statistics Linked Birth/Infant Death and Hospital Discharge Record Linkage for Epidemiological Studies
Comput Biomed Res
(1997) - et al.
The preterm prediction study: maternal stress is associated with spontaneous preterm birth at less than thirty-five weeks' gestation
Am J Obstet Gynecol
(1996) - et al.
Born Too Soon: The global epidemiology of 15 million preterm births
Reprod Health
(2013) - et al.
Maternal and biochemical predictors of spontaneous preterm birth among nulliparous women: a systematic analysis in relation to the degree of prematurity
Int J Epidemiol
(2006) - et al.
Predictive accuracy of serial transvaginal cervical lengths and quantitative vaginal fetal fibronectin levels for spontaneous preterm birth among nulliparous women
J Am Med Assoc
(2017)
Spontaneous preterm birth, a clinical dilemma: Etiologic, pathophysiologic and genetic heterogeneities and racial disparity
Acta Obstet Gynecol Scand
2007 – 2011 American community survey
Cited by (48)
Artificial intelligence in pediatrics
2023, Artificial Intelligence in Clinical Practice: How AI Technologies Impact Medical Research and ClinicsPredicting the risk of spontaneous premature births using clinical data and machine learning
2022, Informatics in Medicine UnlockedCitation Excerpt :However, most of these studies have focused on individual associations between these risk factors and sPTB rather understanding how the totality of life experiences and conditions affect sPTB risk as they interact and potentially amplify one another [14–19]. The increasing use of artificial intelligence to identify patterns, learn from experience, and make decisions has motivated recent investigations into preterm birth prediction [2,19–25]. Datasets from clinical cohorts containing demographic and maternal characteristics of pregnant women have been fed into various algorithms to predict the occurrence of sPTB.
Improving dengue fever predictions in Taiwan based on feature selection and random forests
2024, BMC Infectious DiseasesApplied Machine Learning Algorithms for Classifying Clinical Datasets Based on Pre-Term Premature Birth
2024, AIP Conference Proceedings