Abstract
Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.
-
Research ethics: Ethics approval is not required as this research do not include any studies on human and animals.
-
Author contributions: All authors involved in drafting the manuscript or riving it critically for intellectual content, and all authors approved the final version to be published. Study concept and design: Pokharel. Modelling and data analysis: Peitch and Pokharel. Interpretation of the results: Pokharel, Peitch and Hossain.
-
Competing interests: Authors have nothing to declare.
-
Research funding: This work was funded by Natural Sciences and Engineering Research Council of Canada (NSERC), and NSERC Undergraduate Student Research Award (NSERC-USRA).
-
Data availability: The data we utilized was obtained from the United Kingdom Government, Department of Environment. Food and Rural Affairs (DEFRA). Wo do not have permission to redistribute the data. If readers wish to obtain the data, they may contact Dr. Pokharel or DEFRA directly at defra.helpline@defra.gsi.gov.uk.
References
1. Nuzzo, J, Moss, B, Watson, C, Rutkow, L, Garibaldi, B, Blauer, B, et al.. Johns hopkins coronavirus resource center covid-19 dashboard. Baltimore, Marylands, USA: School of Medicine, Johns Hopkins University; 2022.Search in Google Scholar
2. Chis Ster, I, Ferguson, N. Transmission parameters of the 2001 foot and mouth epidemic in great britain. PLoS One 2007;2:e502. https://doi.org/10.1371/journal.pone.0000502.Search in Google Scholar PubMed PubMed Central
3. Deardon, R, Brooks, S, Grenfell, T, Keeling, M, Tildesley, M, Savill, N, et al.. Inference for individual-level models of infectious diseases in large populations. Stat Sin 2010;20:239–61.Search in Google Scholar
4. Kwong, GPS, Deardon, R. Linearized forms of individual-level models for large-scale spatial infectious disease systems. Bull Math Biol 2012;74:1912–37. https://doi.org/10.1007/s11538-012-9739-8.Search in Google Scholar PubMed
5. Pokharel, G, Deardon, R. Gaussian process emulators for spatial individual-level models of infectious disease. Can J Stat 2016;44:480–501. https://doi.org/10.1002/cjs.11304.Search in Google Scholar
6. Malik, R, Deardon, R, Kwong, GPS. Parameterizing spatial models of infectious disease transmission that incorporate infection time uncertainty using sampling-based likelihood approximations. PLoS One 2016;11. https://doi.org/10.1371/journal.pone.0146253.Search in Google Scholar PubMed PubMed Central
7. Almutiry, W, Deardon, R. Incorporating contact network uncertainty in individual level models of infectious disease using approximate bayesian computation. Int J Biostat 2020;16:20170092. https://doi.org/10.1515/ijb-2017-0092.Search in Google Scholar PubMed
8. Nsoesie, E, Beckman, R, Marathe, M, Lewis, B. Prediction of an epidemic curve: a supervised classification approach. Stat Commun Infect Dis 2011;3. https://doi.org/10.2202/1948-4690.1038.Search in Google Scholar PubMed PubMed Central
9. Pokharel, G, Deardon, R. Supervised learning and prediction of spatial epidemics. Spat Spatio-temporal Epidemiol 2014;11:57–77. https://doi.org/10.1016/j.sste.2014.08.003.Search in Google Scholar PubMed
10. Augusta, C, Deardon, R, Taylor, G. Deep learning for supervised classification of spatial epidemics. Spat Spatio-temporal Epidemiol 2019;29:187–98. https://doi.org/10.1016/j.sste.2018.08.002.Search in Google Scholar PubMed
11. Liu, Z, Deardon, R, Fu, Y, Ferdous, T, Ware, T, Cheng, Q. Estimating parameters of two-level individual-level models of the COVID-19 epidemic using ensemble learning classifiers. Front Phys 2021;8:602722. https://doi.org/10.3389/fphy.2020.602722.Search in Google Scholar
12. Hughes, G, McRoberts, N, Madden, L, Nelson, SC. Validating mathematical models of plant disease progress in space and time. Math Med Biol: J IMA 1997;14:85–112. https://doi.org/10.1093/imammb/14.2.85.Search in Google Scholar
13. Breiman, L. Random forests. Mach Learn 2001;45:5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar
14. Hastie, T, Tibshirani, R, Friedman, J. The elements of statistical learning, 2 edn New York, USA: Springer; 2009.10.1007/978-0-387-84858-7Search in Google Scholar
15. Liaw, A, Wiener, M. Classification and regression by randomforest. R News 2002;2:18–22.Search in Google Scholar
16. Friedman, JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001;29:1189–232. https://doi.org/10.1214/aos/1013203451.Search in Google Scholar
17. Li, P. Robust logitboost and adaptive base class (abc) logitboost; 2010. https://doi.org/10.48550/arXiv.1203.3491.Search in Google Scholar
18. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. New York, NY, USA: ACM; 2016:785–94 pp.10.1145/2939672.2939785Search in Google Scholar
19. Pokharel, G, Deardon, R. Emulation-based inference for spatial infectious disease trans- mission models incorporating event time uncertainty. Scand J Stat 2022;49:455–79. https://doi.org/10.1111/sjos.12523.Search in Google Scholar
20. Park, HS, Jun, CH. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 2009;36:3336–41. https://doi.org/10.1016/j.eswa.2008.01.039.Search in Google Scholar
21. Mahsin, MD, Deardon, R, Brown, P. Geographically dependent individual-level models for infectious diseases transmission. Biostatistics 2020;23:1–17. https://doi.org/10.1093/biostatistics/kxaa009.Search in Google Scholar PubMed
22. Ward, C, Deardon, R, Schmidt, A. Bayesian modeling of dynamic behavioral change during an epidemic; 2022. https://doi.org/10.48550/arXiv.2211.00122.Search in Google Scholar
23. Lawson, AB, Onicescu, G, Ellerbe, C. Foot and mouth disease revisited: Re-analysis using bayesian spatial susceptible-infectious-removed models. Spat Spatio-temporal Epidemiol 2011;2:185–94. https://doi.org/10.1016/j.sste.2011.07.004.Search in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2023-0102).
© 2024 Walter de Gruyter GmbH, Berlin/Boston