Skip to content
Licensed Unlicensed Requires Authentication Published online by De Gruyter April 10, 2024

Ensemble learning methods of inference for spatially stratified infectious disease systems

  • Jeffrey Peitsch , Gyanendra Pokharel EMAIL logo and Shakhawat Hossain

Abstract

Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.


Corresponding author: Gyanendra Pokharel, Department of Mathematics and Statistics, University of Winnipeg, Winnipeg, MB, Canada, E-mail:

  1. Research ethics: Ethics approval is not required as this research do not include any studies on human and animals.

  2. Author contributions: All authors involved in drafting the manuscript or riving it critically for intellectual content, and all authors approved the final version to be published. Study concept and design: Pokharel. Modelling and data analysis: Peitch and Pokharel. Interpretation of the results: Pokharel, Peitch and Hossain.

  3. Competing interests: Authors have nothing to declare.

  4. Research funding: This work was funded by Natural Sciences and Engineering Research Council of Canada (NSERC), and NSERC Undergraduate Student Research Award (NSERC-USRA).

  5. Data availability: The data we utilized was obtained from the United Kingdom Government, Department of Environment. Food and Rural Affairs (DEFRA). Wo do not have permission to redistribute the data. If readers wish to obtain the data, they may contact Dr. Pokharel or DEFRA directly at defra.helpline@defra.gsi.gov.uk.

References

1. Nuzzo, J, Moss, B, Watson, C, Rutkow, L, Garibaldi, B, Blauer, B, et al.. Johns hopkins coronavirus resource center covid-19 dashboard. Baltimore, Marylands, USA: School of Medicine, Johns Hopkins University; 2022.Search in Google Scholar

2. Chis Ster, I, Ferguson, N. Transmission parameters of the 2001 foot and mouth epidemic in great britain. PLoS One 2007;2:e502. https://doi.org/10.1371/journal.pone.0000502.Search in Google Scholar PubMed PubMed Central

3. Deardon, R, Brooks, S, Grenfell, T, Keeling, M, Tildesley, M, Savill, N, et al.. Inference for individual-level models of infectious diseases in large populations. Stat Sin 2010;20:239–61.Search in Google Scholar

4. Kwong, GPS, Deardon, R. Linearized forms of individual-level models for large-scale spatial infectious disease systems. Bull Math Biol 2012;74:1912–37. https://doi.org/10.1007/s11538-012-9739-8.Search in Google Scholar PubMed

5. Pokharel, G, Deardon, R. Gaussian process emulators for spatial individual-level models of infectious disease. Can J Stat 2016;44:480–501. https://doi.org/10.1002/cjs.11304.Search in Google Scholar

6. Malik, R, Deardon, R, Kwong, GPS. Parameterizing spatial models of infectious disease transmission that incorporate infection time uncertainty using sampling-based likelihood approximations. PLoS One 2016;11. https://doi.org/10.1371/journal.pone.0146253.Search in Google Scholar PubMed PubMed Central

7. Almutiry, W, Deardon, R. Incorporating contact network uncertainty in individual level models of infectious disease using approximate bayesian computation. Int J Biostat 2020;16:20170092. https://doi.org/10.1515/ijb-2017-0092.Search in Google Scholar PubMed

8. Nsoesie, E, Beckman, R, Marathe, M, Lewis, B. Prediction of an epidemic curve: a supervised classification approach. Stat Commun Infect Dis 2011;3. https://doi.org/10.2202/1948-4690.1038.Search in Google Scholar PubMed PubMed Central

9. Pokharel, G, Deardon, R. Supervised learning and prediction of spatial epidemics. Spat Spatio-temporal Epidemiol 2014;11:57–77. https://doi.org/10.1016/j.sste.2014.08.003.Search in Google Scholar PubMed

10. Augusta, C, Deardon, R, Taylor, G. Deep learning for supervised classification of spatial epidemics. Spat Spatio-temporal Epidemiol 2019;29:187–98. https://doi.org/10.1016/j.sste.2018.08.002.Search in Google Scholar PubMed

11. Liu, Z, Deardon, R, Fu, Y, Ferdous, T, Ware, T, Cheng, Q. Estimating parameters of two-level individual-level models of the COVID-19 epidemic using ensemble learning classifiers. Front Phys 2021;8:602722. https://doi.org/10.3389/fphy.2020.602722.Search in Google Scholar

12. Hughes, G, McRoberts, N, Madden, L, Nelson, SC. Validating mathematical models of plant disease progress in space and time. Math Med Biol: J IMA 1997;14:85–112. https://doi.org/10.1093/imammb/14.2.85.Search in Google Scholar

13. Breiman, L. Random forests. Mach Learn 2001;45:5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar

14. Hastie, T, Tibshirani, R, Friedman, J. The elements of statistical learning, 2 edn New York, USA: Springer; 2009.10.1007/978-0-387-84858-7Search in Google Scholar

15. Liaw, A, Wiener, M. Classification and regression by randomforest. R News 2002;2:18–22.Search in Google Scholar

16. Friedman, JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001;29:1189–232. https://doi.org/10.1214/aos/1013203451.Search in Google Scholar

17. Li, P. Robust logitboost and adaptive base class (abc) logitboost; 2010. https://doi.org/10.48550/arXiv.1203.3491.Search in Google Scholar

18. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. New York, NY, USA: ACM; 2016:785–94 pp.10.1145/2939672.2939785Search in Google Scholar

19. Pokharel, G, Deardon, R. Emulation-based inference for spatial infectious disease trans- mission models incorporating event time uncertainty. Scand J Stat 2022;49:455–79. https://doi.org/10.1111/sjos.12523.Search in Google Scholar

20. Park, HS, Jun, CH. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 2009;36:3336–41. https://doi.org/10.1016/j.eswa.2008.01.039.Search in Google Scholar

21. Mahsin, MD, Deardon, R, Brown, P. Geographically dependent individual-level models for infectious diseases transmission. Biostatistics 2020;23:1–17. https://doi.org/10.1093/biostatistics/kxaa009.Search in Google Scholar PubMed

22. Ward, C, Deardon, R, Schmidt, A. Bayesian modeling of dynamic behavioral change during an epidemic; 2022. https://doi.org/10.48550/arXiv.2211.00122.Search in Google Scholar

23. Lawson, AB, Onicescu, G, Ellerbe, C. Foot and mouth disease revisited: Re-analysis using bayesian spatial susceptible-infectious-removed models. Spat Spatio-temporal Epidemiol 2011;2:185–94. https://doi.org/10.1016/j.sste.2011.07.004.Search in Google Scholar PubMed


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/ijb-2023-0102).


Received: 2023-09-12
Accepted: 2024-02-13
Published Online: 2024-04-10

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 16.5.2024 from https://www.degruyter.com/document/doi/10.1515/ijb-2023-0102/html
Scroll to top button