Elsevier

Environmental Modelling & Software

Volume 119, September 2019, Pages 275-284
Environmental Modelling & Software

A genetic algorithm for identifying spatially-varying environmental drivers in a malaria time series model

https://doi.org/10.1016/j.envsoft.2019.06.010Get rights and content

Highlights

  • Time series of malaria risk in the Amhara region of Ethiopia can be modeled with remotely-sensed environmental covariates.

  • Responses to the environment are not spatially uniform and the region needs to be partitioned into separate models.

  • A genetic algorithm successfully partitioned districts into clusters with different responses to lagged environmental data.

  • Patterns of malaria outbreaks and their environmental drivers varied geographically along a precipitation gradient.

Abstract

Time series models of malaria cases can be applied to forecast epidemics and support proactive interventions. Mosquito life history and parasite development are sensitive to environmental factors such as temperature and precipitation, and these variables are often used as predictors in malaria models. However, malaria-environment relationships can vary with ecological and social context. We used a genetic algorithm to optimize a spatiotemporal malaria model by aggregating locations into clusters with similar environmental sensitivities. We tested the algorithm in the Amhara Region of Ethiopia using seven years of weekly Plasmodium falciparum data from 47 districts and remotely-sensed land surface temperature, precipitation, and spectral indices as predictors. The best model identified six clusters, and the districts in each cluster had distinctive responses to the environmental predictors. We conclude that spatial stratification can improve the fit of environmentally-driven disease models, and genetic algorithms provide a practical and effective approach for identifying these clusters.

Introduction

Mosquito-borne diseases are a persistent global threat with significant implications for human health. Epidemics of mosquito-borne diseases have spread explosively, creating a public health crisis that places an estimated 3.9 billion people living within 120 different countries at risk (Brady et al., 2012, Wilder-Smith et al., 2017). Entrenched diseases such as malaria are the focus of continuing efforts toward control and elimination (Alonso and Noor, 2017). Forecasting future epidemics is desirable so that limited resources for prevention and control can be allocated efficiently before the peaks of outbreaks (Thomson and Connor, 2001). However, developing robust and effective forecasting models is difficult because of the complexities of mosquito ecology and disease transmission cycles. Here, we address this challenge with a novel approach that uses an evolutionary algorithm to optimize a spatiotemporal malaria model by partitioning observed locations into clusters, each of which represents a different transmission environment with a unique set of environmental sensitivities captured by the model.

It is well established that environmental factors, including temperature, humidity, and rainfall, are important determinants of malaria risk (Stresman, 2010). These variables influence multiple aspects of mosquito life history and parasite development, including larval habitats, mosquito fecundity, growth rates, mortality, and Plasmodium parasite development rates within the mosquito vector. Although malaria transmission is a complex, nonlinear system, it is often approximated with statistical models in which time series of epidemiological outcomes, such as incidence of malaria cases, are predicted as lagged functions of one or more meteorological variables (Lowe et al., 2013, Midekisa et al., 2012, Zinszer et al., 2015). This approach can be effective when there are sufficient long-term data to characterize seasonal and interannual variability in the epidemiological and environmental variables. Malaria case data are routinely collected through disease surveillance programs and are typically aggregated by health facility or geographic district. Environmental monitoring data can be obtained from nearby meteorological stations or by summarizing satellite remote sensing data for the locations where epidemiological data were collected.

An important issue with statistical modeling of malaria time series is spatial non-stationarity. Because the underlying environmental and social contexts of malaria vary geographically, the influences of meteorological variables like temperature and precipitation are often inconsistent from location to location (Stresman, 2010). For example, in Ethiopia, precipitation is an important driving variable in dry regions where availability of breeding habitats is the critical constraint, whereas temperature is often the limiting factor at high elevations where cool temperatures limit mosquito and parasite development rates (Midekisa et al., 2015, Teklehaimanot et al., 2004). Associations between remotely-sensed vegetation greenness and malaria cases were found to be strongest in the driest and most epidemic-prone parts of Eritrea (Ceccato et al., 2012). Similarly, climate variables were most effective at predicting malaria outbreaks in high-elevation, epidemic-prone areas in Malawi (Lowe et al., 2013). In the Brazilian Amazon, precipitation was related positively to malaria cases in upland regions and negatively in wetland regions (Olson et al., 2009).

Simply lumping multiple locations together into a global model with a single set of environmental parameters will not capture the range of local environmental sensitivities and will result in poor model fit. At the other extreme, fitting individual models for every location can lead to overfitting, particularly if the time series are short and the data are noisy. We propose that a more effective approach is to identify clusters of districts that share similar relationships with environmental conditions. One way to identify these clusters is to use a priori stratifications based on covariates such as elevation ranges or ecological zones (Midekisa et al., 2015), with the assumption that locations with similar environmental characteristics will have similar disease trends. Alternatively, the time series data themselves can be analyzed to identify clusters with similar temporal patterns without any reference to explanatory variables (Ceccato et al., 2007, Li and Ngom, 2013, Liao, 2005); e.g. by clustering based on the Fourier spectra of time series to detect similar cyclical patterns (Geerken et al., 2005). However, to ensure that districts in a cluster share responses to changing environmental conditions, selection of covariates and clustering based on malaria responses to those covariates must occur simultaneously.

Here, we present a new modeling approach that combines standard time series techniques with an evolutionary algorithm to identify an optimal clustering of districts for environmental modeling of malaria risk. Evolutionary algorithms (EAs) are computational methods that use the principles of natural selection to solve optimization problems that are otherwise computationally intractable (Whitley, 2001). While many variations exist, here we consider a basic genetic algorithm (GA) to solve the problem of assembling districts into clusters based on their responses to the environment. A GA is a specific type of EA that simulates evolution in a series of generations of individuals, each of which has its own genetic code similar to DNA. In our approach the “individuals” are basic statistical models with different parameter estimates and the “genetic code” indicates how the districts are assigned to various clusters. Our approach incorporates distributed lag effects that estimate the delayed influences of environmental variation over a range of time scales and a flexible trend component to account for the influences of non-environmental factors such as malaria interventions.

We evaluated this approach using seven years of weekly malaria surveillance and daily environmental monitoring data collected for 47 districts in the Amhara Region of Ethiopia, a mountainous area where malaria was historically common and large-scale regional outbreaks caused substantial morbidity and mortality. The region is now undergoing large-scale interventions aimed at malaria control and elimination. The purpose of this study was to generate a set of spatially stratified time series models for malaria by placing the districts into clusters within which malaria cases have similar associations with environmental predictors. This objective required the development of a new GA-based modeling approach because it is fundamentally different from standard clustering techniques. GAs have been shown to outperform traditional methods of clustering (Falkenauer, 1998, Maulik and Bandyopadhyay, 2000), but here classical clustering methods are not even possible, as we must also simultaneously select environmental covariates. For example, SaTScan (Kulldorff et al., 2007) and related cluster-detection methods aim at highlighting spatial and temporal clusters of high relative risk of a disease, but do not consider environmental covariates except as control variables. Time series clustering methods can identify clusters with similar temporal patterns that are likely to have similar environmental drivers, but do not directly consider the underlying environmental relationships (Vlachos et al., 2003).

We therefore used the GA to generate a set of spatially stratified time series models for malaria and assessed the fit of these models to the historical dataset. We also gained insights into climate-malaria relationship by exploring the geographic distribution of the resulting clusters and the lagged relationships with environmental predictors in the different clusters.

Section snippets

Study area and malaria morbidity data

The Amhara Region is in the north of Ethiopia and has a population of more than 20 million persons (Fig. 1). The area's terrain and climate are heterogeneous, with lowlands to the northwest and mountainous regions reaching 4500 m above sea level. Patterns of rainfall and temperature vary substantially throughout the region; temperature is highest in the lowlands and decreases with elevation, whereas precipitation is highest in the western part of the region and decreases to the east. Human

Results from the genetic algorithm

The algorithm was allowed to run for 750 generations, after which it was considered to have converged (Appendix Figs. 1 and 2). The best (lowest AIC) model had 6 clusters, 75% of models in the GA had between 5 and 7 clusters, and all models had between 1 and 11 clusters. The best model had LSTM, PMMC, NDWI6 as its covariates; these appeared in 84%, 83%, and 85% of models, respectively. All other variables appeared far less often. If variables were all equivalently predictive, we would expect

Discussion

We used an evolutionary algorithm to simultaneously solve the problems of variable selection and determine the optimal spatial stratification of an environmentally driven time-series model of malaria in the Amhara region of Ethiopia. The algorithm grouped districts into six clusters based on responses to environmental inputs. The districts within each of these six clusters shared a set of distributed lag functions that modeled the delayed effects of temperature and moisture conditions over the

Acknowledgements

This work was funded by the National Institute of Allergy and Infectious Diseases (Grant number R01AI079411). We thank Chris Merkord and Yi Liu for their work on software development and data processing for the EPIDEMIA project, and Aklilu Getinet for his assistance with project coordination.

References (58)

  • Y. Liu et al.

    Software to facilitate remote sensing data access for disease early warning systems

    Environ. Model. Softw

    (2015)
  • U. Maulik et al.

    Genetic algorithm-based clustering technique

    Pattern Recogn.

    (2000)
  • F. Pattarin et al.

    Clustering financial time series: an application to mutual funds style analysis

    Comput. Stat. Data Anal.

    (2004)
  • G.H. Stresman

    Beyond temperature and precipitation: ecological risk factors that modify malaria transmission

    Acta Trop.

    (2010)
  • Santosh Thakur et al.

    Artificial neural network based prediction of malaria abundances using big data: a knowledge capturing approach

    Clinical Epidemiology and Global Health

    (2019)
  • M.C. Thomson et al.

    The development of malaria early warning systems for Africa

    Trends Parasitol.

    (2001)
  • C.J. Tucker

    Red and photographic infrared linear combinations for monitoring vegetation

    Rem. Sens. Environ.

    (1979)
  • Z. Wan

    New refinements and validation of the MODIS land-surface temperature/emissivity products

    Rem Sens Eviron

    (2008)
  • D. Whitley

    An overview of evolutionary algorithms: practical issues and common pitfalls

    Informatino and Softw. Technol.

    (2001)
  • T.A. Abeku et al.

    Effects of meteorological factors on epidemic malaria in Ethiopia: a statistical modelling approach based on theoretical reasoning

    Parasitology

    (2004)
  • V.A. Alegana et al.

    Estimation of malaria incidence in northern Namibia in 2009 using Bayesian conditional-autoregressive spatial–temporal models

    Spatial and Spatio-temporal Epidemiology

    (2013)
  • S. Almon

    The distributed lag between capital appropriations and expenditures'

    Econometrica

    (1965)
  • A. Arab et al.

    Modelling the effects of weather and climate on malaria distributions in West Africa

    Malar. J.

    (2014)
  • A. Baeza et al.

    The rise and fall of malaria under land-use change in frontier regions

    Nat. Ecol. Evol.

    (2017)
  • O.J. Brady et al.

    Refining the global spatial limits of dengue virus transmission by evidence-based consensus

    PLoS Neglected Trop. Dis.

    (2012)
  • K.P. Burnham et al.

    A Practical Information-Theoretic Approach', Model Selection and Multimodel Inference

    (2002)
  • P. Ceccato et al.

    Malaria stratification, climate, and epidemic early warning in Eritrea

    Am. J. Trop. Med. Hyg.

    (2007)
  • P. Ceccato et al.

    A vectorial capacity product to monitor changing malaria transmission potential in epidemic regions of Africa

    J. Trop. Med.

    (2012)
  • M. Churakov et al.

    Spatio-temporal dynamics of dengue in Brazil: seasonal travelling waves and determinants of regional synchrony

    PLoS Neglected Trop. Dis.

    (2019)
  • Cited by (22)

    • Advancing climate change health adaptation through implementation science

      2022, The Lancet Planetary Health
      Citation Excerpt :

      The goal of the Epidemic Prognosis Incorporating Disease and Environmental Monitoring for Integrated Assessment (EPIDEMIA) project is to develop malaria EWSs that can be used sustainably by public health institutions. These tools support routine forecasting of future malaria risk based on epidemiological surveillance and climate monitoring data.16,17 There was a successful pilot implementation of EPIDEMIA in the Amhara region of Ethiopia.18

    • Identifying out of distribution samples for skin cancer and malaria images

      2022, Biomedical Signal Processing and Control
      Citation Excerpt :

      Similarly, there is a huge need for improvement for worldwide deadly malaria disease detection. Although efforts have been made by researchers using different techniques [33,34] and computer-aided systems [35], malaria detection is still a challenging task because of mosquito ecology and disease transmission cycle [36]. One of the straightforward ways of detecting OoD samples is to utilize the difference of prediction probability of OoD and ID examples [14].

    • Development of a constitutive model for the compaction of recovered polyethylene terephthalate packages

      2021, Waste Management
      Citation Excerpt :

      The correlation between vertical strain and pressure is shown in several graphics with no numerical fitting. Thus no previous work has dealt with the compaction of recovered PET waste, and many existing models have been used to fit a wide variety of materials (Agranat and Perminov, 2020; Nguyen et al., 2020; Davis et al., 2019; Rodriguez-Delgado et al., 2019). Based on the comparison of previous works, it can be deduced that each model’s suitability does not depend so much on the material type, but on the applied compaction pressures.

    • Satellite Observations and Malaria: New Opportunities for Research and Applications

      2021, Trends in Parasitology
      Citation Excerpt :

      The moderate resolution imaging spectroradiometer (MODIS) instrument, launched aboard the United States National Aeronautics and Space Administration (NASA) Terra and Aqua satellites in 1999 and 2002, provided significant improvements in spatial resolution (250–1000 m), measurement frequency (up to four times daily in the tropics), number of spectral bands, and data quality. Spectral indices and land surface temperature from MODIS are frequently used with satellite precipitation measurements (Figure 3) as predictors in spatial models for generating malaria risk maps [13,14], and time series models for predicting changes in malaria risk resulting from environmental fluctuations [15,16]. Remotely sensed data are also used to control for environmental variation when studying the influences of other factors on malaria.

    • LSTM based prediction of malaria abundances using big data

      2020, Computers in Biology and Medicine
      Citation Excerpt :

      Due to complex mosquito ecology and disease transmission cycle, the prediction of malaria is a challenging task [5]. Many researchers have developed prediction models based on environmental factors [6–8]. For malaria surveillance, predictions based on environmental factors and clinical conditions using big data have not been well-explored in the literature.

    View all citing articles on Scopus
    View full text