Simulating and analysing infectious disease data in a heterogeneous population with migration

https://doi.org/10.1016/j.cmpb.2010.05.007Get rights and content

Abstract

Mathematical modelling of infectious diseases has gained growing attention in epidemiology during the last decades. The major benefits of simulating compartmental models are the prediction of the consequences of potential interventions, a deeper understanding of epidemic dynamics and clinical decision support. The main limitation is however that several parameters are based on uncertain expert guesses (default values) and are not estimated from the study data. In this paper we build a bridge between an extension of the well-known deterministic S–I–R (Susceptible–Infectious–Removed) model which can be described with differential equations and the stochastic counterpart which can be used for statistical inference if outbreak data on an individual level are available. The possibly time-dependent transmission rate as well as the (basic) reproduction number are the main epidemiological parameters of interest. Furthermore, one important type of heterogeneity is considered: individuals may vary due to their susceptibility, i.e., risk factors for infection may be investigated. A SAS computer program is provided to simulate outbreak data for this type of setting. The statistical analysis and typical challenges with epidemic data are discussed. Given data on an individual level, the Cox–Aalen survival model that is based on a multiplicative-additive hazard structure turned out to be a suitable tool for that purpose. The results give valuable information for epidemiologists, statisticians and public health researchers.

Introduction

The theory of infectious disease transmission is extensively investigated by mathematical models. They play a fundamental role in the understanding of epidemic and pandemic processes [1]. In contrast, the development of appropriate statistical models to estimate the parameters of epidemic models is still in process but lagging behind [2].

Modelling of infectious diseases is described with so-called compartmental models, in which the study population is divided into compartments ([1], 2nd chapter). The basic model to describe a general epidemic is the S–I–R (Susceptible–Infectious–Removed) model, which has been developed by Kermack and McKendrick [3]; it is displayed in Fig. 1. Depending on the nature of the infectious disease, this model can be extended in various ways. Examples of possible extensions are as follows: the S–I–S (Susceptible–Infectious–Susceptible) model, where infected persons may recover and become susceptible again or the S–E–I–R (Susceptible–Exposed–Infectious–Removed) model with a latent period.

There are two standard approaches to describe compartmental models: the deterministic/mathematical and the stochastic/statistical approach [4]. In most scientific books and papers, compartmental models are formulated in the deterministic way. Differential equations are used to describe the time-dynamics of the epidemic process. Based on prespecified plug-in parameters and initial values, the solution of the differential equations reflects the epidemic dynamics in a theoretical population. This mathematical approach plays a major role in the understanding of infectious diseases [5]. It is often used for simulating the potential effect of interventions to control infectious diseases, e.g., in hospitals [6]. One limitation is that this approach is only valid when the size of the population is accordingly large. Moreover, it is purely deterministic since the temporal dynamics is completely predictable based on the initial rates. In reality, however, epidemic behaviour contains a stochastic random variation meaning that there is a certain probability that the process will do so. This fundamental principle is especially important if the study population is small. This can be overcome by studying the stochastic version of the epidemic process.

In both approaches, simulations are possible without any collected data. Only default values which are often derived from expert guesses or the literature are necessary. An example is the software InfluSim for simulating pandemic influenza [7]. It is based on a deterministic model which includes clinical and demographic parameters. The aim is to help public health planners to find the optimal intervention strategy. Further examples for a hospital setting are given here [8], [9], [10], [11], [12].

However, for analysing infectious disease data, deterministic models can only be a guide towards parameter estimates [13]. The stochastic counterpart introduces random variation and plays the key role for the statistical analysis. There are two fundamental books for analysing infectious disease data [14], [15]. Statistical methods for analysing outbreak data in a homogeneous population are introduced by Becker and Britton [13]; this paper also contains an excellent review on further statistical challenges.

In most statistical models a closed population is assumed which is rarely true in reality. Migration such as birth/immigration and death/emigration is often necessary to be addressed in the model because it influences the epidemic dynamics. Further, most models assume that all individuals have the same force of infection, which might not be realistic for specific pathogens. And public health workers have an interest to know who has a higher risk of acquiring infectious diseases. Moreover, the rates in the model are often assumed to be constant over time, but time-varying rates influence the epidemic process. And again, this is crucial for public health: the effect of interventions can be demonstrated if the transmission rate decreases with time. Thus, the model should allow the parameters to vary with time. In this paper we address these issues and relax the above mentioned assumptions.

Section snippets

Design considerations

The objective is to provide a computer program to simulate outbreak data in a heterogeneous population with migration; we used the SAS software and an extended version of a previously published SAS code for the simulations [16]. The simulations are based on the stochastic version of a compartmental model which is displayed in Fig. 2. It depends on the following input parameters for the simulation: two transmission and two discharge rates (one for the high and one for the low risk group,

Homogeneous population

In a general epidemic we divide at every time point t the study population of size M(t) into three classes labelled S, I, and R, as displayed in Fig. 1. Let S(t) denote the number of individuals who are susceptible to the infectious disease at calendar time t, I(t) the number of infected individuals and R(t) the number of individuals who have been infected and then removed from the possibility of being infected again or of spreading infection. The epidemic process can then be formulated with

Example

As an hypothetical example, we consider an extended S–I–R model for two heterogeneous groups displayed by Fig. 3. The heterogeneity is due to susceptibility. We assume that the population consists of 100 individuals at every time point; a new individual enters whenever an individuals leaves the population. The model may reflect a hospital with 100 beds and full bed occupancy. At the beginning of the study (at t = 0), there are 50 susceptibles in the high risk, 40 in the low risk group and 10

Simulation

In this paper we provide a SAS program to simulate different outbreak realisations. Epidemiologists can mimic scenarios which are relevant for their work. It demonstrates that the shape of outbreaks can differ greatly although the epidemic parameters remain exactly the same. An epidemic can die out by chance even in case of high infectivity, just because of stochastic variation.

Analysis

The major problem in analysing infectious disease data is the limited quality of outbreak data since infectious

Future challenges

There are further time-dependent characteristics to be considered in infectious disease models such as a potential latent, infectious or incubation period. These periods can be introduced in deterministic as well as stochastic models. But if the specific period should be introduced as a random entity for statistical analysis, it is rather a problem whether these additional data, e.g., the timing of incubation are available. An ad-hoc approach would be to introduce them as constant and fixed

Acknowledgement

We thank Christine Scheufele for reading the final manuscript for grammar and style which helped to improve the manuscript.

References (26)

  • E.M. D’Agata et al.

    Modeling the invasion of community-acquired methicillin-resistant Staphylococcus aureus into hospitals

    Clin. Infect. Dis.

    (2009)
  • D. Austin et al.

    Vancomycin-resistant enterococci in intensive-care hospital settings: transmission dynamics, persistence, and the impact of infection control programs

    Proc. Natl. Acad. Sci. U.S.A.

    (1999)
  • M. Wolkewitz et al.

    Environmental contamination as an important route for the transmission of VRE: modeling and prediction of classical interventions

    Infect. Dis. Res. Treat.

    (2008)
  • View full text