Skip to main content
Advertisement
  • Loading metrics

A data-driven semi-parametric model of SARS-CoV-2 transmission in the United States

  • John M. Drake ,

    Roles Conceptualization, Data curation, Investigation, Project administration, Supervision, Visualization, Writing – original draft, Writing – review & editing

    jdrake@uga.edu

    Affiliations Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America, Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America

  • Andreas Handel,

    Roles Conceptualization, Data curation, Investigation, Software, Writing – review & editing

    Affiliations Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America, College of Public Health, University of Georgia, Athens, Georgia, United States of America

  • Éric Marty,

    Roles Conceptualization, Formal analysis, Visualization, Writing – review & editing

    Affiliations Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America, Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America

  • Eamon B. O’Dea,

    Roles Conceptualization, Formal analysis, Investigation, Writing – review & editing

    Affiliations Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America, Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America

  • Tierney O’Sullivan,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliations Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America, Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America

  • Giovanni Righi,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliations Odum School of Ecology, University of Georgia, Athens, Georgia, United States of America, Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America

  • Andrew T. Tredennick

    Roles Conceptualization, Data curation, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    ¤ Current address: Metrum Research Group, Tariffville, Connecticut, United States of America

    Affiliations Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, United States of America, Western EcoSystems Technology, Inc., Laramie, Wyoming, United States of America

Abstract

To support decision-making and policy for managing epidemics of emerging pathogens, we present a model for inference and scenario analysis of SARS-CoV-2 transmission in the USA. The stochastic SEIR-type model includes compartments for latent, asymptomatic, detected and undetected symptomatic individuals, and hospitalized cases, and features realistic interval distributions for presymptomatic and symptomatic periods, time varying rates of case detection, diagnosis, and mortality. The model accounts for the effects on transmission of human mobility using anonymized mobility data collected from cellular devices, and of difficult to quantify environmental and behavioral factors using a latent process. The baseline transmission rate is the product of a human mobility metric obtained from data and this fitted latent process. We fit the model to incident case and death reports for each state in the USA and Washington D.C., using likelihood Maximization by Iterated particle Filtering (MIF). Observations (daily case and death reports) are modeled as arising from a negative binomial reporting process. We estimate time-varying transmission rate, parameters of a sigmoidal time-varying fraction of hospitalized cases that result in death, extra-demographic process noise, two dispersion parameters of the observation process, and the initial sizes of the latent, asymptomatic, and symptomatic classes. In a retrospective analysis covering March–December 2020, we show how mobility and transmission strength became decoupled across two distinct phases of the pandemic. The decoupling demonstrates the need for flexible, semi-parametric approaches for modeling infectious disease dynamics in real-time.

Author summary

Due to the time-varying nature of numerous drivers of disease transmission, flexible, semi-parametric models of disease transmission are necessary to faithfully represent the complex, hidden dynamic processes that drive epidemics of emerging pathogens like SARS-CoV-2. Adequate models are essential to guide policy decisions. We present a data-driven semi-parametric model of SARS-CoV-2 transmission that embeds a latent process within a mechanistic compartmental model. The latent process sub-model captures temporal variation in precautionary behaviors that cannot be easily measured. We show that temporal variation in transmission strength is best explained by mobility early in the pandemic and by this latent process later in the pandemic. The model is flexible and incorporates known biological parameters and disease transmission processes.

Introduction

The COVID-19 pandemic in the United States has challenged the capabilities of conventional infectious disease transmission models [1, 2]. Yet, models are critical for guiding policy decisions, updating situational awareness, and retrospectively evaluating the key drivers of transmission and the effectiveness of interventions [3, 4]. The complex interactions of a highly transmissible disease, seasonal forcing, evolving public health messaging and guidance, the rise of new genetic variants, and the long duration of the COVID-19 pandemic have made it difficult to effectively model how the mechanistic drivers of the force of infection changed over time [5]. Lack of adequate data about key behavioral factors, such as the adoption of individual protective behaviors like mask-wearing, adds to the difficulty.

For example, the initial phase of the pandemic was marked by stay-at-home orders across the nation [6]. Human mobility dropped drastically from pre-pandemic levels, slowing the spread of SARS-CoV-2 [7]. Publicly available cell phone mobility data allowed modelers to explicitly estimate the impact of human movement—a proxy for person-to-person contact—in mechanistic models of disease transmission [5, 8, 9]. However, when states began to re-open in April 2020, mobility data became a weaker correlate of transmission strength [4, 10]. Precautionary measures such as masking and social-distancing in public spaces weakened the relationship between mobility and transmission risk [4]. But no consistent and reliable data exist to capture the dynamics of all precautionary behaviors over time. Carefully designed and constrained models can infer these latent dynamics, allowing for more accurate situational awareness and forecasting [4].

Here, we present a data-driven semi-parametric compartmental model of SARS-CoV-2 transmission. The stochastic model includes compartments for latent (exposed but pre-symptomatic), asymptomatic, detected and undetected symptomatic individuals, and hospitalized cases (Fig 1). The model also includes time-varying rates of detection probability, diagnosis time, and mortality, all of which add realism to the model, improve identifiability, and enable stronger inference on remaining dynamics. In our model, transmission rate is allowed to differ among asymptomatic, pre-symptomatic, and symptomatic individuals. Force of infection is a function of the number of individuals in each compartment, the relative mobility of individuals, and a latent process. The latent process is a time-dependent spline function (see Materials and methods) representing all of the processes influencing the force of infection that cannot be measured. The model was developed in real-time during 2020 and fitted jointly to daily case and deaths data from each U.S. state and the District of Columbia (hereafter collectively referred to as “US states”) from the date of first case notification in each state to December 31, 2020. For this publication, case and death reports were retrieved from the Johns Hopkins University Coronavirus Resource Center [11] on July 18, 2022, meaning any revisions to the data due to time-lags or changes in inclusion criteria for 2020 were applied. Human mobility data were obtained from Unacast [12]. We restricted our analysis to March 1, 2020 to December 31, 2020, a logical endpoint for retrospective analysis because the model does not include vaccination, and vaccines began to be administered in late December 2020.

To demonstrate the usefulness of our semi-parametric model, we perform a retrospective analysis showing that transmission strength (effective reproduction number) and mobility became decoupled over time. By “decoupled”, we mean two time-varying variables are correlated in one period and not correlated in another. This conclusion could not be arrived at with traditional parametric models. Correlation analysis shows that relative mobility among US states was highly correlated over time, regardless of pandemic phase. To the contrary, however, transmission strength became uncorrelated among states as different states had different levels of adherence to precautionary behaviors [13]. We then show how the relative mobility trend and the latent process trade off in importance over time, so that transmission alternately synchronized with or decoupled from mobility. These dynamics can only be captured with a flexible, data-driven modeling approach when key underlying processes cannot be completely measured.

Results

The model estimated daily cases and daily deaths that closely match the observed data although allowing for reporting errors resulting from aggregation, reporting delays, weekend effects, and other anomalies (Fig 2A and 2B; model fits for all states are shown in Section C in S1 Appendix). Model performance varied by state and by response variable. The model for 45 states (out of 51 total) had mean absolute scaled errors for new cases less than or equal to one (Fig B in S1 Appendix), meaning that the semi-parametric model performed better than a non-mechanistic model comprising a random walk model with weekly periodicity. However, only 30 states had mean absolute scaled errors for new deaths less than or equal to one (Fig B in S1 Appendix), meaning either the process for modeling the transition from hospitalized cases to deaths or the observation model could be improved. The smoothed trajectories of daily deaths capture the general trends of the data well (Fig 2B and Section C in S1 Appendix.), which suggests the observation model is the limiting factor in our model. This can be seen in Fig 2B, where the fitted trajectory passes through the cloud of data points but fails to capture cyclical reporting patterns. Cyclical reporting patterns are identified, for example, by large numbers of daily deaths being followed by reports of near 0 daily deaths in the middle of the time series (July—September, Fig 2B).

thumbnail
Fig 2. Model fits and estimated quantities for California.

A comparison of estimated and observed (A) COVID-19 daily cases and (B) COVID-19 daily deaths show the course of the COVID-19 pandemic in California from March–December 2020. In A and B, lines show the median estimated state variable, ribbons show the 95% prediction interval, and points are noisy, recorded observations. In panel (C), the estimated relative mobility trend (blue line) and the estimated latent trend (red line) are shown to vary considerably over time resulting in dramatic fluctuations in the force of infection, giving rise to the multiple waves infection shown in (A) and (B). These fluctuations in the force of infection are illustrated by (D) the estimated effective reproduction number, which crossed the critical boundary at Re = 1 on numerous occasions during 2020.

https://doi.org/10.1371/journal.pcbi.1011610.g002

We also estimated a latent process that, when combined with mobility, uniquely defines transmission strength (Fig 2C). We used the next-generation matrix approach [14] to numerically calculate the time-varying effective reproduction number (Materials and methods), which is a function of mobility, the estimated latent process, and state variables (Figs 2D and 3B).

thumbnail
Fig 3. Drivers of force of infection over time.

(A) The time series of the relative number of public health mandates (blue lines) and relative mobility (red lines) for each state shows how public health messaging and individual behavior diverge after April 24, 2020 (shown as a vertical dashed line). Relative number of health mandates was calculated as the total number of health mandates issued in each 50 states at each time, divided by the maximum observed over all times in each state. Each light line is a single state. Heavy lines are the average relative mobility and relative number of mandates across all states. The dashed purple line shows the mobility over time for Hawaii, which was the lowest among all states. Before April 24, 2020, public health mandates increased while mobility decreased in all states. After April 24, 2020, public health mandates began to expire and mobility increased, but states began to diverge in their responses, as seen by the larger spread in the light lines after April 24. In panel (B), the estimated effective reproduction number, , for each state over time reflects the changes in public health mandates and mobility. Before April 24, declined to below the critical value of 1 (dashed horizontal line) for almost all states. After April 24, is much more dynamic and variable across states, reflecting the differences in mobility and public health messaging. Analysis of these patterns suggest two distinct phases of the pandemic: Phase 1 before April 24 and Phase 2 after April 24. Critically, even as mobility increased in Phase 2 to near pre-pandemic levels, typically remained near 1.

https://doi.org/10.1371/journal.pcbi.1011610.g003

Analysis of these patterns identified two distinct phases of the early pandemic associated with the time series of relative mobility, the cumulative number of state-issued public health mandates nationwide, and the date at which states began “re-opening” (per CDC definitions: https://www.cdc.gov/museum/timeline/covid19.html). On April 24, 2020, states began re-opening and allowing public health mandates to expire, which unsurprisingly coincided with increasing average mobility (Fig 3A). We use this date to notionally partition two phases of the pandemic from March 1, 2020 to April 24, 2020 (Phase 1) and from April 25, 2020 to December 31, 2020 (Phase 2). In Phase 1, we understand transmission to be almost entirely driven by human-to-human contact. The sharp reduction in human mobility compared with the pre-pandemic baseline resulted in dramatically lowering the effective reproduction number, [4, 8, 15]. In Phase 2, precautionary behaviors like mask wearing and social distancing in public played a much greater role in determining than mobility [4].

Our estimates of display the decoupling of transmission strength from mobility between the two phases (Fig 3B). For instance, pairwise correlations of mobility (mean of the absolute value of Pearson’s ρ across all pairs = 0.98, SD = 0.03) and (mean |ρ| = 0.86, SD = 0.22) are high among states in the first phase of the pandemic (Fig 4A and 4B). In the second phase of the pandemic, mobility remained highly correlated among states (mean |ρ| = 0.88, SD = 0.16; Fig 4C) while became uncorrelated among states (mean |ρ| = 0.32, SD = 0.24; Fig 4D).

thumbnail
Fig 4. Between-state correlations in mobility and force of infection.

Pairwise cross correlations among time series of the (A) relative mobility covariate for each pair of states and (B) among time series of for each pair of states in Phase 1 of the pandemic (March 1, 2020 to April 24, 2020) show that mobility and force of infection were highly correlated across states in Phase 1 of the pandemic. This indicates a tight coupling between mobility and force of infection in the Phase 1. Panels C and D visualize the same pairwise cross correlations during Phase 2 of the pandemic (April 24, 2020 to December 31, 2020), which shows that mobility remains correlated among states while force of infection is uncorrelated. This suggests that mobility is not the main driver of disease dynamics in Phase 2.

https://doi.org/10.1371/journal.pcbi.1011610.g004

Variance partitioning showed that mobility explained most of the temporal variation of in the first phase of the pandemic and that the latent trend explained most of the variation in in the second phase of the pandemic for nearly all states (Fig 5). The latent process was an important model component for some states in Phase 1 of the pandemic (e.g., Nevada (NV), Fig 5). There were no states for which mobility remained more important than the latent process in determining the temporal variation of in Phase 2 of the pandemic (Fig 5). This is especially noteworthy since relative mobility in Hawaii was the lowest among all states throughout 2020 (Fig 3A). Taken together, these lines of evidence show that decoupled (i.e., not temporally related) from mobility over the course of the pandemic, a feature of the pandemic that we attribute to “societal learning” as the population adopted increasingly nuanced and localized approaches to reducing transmission [16].

thumbnail
Fig 5. Drivers of temporal variance in force of infection.

Comparisons of the amount of temporal variance in that can be explained by mobility (blue bars) and the latent process (red bars) in the two phases of the pandemic shows that, for most states, the temporal variance of switches from being explained by mobility in Phase 1 to being explained by the latent process in Phase 2. The temporal variation in is mostly explained by the latent trend in both phases for several states (e.g., Nevada). Individual panels for each states are arranged by approximate geographic coordinates for each state.

https://doi.org/10.1371/journal.pcbi.1011610.g005

Discussion

The COVID-19 pandemic spurred innovation in epidemiological modeling because models and forecasts of how the pandemic might unfold were in high demand. However, models that cannot reproduce the complex outcomes of nonlinear dynamical systems are not useful for situational awareness, forecasting, or scenario analysis under any but the shortest time horizons [9]. This does not mean that a good model must include mechanistic descriptions of all relevant processes. But, models do need to be able to reproduce observed outcomes (i.e., case, hospitalization, and death reports). We presented a flexible modeling approach that is rooted in the mechanisms of disease transmission and included a latent trend to represent unmeasurable factors that influence transmission. In so doing, we show the decoupling of SARS-CoV-2 transmission from social distancing as other precautionary behaviors became more prevalent.

Our semi-parametric model mostly performed better than a benchmark random walk model for cases. This benchmark random walk model, which is strictly statistical and lacks mechanistic interpretation, was included in another study that compared twenty-eight COVID-19 forecasting models [17]. The benchmark random walk model was found to have intermediate forecasting skill in comparison with the twenty-seven other models [17], making it a good baseline for comparison with our semi-parametric model. The semi-parametric model fits for six states had mean absolute scaled errors (MASE) greater than one (Fig B in S1 Appendix), indicating worse fit than the benchmark random walk model with weekly periodicity. These six states tended to have case reports that oscillated between 0 and 100s in the later weeks of 2020 (Section C in S1 Appendix), likely due to lags in reporting. Because our semi-parametric model did not include an observation process with reporting lags or periodicity, the model performed worse for those states where reporting oscillations were dramatic. The same issue (report lags and periodicity in reporting) resulted in death fits from our model being worse than the benchmark random walk model for 21 states. We suggest that model performance might be improved by including a more complex observation model to account for cyclical variation in reporting [5].

The better fit of the benchmark random walk model in some situations does not invalidate the utility of our model. The benchmark model, a random walk with weekly periodicity, does not include any epidemiology. Therefore, while the random walk model may generate slightly better one-step-ahead predictions in some situations, it does not yield any scientific insight. Our semi-parametric model did perform better than the random walk model for many states and the filtered trajectories of new cases and deaths matched the observations well, even when MASE ≥ 1 (Fig B in S1 Appendix).

The results of variance partitioning suggest that a purely mechanistic model would have performed just as well as our semi-parametric model in the early stages of the pandemic, when human mobility was the main driver of transmission strength. However, our model performed well across all stages of the pandemic because mobility was allowed to drive transmission early in the pandemic and then trade-off with the latent process in latter stages. The flexibility of the time-varying latent process is critical because the intensity of precautionary behaviors varied over time. Some analyses have sought to address this issue using additional data streams such as the presence of mask mandates over space and time [18]. In contrast, our flexible time-evolving process avoids the need for additional data streams.

Other modelers have used temporal random effects to estimate time-varying transmission strength [4, 19]. For example, Fox et al. [4] modeled time-varying transmission as a function of two time-specific random walks and a normally distributed temporal random effect. Modeling discrete change-points in transmission strength is another approach [20]. But the first- and second-order discontinuities of change-points hardly reflect the way that behaviors change over time and their influence on the ebb and flow of transmission. Our spline approach achieves the desired outcome—letting the relationship between mobility and transmission change over time—while also directly quantifying the contribution of mobility and other forces to moderating transmission. Indeed, our approach is more flexible because the number of basis functions used to estimate the B-spline can also be fitted or, if needed, trimmed to the minimum number possible for computationally efficient model fitting.

We estimated maximum values in range of 1.52–3.84 across all 50 states. Highest occurred at the start of the epidemic (Fig 3B and Section D in S1 Appendix). These estimates are similar to an estimated from meta-analysis, 3.38 ±1.40 [21]. Our estimates are lower than those reported for some individual cities and municipalities, probably because our state-level model is too coarse to capture super-spreading events that happen at local scales [22]. For example, Fox et al. [4] reported a 7-day average of 5.8 at the start of the epidemic in Austin, Texas and Hasan et al. [22] estimated greater than 5 in the Jakarta–Depok region of Indonesia.

Our analysis has implications for modeling the next pandemic. First, as reported by others, adherence to precautionary behaviors is not static and is not always measurable [4, 23]. This means that modeling approaches need built-in flexibility to estimate latent trends, such as the time-dependent spline function we used. Second, incorporating latent trends makes situational awareness easier (i.e., answering the question “What is the current trend in the force of infection?”), but it makes forecasting harder. Estimating trends in transmission is easier with a data-driven model because the data have a large influence on model dynamics and summary quantities such as . Forecasting is more difficult, however, because the latent trend must be specified for future times at which (by definition) no data exist. This means that latent, smooth trends that were specifically designed to fluctuate over time must be, in practice, extrapolated in some fashion. New methods are arising to handle forecasting with latent processes when the underlying function is periodic [24]. But the transmission dynamics of an emerging pathogen are not periodic, meaning subjective choices must be made to specify the latent process at future times if forecasting is the objective. One possibility is to use the estimated number of cases to predict hospitalizations and/or deaths, but only up to future time points within the typical duration between case notification, hospitalization, and death. Such forecasts would be relatively short-term by definition.

Flexible, semi-parametric models of disease transmission are an effective way to faithfully represent complex, hidden, and time-evolving transmission in outbreaks of emerging pathogens [4, 5, 25]. Human behavior and disease transmission are inherently linked and each generates feedback to the other. We believe it will never be possible to perfectly and fully represent human behavior. Nonetheless, carefully crafted models, such as the model we presented and others [4, 5, 19], can infer the impact of human behavior when fitted to data. The semi-parametric model we have presented is capable of using more input data (e.g., survey data on mask-wearing) or less (e.g., no mobility data), making it flexible enough to perform statistical inference regardless of how much information about covariates is available. This feature could be of substantial value to models of the ongoing COVID-19 pandemic, where information is accumulating over time, as well as future pandemics, where initial modeling efforts will also be hobbled by the lack of information.

Materials and methods

Data

We fit the model to daily case and death reports for each state in the USA collated by the Center for Systems Science and Engineering at Johns Hopkins University [11]. We did not smooth the reported data but we did redefine all negative case and death reports as NA-values. Time-series of case and death reports for each state used in this analysis are shown in Section C of S1 Appendix. Data were downloaded on 2022-07-18, meaning any revisions to data from 2020 were applied. For example, data sets were often back-filled after reporting lags and some states changed the inclusion criteria for a death to be considered primarily a cause of COVID-19. We do not specifically account for these revisions or changes in data over time in this analysis.

Data on human mobility were downloaded from Unacast (https://www.unacast.com/). The data were pre-aggregated by Unacast, and were the difference in daily distance traveled relative to baseline distance (Janurary 2020), averaged over individuals in each state. We converted the data to a relative difference in mobility by adding 1 to each value with a positive sign (greater than baseline) and subtracting the absolute value from 1 for all values with a negative sign (less than baseline). We then fit a smoothing spline through the time series to generate smooth covariate for modeling. The fitted splines for each state are shown in the Section D of S1 Appendix.

Model

The model comprises susceptible, pre-symptomatic, asymptomatic, symptomatic, diagnosed, hospitalized, deceased, and recovered persons (Table 1). The infectious compartments (pre-symptomatic, asymptomatic, symptomatic, diagnosed, and hospitalized persons) differ in their transmissibility and are thus defined by their impacts on population-level transmission rather than clinical symptoms. To reflect realistic distributions of movement through compartments, the L, Ia, Isd, Isu, C, and H compartments are internally split into four stages using the linear chain trick [26] (Fig A in S1 Appendix). Transitions between compartments were modeled using the Euler multinomial approximation given the size of the “donating” compartment and the specified or fitted rate of transition, as implemented in the R [27] package pomp [28] version 2.7.1.0.

thumbnail
Table 1. State variable symbols and definitions in the model.

https://doi.org/10.1371/journal.pcbi.1011610.t001

Ignoring the internal splits of multi-stage compartments, the stochastic model is defined as the set of difference equations with where nX is the number of individuals that remain in each compartment X; f(t) is the force of infection at time t; γL, , , , γC, and γH are the transition rates out of each respective compartment; q(t) is the time varying probability of case detection; s(t) is a time varying factor determining time to diagnosis; a is the fraction of infected individuals that are symptomatic; h is the fraction of detected symptomatic individuals that are hospitalized; and m(t) is the time varying fraction of hospitalized cases that are fatal. EM stand for the Euler Multinomial process described by He et al. [29] with the rates (e.g., ef(tt) described and stochastic noise descriptions excluded. See He et al. [29] for more details.

The force of infection f(t) was modeled as: where bL, , bC, and bH are relative transmissibility factors for each respective compartment. The time-dependent transmission rate is a function of mobility, a latent trend, and process noise as: where N is the constant total population size of the state, ψ(t) is relative human mobility at time t, and ϕ(t) is the latent trend of relative transmission strength, specified as the spline function where K is the number of basis functions, g is the vector of spline coefficients, and ξ is the matrix of basis functions. The number of basis functions, K, was defined as the number of days in each time-series divided by 21 (one function every 21 days [3 weeks]). Process noise Γ(t) was modeled as gamma-distributed white noise (temporally uncorrelated noise) with mean 1 and variance σ2 [30].

Time-varying detection probability (q(t)) was modeled as a Hill function starting at 10% (qmin) on day 0 and increasing to a maximum possible of 40% (qmax), reaching that half way point of 25% on the 30th day (qhalf) since the first case notification: where qr is the rate of increase of the Hill function, which is set to 1.1, and t is the day since first case notification (see Section A of S1 Appendix).

Time-varying decrease in time-to-diagnosis (1/s(t) days) was similarly defined as: where smax is the maximum decrease in days to diagnosis, shalf is the day on which the decrease in days to diagnosis is half-way between 0 and smax (set to day 30), and sr is rate of increase of Hill function, set to 1.1 (see Section A of S1 Appendix). Note that s(t) is multiplied by to increase the diagnosis rate, and that γC is divided by s(t) to slow the rate from C to H, increasing the amount of time in the C compartment to account for reduced time in the Isd compartment.

Time-varying mortality fraction (m(t)) was also specified as a Hill function: where mbase is the baseline fraction of hospitalizations that result in death, mmin is the minimum fraction of hospitalizations that result in death, mhalf is the days since first case notification at which m(t) is halfway between mbase and mmin, and mr is the Hill function rate of change, fixed to 1.

We calculated the effective reproduction number () using the next-generation matrix approach [14]. Assuming that all of the time dependent functions, S, ω, s, q, are changing slowly over the course of an individual’s infection, the expected number of new cases each case generates, , may be approximated as:

This equation may be arrived at by assuming that all time-varying parameters are fixed to the value at time t over the course of an infection and multiplying the expected residence time in each infectious stage by the transmission rate, the number of susceptible individuals, and the probability of a newly infected individual experiencing it.

We assumed that new, daily case reports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from Isd4 to C1. We denote this quantity by Cnew, which accumulates over the course of one day in the simulation model and resets to zero at the end of each day (the model is simulated at a time step of 1/20 days). Similarly, we assumed that new, daily death reports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from C4 to D. We denote this quantity by Dnew, which accumulates over the course of one day in the simulation model and resets to zero at the end of each day. Then, for both new cases and deaths, we modeled the observation process as: where θC and θD are the negative binomial (NB) dispersion parameters for cases and deaths, respectively. Note that cases(t) and deaths(t) are the observed number of cases or deaths reported on day t.

Model fitting

We fit the model using Maximization by Iterated particle Filtering (MIF). Observations (daily case and death reports) were modeled as arising from a negative binomial reporting process (see above). We estimated 24 to 26 parameters for each state: the baseline fraction of hospitalized cases that result in death (mbase), the minimum fraction of hospitalized cases that result in death (mmin), the number of days since first case notification at which m(t) is half-way between mbase and mmin (mhalf), a parameter accounting for extra-demographic process noise (σ), two negative binomial dispersion parameters (θc and θd), the initial size of the latent and infectious classes (L(t = 0), Ia(t = 0), Isu(t = 0), Isd(t = 0)), and 14 to 16 coefficients (gi for i ∈ (1, 2, 3, …, 14(16)) specifying the latent trend spline. The number of B-spline coefficients depended on the length of the time series for each state. The estimated latent trend multiplied by the relative mobility covariate multiplied by (fixed to 7 for all states) specifies the time-varying transmission rate. All other parameters were fixed at the values reported in Table 2.

Initial size of the susceptible compartment was set as each state’s population size minus the number of individuals in other the other compartments that we fix. It is true that the initial size of the susceptible pool will also decrease based on the number of individuals estimated to be in the latent infections compartment at t = 1. However, given the small size of the latent and infectious compartments relative to total population size, and the fact that total population size is a point estimate with error, we assume that our simple approach of setting S(t = 1) to each state’s population size is valid. For each state, we considered t = 1 as the date on which the first case is reported in that state.

We used the IF2 algorithm [38] implemented in the R [27] package pomp version 2.7.1.0. [28] to perform MIF. To initialize IF2, we generated 100 parameter sets from a range of parameter values using a Sobol sequence sampling design (Table 3). We then performed two rounds of MIF, each for 100 iterations with 3,500 particles and geometric cooling. For the first round of MIF we set cooling.factor = 1.0. For the second round of MIF, which continues from where the first round stopped, we set cooling.factor = 0.9. The log likelihoods of the parameter sets following MIF were calculated as the log of the mean likelihoods of 20 replicate particle filters with 5,000 particles each. At this stage, we collected all parameter sets within 10 negative log likelihood points of the maximum and sampled a new collection of 100 parameter sets, with sampling weighted in proportion to the negative log likelihood of the parameter set. All parameters in the new set were perturbed as: pnew ∼ Normal(p, |p × 0.25|). The perturbed parameter sets were then used to initialize two final rounds of MIF, each run for 50 iterations with 3,500 particles. Cooling factors were 1 in the penultimate MIF round and 0.5 in the final MIF round. At this stage, we assume the parameter set with highest log likelihood is the MLE.

thumbnail
Table 3. Estimated parameters and starting ranges for MIF estimation procedure.

The expit function refers to back-transforming the parameter from the logit scale, which was used for estimation.

https://doi.org/10.1371/journal.pcbi.1011610.t003

Following [4], we calculated smoothed posterior estimates of all time-varying states using replicate particle filtering. Specifically, we ran 500 replicate particle filters with 2,500 particles at the MLE, retaining one randomly sampled complete particle trajectory from each particle filter. The 500 trajectories generated a set of 500 smoothed posterior draws of all time-varying state variables and parameters. was calculated at each time t for each of the 500 smoothed posteriors using the equation presented above, yielding a smoothed posterior distribution for . We used the median of the posterior distribution of for all analyses. The estimated, time-varying latent trend was calculated using the MLEs for the B-spline coefficients.

Correlation analysis

We calculated pairwise cross correlations of the time series of relative mobility and estimated latent trends (median of ψ) among all locations (states) using Pearson’s correlation coefficient. The correlations were estimated for both phases of the pandemic (see main text) using the cor() function in R.

Variance partitioning

We split the time series of (effective reproduction number), ψ (latent transmission trend), and ϕ (relative human mobility) for each state into two periods: pre-April 24, 2020 and post-April 24, 2020 (see main text). For each period and each state, we partitioned the variance of over time among ψ and ϕ. We used the median of the smoothed posterior distributions for ψ and . We fit two linear regressions to explain over time: an intercept only model and a full model with the mobility trend and the latent trend as additive covariates: R_e ∼ mobility + latent. Models were fitted using the lm() function in R. We extracted the sum of squared residuals for each model parameter using the anova() function in R. Proportion of variance explained by each covariate was calculated as the sum of squared errors associated with each covariate divided by the sum of squared errors of the null model.

Supporting information

S1 Appendix. Additional model details and results.

S1 Appendix comprises five sections: A) graphical displays of the detection probability (q(t)) and diagnosis time (1/s(t)) functions; B) graphical display of mean absolute scaled errors for each state for new case and new death reports; C) time series of incident case and death reports for each state with model-estimated filtered trajectories overlaid; and D) time series of mobility, estimated latent trend, and effective reproduction number for each state.

https://doi.org/10.1371/journal.pcbi.1011610.s001

(PDF)

Acknowledgments

This study was supported in part by resources and technical expertise from the Georgia Advanced Computing Resource Center, a partnership between the University of Georgia’s Office of the Vice President for Research and Office of the Vice President for Information Technology. We thank Shan-Ho Tsai for assistance with high-performance computing.

References

  1. 1. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proceedings of the National Academy of Sciences. 2020;117(29):16732–16738. pmid:32616574
  2. 2. Castle JL, Doornik JA, Hendry DF. THE VALUE OF ROBUST STATISTICAL FORECASTS IN THE COVID-19 PANDEMIC. National Institute Economic Review. 2021;256:19–43.
  3. 3. Reiner RC, Barber RM, Collins JK, Zheng P, Adolph C, Albright J, et al. Modeling COVID-19 scenarios for the United States. Nature Medicine. 2021;27(1):94–105.
  4. 4. Fox SJ, Lachmann M, Tec M, Pasco R, Woody S, Du Z, et al. Real-time pandemic surveillance using hospital admissions and mobility data. Proceedings of the National Academy of Sciences. 2022;119(7):e2111870119. pmid:35105729
  5. 5. O’Dea EB, Drake JM. A semi-parametric, state-space compartmental model with time-dependent parameters for forecasting COVID-19 cases, hospitalizations and deaths. Journal of The Royal Society Interface. 2021;19(187):20210702.
  6. 6. Moreland A, Herlihy C, Tynan MA, Sunshine G, McCord RF, Hilton C, et al. Timing of State and Territorial COVID-19 Stay-at-Home Orders and Changes in Population Movement—United States, March 1–May 31, 2020. Morbidity and Mortality Weekly Report. 2020;69(35):1198–1203. pmid:32881851
  7. 7. Castillo RC, Staguhn ED, Weston-Farber E. The effect of state-level stay-at-home orders on COVID-19 infection rates. American Journal of Infection Control. 2020;48(8):958–960. pmid:32461066
  8. 8. Kraemer MUG, Yang CH, Gutierrez B, Wu CH, Klein B, Pigott DM, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. pmid:32213647
  9. 9. Childs ML, Kain MP, Harris MJ, Kirk D, Couper L, Nova N, et al. The impact of long-term non-pharmaceutical interventions on COVID-19 epidemic dynamics and control: the value and limitations of early models. Proceedings of the Royal Society B: Biological Sciences. 2021;288(1957):20210811. pmid:34428971
  10. 10. Nouvellet P, Bhatia S, Cori A, Ainslie KEC, Baguelin M, Bhatt S, et al. Reduction in mobility and COVID-19 transmission. Nature Communications. 2021;12(1):1090. pmid:33597546
  11. 11. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases. 2020;20(5):533–534. pmid:32087114
  12. 12. Covid-19 Social Distancing Scoreboard—Unacast; 2022. Available from: https://www.unacast.com/covid19/social-distancing-scoreboard.
  13. 13. Fischer CB, Adrien N, Silguero JJ, Hopper JJ, Chowdhury AI, Werler MM. Mask adherence and rate of COVID-19 across the United States. PLOS ONE. 2021;16(4):e0249891. pmid:33852626
  14. 14. van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical Biosciences. 2002;180(1):29–48. pmid:12387915
  15. 15. Badr HS, Du H, Marshall M, Dong E, Squire MM, Gardner LM. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. The Lancet Infectious Diseases. 2020;20(11):1247–1254. pmid:32621869
  16. 16. Lodge EK, Schatz AM, Drake JM. Protective population behavior change in outbreaks of emerging infectious disease. BMC Infect Dis. 2021;21(1):577. pmid:34130652
  17. 17. Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci U S A. 2022;119(15):e2113561119. pmid:35394862
  18. 18. Banerjee S, Lian Y. Data driven covid-19 spread prediction based on mobility and mask mandate information. Applied Intelligence. 2022;52(2):1969–1978. pmid:34764603
  19. 19. Gibson GC, Reich NG, Sheldon D. Real-Time Mechanistic Bayesian Forecasts of Covid-19 Mortality. medRxiv; 2020. Available from: https://www.medrxiv.org/content/10.1101/2020.12.22.20248736v2.
  20. 20. Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, et al. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 2020;369(6500):eabb9789. pmid:32414780
  21. 21. Alimohamadi Y, Taghdir M, Sepandi M. Estimate of the Basic Reproduction Number for COVID-19: A Systematic Review and Meta-analysis. Journal of Preventive Medicine and Public Health = Yebang Uihakhoe Chi. 2020;53(3):151–157. pmid:32498136
  22. 22. Hasan A, Susanto H, Kasim MF, Nuraini N, Lestari B, Triany D, et al. Superspreading in early transmissions of COVID-19 in Indonesia. Scientific Reports. 2020;10(1):22386. pmid:33372191
  23. 23. Salomon JA, Reinhart A, Bilinski A, Chua EJ, La Motte-Kerr W, Rönn MM, et al. The US COVID-19 Trends and Impact Survey: Continuous real-time measurement of COVID-19 symptoms, risks, protective behaviors, testing, and vaccination. Proceedings of the National Academy of Sciences. 2021;118(51):e2111454118. pmid:34903656
  24. 24. Clark NJ, Wells K. Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 2022;n/a(n/a).
  25. 25. Zhou T, Ji Y. Semiparametric Bayesian inference for the transmission dynamics of COVID-19 with a state-space model. Contemporary Clinical Trials. 2020;97:106146. pmid:32947047
  26. 26. Hurtado PJ, Kirosingh AS. Generalizations of the ‘Linear Chain Trick’: incorporating more flexible dwell time distributions into mean field ODE models. Journal of Mathematical Biology. 2019;79(5):1831–1883. pmid:31410551
  27. 27. {{R Core Team}}. R: A Language and Environment for Statistical Computing.; 2020. Available from: https://www.R-project.org/.
  28. 28. King AA, Nguyen D, Ionides EL. Statistical Inference for Partially Observed Markov Processes via the R Package pomp. Journal of Statistical Software. 2016;69:1–43.
  29. 29. He D, Ionides EL, King AA. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. Journal of The Royal Society Interface. 2010;7(43):271–283. pmid:19535416
  30. 30. Bretó C, Ionides EL. Compound Markov counting processes and their applications to modeling infinitesimally over-dispersed systems. Stochastic Processes and their Applications. 2011;121(11):2571–2591.
  31. 31. Sanche S, Lin YT, Xu C, Romero-Severson E, Hengartner N, Ke R. High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2—Volume 26, Number 7—July 2020—Emerging Infectious Diseases journal—CDC. Emerging Infectious Disease Journal—CDC. 2020;26(7).
  32. 32. Mizumoto K, Kagaya K, Zarebski A, Chowell G. Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020. Eurosurveillance. 2020;25(10):2000180. pmid:32183930
  33. 33. Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases. 2020;20(6):669–677. pmid:32240634
  34. 34. Li R, Pei S, Chen B, Song Y, Zhang T, Yang W, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science. 2020;368(6490):489–493. pmid:32179701
  35. 35. Moghadas SM, Shoukat A, Fitzpatrick MC, Wells CR, Sah P, Pandey A, et al. Projecting hospital utilization during the COVID-19 outbreaks in the United States. Proceedings of the National Academy of Sciences. 2020;117(16):9122–9126. pmid:32245814
  36. 36. Ohsfeldt RL, Choong CKC, Mc Collam PL, Abedtash H, Kelton KA, Burge R. Inpatient Hospital Costs for COVID-19 Patients in the United States. Advances in Therapy. 2021;38(11):5557–5595. pmid:34609704
  37. 37. Zhang J, Litvinova M, Wang W, Wang Y, Deng X, Chen X, et al. Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: a descriptive and modelling study. The Lancet Infectious Diseases. 2020;20(7):793–802. pmid:32247326
  38. 38. Ionides EL, Nguyen D, Atchadé Y, Stoev S, King AA. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences. 2015;112(3):719–724. pmid:25568084