Research Article

Superspreaders drive the largest outbreaks of hospital onset COVID-19 infections

MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, United Kingdom
Institut für Biologische Physik, Universität zu Köln, Germany
Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, United States
University of Cambridge, Department of Medicine, Cambridge Biomedical Campus, United Kingdom
Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, United Kingdom
Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical Campus, United Kingdom
Public Health England Field Epidemiology Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge Biomedical Campus, United Kingdom
University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical Campus, United Kingdom
Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, United Kingdom
Wellcome Sanger Institute, Wellcome Trust Genome Campus, United Kingdom
MRC Epidemiology Unit, University of Cambridge, Level 3 Institute of Metabolic Science, United Kingdom
University of Cambridge, School of Clinical Medicine, Cambridge Biomedical Campus, United Kingdom
Public Health England, National Infection Service, United Kingdom

Aug 24, 2021

https://doi.org/10.7554/eLife.67308

Open access
Copyright information

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record updated: September 2, 2021 (This version)
Version of Record published: August 24, 2021 (Go to version)
Accepted: July 15, 2021
Received: February 7, 2021

1. Part of Collection
COVID-19: A Collection of Articles

Edited by Diane M Harper et al.
Further reading

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

SARS-CoV-2 is notable both for its rapid spread, and for the heterogeneity of its patterns of transmission, with multiple published incidences of superspreading behaviour. Here, we applied a novel network reconstruction algorithm to infer patterns of viral transmission occurring between patients and health care workers (HCWs) in the largest clusters of COVID-19 infection identified during the first wave of the epidemic at Cambridge University Hospitals NHS Foundation Trust, UK. Based upon dates of individuals reporting symptoms, recorded individual locations, and viral genome sequence data, we show an uneven pattern of transmission between individuals, with patients being much more likely to be infected by other patients than by HCWs. Further, the data were consistent with a pattern of superspreading, whereby 21% of individuals caused 80% of transmission events. Our study provides a detailed retrospective analysis of nosocomial SARS-CoV-2 transmission, and sheds light on the need for intensive and pervasive infection control procedures.

eLife digest

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, presents a global public health challenge. Hospitals have been at the forefront of this battle, treating large numbers of sick patients over several waves of infection. Finding ways to manage the spread of the virus in hospitals is key to protecting vulnerable patients and workers, while keeping hospitals running, but to generate effective infection control, researchers must understand how SARS-CoV-2 spreads.

A range of factors make studying the transmission of SARS-CoV-2 in hospitals tricky. For instance, some people do not present any symptoms, and, amongst those who do, it can be difficult to determine whether they caught the virus in the hospital or somewhere else. However, comparing the genetic information of the SARS-CoV-2 virus from different people in a hospital could allow scientists to understand how it spreads.

Samples of the genetic material of SARS-CoV-2 can be obtained by swabbing infected individuals. If the genetic sequences of two samples are very different, it is unlikely that the individuals who provided the samples transmitted the virus to one another. Illingworth, Hamilton et al. used this information, along with other data about how SARS-CoV-2 is transmitted, to develop an algorithm that can determine how the virus spreads from person to person in different hospital wards.

To build their algorithm, Illingworth, Hamilton et al. collected SARS-CoV-2 genetic data from patients and staff in a hospital, and combined it with information about how SARS-CoV-2 spreads and how these people moved in the hospital . The algorithm showed that, for the most part, patients were infected by other patients (20 out of 22 cases), while staff were infected equally by patients and staff. By further probing these data, Illingworth, Hamilton et al. revealed that 80% of hospital-acquired infections were caused by a group of just 21% of individuals in the study, identifying a ‘superspreader’ pattern.

These findings may help to inform SARS-CoV-2 infection control measures to reduce spread within hospitals, and could potentially be used to improve infection control in other contexts.

Introduction

Reducing the spread of SARS-CoV-2 is a crucial priority for controlling and limiting the impact of the COVID-19 pandemic. Key metrics in assessing transmission are the basic and effective reproduction R numbers, which describe the mean number of infections caused by a typical infected individual in totally and partially susceptible populations, respectively (Anderson and May, 1992). However, individual variations from this mean can be of vital importance (Lloyd-Smith et al., 2005); a study of SARS-CoV-2 in Hong Kong suggested that 80% of transmission events resulted from only 19% of cases (Adam et al., 2020). Superspreader events are widely reported to play a key role in the spread of the virus in community settings (Shen et al., 2004; Kucharski and Althaus, 2015; Hamner, 2020; Ebrahim and Memish, 2020; Lemieux et al., 2020).

Transmission within hospitals has been identified as a critical concern in managing the COVID-19 pandemic (Iacobucci, 2020). Studying transmission in the hospital environment requires care in distinguishing cases acquired in the community from cases of nosocomial infection (Sikkema et al., 2020). The identification of outbreaks may be complicated by the potential for asymptomatic carriage of the virus (Rivett et al., 2020). As such, testing of asymptomatic health care workers (HCW) has been proposed as a means to reduce viral spread and protect the workforce and patients (Black et al., 2020; Jones et al., 2020).

Evaluating SARS-CoV-2 transmission in a hospital context is not a trivial task. Factors such as the date of symptom onset relative to the date of admission can be used to identify cases of likely nosocomial transmission (Rickman et al., 2021; Price et al., 2021). Phylogenetic methods can be used to identify putative clusters of infection occurring within a single ward or other location within a hospital setting (Meredith et al., 2020). Epidemiological methods can be used to look at potential contacts and opportunities for transmission between cases of infection (Cluster Track, Camart Ltd, Cambridge, UK). However, these approaches do not always provide the detail of who infected whom within a single outbreak or cluster, lacking the resolution to resolve the fine detail of transmission clusters.

Viral genome sequences provide a valuable resource for evaluating nosocomial transmission. At the most basic level, highly distinct sequences are unlikely to be related via transmission. Multiple approaches have been proposed to infer transmission patterns from genome sequences. Typically, phylogenetic reconstruction is used to infer relationships between sequences, an evolutionary model being combined with epidemiological data to infer a network of transmission events (Volz and Frost, 2013; Ypma et al., 2012; Didelot et al., 2014; Hall et al., 2015). Modelling approaches have been extended to include factors such as unsampled hosts (De Maio et al., 2016), the availability of multiple samples per patient (Wymant et al., 2018; Worby et al., 2016), incomplete epidemics (Didelot et al., 2017), and deep sequence data (Ratmann et al., 2019).

Here, we evaluated patterns of viral transmission occurring in epidemiological clusters in Cambridge University Hospitals NHS Foundation Trust (CUH), United Kingdom, where multiple patients with suspected hospital-acquired COVID-19 infections and/or HCW working on the same wards tested positive for SARS-CoV-2 within a 2-week period. Using a novel approach to combine genetic and epidemiological data, we inferred networks of SARS-CoV-2 transmission between these individuals. The tight clustering of genome sequences collected within a single ward places an imperative on the exploitation of non-genetic information to identify potential transmission events. We did this by combining an evolutionary model with symptom and location data for individuals considered, and knowledge of SARS-CoV-2 infection dynamics. Examining data from the largest clusters of infection identified within the hospital, we showed that the spread of infection in these clusters was driven by a small set of superspreader individuals.

Results

We developed a method to infer networks of transmission events between individuals within CUH. Our method combines knowledge of SARS-CoV-2 infection dynamics with viral genome sequences and data describing the movements of patients and HCWs within the hospital.

Applying our method to these data, we generated maximum likelihood reconstructions of the pattern of transmission events occurring within five infection clusters, each of which was centred on a ward at CUH. For reasons of data protection we term these wards A to E. These five wards were chosen as they contained the largest number of patients with hospital-onset infections and/or healthcare worker infections in CUH up to the end of the study period. Of the wards analyzed, wards A to D were ‘green’ wards (designated for patients who had not tested positive for SARS-CoV-2), while ward E was a ‘red’ ward (designated for known COVID-19 patients). Although referred to here as a ‘ward’ for simplicity, one of the green wards was a number of neighbouring clinical areas within the hospital. A preliminary analysis of the data, treating individuals in a pairwise manner, suggested that transmission events between the identified wards was unlikely (Figure 1).

Figure 1

Download asset Open asset

Preliminary analysis of the data with A2B-Covid.

Squares indicate the extent to which an individual-to-individual transmission event is consistent with the data collected, when considered on a pairwise level. Our analysis highlighted multiple potential transmission events occurring within each ward, but transmission between individuals on different wards was uniformly assessed as unlikely. Further analyses of the data considered wards as independent and isolated locations.

Figure 1—source data 1 Assessment of pairwise transmission events.: https://cdn.elifesciences.org/articles/67308/elife-67308-fig1-data1-v2.xlsx
Download elife-67308-fig1-data1-v2.xlsx

Reconstructed transmission networks for the four green wards are shown in Figure 2. Our method requires each transmission in a network to be consistent with a statistical model of pairwise viral transmission (Illingworth, 2020). As such, a broad range of possibilities could in theory be inferred. At one extreme, the infections on a ward could all arise from a single introduction of the virus, with all cases arising via transmission from a single individual. At the other extreme, the infections could all be entirely independent of one another, with no transmission between cases at all. Our approach uses sequence data and epidemiological information to identify cases that are plausibly linked by direct transmission, before inferring the maximum likelihood network reconstruction of events. Across the green wards, the majority of cases were inferred to be connected to at least one other via transmission, with 42 out of 54 cases being joined into networks. These networks involved between 2 and 11 cases each (mean 5.9 cases). This contrasts with results from the single red ward (Figure 3), in which only 9 of 19 cases were inferred to be linked to others via transmission (mean inferred network size 2.3 cases). Our result corresponds to the nature of the wards studied; the repeated transfer of infected patients onto a red ward leads to an increased number of independent introductions.

Figure 2 with 2 supplements see all

Download asset Open asset

Maximum likelihood transmission networks for wards A to D.

Circles represent individuals and arrows show transmission events. White circles represent patients while grey circles represent health care workers. Individuals for which no transmission events were inferred are represented as isolated circles.

Figure 3

Download asset Open asset

Maximum likelihood transmission network for ward E.

Circles represent individuals and arrows show transmission events. White circles represent patients while grey circles represent health care workers. Individuals for which no transmission events were inferred are represented as isolated circles.

Individuals in our study were divided into patients and HCWs allowing for the estimation of rates of transmission between these categories. Of the 38 transmission events in the maximum likelihood networks, 20 were patient-to-patient, 8 were from patient to HCW, 8 were HCW-to-HCW, and just 2 were from HCW to patient (Figure 2—figure supplement 1) These results suggest that patients were significantly more likely to be infected by other patients than by health care workers (p-value 6.1 x 10⁻⁵, one-tailed binomial test). By contrast, HCWs were at approximately equal risk of being infected by patients and other HCWs.

Some of the wards analysed appeared to show uneven patterns of viral transmission, with a small number of individuals responsible for most of the infections observed. For example, in the maximum likelihood reconstruction derived for ward A, the majority of individuals infected did not pass on the virus, while individuals A6 and A10 were the sources, respectively, of four and five transmission events (Figure 2A).

A statistical analysis of the inferred networks provided evidence for a role for superspreading behaviour during transmission. As a first step in evaluating this, we calculated the level of uncertainty in each inferred network, combining data from the maximum likelihood network with that from other plausible, but lower-likelihood networks. Figure 2—figure supplement 2 shows statistical ensembles of networks inferred for the green wards. In this figure, the width of an arrow is proportional to the probability that transmission occurred between each pair of individuals. The maximum likelihood network inferred for Ward E was the only plausible solution given by our reconstruction method.

In a second step, we fitted models to data from these ensembles, calculating probability distributions of the number of individuals infected by each person in the dataset. A negative binomial model, in which the extent of viral spreading was overdispersed, gave a better explanation of the data, measured using the Bayesian Information Criterion, than a simpler model in which all individuals transmitted equally (Figure 4). In the best-performing model of viral spreading, 87% of individuals either did not transmit the virus, or transmitted only to one other. Taken across all individuals, 21% of individuals were responsible for 80% of viral transmission, a result very similar to that found among the general population (Adam et al., 2020). A repeat of this calculation for the green wards alone gave similar statistics (Figure 4—figure supplement 1), with 23% of individuals in these wards being responsible for 80% of transmission (Figure 4—figure supplement 1).

Figure 4 with 5 supplements see all

Download asset Open asset

Models of viral transmission.

(A) Fit of the output of the Poisson model (black dots) to the ensemble data (yellow bars). The weighted number of transmissions per individual reflects the uncertainty in the network reconstruction across the ensemble. (B) Fit of the output of the negative binomial model (black dots) to the ensemble data (yellow bars). (C) Proportions of individuals causing different proportions of infections. A negative binomial model (red line) fitted to all ward data produces a result similar to that of Adam et al., 2020 (blue dot), with 20% of individuals being responsible for 80% of infections. A Poisson model fitted to the same data (dashed grey line) has 20% of individuals being responsible for 60% of infections.

Figure 4—source data 1 Distributions of number of individuals infected by each individual and fits to these data using Poisson and Negative Binomial models.: https://cdn.elifesciences.org/articles/67308/elife-67308-fig4-data1-v2.xlsx
Download elife-67308-fig4-data1-v2.xlsx

In our maximum likelihood reconstructions, a total of five individuals infected three or more others, including one HCW and four patients. Clinical characteristics of these individuals were explored, but are not described in detail or assigned to their anonymised ward clusters to preserve patient anonymity. Of the four patients, all had suspected hospital-acquired COVID-19 and significant comorbidities: two had a history of chronic liver disease, and two had previous haematological malignancies, one of whom was still on immunosuppressive treatment. Immunosuppression has been associated with prolonged viral shedding (Italiano et al., 2020; Avanzato et al., 2020). One superspreader was confused and mobile on the ward. Another had a fever for several days before being tested for SARS-CoV-2, which had been attributed to a pre-existing community-acquired bacterial infection. The only HCW superspreader exclusively infected other HCWs, and shared accommodation with several of these individuals. Cycle threshold (Ct) values of samples collected from identified superspreader individuals were not statistically distinct from those from individuals in the study in general (Figure 4—figure supplement 2).

Inferred timings of transmission events caused by superspreaders showed a variety of patterns (Figure 4—figure supplements 3–5). In ward B, the initial three infections of HCWs by the individual B0 are likely to have occurred within a short period of the SARS-CoV-2 virus entering the ward (within 4 days with 95% certainty), suggesting that an outbreak caused by superspreading may spread rapidly to multiple individuals. However, the inferred timings on other wards were less conclusive; in wards A and C the inferred distributions of infection times were more diffuse. We note simply that where superspreader events occur, the potential exists for multiple transmissions to occur within a short space of time.

Discussion

We have here outlined a novel approach for the inference of transmission networks. Our approach combines an evolutionary model with specific information about SARS-CoV-2 transmission dynamics to identify the most probable set of transmission events linking a set of cases of infection. Our approach builds upon previous approaches to analysing SARS-CoV-2 data from hospital settings, going beyond the identification of clusters to infer directional networks of viral transmission.

The multiple forms of data used by our method each play a critical role in network inference. Where individuals in a ward become symptomatic at similar times, sequence data provide a strong indication of whether these infections are linked via transmission or arise from completely independent events. However, the potentially short time spanned by a local outbreak may be insufficient for substantial genetic variation to accumulate in the virtual population; in such cases, other information, such as dates of symptom onset, become critical for network reconstruction.

The inference of networks allows for detailed study of how SARS-CoV-2 can spread within a clinical environment. Contacts between patients appear to be crucial, as they are primarily infected by other patients rather than through transmission from HCWs. This finding has potential implications for the application of protective measures within a hospital environment, wherease face mask usage was enforced for individuals in outpatients and for HCWs in all areas of the hospital, inpatients were not at the time of data collection subject to the same precautions. A recent study has suggested SARS-CoV-2 aerosolisation to be high in areas where patients with COVID-19 are coughing (Hamilton, 2021).

Our study is biased in its consideration of the largest clusters of infection identified in CUH wards during the first wave of the pandemic. Examining data from these clusters, we identified a pattern of superspreading, in which a small proportion of individuals were responsible for the majority of nosocomial transmission events. Our result is interesting in the context of previous studies of superspreading (Shen et al., 2004; Kucharski and Althaus, 2015), providing an example of this in a hospital context, and a case in which a small proportion of ‘superspreader individuals’ drive ‘superspreader events’. We note that, while prolonged or increased viral shedding would increase the chance of an individual becoming a superspreader, behavioural and environmental factors may also be influential.

A key feature of SARS-CoV-2 that makes infection prevention and control (IPC) particularly challenging is its significant infectivity prior to the onset of symptoms. This means that isolating staff or patients once symptoms are recognised is not sufficient to prevent transmission. The superspreaders identified here illustrate several principles for limiting the spread of SARS-CoV-2 in hospitals. First, scrupulous adherence to infection control practices including use of appropriate personal protective equipment (PPE) at all times, even on green wards and in non-clinical hospital areas, is required to limit transmission between asymptomatic patients and staff in which COVID-19 is not suspected. Use of masks by patients, including on green wards, should be instituted if tolerated, particularly when staff are present in patient bed spaces. Second, healthcare professionals must be vigilant to the possibility of hospital-onset COVID-19 and have a low threshold for testing inpatients, even where an alternative differential diagnosis for the patients' symptoms exists. Third, as soon as positive cases are confirmed, appropriate isolation and PPE precautions should be used, along with contact tracing, testing and isolation. Patients who have been in direct contact with confirmed cases on green wards should be isolated. Fourth, regular screening of asymptomatic individuals can help to identify patients and staff that may be unsuspectingly infected with COVID-19 and infectious, either pre-symptomatic, pauci-symptomatic or asymptomatic, prompting isolation and contact tracing (Jones et al., 2020). Fifth, ventilation should be improved to reduce the risk of aerosol dispersal. Of note, our recommendations concur with the current UK guidance for COVID-19 infection prevention and control (Public Health England, 2020), which have evolved during the course of the pandemic. Finally, our identification of transmission from patients to HCWs highlights the use of higher grade respiratory precaution to protect HCWs (such as FFP3 respirators) as an important topic for future research.

The potential for superspreading enhances the difficulty of controlling hospital-acquired infection, particularly as most transmission events from superspreaders to other people inferred in this study occurred within a relatively short time period. By the time a second linked case in a ward is identified, the potential exists for an index case to have infected multiple individuals, making it too late to prevent a broader outbreak. While this study does not allow for a complete characterisation of superspreading individuals, it may suggest possible risk factors in these instances such as immunosuppression (associated with prolonged shedding), more mobile behaviour that may have contributed to increased risk of transmission, and extended symptoms (fever) prior to testing and isolation (due to fever being attributed to an alternative cause).

Our inferences of transmission were performed on the basis of a largely complete dataset. Sampling of infections within wards was likely very close to complete, with sample collection from symptomatic patients and health care workers being conducted in parallel to asymptomatic screening of hospital staff. A screening programme for asymptomatic HCW was set up at CUH in April 2020 (Rivett et al., 2020) and voluntary weekly screening is currently offered to all HCW. SARS-CoV-2 seroprevalence among staff tested in CUH between 10th June and 7th August 2020 was 7.2% (Cooper, 2020). The five outbreaks described here occurred earlier in the pandemic (March to June 2020), when staff seroprevalence would have been lower. The proportion of staff with neutralising antibodies during the outbreaks was therefore low, and likely played a minor role in transmission dynamics. Sequencing was attempted for all positive samples; across the green wards data was of high quality for 55 out of 71 individuals for whom data was collected (>80% unambiguous nucleotides with no more than one ambiguous nucleotide at at a variant site) consensus viral genome in addition to data describing their location during the period of the study.

We acknowledge several limitations to our study. There is potential for missing or incomplete data, with some aspects of the data more vulnerable than others to omission. Asymptomatic screening was offered to all staff working on the five wards analysed in the study during the outbreaks. It is theoretically possible that HCW could have caught COVID-19 early on in the outbreaks and cleared the virus quickly, becoming negative at time of testing, or caught the virus asymptomatically after the screening test, or had levels of virus below the detection limit of the assay (and thus have been false negatives). However, levels of SARS-CoV-2 RNA below the assay detection limit are unlikely to be infectious (and thus not significant for the inferred transmission networks), and overall HCW testing coverage was high. Testing of asymptomatic patients varied by ward. Asymptomatic screening was done for all patients on Wards A and B during the outbreaks, and all patients entering Ward E (the only ‘red’ ward included in the study) were known SARS-CoV-2 positives. However, for Wards C and D, systematic asymptomatic screening of all patients on the ward during the outbreaks was not performed, and it is possible some asymptomatic infections (that could have contributed to the transmission networks) were missed. Data describing the wards on which patients were treated is likely to be complete, but the same ward data for HCWs may miss the potential for interactions between workers outside of their base wards for example in communal non-clinical areas within the hospital. Missing location data would lead to the non-identification of genuine contacts; our approach may therefore underestimate the number of infections caused by transmission between health care workers. Where data were missing our method does not attempt to identify cases of indirect transmission, for example invoking the presence of unobserved individuals. Only potential cases of transmission that were compatible with a model of direct transmission were included in our networks. The number of superspreader individuals identified in this study (five) is too small to draw general conclusions on superspreader characteristics. Moreover, it is not possible to disentangle whether superspreading was driven mainly by individual factors (such as infectivity or behaviour) or environmental factors (such as patient placement and ventilation at time of peak infectivity), or a combination of these. Ct values can vary for many reasons including the timing of sampling during COVID-19 infection, sampling type and technique, viral transport, sample preparation and variability between PCR runs. The finding that Ct values did not vary significantly between superspreader and non-superspreader individuals should therefore be interpreted with caution.

In conclusion, we have here applied a combined statistical approach to infer and examine SARS-CoV-2 transmission networks within a hospital environment during the first wave of the pandemic in the United Kingdom. For the largest ward outbreaks of hospital-onset COVID-19, the majority of transmission was driven by a small proportion of individuals. Future developments could include exploring the impact of variables that may be associated with an increased transmission risk. Examples would include novel SARS-CoV-2 variants such as B1.1.7 and B1.617.2, which appear to be more readily transmissible (Rambaut et al., 2020), patient characteristics such as immunosuppression which are associated with prolonged viral shedding (Avanzato et al., 2020) and environmental factors such as patient placement and room ventilation. Nevertheless, this unusually comprehensive dataset has provided detailed insights into the processes of hospital-based transmission. Combining data from multiple sources into a single analysis provides increased resolution and insight into the pervasive problem of nosocomial viral transmission.

Share this article

Cite this article

Preliminary analysis of the data with A2B-Covid.

Figure 1—source data 1

Maximum likelihood transmission networks for wards A to D.

Maximum likelihood transmission network for ward E.

Models of viral transmission.

Figure 4—source data 1

Case numbers in the five major ward clusters.

Overview of events on different wards.

Author details

Christopher JR Illingworth

Contribution

Contributed equally with

For correspondence

Competing interests

William L Hamilton

Contribution

Contributed equally with

Competing interests

Ben Warne

Contribution

Competing interests

Matthew Routledge

Contribution

Competing interests

Ashley Popay

Contribution

Competing interests

Chris Jackson

Contribution

Competing interests

Tom Fieldman

Contribution

Competing interests

Luke W Meredith

Contribution

Competing interests

Charlotte J Houldcroft

Contribution

Competing interests

Myra Hosmillo

Contribution

Competing interests

Aminu S Jahun

Contribution

Competing interests

Laura G Caller

Contribution

Competing interests

Sarah L Caddy

Contribution

Competing interests

Anna Yakovleva

Contribution

Competing interests

Grant Hall

Contribution

Competing interests

Fahad A Khokhar

Contribution

Competing interests

Theresa Feltwell

Contribution

Competing interests

Malte L Pinckert

Contribution

Competing interests

Iliana Georgana

Contribution

Competing interests

Yasmin Chaudhry

Contribution

Competing interests

Martin D Curran

Contribution

Competing interests

Surendra Parmar

Contribution

Competing interests