Understanding the leading indicators of hospital admissions from COVID-19 across successive waves in the UK

Jonathon Mellor; Christopher E Overton; Martyn Fyles; Liam Chawner; James Baxter; Tarrion Baird; Thomas Ward

doi:10.1017/S0950268823001449

Understanding the leading indicators of hospital admissions from COVID-19 across successive waves in the UK

Published online by Cambridge University Press: 04 September 2023

Jonathon Mellor ,

Christopher E Overton ,

Tarrion Baird and

Jonathon Mellor: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK
Christopher E Overton: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK Department of Mathematical Sciences, University of Liverpool, Liverpool, UK Department of Mathematics, University of Manchester, Manchester, UK
Martyn Fyles: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK Department of Mathematics, University of Manchester, Manchester, UK
Liam Chawner: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK
James Baxter: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK
Tarrion Baird: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK Department of Pathology, University of Cambridge, Cambridge, UK
Thomas Ward*: Affiliation:
UK Health Security Agency, Data, Analytics and Surveillance, Nobel House, London, UK
*: Corresponding author: Thomas Ward; Email: Tom.Ward@ukhsa.gov.uk

Article contents

Abstract
Introduction
Methodology
Data
Results
Discussion
Conclusion
Data availability statement
Author contribution
Financial support
Competing interest
References

Rights & Permissions

Abstract

Following the end of universal testing in the UK, hospital admissions are a key measure of COVID-19 pandemic pressure. Understanding leading indicators of admissions at the National Health Service (NHS) Trust, regional and national geographies help health services plan for ongoing pressures. We explored the spatio-temporal relationships of leading indicators of hospitalisations across SARS-CoV-2 waves in England. This analysis includes an evaluation of internet search volumes from Google Trends, NHS triage calls and online queries, the NHS COVID-19 app, lateral flow devices (LFDs), and the ZOE app. Data sources were analysed for their feasibility as leading indicators using Granger causality, cross-correlation, and dynamic time warping at fine spatial scales. Google Trends and NHS triages consistently temporally led admissions in most locations, with lead times ranging from 5 to 20 days, whereas an inconsistent relationship was found for the ZOE app, NHS COVID-19 app, and LFD testing, which diminished with spatial resolution, showing cross-correlation of leads between –7 and 7 days. The results indicate that novel surveillance sources can be used effectively to understand the expected healthcare burden within hospital administrative areas though the temporal and spatial heterogeneity of these relationships is a key determinant of their operational public health utility.

Keywords

COVID-19 healthcare pressure Hospitalisation leading indicators Syndromic surveillance

Type: Original Paper
Information: Epidemiology & Infection , Volume 151 , 2023 , e172

DOI: https://doi.org/10.1017/S0950268823001449 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Introduction

The cessation of mass community testing has hampered the ability to understand contemporary estimates of localised COVID-19 growth. Epidemiological prevalence surveillance studies such as the Office of National Statistics (ONS) COVID-19 Infection Survey (CIS) were produced with inconsistent spatial sampling and significant reporting lags for real-time public health policy. Therefore, the surveillance of hospital admissions and bed occupancy may more reliably capture the true growth of COVID-19 in the community in addition to the current pressures on health services. Understanding leading indicators of hospital admissions, with improved spatial precision, allows the National Health Service (NHS) to appropriately prepare, and policymakers to plan interventions. Hospital admissions are less influenced by testing ascertainment rates than by community testing; however, they are more impacted by the age composition of incidence due to the age severity gradient and are therefore less able to detect early growth in younger ages.

A variety of model structures have been employed to predict COVID-19 admission dynamics. For example, incorporating local testing has been shown to improve admission forecasting at fine spatial scales over autoregressive models [Reference Meakin, Abbott, Bosse, Munday, Gruson, Hellewell, Sherratt, Chapman, Prem, Klepac, Jombart, Knight, Jafari, Flasche, Waites, Jit, Eggo, Villabona-Arenas, Russell, Medley, Edmunds, Davies, Liu, Hué, Brady, Pung, Abbas, Gimma, Mee, Endo, Clifford, Sun, McCarthy, Quilty, Rosello, Sandmann, Barnard, Kucharski, Procter, Jarvis, Gibbs, Hodgson, Lowe, Atkins, Koltai, Pearson, Finch, Wong, Quaife, O’Reilly, Tully and Funk1]. More complex Bayesian structural time series modelling techniques have been developed across several countries to forecast dynamics nationally [Reference Feroze2]; however, these univariate time series models struggle at epidemic turning points. In addition, causal approaches aimed at capturing herd immunity effects have been employed [Reference Friston, Flandin and Razi3], though these methods rely on assumption-driven scenarios. Transmission modelling has been used throughout the pandemic; however, these models are fit to coarse spatial scales to reduce model uncertainty [Reference Birrell, Blake, van Leeuwen, Gent and de Angelis4], which limits their operational utility.

There has been work exploring indirect surveillance approaches for disease incidence. For example, perturbations in Google Trend’s search terms have been shown to precede cases and deaths at a national level [Reference Lampos, Majumder, Yom-Tov, Edelstein, Moura, Hamada, Rangaka, McKendry and Cox5] and incorporated in neural network architectures to forecast clinical risk in the UK [Reference Ward, Johnsen, Ng and Chollet6]. These Google Trend search queries also work with other epidemic metrics of interest by improving model predictive performance of case rates, hotspot detection [Reference McDonald, Bien, Green, Hu, DeFries, Hyun, Oliveira, Sharpnack, Tang, Tibshirani, Ventura, Wasserman and Tibshirani7], and deaths at a US state level [Reference Ma and Yang8]. However, the surveillance data sources are not limited to search engine records; during the COVID-19 pandemic mobility measurements, social media and wearable technology have been explored to forecast healthcare pressures [Reference Gerlee, Karlsson, Fritzell, Brezicka, Spreco, Timpka, Jöud and Lundh9–Reference Conroy, Silva, Mehraei, Damiano, Gross, Salvati, Feng, Schneider, Olson, Rizzo, Curtin, Frassica and McFarlane12]. These applications are broader than just COVID-19 – digitised syndromic surveillance (including search engines, news reports, social media, clinician search queries, and crowdsourcing apps) has been used effectively to monitor other disease pressures including influenza and Zika [Reference Ginsberg, Mohebbi, Patel, Brammer, Smolinski and Brilliant13–Reference Lampos and Cristianini17].

Indicator–admissions temporal relationships are not consistent due to changes in behaviour, population immunological response, testing coverage, and antigenic drift/shift. Novel variants of COVID-19 have had distinct epidemiological characteristics affecting this temporal relationship. The extent of a variant’s immunological evasion to prior immunity impacts the incidence growth rate and the rate of spatial dispersion [Reference Jassat, Abdool Karim, Mudara, Welch, Ozougwu, Groome, Govender, von Gottberg, Wolter, Wolmarans, Rousseau, Blumberg and Cohen18]. Novel variants have unique severity profiles [Reference Wrenn, Pakala, Vestal, Shilts, Brown, Bowen, Strickland, Williams, Mallal, Jones, Schmitz, Self and das19], for instance, relative to wild type, the Alpha variant was estimated to have a 62% (HR – 1.62 (95% CI: 1.48, 1.78)) increased risk of hospitalisation [Reference Grint, Wing, Houlihan, Gibbs, Evans, Williamson, McDonald, Bhaskaran, Evans, Walker, Hickman, Nightingale, Schultze, Rentsch, Bates, Cockburn, Curtis, Morton, Bacon, Davy, Wong, Mehrkar, Tomlinson, Douglas, Mathur, MacKenna, Ingelsby, Croker, Parry, Hester, Harper, DeVito, Hulme, Tazare, Smeeth, Goldacre and Eggo20]. This evolving relationship between infection and hospitalisation impacts the temporal relationship between indicators and admissions. The COVID-19 vaccination campaign began in the UK in December 2020, reaching 150 million total doses across the first, second, spring, and autumn booster doses [Reference Curtis, Inglesby, Morton, MacKenna, Green, Hulme, Walker, Morley, Mehrkar, Bacon, Hickman, Bates, Croker, Evans, Ward, Cockburn, Davy, Bhaskaran, Schultze, Rentsch, Williamson, Rowan, Fisher, McDonald, Tomlinson, Mathur, Drysdale, Eggo, Wing, Wong, Forbes, Parry, Hester, Harper, O’Hanlon, Eavis, Jarvis, Avramov, Griffiths, Fowles, Parkes, Douglas, Evans, Smeeth and Goldacre21, 22]. The vaccination, in combination with high population infection attack rates over successive waves of SARS-CoV-2 incidence, has led to an increasingly complex picture of immunity, at individual and population levels, against SARS-CoV-2 infection [Reference Dan, Mateus, Kato, Hastie, Yu, Faliti, Grifoni, Ramirez, Haupt, Frazier, Nakao, Rayaprolu, Rawlings, Peters, Krammer, Simon, Saphire, Smith, Weiskopf, Sette and Crotty23, Reference Khoury, Najjar-Debbiny, Elemy, Jabbour, Haj, Abu-Sini, Yasin, Amin, Hellou, Nasrallah, Saffouri and Hakim24], which impacts syndromic surveillance efforts to understand the spatio-temporal infection burden.

We have evaluated leading indicators of COVID-19 hospitalisations during the Omicron BA.1, BA.2, and BA.4/5 variant waves of 2021/2022. This analysis has been conducted at a National Health Service (NHS) Trust geographic scale (local groups of secondary care providers) [25]. We use a variety of methods to assess temporal relationships between the indicator and COVID-19 hospital admission at a high spatial resolution, including Granger causality, cross-correlation analysis, and dynamic time warping.

Methodology

The data assessed were available at different geographic designations and with varying quality. Hospital admission counts by date are provided by the NHS England (NHSE) daily COVID-19 hospital situational report [26]. This contains Trust-level hospital admissions stratified by age, with bed occupancy and staff absence counts. Google Trends [27] was curated to capture search query trends relevant to syndromic surveillance of COVID-19. NHS 111 calls and online pathways [28] were provided by the NHS, with COVID-19-relevant treatment pathways extracted. ZOE Health [29] provided counts of crowdsourced self-reported symptoms from the ZOE app. Lateral flow device (LFD) testing data were accessed from the UK COVID-19 dashboard [30]. Aggregated NHS COVID-19 app [31] metrics were extracted from data made available by the UKHSA.

Several other data sources were explored for feasibility but were excluded from this study as they were determined to be of limited utility in an operational context. Data sources with a transfer/access latency of greater than one week were excluded, as were data sources that lagged admissions. Primary care and general practitioner calls were excluded due to incompleteness of their national spatial coverage. School attendance reports were not evaluated due to the substantial reporting lag and data availability, which hampers utility in operational settings. PCR testing data were excluded due to changes in mass testing policies, which impacted eligibility and spatial coverage. Care home data were excluded due to highly heterogeneous spatial coverage and a lagging relationship with community transmission. The Office for National Statistics Covid Infection Survey of community positivity was explored, and we estimated a high correlation with hospital admission, see Supplementary Section A, Figure S1. However, further analysis of these data was not conducted as the quantity of sampled tests in this infection study was not informative at smaller geographic levels (like NHS Trust) nor is it released in a timeframe useful for real-time analysis. Data availability is given in Supplementary Table S1. We analysed the data sources in real time; however, for lateral flow device data, the analysis was conducted using specimen date, which is impacted by data correction over time.

Data

Admissions

English hospital admissions, provided by NHS England [29], are reported at a Trust level - a collection of secondary care providers. A COVID-19 admission is defined as a patient who had a positive test upon arrival to hospital, or within the past 24 hours while an inpatient. These counts therefore include admissions for COVID-19, incidental presentations, and hospital acquired infection. NHSE data are provided daily, with each individual Trust submitting the web form by 11 am for the preceding 24 hours. Due to the fast operational turnaround and considerable number of hospitals, there are some missing entries per day and occasional inaccurate values reported. The admissions data were reconstructed to provide one record per Trust per day, with resulting missing data being imputed using the last observation carried forward. Organisational mergers were coded manually ensuring records were accurate to the end of the study date. Hospitals with fewer than 10 admissions in 2022 or clear non-acute specialisations were removed from the analysis as they represented misreporting, non-COVID specialisations, or purely incidental admissions, leaving 121 acute Trusts. Admissions are presented from 01 October 2021 to the 29 August 2022, covering the end of the Delta plateau and the BA.1, BA.2, and BA.4/5 Omicron waves in 2021/2022.

Surveillance data

Google Trends

A large set of potentially predictive Google Trends search terms were analysed in previous research to determine which terms should be evaluated [Reference Ward, Johnsen, Ng and Chollet6]. Initially, over 1,000 terms were captured from the most common phrases used within NHS Pathway 111 telephonic COVID-19 triages, COVID-19 symptoms, over-the-counter medicine, and natural language variations on requests for tests. These terms were screened for relevance and relative occurrence at a national level within the Google Trends web interface. Analysis at a national level using generalised additive models with a negative binomial error structure and dynamic time warping were used to assess each term’s relevance for COVID-19 incidence.

From the wide range of potential terms, 84 COVID-19-relevant Google Trends search term volume scores were collected hourly and aggregated to daily scores per lower tier local authorities (LTLAs) across the UK [Reference Ward, Johnsen, Ng and Chollet6]. An LTLA is an administrative geography within the UK, capturing a district of local government, smaller than a county. The Google Trends scores relate to a ranked relative search volume in a city. We transform the city-level data to LTLA using a coordinate mapping. The LTLA granularity for the London region is limited as Google defines London as central London, and then outer London areas, reducing precision when mapping to London LTLAs. Due to the considerable number of Google Search Terms collected, similar terms were grouped together with the aim of increasing the signal-to-noise ratio. COVID-19 symptom search terms were grouped into ‘common’, ‘rare’, and ‘severe’ terms. Terms relating to general symptoms and tests were combined. General COVID-19 terms such as ‘coronavirus’ were grouped, and terms such as ‘tier system’ relating to policies no longer in effect were combined. The terms and processing logic used are outlined in Supplementary Section B.

NHS 111 pathways

NHS 111 provides non-emergency health advice based on an individual’s symptoms [30]. A user follows the triage process, inputs symptoms and receives healthcare advice (treatment), either through the online or telephone service. Data are presented as counts by day for given pathway outcomes, stratified by age, gender, and Lower Super Output Area (LSOA), an area more granular than LTLA. COVID-19 treatments are aggregated together into clinical assessment, ambulance, or self-care – stratified by age. The ages were binned into three groups to increase the counts per group. These age groups were 0–19, 20–59, and 60 and over.

LFD tests

Lateral flow devices (LFDs) are self-administered rapid tests allowing for real-time detection of COVID-19 by an individual. The tests were provided free universally until 01 April 2022, with the data obtained from the UK COVID-19 Dashboard [31]. An aggregation of positive and total test counts was made available daily at the LTLA level. There can be latency in this dataset due to upload delays for the specimen date of test; however, we did not have access to the historical real-time data, and therefore, analysis included the complete backfilled data. From the LFD data, a positivity rate for tests was calculated using positive and total test counts. A metric of test counts per capita was calculated using the associated population size of an NHS Trust.

NHS COVID-19 app

The NHS COVID-19 app [Reference Menni, Valdes, Freidin, Sudre, Nguyen, Drew, Ganesh, Varsavsky, Cardoso, el-Sayed Moustafa, Visconti, Hysi, Bowyer, Mangino, Falchi, Wolf, Ourselin, Chan, Steves and Spector32] produces daily aggregated metrics of app events for analysis. These events covered both app-specific metrics, such as downloads, app store ratings, and users, as well as contact tracing-relevant processes such as notifications of exposure. The data were provided at an LTLA level, with a weekly release schedule within the UKHSA, with epidemiologically relevant events – contact exposure notifications of users and reported positive tests via the app interface.

ZOE COVID-19 study app

The ZOE app allows users to self-report COVID-19 relevant symptoms daily with the aim of capturing an up-to-date picture of the pandemic [Reference Meakin, Abbott and Funk33]. The data are stratified by age, ethnicity, sex, healthcare worker status, and LTLA. Due to the sparsity of counts at this high resolution, the counts were aggregated as total symptom counts per LTLA per day. The different symptom counts were combined into categories, ‘common’, ‘severe’, ‘rare’, and ‘irrelevant’. Groupings for all variables are further described in Supplementary Section C, Table S2.

Processing

Spatial mapping

NHS 111, ZOE app, LFD tests, and NHS COVID app leading indicators are reported in LTLA geographies (or geographies that are strict subregions of LTLAs) and therefore cannot be aggregated directly to hospital admissions at a Trust level. A mapping is therefore required to relate LTLA-level data sources to NHS Trust admissions. Using the methodology based on the covid19.nhs.data R package [Reference Meakin, Abbott and Funk33], we produced a more contemporary probabilistic mapping using count data from the SUS APC (Secondary Use Service, All Patient Care) hospitals’ admitted patient database. The data extracted were from the 6 months preceding the study and contain test-confirmed admissions (at a Trust level) and discharge locations (the associated LTLA). From these records, we calculate a proportion of people from a local area (LTLA) who were admitted to a specified Trust. Using the proportion of people from an LTLA who attend a Trust, we use the residential population size of that LTLA to calculate the weighted population size for a Trust, referred to as population size in this study. The LTLA residential population counts were obtained from the 2019 mid-year population estimates of each LTLA from the Office for National Statistics. The same mapping between LTLA and Trust allows us to convert LTLA indicator data sources to the hospital Trust level. The distribution of populations across NHS Trusts is shown in Supplementary Figure S2, compared to counties and LTLAs, showing they are of a comparable scale. For the Google Trends data, a mapping was used, which combines London geographies. Leading indicator sources are first aggregated to the LTLA level and then transformed with the Trust–LTLA mapping with a weighted sum to determine the effective impact of an indicator on a Trust.

Scaling and smoothing

All indicators were smoothed using a locally estimated scatterplot smoothing (LOESS) method to reduce noise and scaled between 0 and 1 by Trust to allow comparison between hospitals. For dynamic time warping, the variables were z-score normalised.

Evaluation

Data sources were evaluated against present admissions and admissions in 14 days. We looked at present admissions to understand the use of the indicator as a real-time proxy for admissions. Following discussion with public health practitioners, 14 days was determined as a meaningful time window to act upon leading indicator insight. Consultation on the time length included those creating situational awareness products, senior public health leaders, and colleagues within the NHS. Leading relationships less than 14 days are still of interest for surveillance.

To understand how the leading relationships have changed over time, the data are analysed across the recent Omicron epidemic waves, with each wave being evaluated independently for the Granger causality and cross-correlation approaches. The timings defined for these waves are given in Supplementary Table S3.

Granger causality

The Granger causality test estimates if a time series (indicator) can linearly forecast another time series (admissions), not whether there is a causal relationship [Reference Granger34]. The test uses lagged time series and a combination of t-tests and f-tests to determine if the indicator meaningfully adds explanatory power for predicting admissions. Two regression models were constructed, one contains the explanatory time series (indicator) and the other does not. The comparison between the two models tells us whether the explanatory time series adds useful information in predicting the response. We have two time series $ {x}_t $ and $ {y}_t $ , the indicator and admissions, respectively, at time $ t $ . We then construct two regression equations, giving $ {y}_t $ explained firstly by lags of $ {y}_t $ and lags of $ {x}_t $ and secondly by lags of $ {y}_t $ alone

(1)

$$ {y}_t={\alpha}_0+{\sum}_{j=1}^m{\alpha}_j{y}_{t-j}+{\sum}_{j=1}^m{\beta}_j{x}_{t-j}+{\epsilon}_t, $$

(2)

$$ {y}_t={\alpha}_0+{\sum}_{j=1}^m{\alpha}_j{y}_{t-j}+{\epsilon}_t, $$

where $ {\alpha}_j $ and $ {\beta}_j $ are the regression coefficients for lag $ j $ . The null hypothesis is that

$$ {H}_0:{\beta}_1={\beta}_2=\dots ={\beta}_m=0, $$

and an f-test is performed on the two models (1) and (2) to determine the effect of $ x $

$$ F=\frac{\left(\frac{\left( RS{S}_1- RS{S}_2\right)}{p_2-{p}_1}\right)}{\left(\frac{RS{S}_2}{n-{p}_2}\right)}, $$

where $ n $ is the number of data points, $ {p}_1 $ and $ {p}_2 $ are the number of parameters in (1) and (2), and RSS is the residual sum of squares. We take the maximum as lag $ m=3 $ ; therefore, lags 1, 2, and 3 days are used, with larger numbers of lags reducing the power of the tests. Due to the spatial variation in trends and behaviours at the hospital level, Granger causality tests were performed per Trust, rather than at higher aggregations. A test is performed between an indicator and current hospital admissions, as well as between the indicator and hospital admissions in 14 days, to test the relationship within a practically useful temporal distance. Using both times, we can understand whether an indicator leads admissions and if the indicator leads admissions enough to be useful.

Cross-correlation

A linear time delay analysis using cross-correlation functions (CCFs) allows us to calculate the cross-correlation between indicators and admissions, producing scores over different lead times [Reference Vio and Wamsteker35].

Given two times $ {x}_t $ and $ {y}_t $ where $ t=0,1,2\dots N-1 $ and that $ {m}_x $ and $ {m}_y $ are the respective means, then the cross-correlation $ {R}_{xy} $ at delay $ d $ is

$$ {R}_{xy}=\frac{\sum \limits_i\left[\Big({x}_t-{m}_x\right)\times \left({y}_{t-d}-{m}_y\right)\Big]}{\sqrt{\sum \limits_i{\left({x}_t-{m}_x\right)}^2}\sqrt{\sum \limits_i{\left({y}_{t-d}-{m}_y\right)}^2\;}}. $$

We define an ‘optimal lead time’ as the lead day $ d $ , fewer than 30 days, with maximum CCF between indicator and admissions. Within 30 days was selected to avoid detecting periodic effects in the growth–peak–decline–plateau cycle of admissions.

Dynamic time warping

Dynamic time warping (DTW) calculates the non-linear alignment between two sequences of values. The algorithm creates a mapping (warping curve) between the sequences, which we analyse to understand how indicators and admissions relate over time. The algorithm aims to find the minimal path along the warping curve which aligns the two time series, applied using the R dtw package [Reference Giorgino36]. Further details on how the method works are provided in [Reference Senin37].

We aim to find the optimal warping curve $ \phi $ between $ {x}_i $ and $ {y}_j $ . For DTW, each index in time series $ {x}_i $ must match at least one time index $ {y}_j $ , and the subsequent matches from one index to the next must be monotonically increasing. The algorithm finds the best index matching, which minimises the size of $ \phi $ , the sum of absolute differences in value for each matched pair of indexes.

$$ D\left(x,y\right)=\underset{\phi }{\min }{d}_{\phi}\left(x,y\right). $$

The warping between time series can be calculated in a multivariate context (multiple pairs of time series) by minimising the joint distance across the indexes of $ x $ and $ y $ , $ i $ and $ j $ along the column $ C $ .

$$ d{\left(i,j\right)}^2={\sum}_{c=1}^C{\left({x}_{ic}-{x}_{jc}\right)}^2. $$

To produce sensible results, we place restrictions on the warping curve. Firstly, a 35-day window is used (specifically a Sakoechiba window) to avoid unrealistically long lead times. An asymmetric step pattern with a P2 slope constraint was chosen to allow for a leading alignment between the time series. The specifics of these parameters are addressed in further detail within the DTW literature [Reference Senin37, Reference Sakoe and Seibi38]. From the DTW, we can understand the alignment and lead/lag relationship between the time series in a non-linear fashion, identifying leads at different points in time and epidemic phases by analysing which index pairs are matched. Open start and end conditions were chosen to best capture the leading relationships as not all sequence points of the time series are available, which can cause a beginning or end ‘bunching’ effect, distorting calculated lead times [Reference Tormene, Giorgino, Quaglini and Stefanelli39]. We analyse the alignments by calculating the difference between the index of the indicator with its matched index in the admissions series, which tells us how far the indicator is ahead of the admissions – a lead time. Additionally, as part of the DTW algorithm, we can extract the cumulative warping $ \phi $ between the indicator and admissions, which is a measure of how much manipulation is needed (how different the time series are) to map between the sequences – the warping distance. As time series can be different lengths, or have partial matches, we use the normalised distance to compare how well the indicators match admissions – with greater distances corresponding to larger leads.

Data operations

To be a useful indicator, a data source must be available in near-real-time to capitalise on leading information detected and make decisions before the associated admissions occur. Data sources with substantial reporting latency are not viable candidates for operational indicators. Such a ‘leading indicator’ may have led the measurement in question, but if they are not available to analyse in real time, they cannot have an impact on real-time decisions and were therefore not assessed. Additionally, completion lags, or backfilling, were considered as a major limitation for some data sources as they are unreliable in the near term.

Results

The indicators in this study vary in their magnitude and timing relative to the successive Omicron admission waves – examples of this effect are shown in Figure 1.

Figure 1. A time series plot of example variables from each data source’s indicators, aggregated nationally, with reference to national admissions (red dashed line). Indicators that lead admissions well should appear shifted leftward of the admissions line. Indicators and admissions are scaled between 0 and 1 to allow for easy visual comparison of temporal offsets.