Retrospective time series analysis of veterinary laboratory data: Preparing a historical baseline for cluster detection in syndromic surveillance

https://doi.org/10.1016/j.prevetmed.2012.10.010Get rights and content

Abstract

The practice of disease surveillance has shifted in the last two decades towards the introduction of systems capable of early detection of disease. Modern biosurveillance systems explore different sources of pre-diagnostic data, such as patient's chief complaint upon emergency visit or laboratory test orders. These sources of data can provide more rapid detection than traditional surveillance based on case confirmation, but are less specific, and therefore their use poses challenges related to the presence of background noise and unlabelled temporal aberrations in historical data. The overall goal of this study was to carry out retrospective analysis using three years of laboratory test submissions to the Animal Health Laboratory in the province of Ontario, Canada, in order to prepare the data for use in syndromic surveillance. Daily cases were grouped into syndromes and counts for each syndrome were monitored on a daily basis when medians were higher than one case per day, and weekly otherwise. Poisson regression accounting for day-of-week and month was able to capture the day-of-week effect with minimal influence from temporal aberrations. Applying Poisson regression in an iterative manner, that removed data points above the predicted 95th percentile of daily counts, allowed for the removal of these aberrations in the absence of labelled outbreaks, while maintaining the day-of-week effect that was present in the original data. This resulted in the construction of time series that represent the baseline patterns over the past three years, free of temporal aberrations. The final method was thus able to remove temporal aberrations while keeping the original explainable effects in the data, did not need a training period free of aberrations, had minimal adjustment to the aberrations present in the raw data, and did not require labelled outbreaks. Moreover, it was readily applicable to the weekly data by substituting Poisson regression with moving 95th percentiles.

Introduction

Surveillance has shifted in the last two decades towards systems capable of early detection of disease (Shmueli and Burkom, 2010). Modern biosurveillance systems are designed to take advantage of data assumed to contain signatures of healthcare-seeking behaviours, which are not as specific as diagnosis, but allow for more rapid detection, and can be aggregated as syndromes. Surveillance based on these types of data is therefore referred to as syndromic surveillance (Centers for Disease Control and Prevention (CDC), 2006). A recent review of syndromic surveillance initiatives in veterinary medicine (Dórea et al., 2011) indicated that opportunistic data sources are difficult to find in animal surveillance due to the scarcity of computerized, automatically collected data.

The secondary use of clinical animal data, whether computerized or not, also relies on the voluntary participation of veterinarians and/or producers. One alternative to relying on data shared voluntarily is the exploitation of automatically collected laboratory submission data (Stone, 2007). Laboratory test results have been analyzed retrospectively to detect temporal clustering of bacterial pathogens in public health (Dessau and Steenberg, 1993, Hutwagner et al., 1997, Widdowson et al., 2003) and veterinary medicine (Carpenter, 2002, Zhang et al., 2005). The use of submission data, however, more properly fits the purposes of syndromic surveillance, as test requests are available earlier, though provide less specificity, than test results. Despite having lower population coverage than clinical data, laboratory data are generally stored in computerized systems, and have been available over relatively lengthy periods of time, meaning that historical analyses are usually possible.

When historical computerized data are available, a key challenge involves the construction of outbreak-free baselines, as any outbreaks will typically not be labelled, nor will their shape and magnitude be known (Shmueli and Burkom, 2010). Detection of abnormal behaviour in prospective analysis is based on either modelling and removing expected background (model-driven methods) or comparing profiles to similar data from unaffected populations (data-driven methods) (Yahav and Shmueli, 2007, Shmueli and Burkom, 2010). In both cases, a baseline free of outbreaks is necessary: in the former case to create models of expected behaviour, and in the latter to serve as a comparison to the data being tested. Historical data can provide a baseline for temporal aberration detection algorithms, but data quality and influence of past outbreaks are challenges to overcome when determining ‘typical’ background behaviour against which the presence of abnormalities can be investigated (Shmueli and Burkom, 2010).

The overall goal of this study was to carry out retrospective analysis using three years of laboratory test submissions, related to health events in cattle, made to the Animal Health Laboratory in the province of Ontario, Canada. These historical data were analyzed for their potential use in syndromic surveillance. The retrospective analysis had two specific objectives. The first was to conduct time series analysis in order to discover explainable patterns in the data, such as day-of-week or seasonal effects as well as global trends. The second objective was to identify a procedure that could adequately describe the “normal behaviour” for each syndrome, separating the background behaviour from temporal aberrations present in the historical laboratory test request data.

Section snippets

Data source

The Animal Health Laboratory (AHL) is a full-service veterinary diagnostic laboratory that serves livestock, poultry and companion animal veterinarians in the province of Ontario, Canada. The AHL is part of the University of Guelph and is an integral part of the Ontario Animal Health Surveillance Network (OAHSN).

The AHL has a Laboratory Information Management System (LIMS) that is primarily used for reporting the results of diagnostic tests and for administrative purposes, but can also be used

Case definition and syndromic groups

The complete list of syndromic groups is shown in Table 1. A choice to monitor daily only those syndromes with median counts greater than one submission per day was made; the remaining syndromes were grouped into weekly counts. Syndromic groups merged into larger groups are also shown in the table, with details provided in the numbered footnotes. The AHL primarily operates on weekdays, with selected emergency testing available outside of usual business hours. Test requests are entered in the

Discussion

Syndromic surveillance operates under the assumption that anomalies indicative of disease outbreaks can be detected when information is monitored continuously (Shmueli and Burkom, 2010). Signatures of outbreaks can be obscured in the data by explainable factors, such as day of the week or seasonal effects, autocorrelation and global trends (Lotze et al., 2008).

In this work three years of laboratory test requests from the Animal Health Laboratory at the University of Guelph, Ontario, were

Conclusion

Successful identification of outbreak signatures in population data, the primary goal of syndromic surveillance, depends on identifying and removing explainable variation from the noisy background of normal behaviour. Three years of laboratory test request data from the Animal Health Laboratory in Ontario were analyzed retrospectively in order to identify such explainable factors. Day-of-week and month effects were found to be the only relevant effects that required removal. Poisson regression

Conflict of interest

The authors declare no conflict of interest.

Acknowledgments

This project was supported by the OMAFRA-UG Agreement through the Animal Health Strategic Investment fund (AHSI) managed by the Animal Health Laboratory of the University of Guelph.

References (26)

  • D.L. Buckeridge et al.

    Algorithms for rapid outbreak detection: a research synthesis

    J. Biomed. Inform.

    (2005)
  • F.C. Dórea et al.

    Veterinary syndromic surveillance: current initiatives and potential for development

    Prev. Vet. Med.

    (2011)
  • C.A. Bradley et al.

    BioSense: implementation of a national early event detection and situational awareness system

    MMWR. Morb. Mortal. Wkly. Rep.

    (2005)
  • H.S. Burkom et al.

    Automated time series forecasting for biosurveillance

    Stat. Med.

    (2007)
  • T.E. Carpenter

    Evaluation and extension of the cusum technique with an application to salmonella surveillance

    J. Vet. Diagn. Invest.

    (2002)
  • Centers for Disease Control and Prevention (CDC), 2006. Annotated bibliography for syndromic surveillance. Available...
  • R.B. Dessau et al.

    Computerized surveillance in clinical microbiology with time series analysis

    J. Clin. Microbiol.

    (1993)
  • Dórea, F.C., Muckle, A.C., Kelton, D., McClure, J.T., McEwen, B.J., McNab, W.B., Sanchez, J., Revie, C.W. Exploratory...
  • Y. Elbert et al.

    Development and evaluation of a data-adaptive alerting algorithm for univariate temporal biosurveillance data

    Stat. Med.

    (2009)
  • L.C. Hutwagner et al.

    Using laboratory-based surveillance data for prevention: an algorithm for detecting salmonella outbreaks

    Emerg. Infect. Dis.

    (1997)
  • O. Ivanov et al.

    Detection of pediatric respiratory and gastrointestinal outbreaks from free-text chief complaints

    AMIA Annu. Symp. Proc.

    (2003)
  • James, D., Hornik, K., 2010. Chron: chronological objects which can handle dates and times. R package version 2.3-39....
  • T. Lotze et al.

    Implementation and comparison of preprocessing methods for biosurveillance data

    ADS

    (2008)
  • Cited by (14)

    • A case study of time-series regression modeling: Risk factors for pond-level mortality of farmed grass carp (Ctenopharyngodon idella) on a southern Chinese farm

      2018, Aquaculture
      Citation Excerpt :

      Data extracted from daily records are well suited for the analysis of temporal associations by time-series regression (TSR) methods, which combine the concepts of ordinary regression and time series analysis to allow exploration of associations of outcomes with time-varying factors, such as management interventions or changes in temperature (Bhaskaran et al., 2013; Bernal et al., 2017). Although widely described, investigated analytically and applied in environmental epidemiology and public health intervention studies (Bell et al., 2004; Zeger et al., 2006; Imai et al., 2015; Bernal et al., 2017), TSR has had limited use in animal health studies (Lloyd et al., 2000; Levine and Moore, 2009; Dórea et al., 2012; Lee et al., 2013). There are two recent publications involving TSR analyses in farmed aquatic animals (Gustafson et al., 2016; Piamsomboon et al., 2016), but no previous studies on warm-water finfish.

    • Vetsyn: An R package for veterinary syndromic surveillance

      2015, Preventive Veterinary Medicine
      Citation Excerpt :

      As the analyses are performed, additional slots are added to the object to store the results of syndromic monitoring, as detailed below. When setting up syndromic monitoring from available data the first step is a retrospective evaluation of the data, with summary statistics and analysis of temporal effects (Dórea et al., 2013b; Lotze et al., 2008). A single function (retro_summary) will generate a markdown file with R codes to perform all basic exploratory analysis detailed in Dórea et al. (2013b):

    • Syndromic surveillance using laboratory test requests: A practical guide informed by experience with two systems

      2014, Preventive Veterinary Medicine
      Citation Excerpt :

      Health data are typically subject to temporal effects such as day-of-week, season, global trends, as well as to the influence of various random factors not related to disease burden. The challenges of continuously monitoring health data and dealing with such effects have been thoroughly discussed in the literature (e.g. Lotze et al., 2008; Shmueli and Burkom, 2010; Yahav and Shmueli, 2007; Dórea et al., 2012). Exploring all of the methods proposed in these references and choosing the most appropriate alternative was part of each system's development process.

    View all citing articles on Scopus
    View full text