Keywords
neonatal vital sign measurement, monitoring, heart rate, respiratory rate, accuracy, validation
neonatal vital sign measurement, monitoring, heart rate, respiratory rate, accuracy, validation
There is a high risk of mortality during the neonatal period, particularly in resource-constrained settings1. Continuous monitoring of neonatal vital signs enables early detection of physiological deterioration and potential opportunities for lifesaving interventions2–4. The development of innovative, non-invasive, multiparameter continuous physiological monitoring (MCPM) technologies specifically for neonates offers the promise of improving clinical outcomes in this vulnerable population.
A neonate's marked physiological variability, small size, and often fragile condition can offer challenges when measuring and monitoring vital signs. A lack of neonatal clinical validation standards further undermines the development of MCPM technologies clinically validated specifically for neonates. Determining the accuracy of new MCPM technologies is an essential step in bringing these technologies to market5,6.
The Evaluation of Technologies for Neonates in Africa (ETNA) platform aims to independently establish the accuracy and feasibility of novel MCPM technologies suitable for use in neonates in resource-constrained settings7. To determine accuracy and agreement, new technologies are compared against existing reference methods or technologies8. However, before the comparison process can proceed, a clinical reference verification step is necessary to determine appropriate accuracy thresholds7. These a priori thresholds determine the target level of agreement required and thus, the success or failure of an investigational technology. This study describes the clinical reference technology verification processes conducted to determine appropriate heart rate (HR) and respiratory rate (RR) thresholds in subsequent accuracy comparisons.
This was a cross-sectional study which aimed to identify the natural variation in neonatal HR and RR in order to identify appropriate accuracy thresholds for use in an accuracy comparison of MCPM technologies.
Study participants were neonates admitted for observation and care in the maternity ward, neonatal intensive care, and the neonatal high dependency units at Aga Khan University Hospital in Nairobi, Kenya (AKUHN). Between June and August 2019, caregivers were approached, recruited, and sequentially screened for enrolment by trained study staff during routine newborn intake procedures. To minimize potential selection bias, all caregivers were approached in a sequential manner, as much as possible and introduced to the study using a standardized recruitment script. Final eligibility determination was dependent on medical history results, physical examination, an appropriate understanding of the study by the caregiver, and completion of the written informed consent process (Table 1).
The Masimo Rad-97 Pulse CO-Oximeter® with NomoLine Capnography (Masimo Corporation, Irvine, CA, USA) was selected as the reference technology based on validated oxygen saturation (SpO2) accuracy measurement in neonates9–11. During study participation, trained and experienced study nurses attached the Rad-97 to neonates and conducted manual HR measurements (counting over 60-second epochs) every 10 minutes for the first hour and once per hour of participation thereafter, following World Health Organization (WHO) guidance for HR measurement in neonates12. Photoplethysmographic HR was also measured via the Masimo Rad-97 pulse oximetry skin sensor attached to the neonate’s foot. RR was measured by capnography using an infant/pediatric nasal cannula to collect the neonate’s exhaled carbon dioxide (CO2) levels. Duration of data collection length was set at a minimum of one hour, with no upper limit. Neonates exited from the study upon discharge from the ward or by caregiver request.
Using a custom Android (Google, Mountain View, CA, USA) application, raw data was collected from the Masimo Rad-97 in real-time through a universal serial bus (USB) asynchronous connection and parsed in C (Dennis Ritchie & Bell Labs, USA). Instantaneous HR was obtained from the timing of the pulse oximetry signal quality index (PO-SQI). The plethysmogram waveform was sampled at 62.5 Hz with the PO-SQI identified by the Masimo Rad-97 at the peak of each heartbeat. The CO2 waveform was sampled at approximately 20 Hz from the capnography channel. The parsed output included an accurate time stamp for each entry in the waveform data output to facilitate synchronization and analysis. Data were recorded and stored on a secure AKUHN-hosted REDCap server13.
We analyzed the CO2 waveform data using a breath detection algorithm developed in MATLAB (Math Works, USA) and based on adaptive pulse segmentation14. In addition to providing a RR, the algorithm analyzed the waveform’s shape and identified the breath duration (waveform trough to trough) for each breath. From the breath duration, we calculated a RR based on the median breath duration within the epoch. We developed a custom capnography quality score (CO2-SQI) based on capnography features to assist with data selection. HR and RR counts and medians, along with signal quality metrics from the MATLAB signal detection algorithm, were analyzed using R version 4.0.315. Epochs were selected to align precisely with the clinical observations. Capnogram waveforms were generated with two seconds added at the beginning and end of each epoch to facilitate manual breath counting within the epoch.
One of the authors (JMA, a pediatric anesthesiologist) reviewed the capnogram tracings and discarded plots with marked variability or a significant duration of an artifact that would have made breaths challenging to count. The remaining plots were provided to two trained observers to independently count all breaths within each epoch using a set of predefined rules created by the investigators (Table 2). The two independent counts were averaged, and if the number of breaths counted by the two observers varied by more than three breaths per epoch, a third trained observer independently counted the plot, and the two closest counts were averaged.
Measurement repeatability was estimated using linear mixed-effects models based on the between- and within-neonate variability for each data source using R version 4.0.316. Agreement between data collection methods was assessed using the method described by Bland-Altman for replicated observations and reported as a mean bias with 95% confidence intervals (CIs), 95% upper and lower limits of agreement (LOA), and as a root mean square deviation (RMSD)17. The aim was to identify practical threshold limits using data from the clinical reference technology verification process.
We estimated that 20 neonates with ten replications each would give a 95% CI LOA between two methods of +/-0.76 times the standard deviation (SD) of their differences. Sample size estimates for method comparison studies typically depend on the CI required around the LOA, and sample sizes of 100 to 200 provide tight CIs17. We aimed for a sample size of at least 30 neonates to ensure a diverse population and sufficient replications for tight CIs.
The study was conducted per the International Conference on Harmonisation Good Clinical Practice and the Declaration of Helsinki 2008. The protocol and other relevant study documents were approved by Western Institutional Review Board (20191102; Puyallup, Washington, USA), Aga Khan University Nairobi Research Ethics Committee (2019/REC-02 v2; Nairobi, Kenya), Kenyan Pharmacy and Poisons Board (19/05/02/2019(078)) and Kenyan National Commission for Science, Technology and Innovation (NACOSTI/P/19/68024/30253). Written informed consent was obtained in English or Swahili by trained study staff from each neonate’s caregiver according to a checklist that included ascertainment of caregiver comprehension.
Between June and August 2019, 35 neonates were enrolled, and 297 clinical observations were completed with a mean of 8.4 (SD 1.7) observations per neonate (Table 3; Figure 1) and a median data collection time of 4 hours, 5 minutes (interquartile range (IQR) 3:52-4:45)18. The manual HR measurements were found to have a non-normal distribution with skewness of 0.76 and kurtosis of 3.60 (p<0.001). The median manual HR measurement for all observations was 134 (IQR 126-143) beats per minute (bpm).
Sex | Age at participation (days) | Gestation at birth (weeks) | Weight at birth (grams) | |||||
---|---|---|---|---|---|---|---|---|
Female | Male | Other | Median | IQR | Median | Range | Median | IQR |
22 | 13 | 0 | 2 | 0-4 | 33 | 32-34 | 1500 | 1260-1600 |
The manual HR demonstrated a negative bias of -2.4 (-1.8%) compared to the median PO-SQI HR, and a marked spread between the 95% LOA of 40.3 (29.6%). The RMSD was 10.5 (7.7%). Removing data from a single outlier neonate resulted in a smaller bias of -1.4 (-1.0%), a tighter spread between the 95% LOA of 24.7 (18.2%), and a lower RMSD of 6.4 (4.7%) (Table 4; Figure 2).
Moderate repeatability was demonstrated with approximately 62% (95% CI 47%-73%) of the manual HR variability being due to differences between neonates (Table 5, Figure 3A). Since the 95% CI for manual HR crossed 50%, the between- and within-neonate variability appeared to be comparable, with neither causing significantly more variability than the other.
Repeatability1 (95% Confidence Intervals) | |
---|---|
Heart rate (n=297 epochs) | |
Manual HR | 0.62 (0.47-0.73) |
Median pulse oximetry signal quality index HR | 0.75 (0.62-0.83) |
Respiratory rate (n=130 epochs) | |
Manual RR | 0.66 (0.47-0.79) |
Algorithm-derived median RR | 0.50 (0.28-0.67) |
Algorithm-derived RR count | 0.66 (0.46-0.79) |
Manual RR from capnograms were found to have a non-normal distribution with skewness of 0.61 and kurtosis of 2.96 (p=0.027). The median manual RR measurement for all observations was 47 (IQR 39-56) breaths per minute. The manual RR compared to the algorithm-derived median RR showed a negative bias of -3.2 (-6.6%) and a marked spread between the 95% LOA of 17.9 (37.3%). The RMSD was 5.5 (11.4%). Comparing the manual RR to the algorithm-derived RR count showed a smaller bias of -0.5 (-1.1%) and a tighter spread between the 95% LOA of 4.4 (9.1%). The RMSD was 1.2 (2.5%).
The repeatability was moderate with approximately 66% (95 CI 47%-79%) of the manual RR variability due to differences between neonates (Table 5, Figure 3C). Since the 95% CI crossed 50%, the amount of between- and within-neonate variability appeared similar, with neither one resulting in significantly more variability than the other.
This reference technology clinical verification study showed minimal measurement bias with a wide spread of 95% upper and lower LOAs and similar repeatability compared with manual clinical measurements. The agreement results allowed us to identify practical HR and RR thresholds for our subsequent technology comparison evaluation. Specifically, we identified a 30% spread between the 95% upper and lower LOA. These a priori-defined thresholds were based on variability observed ten and sixty minutes apart in the same neonate and considered the natural within-neonate physiologic variability. Variability was found to be more marked in some neonates. In part, the 30% spread between 95% upper and lower LOA was selected based on the idea that thresholds should not be more stringent than the observed physiological variability, and in part, based on results from the different averaging methods (manual RR vs algorithm-derived median RR). Given the large difference in results between the two averaging methods, considerable thought should be given prior to choosing an averaging method. A random selection of real clinical data can provide appropriate guidance for selecting suitable neonatal accuracy thresholds.
Of note, one neonate (PTID9) significantly impacted the LOA for HR. Five of nine of this neonate’s manual HR measurements significantly diverged from the same epoch’s PO-SQI HR values and were significantly lower than their mean PO-SQI HR, despite having acceptable signal quality scores. This irregularity suggests a HR reading or data entry error by the study nurse. Removing this neonate’s data and re-analyzing it resulted in a smaller bias and tighter LOAs (Figure 2B).
Results from this clinical verification highlight the difficulty with existing performance thresholds. Current United States Food and Drug Administration performance thresholds for HR measurement, based on electrocardiogram measurements, may not be applicable for use in neonates or when using photoplethysmography for estimating HR19. The current UNICEF target product profile for RR measurement technology recommends a ±2 breaths per minute threshold, which may be too stringent even for use in adults20,21. Using a ±2 breaths per minute recommendation with our RR data would result in a LOA spread threshold of no more than 5%, which is half the LOA spread of our best performing RR comparison. Furthermore, a ±2 breaths per minute or 5% spread in LOA is smaller than random and natural within-neonate physiologic variability (11.5% in this study [unpublished data]) and would result in unrealistically stringent thresholds.
Selecting a performance threshold is challenging. The threshold cannot be too restrictive or inflexible, thereby stifling innovation and preventing new single parameter or MCPM technologies from reaching the market. However, too lax a threshold could result in an inaccurate representation of the underlying physiological state. One key limitation is that the true underlying HR or RR is unknown, regardless of the measurement method6,22. The primary goal of this reference technology verification study was to establish a priori thresholds as the first step of our technology comparison evaluation while at the same time understanding that the true underlying RR and HR cannot be known and also recognizing the marked physiologic variability between and within neonates.
In this study, we did not attempt to define or detect clinically meaningful events; instead, we focused on describing non-random thresholds that fall outside of normal physiological variability. We defined HR and RR thresholds based on the difference between the 95% upper and lower LOA. Additional studies will be required to determine if these thresholds translate into improved clinical outcomes.
Performance thresholds identified using this method are influenced by the characteristics of the neonates studied, the data selection methods, and the number of comparisons. For this reason, the thresholds we identified may not be applicable in different neonate cohorts, such as those receiving mechanical ventilation or immediately following birth, among others. Variability will be influenced by disturbances in the environment such as routine procedures, feeding, noise, and time of day. To minimize variability in our data set, we used only RR epochs that appeared to be regular based on visual inspection. Although these segments were selected based on predefined criteria, a majority (167/297) were discarded as the extreme variability seen in some recordings would have made reproducible manual counting of breaths impossible.
Appropriate clinical thresholds should be selected a priori when performing accuracy comparisons for HR and RR. The magnitude and importance of sample size, as well as within-neonate variability requires further investigation. A larger sample size could allow the development of an error model that more clearly describes the error due to various factors such as the measurement technology, averaging method, the observer, and the natural variability of neonatal HR and RR. We strongly support the creation of international standards for technology comparison studies in neonates. These standards should include thresholds for HR and RR based on the specific neonatal population studied and provide details of the experimental conditions, data selection methods, and analysis methods used. Together, such standards would lay the groundwork for a robust MCPM technology comparison field.
Dryad: Identification of thresholds for accuracy comparisons of heart rate and respiratory rate in neonates. https://doi.org/10.5061/dryad.1c59zw3vb18.
This project contains the following underlying data:
- Coleman-2021-ETNA-DemographicData.csv
- Raw data folder (contains all raw capnography and pleth data)
- Coleman-2021-ETNA-ProcessedPulseValues.csv
- Coleman-2021-ETNA-ProcessedRespirationValues.csv
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
We thank Dorothy Chomba, Millicent Parsimei, Annah Kasyoka, and the dedicated staff at Aga Khan University-Nairobi Hospital for providing patient care, and the neonatal participants, their caregivers, and the local community in Nairobi, Kenya, for their participation.
Views | Downloads | |
---|---|---|
Gates Open Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
No
Are the conclusions drawn adequately supported by the results?
No
References
1. Simoes EA, Roark R, Berman S, Esler LL, et al.: Respiratory rate: measurement of variability over time and accuracy at different counting periods.Arch Dis Child. 1991; 66 (10): 1199-203 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Expertise in respiratory monitoring and data analysis
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 2 (revision) 08 Oct 21 |
read | read | read | read |
Version 1 10 Jun 21 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Register with Gates Open Research
Already registered? Sign in
If you are a previous or current Gates grant holder, sign up for information about developments, publishing and publications from Gates Open Research.
We'll keep you updated on any major new updates to Gates Open Research
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)