Introduction

Patients who initially survive aneurysmal subarachnoid hemorrhage (aSAH) are at risk of further neurological complications, including delayed cerebral ischemia (DCI), hydrocephalus, rebleeding, and seizures [1]. In addition, medical complications such as cardiac injury and healthcare-associated infections (HAI) can further impact immediate outcomes [2]. Quantifying the degree to which these complications affect outcomes is important for prognostication, and identifying high-impact modifiable factors. Many aSAH severity scores rely primarily on initial imaging and the neurological examination [1, 3, 4], which is subject to neurosurgical procedures, sedatives, or paralytics. Though early physiologic data are incorporated in more recent scores [5, 6], physiologic data more specific to the nervous system and modifiable complications beyond the early phase are not included.

With the electronic health record (EHR), large amounts of data are available, providing an opportunity to more accurately predict outcomes. Using data-driven predictive modeling, we sought to identify reproducible clinical parameters during hospitalization that may impact discharge outcomes and subsequent rehabilitation potential, serve as potential targets for intervention, and that have not been included in prior severity scores.

Methods

Patient Cohort

We performed a retrospective study of patients from the Massachusetts General Hospital aSAH database admitted between September 2011 and February 2016, after institutional review board approval. The database includes patients with high-grade SAH (≥HH3F3) and who undergo continuous electroencephalogram (EEG) or multimodality monitoring. We included cases with an identified aneurysm and excluded non-aneurysmal and traumatic SAH and cases caused by other vascular malformations. All patients routinely underwent computed tomography angiography and conventional angiography.

Our primary objective was to identify modifiable complications that can impact the hospital course and subsequent rehabilitation potential. Our primary outcome measure was discharge Glasgow Outcome Scale (GOS); GOS 1: death; GOS 2: vegetative state; GOS 3: severe disability; GOS 4: moderate disability; GOS 5: good recovery [7]. Two raters (SFZ and ENP) independently abstracted and adjudicated GOS from physician and physical therapy examinations at discharge and initial rehabilitation facility examinations. Both GOS and GOS extended have been used as discharge outcome measures in prior studies [8,9,10]. We chose GOS due to its simple categories and ease of ascertainment. To ensure sufficient numbers and data balance for model fitting, we categorized outcomes as: poor (GOS 1–2), intermediate (GOS 3), and good (GOS 4–5).

Candidate Predictors for Outcomes

We collected demographic and clinical variables from the EHR. A full list of predictor variables is provided in supplementary material (Table e-1) and included admission Glasgow Coma Scale (GCS), APACHE II (Acute Physiology and Chronic Health Evaluation II) score, EEG findings, anti-epileptic drugs (AED), mechanical ventilation, and HAIs. We chose variables based on existing SAH disease severity scores and additional variables that reflect neurological and medical complications.

EEG findings were defined using the American Clinical Neurophysiology Society Nomenclature: sporadic epileptiform discharges, periodic patterns, rhythmic delta activity, and seizures [11]. The institution’s EEG protocol for DCI detection recommends 10 days of monitoring for high-grade (≥HH3F3) patients. In addition, patients with concern for subclinical seizures are monitored as indicated. The standard practice during the study period was to discontinue AED prophylaxis within 24 h of aneurysm coiling or within 7 days of clipping. We included AEDs in our analysis, if continued beyond 7 days. Our primary indications for AED continuation are: clinical or electrographic seizures, and scalp or depth ictal–inter-ictal continuum (IIC) patterns or epileptiform discharges at the treating physicians discretion. Levetiracetam is the typical first-line AED. Additional or alternate AEDs, frequently phenytoin and lacosamide, are used for refractory seizures or persistent IIC patterns at the treating physician’s discretion.

We chose APACHE II, instead of APACHE III, as it is included in the Functional Recovery Expected after Subarachnoid Hemorrhage (FRESH) score, allowing for more direct comparison. HAIs were confirmed by positive cultures, or radiographic and clinical evidence of respiratory tract infections. Laboratory data (sodium, potassium, glucose, and white blood cell (WBC) count) and physiological data (heart rate, mean arterial pressure (MAP), temperature, oxygen saturation (SpO2), intracranial pressure (ICP), cerebral perfusion pressure (CPP), respiratory rate, and ventricular drain output) from the first 3 days of admission were collected. Hourly ICP and CPP values were obtained, and other physiologic values were available at a resolution of every 1 to 4 h. For each predictor, we computed the minimum, maximum, median, and variance for the first 1, 2, 3, 1–2, 2–3, and 1–3 days.

Two neurologists independently abstracted and adjudicated the presence of DCI following a previously published protocol, with excellent inter-rater agreement (95.83%) [12]. DCI was defined using published guidelines [13].

We excluded duplicate features from analysis and variables that served as surrogates of our outcome measure, e.g., discharge disposition, and duration of hospitalization. Variables with greater than 10% missing data were discarded, and missing values were imputed for the rest.

Descriptive Statistics

Mean, median, and inter-quartile ranges were calculated for descriptive analysis. Univariate analysis was performed using a linear regression model, and significance was set at <0.05.

Predictive Modeling: Model Estimation and Validation

In this big data study, we performed predictive, instead of explanatory modeling, to predict new or future observations. Determining causality is not the primary goal nor a prerequisite for inclusion of a variable in a predictive model [14]. Complex and potentially uninvestigated associations can be used to generate new hypotheses in predictive modeling. This can potentially improve and provide further pathophysiologic understanding of existing explanatory models [14].

We created predictive models and estimated their performance using nested cross-validation (CV) [15]. Details are provided in supplementary material. In summary, our CV approach included four features:

  1. 1.

    Dividing data into testing and training sets (external CV) We used a tenfold external CV to validate model performance. 10% of the available data were held out, while the remaining 90% were used for model optimization (feature selection and parameter tuning).

  2. 2.

    Balancing the training data to cope with class imbalance To prevent bias toward predicting the predominant outcome, in each fold of external CV we balanced the training data by randomly discarding examples from the dominant class(es) until all classes had equal numbers of examples.

  3. 3.

    Inner cross-validation for model optimization For each round of external CV, we conducted an inner CV loop for feature selection and to identify the optimal value of the penalty parameter λ. The level of complexity (λ) that produced the best performance on the internal testing data was then used to train a predictive model on all the training data.

  4. 4.

    Model evaluation For each round of external CV, we tested the logistic regression model developed in the inner CV loop on the held out testing data. Model performance was assessed using the area under the receiver operating curve (AUROC). Ten AUROC values were obtained for each round of CV. The final reported predictive performance is the mean and standard deviation of the AUROC across the tenfold of CV. This ensured that performance estimates are based entirely on data not used for feature selection or model parameter tuning, avoiding reporting prediction results that are inflated by model overfitting. Conventional approaches that do not enforce strict separation between training and testing data (e.g., fitting a predictive model that includes features with small p values on univariate analysis) are vulnerable to overfitting.

Steps 1–4 were repeated 1000 times to obtain final performance estimates. Each round of bootstrapping involved a different random subset of the available data, yielding different sets of optimal features. We therefore report not a single set of features, but rather the frequency with which features were selected. This more accurately estimates the robustness of each feature for outcome prediction that can be obtained from a single round of tenfold CV.

Binary and Multilevel Outcome Prediction Models

We created two types of models using the nested-CV framework. First we created a binary prediction algorithm for predicting in-hospital mortality (GOS 1 vs GOS 2–5) and one for predicting death/dependence versus independence (GOS 1–3 vs GOS 4–5). Second, we created a multiclass prediction algorithm, which predicted poor (GOS 1–2), intermediate (GOS 3), and good (GOS 4–5) outcomes.

All statistical analyses were performed using MATLAB version 2016a (Natick, MA).

Results

Cohort Characteristics

Of 209 medical charts reviewed, 56 were excluded due to the absence of aneurysm, or clear alternate etiology, and 153 subjects were included. Demographic and clinical variables are summarized in Table 1. The mean age was 58.3 years, and 69.3% (n = 106) were female. The mean APACHE II score was 14.4. Most patients presented with Hunt and Hess (HH) 4 (n = 39, 25.5%) and Fisher 3 (n = 114, 74.5%) hemorrhages. 47.7% (n = 73) aneurysms were coiled, and 41.8% (n = 64) were clipped. Majority of patients had a discharge GOS of 3. Twenty-eight (18%) patients died in the hospital, 27 following withdrawal of life-sustaining therapies, and one met brain death criteria. A total of 138 patients underwent EEG monitoring; epileptiform discharges (n = 65, 47.1%) and rhythmic delta activity (n = 60, 43.4%) were the most frequent abnormalities. Eleven patients had depth EEG monitoring. Of 473 defined candidate predictor variables, 22 were excluded from analysis due to missing values.

Table 1 Clinical and demographic variables

Binary Discrimination: Predictors of Mortality

Significant predictors of mortality on univariate analysis are shown in Fig. 1. These included total number of AEDs, levetiracetam and lorazepam, APACHE II score, aneurysm treatment modality, periodic discharges, and HH Score.

Fig. 1
figure 1

Significant predictors of mortality on univariate analysis: GOS 1 versus GOS 2–5. Significant predictors of outcome on univariate analysis and their regression coefficients are shown. The p value for each predictor was <0.05. For predictors with negative regression coefficient, presence/higher value was associated with lower GOS. For predictors with positive regression coefficient, presence/higher value was associated with higher GOS

The main predictors of death at discharge in the multivariate model, and the frequency of selection in the bootstrapping method are shown in Table 2. APACHE II and glucose and ICP variance were selected more than 95% of the time. At the point of maximum accuracy on the ROC curve, the sensitivity was 86%, and specificity was 92%; the mean AUC was 0.9198. Performance metrics and ROC curve are shown in supplementary material (Figure e-1 and e-2).

Table 2 Predictors of outcome on multivariate analysis

Binary Discrimination: Predictors of Death/Dependence Versus Independent Status

Significant predictors of death/dependence (GOS 1–3) versus independent status (GOS 4–5) on univariate analysis are shown in Fig. 2. These included total number of AEDs, levetiracetam, APACH II, admission GCS, and HH score. Epileptiform discharges and rhythmic delta activity were significant EEG findings. The presence of HAIs and hospital-acquired pneumonia were also significant.

Fig. 2
figure 2

Significant predictors of death/dependence versus independent status on univariate analysis: poor (GOS 1–3) versus (GOS 4–5). Significant predictors of outcome on univariate analysis and their regression coefficients are shown. The p value for each predictor was <0.05. For predictors with positive regression coefficient, presence/higher value was associated with lower GOS. For predictors with negative regression coefficient, presence/higher value was associated with higher GOS. EEG: Electroencephalogram; GCS: Glasgow Coma Scale; MAP: mean arterial pressure; PVD: peripheral vascular disease

Table 2 shows the features selected in the multivariate model. Levetiracetam and mechanical ventilation were selected in a 100% of the training sessions. The sensitivity and specificity at the point of maximum accuracy on the ROC curve were 94 and 98%, respectively, and the mean AUC was 0.9456. Performance metrics and ROC curve are shown in supplementary material (Figure e-3 and e-4).

Multilevel Discrimination: Predictors of Poor, Intermediate, or Good Outcomes

Predictors of outcome in the multivariate multilevel prediction model are shown in Table 2. Maximum day 1 GCS, minimum day 2–3 GCS, and APACHE II score were the most frequently selected features. Periodic discharges, lacosamide, and rebleed were less frequently selected. Using these features, the model predicted poor and good outcomes with greater than 80% accuracy and intermediate outcome with greater than 70% accuracy (Fig. 3).

Fig. 3
figure 3

Multiclass model—predicted versus observed outcomes; poor (GOS 1–2), intermediate (GOS 3), good (GOS 4–5). The percentage of accurately predicted discharge GOS is shown. The first stacked bar shows the model accurately predicts poor outcome (GOS 1–2) 87% of the time. The second bar shows the model accurately predicts an intermediate outcome (GOS 3) 74% of the time

Discussion

In this large EHR data-driven predictive model, we identified key features that accurately predicted outcomes in patients with aSAH. Our study highlights the importance of fluctuations and variance in physiologic features, which often more accurately predict outcome than the minimum or maximum value. The predictors identified do not necessarily imply causality. Nevertheless, some of the identified associations suggest plausible causal hypotheses and potentially modifiable risk factors and warrant further investigation in prospective studies.

aSAH hospital mortality rate is reported at 20–50% [16], with up to 40% of deaths from extra-cerebral organ failure [2]. Our 18% in-hospital mortality is comparable to that of the Columbia University SAH Outcomes Project [16]. The APACHE II score consistently predicted mortality. APACHE II is a disease severity score incorporating physiologic and laboratory data [17]. The physiological component is also included in the FRESH score [5]. Looking beyond the individual components of the APACHE II, we identified ICP, external ventricular drain (EVD) drainage, EEG findings as additional predictors. Apart from GCS, the APACHE II does not include physiologic or clinical data that are more specific to the nervous system; hence, addition of these factors can enhance performance of predictive scores.

Variance, maximum, and absolute difference in serum sodium levels predicted outcomes. Sodium derangements correlate with death and disability [2, 16, 18], and fluctuations may have a greater impact on outcomes than hyponatremia itself [18, 19]. Strict sodium control and balancing the effects of salt wasting, syndrome of inappropriate ADH, and hyperosmolar treatment may mitigate the adverse effects. Similarly, serum glucose derangements can increase secondary cerebral injury [20, 21]. While there is conflicting data on the impact of tight glycemic control, similar to prior studies, our findings suggest glucose variability correlates with outcomes in patients with neurological injury [20, 21].

Other laboratory predictors included minimum and maximum WBC count. Leukocytosis has been identified as a predictor of poor outcome in SAH [22, 23] and also as an independent predictor of vasospasm [24]. A rising WBC count warrants vigilance and should raise suspicion for potential vasospasm.

Cardiac and pulmonary complications occur in up to 63 and 80% of aSAH patients, respectively [2]. Blood pressure extremes and heart rate variability can impact outcomes [6, 25, 26]. We found maximum heart rate and maximum MAP predicted outcomes. Pulmonary predictors included duration of mechanical ventilation and minimum SpO2. These predictors could certainly be surrogates of the underlying disease severity. Alternatively, they may directly impact outcomes; for example in patients without lung injury, who require ventilation for depressed arousal, over ventilation can result in hyperoxia or hypocapnea. Oxidative stress may exacerbate cerebral injury and increase the risk of DCI [27]. Hypocapnea is similarly associated with worse outcomes and vasospasm [28]. These potential modifiable clinical factors can be addressed using a protocolized approach to ventilator management.

Temperature differences also predicted outcomes, and although HAIs have been associated with worse outcomes [2, 29], and were significant in our univariate analysis, they were not identified in our multivariate model. One hypothesis is that earlier physiologic derangements may have greater impact on outcomes.

We also investigated ICP, CPP, and EVD drainage and found ICP variance to be a significant predictor. Data on ICP monitoring in aSAH are limited [30] with widespread practice variation [31]. Elevated ICP is linked to worse outcomes, although this may be a reflection of the disease severity [32], and ICP-derived variables, such as pressure reactivity index and variability, may be more accurate predictors [30]. There is also conflicting data on optimal EVD management [33,34,35]. Continuous drainage may be associated with greater risk of infection but lower risk of vasospasm [33, 34]. We typically use intermittent drainage, transitioning to continuous drainage if clinically indicated. Five of our patients developed meningitis, though this was not identified as an outcome predictor. Simultaneously, avoiding prolonged weans and continuous drainage when not indicated may help prevent EVD-related complications.

Age, GCS, and rebleeding were significant predictors that are also incorporated in existing scores [1, 5], although rebleeding was only seen in 1% of training sessions. Interestingly, HH score was identified only on univariate and not on multivariate analysis, underscoring the potential advantages of scores that are not limited to the initial examination.

Forty four percent of our patients had DCI, higher than typical rates cited in the literature (up to 30%) [13]. This may be because majority of our patients were high grade. Although not identified as a predictor of death, the presence of DCI did help discriminate between dependent versus independent status at discharge.

Finally, we studied the impact of EEG features. Epileptiform and periodic discharges were predictors in the multiclass model. Inter-ictal patterns are associated with worse outcomes and DCI, potentially related to increased cerebral blood flow or metabolism [36,37,38]. Larger prospective studies can determine the long-term impact of periodic patterns and implications of treatment. Careful selection of patients for treatment using existing evidence may limit both the impact of these patterns and avoid excess use of AEDs.

Electrographic seizures, seen only in 6 of our patients, were not identified in our model, despite prior studies showing an association with worse outcomes [39]. Our findings raise the hypothesis that inter-ictal scalp activity may signify scalp-negative seizures [40], or seizures, per se, may fail to predict outcome if treated in a timely fashion. Intriguingly, we found that use of AEDs predicted poor outcomes. This association is not necessarily causal. There are two plausible explanations for this (Figure e-5 supplementary material); AEDs are a non-causal predictive variable, but have an “apparent causality” as they are surrogates for the underlying disease severity or EEG findings (confounding by indication). Alternatively, AEDs may have an iatrogenic, true-negative causal impact on outcomes. Regardless, we hypothesize that prompt discontinuation of primary AED prophylaxis, and using the lowest dose of monotherapy for secondary prophylaxis, might be beneficial.

Limitations of our study are its retrospective nature and that it is a single-center study. Only including patients with a definite aneurysm resulted in a smaller sample size. Most patients were high-grade aSAH, limiting the generalizability of our findings to lower-grade hemorrhages, and our inclusion of intraventricular hemorrhage was limited to the Fisher score. Variation in AED prescribing practices for secondary prophylaxis and temporal trends in aneurysm treatment modality are additional limitations. The greatest difference in aneurysm treatment, however, was in the first year (2012: 23% coiled, 76% clipped); thereafter, there was an increase in coiling (2013: 50% coiled, 46.9% clipped; 2014: 39% coiled, 50% clipped; 2015: 57.1% coiled, 26.8% clipped; 2016: 58% coiled, 25% clipped).

Conclusions

Attractive features of data-driven approaches to outcome prediction are reproducibility, lack of susceptibility to errors of human judgment, and their ability to take advantage of non-obvious patterns in complex medical data. In our cohort, variability and fluctuations in physiological and laboratory data were important predictors of outcomes that are not readily available in a clinician-documented approach. Early identification of these features may identify patients requiring additional vigilance and facilitate more timely therapeutic interventions, allowing for improved immediate outcomes and rehabilitation potential. Future prospective studies are needed to create a more comprehensive, reliable, and reproducible outcome prediction score using additional features such as ICP, blood glucose levels, EEG findings, and variability in physiologic data. With increasing use of multimodality monitoring, larger prospective studies can better understand the relationship between variability in physiologic and laboratory data and cerebral metabolism, and the impact of goal-directed treatments on neurologic outcomes.