Introduction

Acute-on-chronic liver failure (ACLF) is an acute decompensation of liver function in patients with chronic liver disease, either due to additional liver injury or due to systemic factors leading to multi-organ dysfunction [1]. Acute hepatic dysfunction in the critically ill has substantial mortality [26]. Early hepatic dysfunction is shown to occur in 11 % of critically ill patients and is associated with in-hospital mortality [2]. Valid prediction models that can discriminate patients by the risk of death allow for the discovery of high risk subgroups which may benefit most from intervention [7]. Several scoring systems are available for primary liver disease [810], for cirrhotic patients admitted to an Intensive Care Unit (ICU) [1114] and for cirrhotic patients with acute decompensation [15, 16], but no laboratory-based score has been developed for ACLF in the critically ill.

To address this issue, we sought to assess the potential of a laboratory-based acute liver failure score to serve as a predictor of mortality in critically ill patients, termed the Liver injury and Failure evaluation (LiFe) score. Through a survey of ESICM members on acute liver failure in ICU patients, we concluded three variables (INR, total bilirubin and arterial lactate) were considered to be commonly available markers of metabolism, excretion and synthesis and of clinical importance. Our primary study objective was to derive and validate a risk prediction score based on INR, total bilirubin and arterial lactate and examine its discrimination for in-hospital mortality. We hypothesized that such an acute liver failure risk prediction score would have robust discrimination for in-hospital mortality in patients with chronic liver disease following critical care admission. To test this hypothesis, we performed a three-center observational cohort study of 1916 adults with chronic liver disease, hospitalized for critical care from 1997 to 2011.

Materials and methods

Survey

Using an internet-based survey tool, we conducted a descriptive, structured, cross-sectional, self-administered survey of intensivists within the European Society for Intensive Care Medicine (ESICM) from July to October 2013. The survey questions chosen were based on test utility, cut off levels and test availability and were reviewed and approved by the Metabolism, Endocrinology and Nutrition research committee within the ESICM. Details of the survey design can be found in Supplemental Methods. Of the 157 intensivists who responded, 76 % were from Europe, 16 % from Asia and 8 % from Africa, the Middle East, Australia and the Americas. The majority of responders (72 %) worked in closed intensive care units, with (80 %) of beds having ventilator capacity. Thirty-six percent of responders were from university hospitals with specialist Hepatobiliary Services. Responders indicated that INR, total bilirubin and arterial lactate which reflected the synthetic, excretory and metabolic properties of liver function, respectively, and were widely considered useful, were obtainable, and practical. Further, the responders indicated that increases of INR, total bilirubin and arterial lactate by 2-unit increments were considered important for discriminating between those with acute liver failure and those without and corresponded to increasing severity of acute liver failure.

Source population and data sources

For the derivation cohort, we abstracted patient-level demographic, administrative and laboratory data from two academic medical centers in Boston, Massachusetts: Brigham and Women’s Hospital (BWH), and Massachusetts General Hospital (MGH). Data on all patients admitted to BWH or MGH between November 3, 1997, and December 30, 2011, were obtained through the Research Patient Data Registry (RPDR), a computerized registry which serves as a central data warehouse for all inpatient and outpatient records at BWH and MGH [7]. For the external validation cohort, consecutive admissions to the Liver Intensive Therapy Unit (LITU) at King’s College Hospital from January 2000 to December 2010 had patient-level demographic, administrative and laboratory data prospectively captured by dedicated auditors [17]. Approval for the study was granted by the Partners Human Research Committee (Institutional Review Board) and by the South East London Research Ethics Committee.

Study population

During the study period, there were 92,886 individual inpatients at BWH or MGH in Boston, age ≥18 years, who were admitted to a medical or surgical ICU [18], who were assigned a Diagnostic Related Group classification [19] and had a social security number. DRG classification was used to exclude patients assigned the CPT code 99291 who received Emergency Room care but were not admitted to BWH or MGH. We excluded: 85,838 patients who did not have chronic liver disease [20], 6116 patients who did not have arterial lactate, total bilirubin and INR drawn at ICU admission, and 13 patients post-liver transplant or transplanted during their hospital stay. Thus, the derivation cohort was comprised of 945 patients (270 from BWH and 675 from MGH) (Fig. 1). During the study period, there were 1032 individual inpatients at Kings College Hospital, London, age ≥16 years, who were admitted to the Liver Intensive Therapy Unit. We excluded 61 Kings College Hospital patients who had acute liver failure, hepatocellular carcinoma, chronic liver disease not consistent with cirrhosis and malignancy and patients post-liver transplant or transplanted during their hospital stay. In the validation cohort, the presence of cirrhosis was determined from clinical, biochemical, radiological or histopathological results. Thus, the validation cohort was comprised of 971 patients with cirrhosis from Kings College Hospital [17] (Fig. 1).

Fig. 1
figure 1

Study flow chart

Covariates

Definition and determination of the following derivation cohort covariates are outlined in Supplemental Methods: ICU admission [18], race, medical versus surgical patient admission ‘type’ [21], Deyo–Charlson index [2224], chronic liver disease [20], chronic kidney disease stage [25], categories of acute organ failure [7, 26], sepsis [27], noncardiogenic acute respiratory failure [28, 29] and acute kidney injury [30, 31]. For severity of illness risk adjustment, we utilized the Acute Organ Failure score [7]. The Acute Organ Failure score is an ICU risk-prediction score derived and validated from demographics (age, race), patient admission ‘type’ as well as ICD-9-CM code based comorbidity, sepsis and acute organ failure covariates which has similar discrimination for 30-day mortality as APACHE II [7]. Definition and determination of the following validation cohort covariates are outlined in Supplemental Methods: Acute Physiology and Chronic Health Evaluation (APACHE II) [32], Simplified Acute Physiology Score (SAPS-II) [33], Sequential organ failure assessment (SOFA) [34], chronic liver failure–sequential organ failure score (CLIF-SOFA) [15]. For the diagnositc criteria for acute-on-chronic liver failure (ACLF) we utilized the CLIF Acute-on-Chronic Liver Failure in Cirrhosis (CANONIC) study graded approach [15]. The CANONIC study approach was utilized to define the absence of ACLF: no organ failure; a single “non-kidney” organ failure with serum creatinine level <1.5 mg/dL and no hepatic encephalopathy; or single cerebral failure with serum creatinine level <1.5 mg/dL [15].

End points

The primary end point was all-cause in-hospital mortality following ICU admission. Vital status in the derivation cohort was obtained from hospital records and the Social Security Administration Death Master File [35] which is shown to be valid approach for in-hospital in the BWH/MGH administrative database [18]. Vital status in the validation cohort was determined by hospital records [17]. In-hospital mortality data were available for 100 % of the cohort.

Power calculations and statistical analysis

Previously, acute hepatic dysfunction in the ICU was shown to occur in 10 % of ICU patients and be associated with a 14 % increase in hospital mortality [2]. From these data, we assumed that in-hospital mortality would be 14 % higher among the current patient derivation cohort with ACLF compared to those without. With an alpha error level of 5 % and a power of 80 %, the sample size required for our primary end point (in-hospital mortality) was 705 patients without acute hepatic dysfunction and 71 patients with acute hepatic dysfunction.

The derivation cohort consisted of critically ill patients with chronic liver disease treated at BWH and MGH (n = 945) and the validation cohort comprised critically ill cirrhotic patients treated at King’s College Hospital (n = 971). Categorical variables were described by frequency distribution while continuous variables were examined graphically and by summary statistics. The primary outcome was in-hospital all-cause mortality. Univariate logistic regression was performed to determine the unadjusted association between in-hospital mortality and potential predictors. A clinical prediction model was created based on a logistic regression model describing the risk of in-hospital mortality in the derivation cohort as a function of predictors (arterial lactate 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0 mg/dL; total bilirubin 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0 mg/dL; INR 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0) at ICU admission. The model was transformed to a simplified integer-based score, with a score for each predictor variable assigned by dividing its β-coefficient by the smallest coefficient in the model, multiplying by a factor of 2 and rounding up to the nearest integer. A risk score was then calculated for each patient, and the population was divided into four categories: patients at low risk, patients at intermediate risk, patients at high risk and patients at very high risk for death. The discriminatory ability for in-hospital mortality was quantified using the c-statistic. The DeLong method was used for Area Under ROC (AUC) comparisons [36]. Calibration was assessed using the Hosmer–Lemeshow χ 2 goodness-of-fit test and the accompanying p value. We tested for effect modification by year of hospitalization by adding an interaction term to the multivariate models. In all analyses, p-values are two-tailed and values below 0.05 were considered statistically significant. All analyses were performed using STATA 13.1MP statistical software (StataCorp, College Station, TX, USA).

Results

Survival analysis and risk-scoring system

Table 1 shows demographic characteristics of the derivation cohort. The majority of derivation cohort patients were white (77 %), medical (71 %) and 59 % were male. The mean age at hospital admission was 60 years. The in-hospital mortality rate was 32 %. Survival improved over the course of the study period with in-hospital mortality rates prior to 2007 of 44 % and after of 23 %. The details of the validation cohort from Kings College Hospital have been previously described [17]. The majority of the validation cohort patients were male (63 %), the mean age at hospital admission was 51 years and the in-hospital mortality rate was 52 %. The validation cohort had a mean APACHE II of 21.8, a mean SOFA of 9.5, a mean SAPS II of 46.4, and a mean CLIF-SOFA of 10.7. ACLF was absent (ACLF grade 0) in 18.9 % of validation cohort patients.

Table 1 Population characteristics of the development cohort and unadjusted association of potential prognostic determinants with in-hospital mortality

Table 1 shows that comorbidities and the development of sepsis and acute organ failure were associated with the risk of death in the derivation cohort. To calculate a risk score, we utilized derivation cohort data and assigned levels of INR, total bilirubin and arterial lactate a number of points proportional to its regression coefficient (Supplemental Table 1). A score was calculated for each patient by adding the points corresponding to the cut points of risk factors (arterial lactate 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0 mg/dL; total bilirubin 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0 mg/dL; INR 0–1.9, ≥2.0–3.9, ≥4.0–5.9, ≥6.0). The patients were then divided into 4 categories on the basis of score distribution, which ranged from 0 to 20: a low-risk group (0 points), an intermediate-risk group (1–3 points), a high-risk group (4–8 points), and a very high-risk group (>8 points) (Table 2; Fig. 2).

Table 2 Risk of in-hospital death in the development and validation cohorts, according to risk category
Fig. 2
figure 2

Time-to-event curves for mortality for the derivation cohort. Unadjusted all-cause mortality rates were calculated with the use of the Kaplan–Meier methods and compared with the use of the log-rank test. Categorization of risk groups is per the primary analysis. The global comparison log rank p value is <0.001

In the derivation and validation cohorts, the in-hospital mortality rates for the low, intermediate, high and very high risk categories show similar exposure–response relationships (Table 2). The odds of in-hospital mortality for validation patients with prognostic index of intermediate, high and very high risk categories was 1.85 (95 % CI 1.06–3.21), 4.21 (95 % CI 2.54–6.98) and 15.52 (95 % CI 9.46–25.48) relative to patients with low risk prognostic index. The AUC for the continuous risk score point model was 0.74 in the derivation cohort and 0.77 in the validation cohort (Fig. 3). For both the derivation and validation cohorts, the Hosmer–Lemeshow χ 2 P values indicated good model fit (Fig. 3). Thus, the risk score point model showed good calibration and similar good discrimination in the validation cohort.

Fig. 3
figure 3

Performance of the LiFe score Area under the ROC Curve (C Statistic) for continuous LiFe score point model predicting in-hospital mortality of 0.738 (95 % CI 0.704–0.772) in the derivation cohort (a) and 0.771 (95 % CI 0.742–0.800) in the validation cohort (b). The black line indicates reference values. The unadjusted Odds Ratio for in-hospital mortality (95 % CI) per one point increase of LiFe score is 1.20 (1.16–1.24) in the derivation cohort and 1.25 (1.21–1.29) in the validation cohort. The LiFe score showed good calibration in the derivation and validation cohorts (HL χ 2 4.87, P = 0.56 and HL χ 2 4.67, P = 0.59, respectively)

Individually running the risk score point model in the validation cohort with and without terms for year of hospitalization, the in-hospital mortality estimates in each case are similar indicating that the risk score–mortality relationship is not materially confounded by year (data not shown). There is significant effect modification of the risk score–mortality association on the basis of year of hospitalization (P interaction <0.001). With regard to year, when the validation cohort is separately analyzed before and after 2007, the directionality and significance of the risk score–mortality association is preserved.

We next assessed the performance of the risk score point model compared to other scoring systems in the validation cohort (Table 3). APACHE II score had similar accuracy as the risk score (χ 2 0.03, P = 0.86) as did the SAPS II score compared to the risk score (χ 2 0.32, P = 0.57). There was a significant difference in discrimination between the risk score and SOFA (χ 2 4.05, P = 0.044), as well as between the risk score and CLIF-SOFA (χ 2 11.28, P < 0.001). ACLF was associated with increases in risk score. Using the definitions from the CANONIC study for ACLF grade [15], the mean (SD) risk score differed according to ACLF grade: grade 0 = 3.7 (4.0), grade 1 = 6.7 (4.8), grade 2 = 7.9 (4.6), grade 3 = 10.9 (4.4). There was a statistically significant difference in risk score between ACLF grade as determined by one-way ANOVA (F9961 = 105.7, P < 0.001). Finally, a Pearson’s product-moment correlation shows a significant positive correlation between risk score and ACLF grade, r(969) = 0.49, P < 0.001.

Table 3 Validation of the risk score point model (N = 971)

Discussion

The present study aimed to derive and validate a clinical risk prediction score relevant to ACLF in the critically ill. Our study has several novel findings. First, we show a risk-prediction score created from commonly collected laboratory data has good calibration and discrimination for in-hospital mortality in critically ill patients with chronic liver disease. Further, in the validation cohort of patients with cirrhosis, we show that our laboratory-based risk-prediction score has good discrimination for in-hospital mortality, approaches the performance of physiological-based scoring systems and correlates with ACLF grade.

While liver-related risk prediction scores have been described before [816, 37, 38], this is the first study to validate a laboratory-based score in the critically ill. In cirrhotic patients, SOFA score is noted to have the best predictive accuracy for survival in the ICU [11]. Chronic liver failure-specific modification of the SOFA score (CLIF-SOFA) may have higher accuracy in patients with cirrhosis [39]. The CLIF-SOFA components include: PaO2/FiO2 or SpO2/FiO2, INR, hypotension (mean aterial pressure, vasopressor use), hepatic encephalopathy grade, and serum creatinine [15], which may not be available in observational datasets. While the SOFA-based scores have better discriminiation of in-hospital mortality in our validation cohort, our simple risk predition score may be easier to utilize at the bedside and have higher utility in observational datasets that lack physiologic parameters.

We do note that there are potential limitations to our approach. Observational studies may be limited by confounding, bias and/or reverse causation. Importantly, we cannot determine causality in our study. Our derivation and validation cohorts are from large acute care hospitals that may not be generalizable to all critically ill patients with chronic liver disease [40]. Ascertainment bias may be present as a small fraction of patients with chronic liver disease so the parent BWH/MGH ICU cohort was utilized to derive the risk score. As arterial lactate is not routinely measured in BWH/MGH ICU patients, those included in the derivation cohort have a higher severity of illness and higher mortality than the parent BWH/MGH ICU chronic liver disease cohort. In the parent BWH/MGH ICU chronic liver disease cohort, there is a substantial mortality difference between patients with lactate measured on the ICU admission day and those without, (in-hospital mortality 30.7 vs. 16.7 %, respectively). Differences with regard to sepsis are present between derivation cohort patients and the parent BWH/MGH ICU chronic liver disease cohort (43.5 vs. 22.1 %). Thus, our derivation cohort represents a sicker population of patients with chronic liver disease.

INR, total bilirubin and arterial lactate may not be reflective of liver issues alone. Importantly, marked abnormalities could exist in some or all of these markers and be associated with mortality and yet not be the result of specific liver injury or dysfunction but of illness severity. Lactate is marker for tissue hypo-perfusion in both hepatic and non-hepatic endothelial beds and is further elevated with decreased hepatic clearance. Bilirubin can rise in hematological conditions in the critically ill, and by not differentiating between conjugated and unconjugated bilirubin, we may underestimate the role of hemolytic causes. Coagulation disorders are similarly common in trauma, hematological conditions, sepsis, and renal failure as well as primary hepatic conditions. Overall, whether or not the cause of critical illness in the cohorts under study is primarily acute hepatic failure, the combination of these markers is predictive for short-term mortality and correlates with ACLF grade.

The present study has several strengths. The risk score is simple and easy to calculate from laboratory measures (INR, total bilirubin and arterial lactate) that are commonly available or obtainable in critically ill patients and administrative datasets. The parent dataset where the derivation cohort was obtained is well studied [7, 18]. The use of CPT code 99291 to identify ICU patients and ICD-9-CM code combinations to identify sepsis and chronic liver disease are previously validated in the parent dataset under study [18, 20, 27]. The risk score had good model performance for in-hospital mortality and was comparable to other physiological scores in a validation cohort of critically ill patients with a high proportion of ACLF. Our simple risk score has potential clinical utility in the triage and management of critically ill patients with ACLF. In addition, our risk score can be utilized for outcome studies in observational datasets that include laboratory measures.

Conclusion

In aggregate, these data demonstrate that a risk score based on INR, total bilirubin and arterial lactate drawn at ICU admission is predictive for short-term mortality and correlate with ACLF grade in critically ill patients. This risk score can be quickly, easily and conveniently utilized at the bedside for early risk prediction in patients with chronic liver disease with performance that approaches but does not match that of SOFA or CLIF-SOFA.