1 Background

As organizational innovation plays an increasingly prominent role in health care system change, non-experimental study design has risen in prominence. Substantial statistical and econometric advances have made it possible to make credible claims for causality from observational studies, including those making secondary use of administrative data. Less attention is sometimes devoted to the quality of the administrative data that is used to support these study designs. Here we describe a policy problem of substantial importance that requires an observational study design using administrative data. We examine several approaches to validating a key variable, allowing comparison of their relative yield.

1.1 Policy context

Over 500,000 Americans receive mechanical ventilation for acute respiratory failure each year, most frequently for pneumonia, chronic obstructive pulmonary disease, acute lung injury, and post-operative respiratory failure (Behrendt 2000). Approximately 5–20% of these patients will survive the first few days of critical illness but develop persistent organ failure resulting in chronic critical illness (Carson and Bach 2002; Martin et al. 2005). These patients frequently require prolonged mechanical ventilation (PMV), typically defined as either 14 or 21 consecutive days of invasive mechanical ventilation (MacIntyre et al. 2005). Multiple factors contribute to the need for PMV, including underlying heart and lung disease, neuro-endocrine changes of critical illness, and ICU-acquired neuromuscular disease (Carson et al. 1999; Hudson and Lee 2003; Vanhorebeek et al. 2006).

Patients requiring PMV consume a disproportionate amount of health care resources. Although totaling less than 20% of all ICU patients, they account for nearly 40% of all ICU expenditures (Wagner 1989). Post-discharge costs are also extremely high—a recent economic analysis estimated the total cost of PMV at over $80,000 per quality adjusted life year compared to withdraw of life support in the ICU (Cox et al. 2007a). Six month survival is approximately 40%, compared to over 80% for all survivors of intensive care (Douglas et al. 2002; Engoren et al. 2004; Gracey et al. 1992). Those who survive experience significant reductions in health-related quality of life (Combes et al. 2003).

1.2 Empirical problem

Recent advances in mechanical ventilation and the aging of the population are expected to result in an increase in the incidence of PMV in the coming years (Angus et al. 2000). At the same time, care models for these patients are evolving, with increasing use of long-term acute care (LTAC) facilities for weaning (Carson 2007). Large-scale research into the epidemiology and resource utilization of this patient population is vital. Unfortunately efforts are hindered by the lack of a valid method of identifying patients requiring PMV in large administrative datasets. Most administrative data do not include actual duration of mechanical ventilation. The International Classification of Diseases Version 9.0—Clinical Modification (ICD-9-CM) procedure codes for mechanical ventilation distinguish patients requiring ventilation for more than 4 days (code 96.72), but not the longer periods corresponding to clinical practice (MacIntyre et al. 2005). Some previous research has utilized the diagnosis-related group (DRG) for tracheostomy and mechanical ventilation for at least 96 consecutive hours (previously DRGs 483, 541 and 542; now 003 and 004 under the new severity-adjusted DRG system) (Cox et al. 2004). This code is attractive because many patients requiring PMV undergo tracheostomy during their hospitalization and therefore will be classified into this high-reimbursement DRG. However, not all patients requiring PMV receive a tracheostomy, and recent evidence shows that these DRGs fail to effectively capture a homogenous patients group with the highest resource utilization and post-discharge morbidity (Cox et al. 2007b).

1.3 Study aims

The purpose of this study was to derive and validate a novel algorithm for identifying patients requiring PMV in administrative and claims data. We hypothesized that a combination of elements readily available in most administrative data sets, including ICD-9-CM procedure codes, DRG codes, and lengths of ICU and hospital stay would reliably and validly identify a population of patients requiring PMV. We also sought to extend usual research into validating disease definitions in administrative data by performing a two-step external validation process: (1) criterion validation in multi-center clinical registry, and (2) construct validation by analyzing characteristics and outcomes of patients identified with the algorithm in a large-scale administrative database.

2 Materials and methods

2.1 Study design

We performed a retrospective cohort study evaluating novel algorithms for identifying PMV in administrative data using independent derivation and validation datasets. New algorithms were compared to the gold standard of actual duration of mechanical ventilation. We defined PMV as invasive mechanical ventilation for at least 14 or 21 days. Duration of mechanical ventilation ≥21 days is a standard definition endorsed by a recent national consensus conference (MacIntyre et al. 2005). We also evaluated the lower threshold of ≥14 days in order to reflect the increasing use of LTAC hospitals for the management of some patients requiring PMV, as some LTACs may admit PMV patients prior to day 21 (Scheinhorn et al. 2007a). Day 14 is also close to the threshold prompting decisions to perform a tracheotomy to facilitate PMV (Groves and Durbin 2007). All analyses were restricted to short-stay acute care hospitals due to the clinical and policy relevance of PMV in this hospital group.

2.2 Derivation

The algorithms were derived in a dataset of all patients undergoing invasive mechanical ventilation at the Hospital of the University of Pennsylvania (HUP) during calendar year 2006 (henceforth referred to as the “HUP validation data”). HUP is an 865-bed university-affiliated teaching hospital located in Philadelphia, Pennsylvania. We began with mechanically ventilated patients since previous work has shown that ICD-9-CM procedures codes are extremely sensitive and specific for identifying mechanical ventilation (Quan et al. 2004). The hospital’s administrative database was used to obtain standard administrative data from each patient. Actual duration of mechanical ventilation was obtained from daily resource utilization records, which contain a hospital charge for each day the patient received invasive mechanical ventilation. We excluded patients less than 18 years of age and trauma patients (identified using ICD-9-CM codes for major trauma) due to their unique risk profile and outcomes.

We evaluated algorithms based on various combinations of three main variables commonly available in administrative data: ICD-9-CM procedure codes for mechanical ventilation, DRG assignment for mechanical ventilation and tracheostomy, and ICU and hospital length of stay. The rationale for this strategy was that patients requiring PMV would receive either a DRG or ICD-9-CM procedure code indicating mechanical ventilation of 4 or more days and be in the ICU for at least 14 or 21 days. We also evaluated algorithms based on hospital length of stay, since ICU length of stay is not available in all administrative datasets.

We evaluated each of the following administrative definitions of prolonged mechanical ventilation:

  • DRG 483, 541 or 542

  • ICD-9-CM procedure code 96.72

  • ICD-9-CM procedure code 96.72 and ICU length of stay ≥14/21 days

  • ICD-9-CM procedure code 96.72 and hospital length of stay ≥14/21 days

  • (DRG code 483/541/542 or ICD-9-CM procedure code 96.72) and ICU length of stay ≥14/21 days]

  • (DRG code 483/541/542 or ICD-9-CM procedure code 96.72) and hospital length of stay ≥14/21 days

Performance characteristics of the algorithms were evaluated by examining sensitivity, specificity, positive predictive value and negative predictive value compared against the gold standard of actual duration of mechanical ventilation. The algorithms with the highest sensitivity and specificity were selected for external validation.

2.3 External criterion validation

We performed external criterion validation of the final algorithm by determining their sensitivity, specificity, positive predictive value and negative predictive value among patients receiving mechanical ventilation for community acquired pneumonia in the Premier Hospital dataset (henceforth referred to as the “Premier validation data”). Premier is a voluntary association of United States hospitals that pool clinical and administrative data for benchmarking and quality improvement purposes. The Premier validation data contain standard administrative variables as well as daily resource utilization codes for each individual service, making it possible to directly identify duration of mechanical ventilation. We used data from 377 hospitals participating in Premier during calendar year 2004. Patients with community-acquired pneumonia were selected because they are a standard population at high-risk for chronic critical illness and PMV. Community-acquired pneumonia was defined as having a discharge diagnosis of pneumonia, a chest X-ray on each of the first two days of hospitalization, and antibiotic administration typical for community-acquired lung infections. Due to limitations on data use, this validation step was only performed for the 14 day definition.

2.4 External construct validation

Next, we evaluated the performance of the final algorithms in the Pennsylvania hospital discharge database from fiscal years 2005 and 2006 (henceforth referred to as the “Pennsylvania state validation data”). The purpose of this step was to describe the incidence, demographic characteristics and outcomes of patients identified as requiring PMV in a typical administrative dataset. This step offers several unique advantages over the typical criterion validation step described in Sect. 2.3. First, it allows us to directly observe the incidence and outcomes of the disease at the population level and compare whether they are similar to expected values. Second, it allows us to test specific hypotheses about patients meeting the administrative definition, further supporting the ability of the algorithm to identify the target patient population.

All 155 general medical-surgical hospitals in Pennsylvania were included in this analysis. We identified patients requiring mechanical ventilation using ICD-9-CM codes 96.70, 96.71 and 96.72 (mechanical ventilation-time unspecified, <96 consecutive hours, and ≥96 consecutive hours, respectively) (Quan et al. 2004). We excluded patients less than 18 years of age and patients with major trauma by ICD-9-CM diagnosis code. Co-morbid illnesses were defined using the method of Elixhauser (Quan et al. 2005). Demographic and clinical characteristics of patients meeting the administrative definition in the Pennsylvania state discharge database were compared to all patients requiring mechanical ventilation. Additionally, we evaluated the relationship between the percentage of ventilated patients requiring PMV at each hospital and that hospital’s academic status and annual volume of mechanically ventilated patients. We defined academic hospitals as those with a resident to bed ratio ≥0.2 (Volpp et al. 2007). We hypothesized a priori that academic hospitals and hospitals with a large number of mechanically ventilated patients would have a higher proportion requiring PMV, both because high volume hospitals tend to have better survival and high volume hospitals tend to care for more complex patients (Kahn et al. 2006). These associations were evaluated graphically (for annual volume) and with an unpaired t-test (for academic status).

2.5 Sensitivity analysis accounting for inter-hospital transfers

Any algorithms based on length of stay might underestimate the incidence of PMV if patients are transferred to other acute care hospitals or LTACs prior to the cutoff day. Because neither the HUP derivation dataset nor the Premier validation dataset has data on patients after inter-hospital transfer, we could not formally evaluate the effect of transfers on algorithm sensitivity and specificity. Instead, we performed a sensitivity analysis using the Pennsylvania state validation dataset, evaluating the maximum possible impact of transfers on PMV incidence. To evaluate how early transfer to another short-stay hospital might affect PMV incidence, we identified potential “missed cases” in which a patient might have met the PMV definition had they not been transferred. Missed cases were defined as cases in which a patient was transferred from acute care hospital to another, mechanical ventilation occurred on both sides of the transfer, and the total ICU length of stay combining the two hospitalizations was greater or equal to the cutoff. Many people meeting this definition would not have PMV, making this a conservative estimate of missed cases. For this analysis we identified ICU transfers by directly observing transfers in the claims as previously described (Iwashyna et al. 2009).

To evaluate how early LTAC transfers might affect PMV prevalence we defined potential missed cases as patients requiring mechanical ventilation transferred to an LTAC before the cutoff date and where ICU length of stay equaled the hospital length of stay. This approach would identify patients most likely to receive mechanical ventilation on transfer since they were directly transferred to an LTAC from the ICU and is also conservative. For this analysis we defined transfer to an LTAC using the discharge location field in the records.

3 Results

3.1 Derivation

A total of 1,748 patients underwent invasive mechanical ventilation at HUP during the study period. Of those, 1,500 were adult non-trauma patients (Table 1). One hundred and forty-four patients (9.6%) received mechanical ventilation for ≥14 days and 82 patients (5.5%) received mechanical ventilation ≥21 days. Compared to all patients receiving mechanical ventilation, those receiving PMV were older, had much higher in-hospital costs, and were much more likely to be discharged to a rehabilitation hospital, skilled nursing facility or LTAC than to home.

Table 1 Characteristics of patients receiving invasive mechanical ventilation in the HUP derivation dataa

Table 2 shows the performance characteristics of the administrative definitions in the derivation dataset. As expected, DRG 541/542 alone and ICD-9-CM procedure code 96.72 alone had poor sensitivity. The best definition was the combination of DRG 541/52 or ICD-9-CM 96.72, combined with an ICU length of stay ≥14 or 21 days. A schematic of this definition giving hypothetical clinical examples of negative and positive results is shown in Fig. 1. The negative predictive values of these definitions were excellent [≥14 days: 99.1% (95% CI: 98.3–99.5%); ≥21 days: 99.9% (95%CI: 99.5–100%)], indicating that we were identifying nearly all PMV patients. The positive predictive values were lower but still acceptable for a relatively rare disease [≥14 days: 62.1% (95% CI: 55.3–68.7%); ≥21 days: 61.1% (95% CI 52.2–69.5%)]. An identical definition except using hospital length of stay had similar sensitivity but lower specificity, since duration of mechanical ventilation is more tightly correlated with ICU than hospital length of stay.

Table 2 Performance characteristics of administrative definitions of prolonged mechanical ventilation in the HUP derivation data. N refers to the number of patients meeting gold standard definition or the number of patients identified by each algorithma
Fig. 1
figure 1

Schematic of administrative definition of prolonged mechanical ventilation (14 days). Shaded areas represent time on ventilator. Non-shaded areas represent time in ICU not on a ventilator. Hypothetical patient scenarios: Patient A is a true PMV patient captured by the administrative definition. Patient B was ventilated for 7 days and then died—they would not be captured by the administrative definition and are a true negative. This case could also be viewed as indeterminate under an alternate gold standard, since had they survived they might have gone on to PMV. Patient C was ventilated for 7 days and then was transferred to another acute care facility—this patient would not be captured by the algorithm and may be either a true negative or false negative, depending on their length of ventilation at the new facility. Patent D was ventilated for 4 days and remained in the ICU for 9 days following ventilation—this patient would not be captured by the algorithm and is a true negative. Patient E was ventilated for 7 days and remained in the ICU for 7 days following ventilation—this patient would be captured by the algorithm and is a false positive

Characteristics of patients meeting the best administrative definition in the derivation dataset are shown in Table 3, both for all patients and divided into true positives and false positives. Compared to patients meeting the clinical definition (shown in Table 1), patients meeting the administrative definition had somewhat lower in-hospital costs and post-discharge skilled care facility utilization. Otherwise they were very similar, with nearly identical age and gender distributions. Importantly, among the false positives costs were high and the proportion of patients discharged home were low. Few of the false positives had a very short duration of mechanical ventilation—the 25th percentile in the 14 day group was 9 days and the 21 day group was 13 days.

Table 3 Characteristics of patients meeting administrative definition of PMV in the Premier validation dataa

3.2 External criterion validation

In the first validation step we evaluated 20,370 adult patients who met our definition of community-acquired pneumonia in the Premier validation data. Of those, 4,762 (23.4%) required mechanical ventilation for ≥14 days. Our final algorithm was able to identify 14 day PMV patients with very good sensitivity (87.6%, 95% CI: 86.7–88.6%) and specificity (88.5%, 95% CI: 88.0–89.0%). The negative predictive value remained excellent (95.9%, 95% CI: 95.6–96.2%) and the positive predictive value was improved (69.9%, 95% CI: 68.7–71.0%) due to the higher incidence of PMV in this population.

3.3 External construct validation

In the second validation step we evaluated 62,383 adult non-trauma patients receiving mechanical ventilation in the Pennsylvania state validation data (Table 4). Of those, 8,878 patients (14.2%) met the administrative definition for PMV at 14 days and 4,300 (6.9%) met the administrative definition for PMV at 21 days. Demographics, clinical characteristics, costs and hospital outcomes were similar to patents meeting the clinical definition in the derivation dataset. These patients also had extremely high utilization of skilled-care facilities after hospital discharge: over 30% were discharged to a rehabilitation hospital or skilled-nursing facility and over 15% were discharged to an LTAC hospital. As hypothesized, hospitals with a large annual volume of mechanically ventilated patients tended to have a higher proportion of patients meeting the definition of PMV (Fig. 2). Academic hospitals also tended to have a higher proportion of patients requiring PMV (14 days: 15.6% vs. 10.4%, P < 0.001; 21 days: 8.7% vs. 4.4%, P < 0.001).

Table 4 Characteristics of ventilated patients in the Pennsylvania state validation dataa
Fig. 2
figure 2

Relationship between each hospital’s number of eligible mechanically ventilated patients and the proportion of patients receiving prolonged mechanical ventilation (Pennsylvania state validation data)

3.4 Sensitivity analysis accounting for inter-hospital transfers

Potentially missed cases due to episodes of mechanically ventilation interrupted by an inter-hospital transfer were generally small. In the Pennsylvania state validation data, accounting for potential missed cases under a worst-case scenario (i.e., the most number of cases potentially missed due to inter-hospital ICU transfer) would add an additional 567 cases for the 14 day definition beyond the 8,878 detected (absolute prevalence increase: 0.9%) and 393 cases under the 21 day definition beyond the 4,300 detected (absolute prevalence increase: 0.6%). The same was true for early discharges to LTACs. Accounting for potentially missed cases under a worse-case scenario (i.e., the most number of cases potentially missed due to early LTAC transfer) would add an additional 428 cases for the 14 day definition (absolute prevalence increase: 0.7%) and an additional 684 cases for the 21 day definition (absolute prevalence increase: 1.0%). Of note, in the HUP derivation dataset no patients receiving mechanical ventilation were transferred before day 14.

4 Discussion

An algorithm for identifying a population of patients requiring PMV in administrative and claims data was both sensitive and specific, reliably identifying patients with extremely high hospital costs and post-discharge care needs. The algorithm works by identifying patients either classified into DRG 541/542 (or its equivalent) or receiving ICD-9-CM procedure code 96.72, then limits this group to those with an ICU length of stay greater or equal to either 14 or 21 days, depending on the preferred clinical definition. This definition had very good performance characteristics both in the HUP derivation data and the independent Premier validation data. The algorithm performed as expected in Pennsylvania state validation data, identifying an appropriate number of patents with typical demographic and clinical characteristics for this population.

The final algorithm is easily applied to most administrative datasets, which uniformly contain a range of ICD-9-CM procedure codes and the patient’s DRG. The algorithm requires ICU length of stay, which is not consistently available in all administrative data. We considered an algorithm that uses hospital length of stay, however this algorithm had a high false positive rate resulting in insufficient specificity. Notably, ICU length of stay is available in a large number of state discharge databases and Medicare Provider Analysis and Review (MedPAR) file. These databases are routinely used to address critical care health services research questions in the United States (Angus et al. 2001; Birkmeyer et al. 1999; Martin et al. 2003). A valid and reliable definition of PMV applicable to MedPAR opens the door for an extensive body of work evaluating clinical outcomes, risk factors, and resource utilization of this high-risk patient population (Wunsch et al. 2005).

Several factors necessitate ongoing research into patients requiring PMV. First, critical care use is extremely costly. Current critical care expenditures in the United States now exceed $13 billion or roughly 0.6% of the entire gross domestic product (Halpern et al. 2004). Targeting a patient population with the highest resource utilization may contribute significantly to curtailing future costs. Second, the aging of the population means that demand for critical care is expected to rise (Angus et al. 2000). This trend should result in an increase in the incidence of PMV. Third, in recent years there have been dramatic changes to the way we provide care for patients requiring PMV. Increasingly care is shifted from ICUs in short-stay hospitals to LTAC hospitals which operate as dedicated facilities oriented to weaning patients requiring PMV from the mechanical ventilator (Scheinhorn et al. 2007b). LTACs are the most rapidly growing segment of hospital medicine in the United States, increasing at a rate of 12% per year and accounting for $3.1 billion in Medicare spending in 2004 (Medicare Payment Advisory Commission 2004). Our newly developed and validated algorithm will facilitate research into this evolving care model.

Although our algorithm is useful for identifying patients for PMV, caution should be used in interpreting the incidence of PMV as a marker for quality. A low incidence of PMV could mean high quality (because chronic critical illness is avoided through reduced complications use of evidence-based practices) or low quality (because patients die before entering the chronic critical illness phase). PMV should not be used as a quality measure independent of mortality. This problem would occur with both the administrative definition and the gold standard, and is shared with many functional measures of quality which can only be measured in survivors.

Our study provides insight into a method of validating administrative definitions of clinical diseases and syndromes. At their most basic, algorithm validation studies can divide a single-center patient sample into development and validation sets. Although this method frequently suggests excellent performance, the observed accuracy of the algorithm may be overly optimistic due to insufficient independence between the derivation and validation data set. Idiosyncratic features of the study site are preserved across the derivation and validation steps, limiting generalizability. External validation in a distinctly different clinical setting, as we have done, is likely to result in more realistic estimates of algorithm performance and improved generalizability. For example, in this case both sensitivity and specificity of the algorithm were reduced in the Premier validation data (sensitivity: 93.4–87.6%; specificity: 94.0–88.5%). Also of note, the positive predictive value was higher in the Premier validation data. Unlike sensitivity and specificity, which are entirely dependent on test characteristics, positive and negative predictive value are highly dependent on the incidence of disease. As the incidence of PMV rose from the HUP derivation data to the Premier validation data, the performance characteristics of the test improved.

We also tested our algorithm in real world conditions, using a large multi-payor state discharge database (the Pennsylvania state validation data). This step, infrequently performed in validation studies, allows us to examine the characteristics of patients identified using the algorithm and whether they are similar to expected. As we hypothesized, the incidence of PMV would be higher in academic hospitals and high volume hospitals. In this way, a formal evaluation of whether specific hypotheses about the incidence of our disease hold true in the administrative dataset can lend additional support to the algorithm. This external validation step further provides a baseline for other users, allowing them to confirm the adequacy of their implementation.

Our algorithms have several limitations. The requirement for ICU length of stay may limit generalizability. ICU length of stay can depend on factors other than severity of illness, including bed availability and local resource utilization practices. In hospitals in which patients remain in the ICU for long periods following the discontinuation of mechanical ventilation the algorithm will lose specificity. Additionally, the algorithm is dependent on exactly how ICU days are tabulated in administrative data. In the current MedPAR file days in true critical care units are not easily distinguished between days in intermediate care units (Halpern et al. 2007). Our algorithms will remain sensitive in hospitals that provide mechanical ventilation in intermediate care units, such as step-down and weaning units (Hoffman et al. 2006). The algorithms may lose sensitivity in hospitals which provide mechanical ventilation only in ICUs, since patients in intermediate care units will have an ICU length of stay significantly exceeding the duration of mechanical ventilation. As with any test for a rare disease, the positive predictive values are less than optimal. Nonetheless, even the false positives represented a patient population with high resource utilization and poor outcomes, reflecting true chronic critical illness. Overall the definition identified a patient group with similar clinical characteristics to other cohorts of patients requiring PMV (Combes et al. 2003; Douglas et al. 2002; Engoren et al. 2004; Gracey et al. 1992). In the absence of a specific ICD procedure code for prolonged mechanical ventilation our algorithms remain a useful surrogate, yet they should be applied with caution in areas where extremely high positive predictive value is important.

Additionally, the algorithms will not identify mechanically ventilated patients who are transferred to another acute-care hospital or LTAC prior to day 14. These patients are classified as negative and may be either true negatives or false negatives, depending on the duration of mechanical ventilation at the receiving hospital (see Fig. 1). To the degree that these patients are false negatives, our algorithm will underestimate the incidence of PMV at hospitals that tend to transfer mechanically ventilated patients to other hospitals. We expect these patients to be rare, especially in light of Centers for Medicare and Medicaid Services regulations that limit hospital reimbursements for PMV patients transferred to other acute care hospitals early in their course. Indeed, in the derivation dataset no ventilated patients were transferred to LTACs prior to day 14. As shown in our sensitivity analyses, the maximum potential impact of these patients on our algorithms is relatively small, although not being able to evaluate transfers across states limited our ability to fully assess the impact of transfers. Future research using national datasets such as the MedPAR should address this potential problem.

We also did not evaluate the performance of these algorithms in non-short-stay acute care hospitals. Our algorithms should not be extended to other settings such as LTACs or skilled nursing facilities without further evaluation. Finally, as in all administrative definitions of clinical conditions, our algorithms may be sensitive to variation in coding practices. It will only identify PMV patients who received DRG 541/542 (or their newly developed equivalents 003/004) or ICD-9-CM procedure code 96.72 (mechanical ventilation ≥96 consecutive hours). Both of these codes have been utilized with success in past research and there is no reason to suspect that variation in coding will affect this definition more so than any other administrative definitions of disease.

As the hospital information technology infrastructure evolves it will be possible to better identify populations of patients requiring PMV and other critical illness syndromes in administrative data (Wunsch et al. 2005). As in the Premier validation data, daily resource utilization codes for mechanical ventilation are a potential method of identifying PMV with accuracy. Unfortunately these variables are not currently available in most administrative data sources. In the absence of fully integrated administrative and clinical data or a specific ICD procedure code for PMV, the algorithm we have developed for identifying patients requiring PMV is both reliable and valid. Future research efforts can use this definition to investigate the costs and outcomes of patients requiring PMV in large populations of hospitalized patients.