The multiple organ dysfunction syndrome (MODS) is the defining syndrome of critical illness. It is the raison d’être for the intensive care unit—to provide support of otherwise lethal organ system dysfunction in anticipation of reversibility and ultimate survival. It is also the expression of a complex network of altered biologic processes that, in aggregate, create the phenotype of critical illness. It is, therefore, highly desirable that we have the necessary metrics to measure organ dysfunction quantitatively, and to describe its evolution over time. The tools we currently have, however, are dated and imperfect, and in need of refinement.

MODS can be defined as acute, potentially reversible physiologic organ dysfunction that arises in the wake of a significant homeostatic insult. Its characterization as a syndrome speaks to the prevalent hypothesis that common underlying biologic factors contribute to its pathogenesis. The vagaries of its formulation point to the reality that current understanding of these, if indeed they exist, is limited.

Why measure MODS?

To understand a clinical phenomenon, one must be able to measure it. A quantitative descriptor permits the study of associations between the phenomenon and concurrent or subsequent events, and so enables conclusions about prognosis, pathogenesis, and treatment.

Prognosis

It is a universal finding of studies of MODS that prognosis is directly related to severity, whether measured as the number of failing systems [1] or the aggregate degree of dysfunction across multiple systems [2,3,4]. For the individual patient, significant organ dysfunction implies a need for greater intensity of supportive care, and so for both greater staffing intensity. Reference to individual systems provides information regarding what supports are needed, and even more, alerts the clinician to what further support the patient may need in the future. That being said, this information is generally readily available from routine clinical data, and collecting it through the use of a dedicated organ dysfunction measurement tool is unnecessary.

Quantification of the severity of organ dysfunction has greater value in describing patient populations—characterizing the nature and degree of physiologic derangement at baseline in a clinical trial to prognosticate outcomes or to confirm the effectiveness of randomization. Some interventions may be more effective in patients with greater [5] or lesser [6] degrees of organ dysfunction, and so the baseline severity of MODS may be used for both predictive and prognostic enrichment in clinical trials. More commonly organ dysfunction within a single system—the lung [7], kidney [8], or coagulation systems [9] for example—is used to define an at-risk population for study of an intervention that supports the function of that system.

Pathogenesis

Phenotypic characterization is a prerequisite to understanding pathogenesis. Our current understanding of the pathogenesis of MODS is limited, and descriptors of the syndrome focus on the resulting physiologic and functional derangements, rather than on discrete pathologic processes that result in these. Moreover the physiologic or therapeutic criteria used to characterize organ dysfunction can result from a heterogeneous group of causes. For example, the acute pulmonary dysfunction of the acute respiratory distress syndrome (ARDS) may be caused by inflammatory changes in the lung that can plausibly be considered to be elements of MODS. However hypoxemia can be induced acutely by factors such as volume overload, atelectasis, pneumothorax, or pulmonary embolism—insults that while morbid and potentially life-threatening, are not necessarily reflective of an underlying systemic disorder.

The simultaneous dysfunction of two or more systems may provide greater insight into underlying processes. Early descriptions of MODS, for example, emphasized that the development of remote organ dysfunction was often evidence of undiagnosed and untreated infection [1], and prompted some to recommend that unexplained organ dysfunction was a valid indication for laparotomy [10]. It has not been possible to define discrete patterns of organ dysfunction that might point to a common underlying pathologic process, although this may reflect shortcomings of descriptive systems rather than the absence of more informative patterns of organ dysfunction.

Regardless of the criteria used to define dysfunction, the de novo development of acute organ dysfunction is often the harbinger of an occult infectious complication such as an anastomotic leak or an evolving pneumonia.

Clinical outcome

Finally, the construct of evolving or resolving organ dysfunction becomes a useful metric to track the complex, and often contradictory clinical course of critical illness. While pulmonary or cardiovascular derangements improve, renal and hematologic dysfunction, for example, may be worsening. At the level of the individual patient, such an insight into illness trajectory can suggest emerging management priorities and point to the need to tailor treatment to minimize individual risk; at a population level within a clinical trial, this information may provide evidence of differential therapeutic efficacy, and point to discrete subpopulations of patients most likely to benefit from specific therapies. It has been a recurring finding that corticosteroid therapy can accelerate the resolution of cardiovascular dysfunction [11, 12], suggesting that utility is better understood as a means of hemodynamic support, rather than as broad spectrum anti-inflammatory therapy. Secondary analysis of data from a trial of a monoclonal antibody to tumor necrosis factor (TNF) showed that the biggest impact of treatment was on hastening the resolution of neurologic dysfunction [13].

Each of these potential uses points to performance features that can optimize the selection and calibration of specific measures of organ dysfunction—both within a given system, and aggregated within an organ dysfunction scale or score [14].

How should MODS be measured?

The tools most commonly used to quantify organ system dysfunction are strikingly similar. They also reflect thinking from a quarter of a century in the past and are better seen as a starting point in an exercise that can benefit from updating. Implicit in their structure is a series of assumptions that merit re-evaluation for these assumptions may impact the conclusions drawn; these assumptions represent key questions for the next generation of measurement tools (Table 1).

Table 1 Issues for consideration in the future development of organ dysfunction scales

Which organ systems?

Contemporary organ dysfunction scales evaluate organ dysfunction in six organ systems—respiratory, renal, cardiovascular, central nervous system, hepatic, and hematologic [14]. The selection of these six systems as emblematic of MODS reflects a number of factors—the prevalence of specific dysfunction, the ease of measurement, the availability of supportive measures, and the perceived risk to survival posed by dysfunction. Gastrointestinal dysfunction, for example, is not included because of the challenges of measuring it. Systems that might be more causally linked to the disorder such as the immune system or endothelium are not included for the same reason.

Which descriptors?

We have previously proposed criteria for an optimal descriptor of individual organ system dysfunction (Table 2; [15]). Some of these are self-evident—that the measure be specific for the function of the system, readily and reliably measured, and reproducible across broad populations of patients. Others are worthy of debate. For example, selection of a measure that is continuous, and abnormal in one direction only (i.e., only pathologic if either too high or too low) enables the variable to be used as a continuous variable, so that mean values over time can be plotted. Examples of such variables include creatinine for renal function, PO2/FIO2 ratio for pulmonary function, and Glasgow Coma Score for neurologic function.

Table 2 Characteristics of an ideal descriptor of organ dysfunction. From [24]

Similarly the suggestion that the measure be minimally altered by therapy reduces the likelihood that an intervention that causes harm is conflated with intrinsic patient status. For example, a measure of respiratory dysfunction that is based on whether the patient is receiving invasive mechanical ventilation confuses a patient state (respiratory insufficiency) with a clinical response (intubation and mechanical ventilation). As hypoxemic patients are increasingly managed with non-invasive strategies such as high flow oxygen, for example, the use of a treatment-dependent variable such as intubation and mechanical ventilation to describe organ dysfunction risks can bias measures of organ dysfunction by underestimating the degree of intrinsic derangement in patients in whom less intensive supportive measures have been used. The use of vasopressor dose as a measure of cardiovascular dysfunction poses a similar risk. Recent work has suggested that vasopressor therapy to support blood pressure may actually have deleterious consequences [16]; if the intervention causes harm, is its measurement a reflection of patient risk, or poor clinical decision? It is for this reason that the Multiple Organ Dysfunction Score [2] chose to use a measure that integrates treatment and deranged physiology analogous to the PO2/FiO2 ratio to describe cardiovascular dysfunction in MODS—the pressure-adjusted heart rate (PAR)—the product of the heart rate and central venous pressure, divided by the mean arterial pressure:

$$\mathrm{PAR}=\frac{\mathrm{HR}\times \mathrm{CVP}}{\mathrm{MAP}}$$

Like the PO2/FIO2 ratio, the PAR adjusts physiology for treatment, and so measures alterations in hemodynamic status that are not responsive to fluid therapy. As fluids are administered, the CVP increases: if the heart rate and MAP do not respond accordingly, the value of the PAR rises. A normal value is <10, whereas deranged values are 30 or greater (Infobox 1). While measurement of the PAR assumes knowledge of the CVP, if this is not available, an assumed normal value of 8 can be used.

When should abnormality be measured?

If organ dysfunction is conceptualized as stable postresuscitation deranged physiology, then it follows that it should be measured not by the most extreme values of the component variables, but by representative values. This is an important difference between the Multiple Organ Dysfunction (MOD) score and the Sequential Organ Failure Assessment (SOFA) score (Table 3). Selection of the worst value also introduces a measurement bias, in that the more frequently a variable is measured, the more likely an extreme abnormal value is obtained, and so that the score will be higher. On the other hand, data may not be available at a given time point, and greater degrees of clinically relevant dysfunction may be missed if all data are recorded at a common time point.

Table 3 Comparison of Systemic Organ Failure Assessment (SOFA) and Multiple Organ Dysfunction (MOD) scores

How should variables be weighted?

Existing organ dysfunction scores assume that the impact of dysfunction within individual systems is similar, and so apply a similar score across the spectrum of derangement in each system. The validity of this assumption has not been tested. It is possible, for example, that the mortality impact of respiratory dysfunction is much greater than that of hepatic dysfunction, although this has never been tested. Similarly the costs associated with the management of renal dysfunction may be significantly greater than those associated with the management of hematologic dysfunction. These considerations may impact the conclusions drawn using a score; should they impact its calibration?

How should variables be calibrated?

Finally, a core question arises: if organ dysfunction reflects potentially reversible morbidity, against what criterion should it be calibrated? Mortality is an attractive calibrator, though one could equally argue for a more remote measure that reflects the quality of long-term survival [17]. And if we calibrate measures of acute organ dysfunction against survival, when do we measure that survival—at a landmark time point such as 60 or 180 days, or at a time point such as hospital discharge that reflects progress through the health care system?

Organ dysfunction scores

A number of similar models for the measurement of organ dysfunction are available. The most widely used is the Systemic Organ Failure Assessment (SOFA) score, developed through a European expert consensus process in 1994 (Table 3; [18]). The Multiple Organ Dysfunction (MOD) Score was developed in 1992 through a process that involved explicit criteria for the selection of systems and descriptor variables, and their calibration on the basis of their individual association with ICU mortality [2]. Variables in the MOD score are all single measures of altered physiology without reference to clinical support (Table 3). Other scales such as the Logistic Organ Dysfunction (LOD) Score [19], the Brussels Score [20], and the Denver Score [21] use similar approaches to measuring organ dysfunction, differing to minor degrees in the choice and calibration of included variables.

Organ dysfunction scales enable the aggregate degree of organ dysfunction to be quantified, both at a point in time, and over time. Typically they are treated as ordinal scales for which each range is assigned a value from 0 (normal function) to 4 (markedly abnormal function). However if the variables describing dysfunction in an individual system are single continuous variables, mean values of each can be followed, a particular advantage in detecting effects on the biology of specific systems.

These scales can be used in a variety of ways (Table 4).

Table 4 Modeling organ dysfunction as a measure of acute illness

Prognosis and severity stratification: baseline scores

Although organ dysfunction scores were developed to measure organ dysfunction as an outcome, their calculation at an early time point—at ICU admission or at the time of randomization into a clinical trial—can provide reliable prognostic information, reflecting risk at baseline. A clinical trial, for example, might designate a minimal degree of organ dysfunction as an entry criterion [5], or compare baseline organ dysfunction scores between study arms as a measure of illness severity and the effectiveness of randomization.

Baseline scores provide a measure of the severity of organ dysfunction prior to the initiation of treatment, and so provide a starting point to evaluate either new onset (and therefore potentially modifiable) organ dysfunction, or the rapidity of resolution of existing derangements. Organ dysfunction scores serve as severity measures because illness severity, regardless of how measured, is a risk factor for adverse outcome (and organ dysfunction scores are calibrated against mortality). However an organ dysfunction score should by its very nature provide less prognostic information than a dedicated severity score, for it measures only the risk associated with organ dysfunction, and not that associated with an acute lethal event such as a pulmonary embolism or myocardial infarction.

Temporal progression of organ dysfunction: daily scores

Just as organ dysfunction scores can be calculated at baseline, they can also be calculated on a daily basis as a means of tracking the evolution or resolution of organ dysfunction. The score on a given day is simply the sum of the scores in each of the systems within the score. Calculation of daily scores enables an assessment of both the trajectory of change—towards resolution or exacerbation—and the rate of change, reflected in the slope, or change over time of the score [22].

Organ dysfunction over time: aggregate scores

Studies of the epidemiology of organ dysfunction have typically been agnostic to the timing of that dysfunction. Yet the time course of the evolution of organ dysfunction varies by system, with respiratory and cardiovascular dysfunction typically occurring early in the course of the disease, and renal and hepatic dysfunction, if only because of the time required for their measures to show abnormality, occurring later. An aggregate organ dysfunction score combines the worst scores over a defined period of time for each organ system—for example, the respiratory score on day 2 and the hepatic score on day 5—and so provides a measure of the full burden of organ dysfunction over the time interval of interest. This may be the ICU stay, the first 7 days, or whatever time window is appropriate for use.

An aggregate score measures the severity or burden of organ dysfunction over time, and so is primarily of use as an outcome measure.

New onset organ dysfunction: the delta score

Measurement of an aggregate organ dysfunction score enables the calculation of a delta score, reflecting new onset (and therefore potentially modifiable) organ dysfunction occurring over time:

$$\Updelta \text{Score}=\text{Aggregate score}-\text{baseline score}$$

Delta scores—reflecting worsening organ dysfunction over time—provide a measure of potentially preventable new organ dysfunction, and so are of primary value for clinical trials or quality improvement initiative.

Integration of mortality and morbidity: the mortality-adjusted score

A problem associated with the use of organ dysfunction scores as outcome measures in ICU-based clinical trials is censoring by death: patients who die early do not survive long enough to develop greater degrees of organ dysfunction. This limitation can be overcome through the calculation of mortality-adjusted scores, for which patients who die over the interval of study are assigned a maximal score +1, reflecting the clinical assumption that survival despite maximal degrees of organ dysfunction is preferable to death.

Conclusions

Organ dysfunction has become both the raison d’être for the ICU, and the measure of its successes and failures. It is also the determining element in contemporary definitions of disorders relevant to the ICU, such as sepsis [23]. Measuring organ dysfunction, therefore, is critical to understanding the pathophysiology of acute illness, its evolution over time, and its response to conventional and experimental intervention. Tools to accomplish this aim have been developed, but are rudimentary and in need of updating as knowledge of biology advances, and patterns of ICU-related morbidity change. As the focus of ICU care gradually shifts from salvaging lives to enhancing the long term quality of life of survivors of critical illness [17], the formulation of reliable tools to measure the progression of acute illness assumes even greater importance.

Infobox 1 The pressure-adjusted heart rate: an example

A middle-aged man is admitted in septic shock as a consequence of perforated diverticulitis and acute peritonitis. His heart rate is 120, his blood pressure 80/50, and his CVP 2.

His PAR is calculated as follows:

$$\mathrm{PAR}=\frac{\text{Heart rate}\times \mathrm{CVP}}{\mathrm{MAP}}$$
$$=\frac{120\times 2}{60}$$
$$=4$$

Assuming he responds to volume replacement, with his heart rate dropping to 90, his CVP rising to 8, and his blood pressure increasing to 100/70.

His PAR now becomes:

$$\begin{array}{c} \frac{90\times 8}{80}\\ =9 \end{array}$$

Both values are within the normal range of less than 10.

On the other hand, if his shock state does not fully resolve following fluids, and his heart rate remains elevated at 100 despite a CVP of 15, and a blood pressure of 95/65 on pressors, his PAR is now:

$$\begin{array}{c} \frac{100\times 15}{75}\\ =20 \end{array}$$