Introduction

Prior to the discovery of X-rays by Wilhelm Röntgen in 1895, the diagnosis of spinal injury was primarily based on clinical observations. Because of the impressive clinical presentation in spinal injury patients with concomitant neurological deficits, spinal injuries were frequently categorized as injuries with or without spinal cord injury (SCI) [19]. Böhler [14] published the first categorization of spinal injuries in 1929 based on plain radiographic examinations of treated spinal injury patients during World War I. Since then, there has been an enormous growth of injury classification systems in spinal trauma literature, ranging from general spinal injury classifications [47, 71], to specific classifications of lateral mass injuries of the subaxial cervical spine [59, 63].

Out of the numerous spinal injury classification systems, only a few have been evaluated for reliability or validity [22]. Recently, the Spine Trauma Study Group (STSG) introduced multidimensional injury classification systems for the subaxial cervical and thoracolumbar spine. These classification systems have been evaluated for content validity and subjected to reliability testing by the STSG expert committee [74, 99, 101, 102]. However, the generalizability and practicability of these classifications outside of the expert committee have not been reported to date.

The introduction, evaluation, validation, and clinical and/or scientific implementation of a new classification system is a challenging and time-consuming process [7]. Currently, however, no review evaluating the elementary basics of a spinal injury classification system has been published. The aim of this study is threefold: first, to review the methodological principles of spinal injury classifications; second, to critically appraise the contents of current spinal injury classifications; and third, to discuss future directions of spinal injury classifications.

The principles of spinal injury classification systems

A spinal injury classification system should be clinically relevant, reliable, and accurate. A classification can only be labeled as ‘valid’ once it has been proven to fulfill these three crucial criteria. The properties of these three criteria will be reviewed in detail in the following paragraphs.

Clinical relevance of spinal injury classification systems

Clinical relevance

Spinal injury classification systems are used as a prognostic tool to determine natural history outcomes, guide treatment decision-making, and predict the possibility of complications [35, 68]. In clinical research, spinal injury classifications are also used to compare different treatments for identical injury types and similar treatments for different injury types [104, 105]. Classification categories that perfectly guide treatment decision-making have excellent construct validity. Construct validity refers to how well a measurement conforms to theoretical constructs. For instance, if a single morphological injury characteristic is theoretically believed to guide between two treatment options, a measure of this characteristic that has construct validity would show this guidance on the choice of treatment [51, 108].

Content and face validity

A spinal injury classification is considered clinically relevant if it comprises the most relevant items. The content of the classification should be valid. Content validity examines how well the classification represents all aspects of the phenomena under study. Content validity of a classification system is often established through subjective judgments, i.e., face validity, about whether the relevance and applicability of a diagnostic item seems reasonable [51]. To illustrate, the synthesis of the STSG subaxial cervical and thoracolumbar spine injury classification systems were both preceded by extensive review of the literature and consensus achieved at expert meetings [99, 101].

The contents of published spinal injury classifications vary considerably. Both incomprehensive classifications solely based on locations of a fracture line, like the Anderson and D’Alonzo [4] odontoid process fracture classification, and comprehensive classifications based on neurologic function grade, spinal canal deformity, and spinal biomechanical stability, like the Tsou et al. [97] thoracic and lumbar spine injury severity classification, have been introduced. In a review of thoracic and lumbar fracture classifications, Mirza et al. [73] summarized the expectations of an ideal spinal injury classification system. These expectations include descriptions of injury severity, pathogenesis, and causal biomechanical forces, in addition to clinical, neurological, and radiographical characteristics of the injury, see Table 1.

Table 1 The range of reported expectations for an ideal spinal injury classification system as proposed by Mirza et al. [73], reprinted with permission

Critical appraisal of the contents of spinal injury classifications

It is clear that creating an ‘ideal’ spinal injury classification which includes elements like the ones proposed by Mirza et al. [73] remains an unachievable objective. There are currently no guidelines which specify the minimally required contents of a spinal injury classification. However, in line with the underlying philosophy of the Müller AO Classification of Fractures in Long Bones [75], we believe that the characterization and categorization of spinal injuries should primarily be based on characteristics that can be reliably identified on diagnostic images. The following three key issues clearly illustrate that the content of a spinal injury classification system does not need to be comprehensive at all:

Defining spinal injury: clear semantics, clear focus

The spine (syn.: vertebral column) is defined as “the series of vertebrae that extend from the cranium to the coccyx, providing support and forming a flexible bony case for the spinal cord [93].” Although this definition appears straightforward a clear distinction must be made, for the spinal cord itself is not part of the spine. The spine is a multisegmental osseous structure that covers the spinal cord. Given this definition, spinal trauma can, by definition, result in two closely related yet clearly distinguishable injuries: spinal injury and spinal cord injury (SCI).

Therefore, the content of a spinal injury classification should only include injury characteristics of the vertebral column. Distinguishing spinal injury patients with SCI from spinal injury patients without SCI is rather based on a categorization of the clinical presentation than a classification of the spinal injury itself [19]. Similar to the way soft tissue injuries are classified separately to fractures of long bones [83], spinal cord injuries should also be classified separately to spinal injuries [18]. Continuing with this parallel, just as the Gustilo-Anderson [41] classification is commonly used to categorize the severity of soft tissue injury, the American Spinal Injury Association (ASIA) International Standards for Neurological Classification of Spinal Cord Injury have become the standardized and routinely adopted classification for traumatic SCI [1, 94].

Diagnostic work-up of spinal injuries: principle determinant of classification accuracy

A dramatic shift in the diagnostic work-up of spinal injuries has occurred over the past several years. In contrast to a decade ago, computed tomography (CT) scans of the spine are now being routinely performed during the diagnostic work-up in approximately 80% of level I trauma centers [40, 96]. Not without reason. Increasing evidence supports the use of (multi-detector) CT of the spine rather than conventional radiography in the diagnostic work-up of patients at both high risk and low risk of spinal injuries [9, 17, 27, 34, 39, 45, 48, 70]. Moreover, with respect to clearance of the spine, magnetic resonance imaging (MRI) of the spine is currently not indicated in the absence of CT abnormalities and neurological deficits [23, 46, 50, 60, 86]. CT images have shown to be more accurate in the visualization of potentially prognostic spinal injury characteristics than conventional radiographs [8, 11, 20, 24, 55, 69] and MRI [49, 57]. In addition, physicians may come to different treatment decisions after examining additional CT images [24, 54].

Given this brief summary of recent advances in the diagnostic work-up of spinal injuries, it is clear that the contents of a contemporary spinal injury classification should primarily be based on the increasingly routinely performed CT imaging.

Classifications, severity measures, and treatment algorithms

Following on from the first two issues, our proposed concept of a spinal injury classification is an incomprehensive one. It should be based on characteristics identified on diagnostic images of the vertebral column only. This concept, however, does not include all prognostic factors associated with spinal injury treatment outcomes. Therefore, other instruments that facilitate case management, communication, and education in the diagnostic-therapeutic pathway of spinal injury can be used in addition to the initial classification (see Fig. 1).

Fig. 1
figure 1

Flowchart including three instruments in spinal injury management: classifications systems, severity measures, and treatment algorithms. *Including spinal cord injury, ATLS advanced trauma life support

Once a spinal injury has been categorized, the true extent of the injury’s severity needs to be evaluated. During this process, other relevant injury classifications, like the ASIA International Standards for Classification of Spinal Cord Injury [1], can be integrated into a spinal injury severity measure to direct treatment and determine prognosis. Although controversy remains regarding its effectiveness, surgical spinal decompression is increasingly regarded as indicated in presence of concomitant neurologic deficits [30]. Once the effectiveness of surgical spinal decompression becomes clearer, a spinal injury severity measure or scale can be adapted without altering the underlying spinal injury classification system [81]. In addition to the presence of concomitant neurologic deficits, vascular injuries or even more general injury severity measures can also be considered for integration into a spinal injury severity measure [10, 29].

Even more so than a classification system, a severity measure should guide treatment decision making. However, spinal injury treatment decisions are not entirely based on injury severity measures. Pre-existing comorbidities have already shown to be significant prognostic factors of mortality outcomes in the general blunt trauma population [12, 72, 103]. Together with the spinal injury severity measure, these systemic aspects should be integrated into a spinal injury treatment algorithm. In addition, special attention should be given to potential prognostic issues of various treatment options, including the risk of complications. Compared with the other two instruments, the treatment algorithm is potentially most often subjected to adaptations as a result of the steady evolution of treatment options and increasing evidence of their efficacies.

The three spinal injury management instruments are characterized by their increasing grades of clinical relevance. The initial classification systems currently receive the most scientific attention in clinical research. Although potentially useful for clinical decision making, the recently published STSG spinal injury severity measure scales are considered to be of limited value due to a lack of descriptive and communicative dimensions [21]. Even though the STSG subaxial cervical and thoracolumbar spine injury severity scales may have shown excellent evidence of construct validity, the successful scientific implementation of the underlying classification systems remains to be seen [99, 101].

Epidemiological properties of spinal injury classification systems

Reliability

Reliability, or precision, is the extent to which repeated measurements under similar conditions of the same case agree with one other [25]. In general, there are three potential sources of variation during the classification process. These are (1) the patient, (2) the diagnostic instrument, and (3) the physician [110]. As spinal injury classification systems are primarily based on diagnostic imaging, the potential variability of the immobilized, supine patient and diagnostic instrument are normally minimal. If diagnostic images do appear to be of suboptimal quality, new images should be obtained for the sake of patient safety before being evaluated by a physician [80]. Physician variability is the most susceptible factor affecting the reliability of the classification.

Two types of physician, or observer, variation are commonly distinguished in fracture classifications: inter-rater reliability and intra-rater reliability. Inter-rater reliability assesses the reliability, or agreement, of the classification system when measured by different people under similar conditions. Intra-rater reliability assesses the reliability, or reproducibility, of the classification system when measured more than once by the same rater. From a clinical perspective, the inter-rater reliability of a classification system is considered to be more important than the intra-rater reliability.

Accuracy

Accuracy is the degree to which the classification system actually represents what it is intended to represent. The accuracy of a measurement is best assessed by comparing it whenever possible to a ‘reference standard’ technique that is considered to accurately represent the truth. Although no studies correlating CT detected fracture and dislocation patterns with intraoperative findings have been published so far, CT is currently regarded as the number one reference standard with very high sensitivity rates being reported [9, 17, 27, 34, 39, 45, 48, 70].

Reliability, accuracy, and error

The principle difference between reliability and accuracy is that reliability concerns reproducibility and agreement, whereas accuracy concerns representativeness of reality. An unreliable classification system is unlikely to be accurate because of its inherent variability. A classification system can be shown to be reliable, yet it may not be accurate. For instance, Bach et al. [8] reported higher inter-observer agreement in the detection of cervical spine fractures with plain radiographs than with CT images. Nevertheless, CT was more sensitive in detecting cervical spine fractures than plain radiography [8].

Although a clear distinction between these types of error exists, many of the strategies to increase reliability will also improve accuracy [51].

Factors that influence the reliability of spinal injury classification systems

Observation and conversion processes

As previously mentioned, it is physician or observer variability that is most likely to affect a fracture classification’s reliability. Wright and Feinstein identified two physician-related components during the classification process: the first step is the observation process and the second step is the conversion process (see Fig. 1) [110]. During the observation process the physician assesses the extent of the injury by discerning available diagnostic images, ideally with use of predefined process criteria.

The STSG has published three valuable review articles on diagnostic imaging measurement techniques for spinal injuries [15, 16, 56]. They concluded that most of the currently available measurement techniques have not been tested for reliability, accuracy, and validity. Nonetheless, the standardization of observational process criteria may considerably improve observer reliability [110].

Once the properties of the spinal injury have been determined on diagnostic images, the second phase starts: the conversion process. The criteria used to categorize observational data are called conversion criteria [110]. These conversion criteria are literarily the most crucial ones that can make or break the reliability of a spinal injury classification system. The ideal properties of spinal injury classification conversion criteria, or categories, are shown in Table 2 and are summarized below.

Table 2 Ideal properties of spinal injury classification categories

Ideal properties of spinal injury classification categories

  1. 1.

    Clear definitions without ambiguity or freedom of interpretation. If a category description includes subjective terms like minimal, intermediate, or severe dislocation, observers will interpret the severity of a dislocation based on their individual experience [67]. An ideal classification system should result in minimal variability between experienced and inexperienced observers. In order to increase reproducibility and agreement, explanation and elaboration documents can be formulated [1].

  2. 2.

    All-inclusive and mutually exclusive. A classification system should ideally cover all injuries of clinically relevant structures. What clinically relevant structures actually are is a matter of content validity, as previously outlined. Spinal injury patterns should fit into one category only. Once proven to be reliable and valid, quantifiable (measurement) criteria can be applied as an effective categorization tool [110].

  3. 3.

    Clearly distinguishable representative graphic illustrations. Since fracture classifications are primarily based on diagnostic images, graphic illustrations have proven to be an effective means of simplification and clarification [75]. Chapman and colleagues [22] published a valuable reference work providing detailed illustrations of each category in spinal injury classifications available to date.

  4. 4.

    Straightforward and practicable for daily use. A classification system should ideally not consist of a variety of parameters with each parameter requiring different, comprehensive or cost-ineffective diagnostic interventions. In addition, each injury category should preferably be summarized in a single phrase.

  5. 5.

    Limited number of categories. In the search for the ideal classification system, there has always been tension between the multitude of possible patterns of spinal injury, reduction of information, and clinical relevance. The number of categories reported in thoracolumbar spine injury classifications varies from 6 [69] to 55 [66]. Blauth and colleagues showed that the reliability of the Magerl-AO-classification system decreased by an increasing number of subcategories [13, 66]. For a clear hierarchical understanding, subcategories should ideally comprise more detailed injury characteristics than main categories.

  6. 6.

    Characterized by increasing grades of severity. The clinical utility of a spinal injury classification improves when categories are arranged in increasing severity. These may indicate the need for a more demanding therapy, a poorer prognosis, or an increased risk of complications.

  7. 7.

    Each (sub)category alphanumerically coded. The application of an alphanumeric coding classification system is the ultimate method of condensing information on injury characteristics [75]. The strength of an alphanumeric coding system is that it utilizes physicians’ visualization of injury categories based on only a few characters.

  8. 8.

    Injury characteristics easily discernable on diagnostic images. To discern an injury literally means “to distinguish between physiological and posttraumatic findings with the eyes”. The phase of discerning, or detecting, spinal injuries on diagnostic images corresponds to the observation process as described by Wright and Feinstein [110]. Two types of injury characteristics have commonly been used in spinal injury classifications: morphological and biomechanical spinal injury characteristics.

Critical appraisal of current spinal injury characteristic concepts

Morphologic injury characteristics: a study of structure or form

Because of the central role of diagnostic imaging in the diagnosis of spinal injury, descriptions of morphological characteristics have been reported most often in spinal injury classifications. Morphology literally means “a study of structure or form”. In contemporary literature, spinal structures are commonly subcategorized into osseous and disco-ligamentous structures [99, 101].

Böhler [14] was the first to categorize thoracolumbar spinal injuries morphologically based on plain radiographic examinations. Spinal injuries were classified into two main categories: fractures of the vertebral body and fractures of the neural arch. These two morphological anatomical categories were in turn both subcategorized as with or without paralysis.

In 1949, Nicoll [76] applied a more detailed morphological approach to categorize thoracolumbar injuries. Fractures were classified into four main types: (1) anterior wedge fractures, (2) lateral wedge fractures, (3) fracture-dislocations, and (4) isolated fractures of the neural arch. These morphological characteristics can be distinguished by discerning vertebral body contours, displacement and/or fracture lines on radiographic (or other diagnostic) images without necessarily interpreting them. Interestingly, Nicoll’s four categories were further classified as stable or unstable on the basis of the risk of increased deformity and possible cord injury during functional activities. This means that the secondary (‘severity’) categorization was based on the initial classification of observational data. Unfortunately, Sir Frank Holdsworth, who is recognized as one of the fathers of spinal trauma, continued Nicoll’s post-traumatic spinal stability concept without considering the underlying morphological principles of spinal injuries in 1963 [47].

Two decades later, Aebi and Nazarian [2] reintroduced a comprehensive morphological anatomically based classification of cervical spine injuries. It was concluded that mechanism of injury-based classifications lack clinical relevance because of the limited relationship between biomechanical causes of, and treatment options in, spinal injuries. Nevertheless, probably due to its complexity, the Aebi and Nazarian classification did not gain worldwide acceptance [2, 13].

While the first morphological descriptions of spinal injuries were mainly focused on the integrity and alignment of osseous structures, the evaluation of the disco-ligamentous integrity gained increasing interest after the introduction of the MRI technology. It was during the 1990s that the use of MRI in detecting ligamentous spinal injury received much scientific attention and showed promising results for future clinical implementation [28, 78, 79, 85, 95]. More recently, however, conflicting evidence concerning its reliability [37, 42, 87] and accuracy [62, 84, 92, 107] has been published. As suggested in previous reports [28, 62, 64, 78], the true additional value of MRI in the treatment decision-making of spinal injury patients without concomitant neurological deficits has not yet been proven [23]. Because of these current controversies, we do not recommend the use of disco-ligamentous characteristics in spinal injury classifications.

As diagnostic imaging technology continuously evolves and treatment options steadily increase, established and implemented spinal injury classifications may become outdated over time. Classification modifications will then be necessary, similar to re-testing for reliability and validity prior to its clinical and scientific implementation [7].

Biomechanical spinal injury characteristics: speculative causal interpretations

In 1939, Watson-Jones [106] was the first to categorize thoracolumbar vertebral body and facet joint injuries in a biomechanical, morphological manner. Although the three main fracture type categories consisted of morphological descriptions (which are, (1) simple wedge fracture, (2) comminuted fracture, and (3) fracture-dislocation) all three types were considered to be flexion compression fractures of the vertebral body.

Mainly inspired by Watson-Jones’s and Holdsworth’s initial concepts, the spinal injury classifications introduced during the second half of the twentieth century can be characterized by their predominantly hypothetical biomechanical causal descriptions [3, 26, 31, 36, 43, 109]. Spinal injury descriptions which depend on the physicians’ interpretation are still in use [53, 65, 100].

In 2005, the STSG introduced the thoracolumbar injury severity scale (TLISS) [102]. This scale is based on the three major injury characteristics: (1) the mechanism of injury, (2) the integrity of the posterior ligamentous complex, and (3) the patient’s neurological status. Despite the excellent construct validity of the TLISS as a whole, the interobserver agreement for the injury mechanism was marginal with κ-values up to 0.33 being reported [44, 82, 98]. These disappointing values were the main reason to justify the introduction of a modification to the TLISS, the Thoracolumbar Injury Classification and Severity Score (TLICS) [101]. In the TLICS, the mechanism of injury has been replaced by a description of morphological injury characteristics as seen on the injury’s radiographic images.

In a study evaluating the reliability and validity of both the TLISS as well as the TLICS, Whang et al. found much higher agreement for the TLISS injury mechanism and almost equally high agreement for the TLICS injury morphology category, with κ-values of 0.636 and 0.626, respectively. Interestingly, based on these data and the significantly stronger construct validity, the authors suggested that the mechanism of trauma may be a more valuable parameter than fracture morphology for the classification and treatment of thoracolumbar injuries [108].

We do not share this point of view. Although the STSG did modify the TLISS mechanism of injury category into the TLICS morphological characteristics category, the subcategories and textual descriptions are almost similar. It is, in fact, this slight (and incorrect) semantic change which can be considered the main reason for the minimal differences in agreement as presented by Whang et al. [108].

To our knowledge, no mechanistic-based spinal injury classification with clear, unambiguous definitions and mutually exclusive categories exists. Several cadaveric studies have confirmed the difficulties in the reciprocal interpretation of causal biomechanical forces leading to spinal injury. Shono et al. [89] showed that identical vectors and magnitudes of forces applied on the skull resulted in different types of fractures and/or dislocations. Moreover, once the integrity of the spinal column is disrupted at the initial moment of injury, altered injury vectors during subsequent moments of the injury make the interpretation of mechanical forces leading to contiguous or non-contiguous injuries difficult, if not impossible, to interpret [77]. Because of its proven highly speculative nature, we do not recommend the use of biomechanical characteristics in spinal injury classifications [88].

Two crucial spinal injury characteristics: location and morphology

As shown in Table 3, two principle characteristics should ideally be detectable without difficulty on diagnostic images: the location of spinal injury and the morphology of the injured spinal structure. The location of the spinal injury can be categorized in one of five levels of accuracy: (1) a non-specified location of the spinal injury, (2) injury of a spinal region, (3) injury of a spinal (articulating) level, (4) injury of an anatomical structure, and (5) injury of a region within an anatomic structure (see Table 3).

Table 3 Two injury characteristics easily discerned by diagnostic imaging: location and morphology

Morphological characteristics should also be easy to discern from diagnostic images. In essence, three major morphological characteristics can be identified: (1) the configuration of the fracture line, (2) the extent of tissue involvement (osseous or disco-ligamentous), and (3) presence of displacement (see Table 3). These three morphological characteristics are crucial aspects in determining the spinal injury’s severity and stability. As the estimation of posttraumatic spinal stability is primarily based on the consideration and interpretation of the discerned morphological characteristics, we think that a stability concept should not be integrated in the initial spinal injury classification, but rather in a spinal injury severity measure. Nonetheless, injuries should ideally be characterized by increasing grades of severity in the initial spinal injury classification system. Prior to the implementation of morphological characteristics in a classification system, diagnostic imaging measurements necessary to quantify these characteristics should ideally have been tested for reliability, accuracy, and validity [15, 16, 56].

The classification of lower cervical spine injuries (CSISS) as recently developed by Anderson perfectly addresses these two principle injury characteristics [5, 74]. Using only the spinal injury’s location and true morphological descriptions, excellent agreement (κ-value: 0.883) and reproducibility (κ-value: 0.977) can be obtained. Zehnder et al. [111] confirmed these findings in an external validation study. One should keep in mind, however, that an excellent reproducibility of a classification system does not say anything about its content and construct validity, nor about its clinical utility. For a classification to be clinically relevant and scientifically valid a validation pathway should ideally be completed successfully.

Validation pathway of spinal injury classification systems

As the aim of this study was to review methodological aspects faced during the development phase of a spinal injury classification system, the process of validation and clinical implementation has not been described in detail. In 2005, Audigé et al. [7] proposed a 3-phase validation concept for general orthopedic fracture classifications (see Fig. 2). During the first phase, as described in detail in this study, classification categories are defined following extensive literature research and expert consensus meetings. To pursue future success, pilot agreement studies assessing both reliability and accuracy should also be performed during this phase. After the development of a fracture classification, a multicenter agreement study should be conducted among a representative group of future users of the classification. Finally, the prognostic value of the classification needs to be assessed in prospective clinical studies investigating patient outcomes of different treatments.

Fig. 2
figure 2

Three-phase validation process for fracture classification systems as proposed by Audigé et al. [7], reprinted with permission

This methodological pathway has already been shown to result in successful implementation of fracture classification systems, in particular for the development and validation of the AO pediatric long-bone fracture classification system [90, 91]. Currently, the AOSpine Classification Group is developing new spinal injury classification systems using the same validation pathway.