Original article
Reliability of detection of lumbar lateral shift

https://doi.org/10.1016/S0161-4754(03)00104-0Get rights and content

Abstract

Background and purpose

The poor reliability of lateral shift detection has been attributed to lack of rater training, biologic variation, and test reactivity. This study aimed to remove the potential confounding arising from biological variation and test reactivity and control the level of rater experience/training in making judgments of lateral shift.

Subjects

One hundred forty-eight raters with 3 levels of clinical physical therapy experience and training in the McKenzie method participated.

Method

The raters viewed photographic slides of 45 patients with low back pain. Slides were judged on a numerical scale for presence and direction of a shift. Intrarater reliability was evaluated using the intraclass correlation coefficient (ICC) and interrater reliability was evaluated using both the ICC and κ statistic.

Results

Reliability of shift judgments was only moderate for all groups (eg, ICC [2,1] values ranged from 0.48 to 0.64).

Conclusion

Lateral shift judgements have only moderate reliability, even when trained raters judge stable stimuli. We propose that the photo model employed can be used to explore the source of error in this process.

Introduction

A recent survey of physical therapists in the United States1 reported that a McKenzie evaluation was one of the most common evaluations performed for patients with low back pain (LBP) and that almost half the therapists viewed the McKenzie method as the most useful management approach for low back pain. Similar results have been reported for British and Irish physiotherapists.2, 3

The method has received support as an effective LBP treatment in a systematic review of activity prescription for back pain4 and also in Danish clinical practice guidelines,5 based on the 2 existing clinical trials.6, 7 Subsequent to the completion of both reviews, Cherkin et al8 published a clinical trial evaluating 3 approaches (chiropractic manipulation, McKenzie therapy, and an educational booklet) and found chiropractic manipulation and McKenzie therapy to have similar effects and costs. However, both treatments provided only marginally better outcomes than an educational booklet.8 In this environment, further information about the use of the basic criteria in the method is needed.

The principal aim of the McKenzie assessment is to first determine those suitable for treatment with this approach. Suitable patients must fit one of 3 syndromes: postural, dysfunction, or derangement.9 The derangement syndrome is further divided into 7 subsyndromes on the basis of pain location, the behavior of the pain in response to the application of repeated spinal movements, and on the presence or absence of deformities including a lateral shift. Because classification determines the specific treatment used by the treating clinician, accurate classification is believed essential for the effective management of the LBP patient.

In employing the method, the presence of a lateral shift is determined by visual inspection at the time the patient's posture is evaluated. If a lateral shift is deemed to be present, lateral glide movements are performed to assess if these alter the patient symptoms. Where this is the case, the shift is classified as “relevant’ and directs the initial treatment approach.9 The initial step of detecting a shift is of paramount importance, because it is only if a shift is identified to be present that its relevance is determined.

A lateral shift is defined as a lateral displacement of the trunk in relation to the pelvis.9 The prevalence of a lateral shift has proved hard to establish, probably because of the problems with measurement of this attribute. Porter and Miller10 suggest that it is an uncommon feature, citing a prevalence of 5.6%; however, later studies report approximate prevalences of 20%11 and 80%.12

The reliability of therapists in determining the presence of a lateral shift has been evaluated in 6 studies to date. In the Kilby et al13 study, 2 physiotherapists with some training in the McKenzie method simultaneously evaluated 41 patients. There was only 55% agreement on the presence or absence of a lateral shift, a value similar to the findings of Nelson et al,14 who reported that the detection of lumbar tilt (lateral shift) had high interobserver error. However, these studies did not provide κ values, and there was insufficient data to allow calculation of this statistic.

Riddle and Rothstein11 examined the intertester reliability of assessments of LBP patients made by physical therapists using the McKenzie method. They also aimed to determine whether training in the McKenzie method influenced reliability. Forty-nine physical therapists from 8 clinics examined 363 patients. Sixteen of the therapists had attended at least 1 postgraduate course in the McKenzie method. The paired assessments were completed consecutively, with a time interval between examinations. They found a high error rate in the determination of the presence of a lateral shift (60% agreement, κ = 0.26) and concluded that this was a possible source of error in the determination of the syndrome classifications.

Donahue et al12 attempted to improve the reliability of the determination of the presence and direction of a lateral shift by using a simple measuring device, but the reported κ value for the decisions indicated very poor reliability. McLean et al15 investigated 3 different techniques for measuring trunk list and concluded that the use of a plumb line provided the most reliable measures; however, there was no summary reliability statistic reported to allow comparison to other studies.

Improved reliability in determining the presence of a lateral shift (78% agreement, κ = 0.52) was demonstrated by Razmjou et al16 for therapists observing the same patient assessment. The 2 physical therapists involved in this study were both trained extensively in the McKenzie method and assessed the patients simultaneously in an attempt to reduce the error related to repeated examinations. They visually determined whether a lateral shift deformity was present for each patient.

Based on the research to date, it remains unclear whether a lateral shift can be detected with acceptable reliability. The measuring devices used to date do not seem to improve reliability, and the reliability estimates are in the range poor to moderate. It is therefore worthwhile to explore the source of disagreement.

Two hypotheses have been offered for the poor reliability observed:

  • 1.

    The attribute is inherently unstable and changes with repeated examination.16

  • 2.

    The attribute is subtle, and clinical experience and training are necessary to reliably measure a lateral shift.17

One way to explore the first hypothesis is to use a model of clinical practice that allows for greater control than would be possible in the clinic, for example, the use of photographs as the stimuli to be rated rather than real patients. This method avoids the potentially confounding effect of the biologic variation of the shift, allows for an unlimited number of repetitions of the same stimuli, and also allows for a much larger panel of raters than is practical in a traditional clinical reliability study.

To explore the effect of clinical experience and training, we selected a cross section of raters, including first-year undergraduate students, graduate physiotherapists with no formal training in the McKenzie method, and graduate physiotherapists with a minimum of 70 hours training in the McKenzie method.

The aims of the study were to investigate:

  • the intrarater/interrater reliability of judgements of lateral shift made from inspection of photographs of patients with low back pain.

  • whether interrater reliability and discriminability were influenced by level of education in the McKenzie method.

Section snippets

Project overview

The design of the experiment required raters to inspect a set of photographic slides of patients with low back pain and to judge whether a shift was present. The photographs of the patients had been taken by the first author on the same day that she performed a full clinical examination of these patients. On the same visit. demographic and clinical data were recorded for each patient.

Patients with low back pain

Patients attending a private physiotherapy clinic for low back pain were invited to participate in the study.

Results

Intrarater reliabilities, as expressed by ICC values with 95% CIs, are shown in Table 2. The ICC values ranged from 0.48 to 0.59, which falls within the range of ICC values described by Fleiss19 as representing fair to good reliability. The interrater reliabilities, expressed as ICC values, ranged from 0.49 to 0.64, again in the range representing fair to good reliability. For both intrarater and interrater reliability, inspection of the 95% CIs reveals that the McKenzie group had

Discussion

Despite using a simplified model of clinical practice that removed any potential for reactivity and biologic variation, the reliability of shift detection remained unacceptably low. While the McKenzie trained raters were more reliable in judging a shift than the other 2 groups of raters, the absolute difference between groups was small and was revealed as statistically significant because of the high power of the study. Our study had unusually high power because we used a model that allowed for

Conclusion

Despite the task of judging the presence or absence of a lateral shift being simplified by the removal of biologic variation and test reactivity, the reliability of the raters in this study was unacceptable. We recommend that this model utilizing photographs of LBP patients be used to further study the features of the lateral shift that influence the rater's decision as to its presence and direction. Once these have been established, a protocol may be able to be developed to improve the

Acknowledgements

This study was approved by the Human Research Ethics Committee of the University of Sydney.

References (23)

  • C. Maher et al.

    Prescription of activity for low-back painwhat works?

    Aust J Physiother

    (1999)
  • J. Kilby et al.

    The reliability of back pain assessment by physiotherapists, using a “McKenzie algorithm”

    Physiotherapy

    (1990)
  • A. Chiradejnant et al.

    Objective manual assessment of lumbar PA stiffness is now possible

    J Manipulative Physiol Ther

    (2003)
  • M.C. Battie et al.

    Managing low-back painattitudes and treatment preferences of physical therapists

    Phys Ther

    (1994)
  • N.E. Foster et al.

    Management of nonspecific low-back pain by physiotherapists in Britain and Ireland

    Spine

    (1999)
  • D.A. Hurly et al.

    Biopsychosocial screening questionnaire for patients with low-back painpreliminary report of utility in physiotherapy practice in Northern Ireland

    Clin J Pain

    (2000)
  • Low-back pain. Frequency, management and prevention from an HITA perspective

    Danish Health Technol Assess

    (1999)
  • R. Stankovic et al.

    Conservative treatment of acute low-back pain. A 5-year follow-up study of two methods of treatment

    Spine

    (1995)
  • G. Nwuga et al.

    Relative therapeutic efficacy of the Williams and McKenzie protocols in back pain management

    Physiother Pract

    (1985)
  • D. Cherkin et al.

    A comparison of physical therapy, chiropractic manipulation, and provision of an educational booklet for the treatment of patients with low-back pain

    N Engl J Med

    (1998)
  • McKenzie RA. The lumbar spine: mechanical diagnosis and therapy. Waikanae, New Zealand: Spinal Publication Limited;...
  • Cited by (12)

    • A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research

      2016, Journal of Chiropractic Medicine
      Citation Excerpt :

      However, modern ICC is calculated by mean squares (ie, estimates of the population variances based on the variability among a given set of measures) obtained through analysis of variance. Nowadays, ICC has been widely used in conservative care medicine to evaluate interrater, test-retest, and intrarater reliability (see Table 2 for their definitions).10–17 These evaluations are fundamental to clinical assessment because, without them, we have no confidence in our measurements, nor can we draw any rational conclusions from our measurements.

    View all citing articles on Scopus
    View full text