Introduction

The management of patient with degenerative lumbar disorders (DLD) requires reliable measures of functional impairment. Today, patient-reported outcome measures (PROMs) are used as a gold standard for the outcome assessment in spine surgery [1,2,3]. Apart from subjective PROMs, objective measures of function find increasing attention in research and clinical practice as they help to monitor and compare treatment results over time and across populations [4].

The 6-min walking test (6WT) is increasingly applied as an objective outcome measure in patients with lumbar DLD [5]. We recently developed a smartphone app-based version of the 6WT, which demonstrated excellent reliability [6, 7]. The 6WT can be self-performed by the patient in his/her home environment, and results are standardized with respect to age and sex [6, 8, 9]. By providing the ability to monitor patients from afar, digital outcome measures are invaluable tools for physicians and patients. In a recent study, three out of four patients favoured the smartphone-based 6WT over traditional paper-based PROMs for the assessment of spine-related symptoms [10]. This is a trend that will only accelerate in a time when a global pandemic hampers avoidable physical “face-to-face” consultations as those might endanger elderly or particularly fragile patients [11, 12].

The 6WT’s primary outcome measure is the maximum distance a patient can walk within 6 min (6WD = 6-min walking distance, measured in metres) [6]. In addition, the 6WT-app provides patients with the possibility to push a “flash button” and records both the time (TTFS = time to first symptoms, measured in seconds) and distance (DTFS = distance to first symptoms, measured in metres) when first symptoms of neurogenic claudication appear. While the 6WD expresses the result of walking restrictions that occur over the whole duration of the test time, TTFS and DTFS may give more granular information about the severity or urgency of patient’s symptoms. Studies have proven the 6WD to be a reliable, valid, and responsive measure of functional impairment [6]. The added value of both TTFS and DTFS in addition to the 6WD, however, is yet unclear.

This study aims to analyse the psychometric properties of DTFS and TTFS as determined by the 6WT. We hypothesize that the pre- to postoperative change in both measures may help to differentiate between treatment successes in patients with lumbar DLD and compare the responsiveness to the traditional 6WD outcome.

Methods

All adult patients with lumbar DLD scheduled for elective spine surgery between May 2019 and March 2020 with one of the following diagnosis (1) lumbar disc herniation (LDH), (2) lumbar spinal stenosis lumbar (LSS) or (3) DLD with or without instability requiring lumbar fusion were prospectively screened for study enrolment at the XX, XX, XX. A detailed prescription of the app-based outcome measures and PROMs used in this study is provided as Online Resource 1.

Inclusion and exclusion criteria for the study cohort

Patients fulfilling all of the following inclusion criteria were considered for this study:

  • Male or female subject ≥ 18 years;

  • Written informed consent.

Patients were not enrolled if any of the following exclusion criteria were met:

  • Pregnancy;

  • Inability to walk (extreme pain or severe neurological deficits);

  • Severe chronic obstructive lung disease (COPD) corresponding to ≥ Gold III;

  • Severe heart failure corresponding to ≥ NYHA III;

  • Lung cancer and diffuse parenchymal lung disease;

  • Other medical reasons interfering with the patient’s ability to walk and perform the 6WT (e.g. osteoarthritis disease of the lower extremities, Parkinson’s disease, heart failure, hip or knee prosthesis, peripheral artery disease causing intermittent claudication, etc.);

  • Unavailability for follow-up and/or inability to complete assessment (planning to move, no smartphone, etc.).

The 6WT-app

The 6WT-App measures the maximum distance (in m) walked within six minutes (6WD) using global positioning system (GPS) coordinates, which is the primary test result [13]. Both distance walked and time elapsed are continuously displayed on the screen, while the 6WT is conducted. Patients were instructed to press a "flash" button on the app’s user interface in case of appearance and/or first-time significant aggravation in leg or back pain during the test. This marks their time (= TTFS in sec) and walking distance (= DTFS in m) to first symptoms (Fig. 1). Patients were instructed to continue walking until the six minutes have elapsed, whenever possible. Completed measurements are saved on the patient’s smartphone device with a date and time stamp and may be transferred to a secure online database.

Fig. 1
figure 1

The 6-min walking test (6WT) smartphone application starts measuring the walking distance based on GPS coordinates once the “start” button is pressed. Patients are instructed to press a “flash” button on the app’s user interface in case of appearance and/or first-time significant aggravation in leg or back pain during the test. This records their time (= TTFS in sec) and walking distance (= DTFS in m) to first symptoms. Patients are instructed to continue walking until the six minutes have elapsed, whenever possible. Completed measurements are saved on the patient’s smartphone device with a date and time stamp and may be transferred to a secure online database

Data collection and PROMs

All patients underwent detailed both subjective and objective (6WT) assessments before surgery, as well as six weeks postoperatively (W6). The subjective assessment included the following PROMs:

  1. 1.

    The Visual analogue scale (VAS) for pain

  2. 2.

    The Zurich Claudication Questionnaire (ZCQ), with its two main scores:

    1. a.

      ZCQ symptom severity (ZCQ SS)

    2. b.

      ZCQ physical function (ZCQ PF)

  3. 3.

    The Core Outcome Measures Index (COMI) Back

Ethical considerations

The study was approved by the local ethic committee (XX, EKOS–2019–01,209) and was registered (http://clinicaltrials.gov identifier: XX). All patients provided written informed consent prior to initiation of the data collection.

Statistical considerations

Data are presented as mean ± standard deviation (SD) for continuous and count (percentage) for categorical variables. 6WT results are the raw 6WD (in m) as well as the DTFS (in m) and the TTFS (in s). We present the percentage (%) of patients who experienced the appearance and/or significant aggravation in leg or back pain during their pre- as well as postoperative 6WT. In case a patient did not indicate symptoms in the app, the DTFS was defined as corresponding to the 6WD of the same run and TTFS as the maximum walking time which is 360 s = 6 min.

According to Shrout and Fleiss, the intraclass correlation coefficient (ICC) was used to determine the agreement of repeated pre- and postoperative 6WT measurements [14]. ICC was deemed good (ICC between 0.75–0.9) or excellent (ICC > 0.9) in accordance with prior research [15]. Standard error of measurement (SEM) was calculated as the SD multiplied by the square root of 1 minus the intrarater ICC. The SEM represents a “grey zone” of uncertainty between two patient scores demonstrated by Stratford & Goldsmith [16].

Paired-sample t-tests were calculated to evaluate the changes between pre- and postoperative outcomes. Pearson correlation coefficients (r) were used to define the relationship between pre- and postoperative 6WT results and subjective outcome measures (PROMs).

The internal responsiveness of the 6WT results was assessed using standardized effect size (standardized response mean (SRM) = mean score of change from baseline to follow-up, divided by the SD of the score change). In accordance with prior research, SRM values were deemed as small (> 0.20), moderate (> 0.50), or large (> 0.80).

The external responsiveness of 6WT results was evaluated using receiver operating characteristics (ROC) curves. A reference standard indicating successful versus unsuccessful treatment was created by grouping results of the ZCQ patient satisfaction subscale (range of 1–4) into a binary variable of satisfied (combined scores ≤ 2, including 1 = “completely satisfied” to 2 = “somewhat satisfied”) versus dissatisfied (combined score > 2, including 3 = “somewhat dissatisfied” to 4 = “completely dissatisfied”). External responsiveness determined the probability that the pre- to postoperative change in 6WT result correctly classified patients who were satisfied or unsatisfied with the treatment result. An area under the curve (AUC) of 0.5 indicates no discrimination (no better than chance), whereas an AUC of 1.0 indicates perfect discrimination [17].

Analyses were carried out using “R version 3.6.3” for Mac (R Core Team, 2020, RStudio: Integrated Development for R. RStudio, Inc., Boston, Massachusetts, http://www.rstudio.com/). P values < 0.05 were considered significant.

Results

Study cohort

A total of 50 consecutive patients undergoing surgery for DLD were enrolled in this study. One patient dropped out due to incomplete follow-up assessments. The final analysis therefore included 49 patients (41% female) with a mean age of 55.5 ± 15.8 years. Table 1 summarizes demographic and clinical variables of the study population.

Table 1 Patient baseline characteristics

Indication of symptoms

Table 2 displays the number of patients who experienced the appearance and/or significant aggravation in leg or back pain during their pre- as well as postoperatively 6WT. Out of 35 patients who experienced symptoms preoperative, 21 (60%) no longer indicated symptoms in the 6WT postoperatively.

Table 2 Indication of pre- and 6 weeks postoperative (W6) DTFS/TTFS during the 6WT

Test–retest reliability

For pre- and postoperative 6WD values, ICC was good (β = 0.82, 95% CI 0.75–0.87, p < 0.001), with a SEM of 58 m. ICC was similar for DTFS values (β = 0.83, 95% CI 0.77–0.88, p < 0.001, SEM = 85 m) and for TTFS values (β = 0.79, 95% CI 0.72–0.85, p < 0.001, SEM = 59 s).

Pre- and 6 weeks postoperative results

Table 3 contains the mean scores for each subjective and objective outcome measure at time points preoperative and postoperative W6. There was a significant (p < 0.001) improvement in each subjective and objective outcome measure from baseline to W6. The 6WD improved by 94 m (SD 109 m), DTFS improved by 205 m (SD 218 m) and TTFS improved by 112 s (SD 134 s).

Table 3 Pre- and 6 weeks postoperative (W6) subjective and objective outcome measures

Convergent validity

Correlation coefficients between pre- or postoperative 6WT values and PROMs are outlined in Table 4. Changes in the 6WD and DTFS inversely correlated with changes in PROMs. Changes in the TTFS showed a generally weaker inverse correlation with PROMs. 6WD shows a stronger correlation with DTFS compared to TTFS.

Table 4 Convergent validity: Pearson correlation [95% CI] of pre- and postoperative 6WD, DTFS and TTFS measurements

Responsiveness

Internal responsiveness analysis showed the highest SRM for DTFS (0.94) followed by 6WD (0.86) and TTFS (0.84). Based on the ZCQ patient satisfaction subscale, 40 (82%) patients in our cohort were identified as responders to surgery at W6. Evaluation of external responsiveness revealed that the change in DTFS differentiated better between satisfied (82%) and unsatisfied patients (18%) than the change in 6WD with an AUC of 0.75 (95% CI 0.53–0.98) vs. AUC of 0.70 (95% CI 0.52–0.90; Fig. 2). Change in TTFS did not demonstrated meaningful differential capabilities (AUC = 0.59, 95% CI 0.34–0.83).

Fig. 2
figure 2

ROC curves for 6WD change (solid lines) and DTFS changes (dotted lines) in the 6WT. The DTFS exceeded the 6WD capability to differentiate between satisfied (82%) and unsatisfied patients (18%) with an 0.75 (95% CI 0.53–0.98) vs. AUC of 0.70 (95% CI 0.52–0.90)

In a subgroup analysis examining the 14 patients who indicated symptoms both during their pre- as well as postoperative 6WT, external responsiveness revealed that the change in DTFS similarly had a greater capability to differentiate between satisfied (10 patients) and unsatisfied patients (4 patients) than the change in 6WD with an even greater AUC of 0.88 (95% CI 0.68–1.00) vs. AUC of 0.58 (95% CI 0.14–1.00; Online Resource 2: Suppl. Figure 1).

Discussion

This study analysed whether two sub-scores of the smartphone-based 6WT, namely DTFS and TTFS, provide additional information to the main outcome score (6WD). The rationale for the study was that the 6WD as main score is somewhat insensitive towards symptoms onset and symptom severity. As long as patients can “tough it out”, they might be able to reach a high 6WD despite an early onset of back and leg pain, whereas both DTFS and TTFS account better for these aspects. Several interesting findings emerged. First, the study population demonstrated significant improvements not only in the overall 6WD, but also with respect to DTFS and TTFS at W6 postoperative. Secondly, both DTFS and TTFS showed strong convergent validity with each other and the 6WD, but only weak to moderate correlation with PROMs. Lastly, DTFS exceeded the capability of the main outcome score (6WD) to differentiate between satisfied and unsatisfied patients after surgery.

Restricted walking distance due to pain and/or neurological deficits is one of the most reported and disabling symptoms related to lumbar DLD. While the 6WD is an already established and thoroughly validated outcome measure [5, 6], we here analyse two new measures that might account for different aspects of a patient’s functional impairment. How far a patient can walk until symptoms impede continuation is a question routinely posed during spinal consultation with the intention to quantify impairment as much as possible. There is a solid body of evidence demonstrating that patients have difficulties estimating this distance and hence answering this question [18, 19]. Disabling symptoms may start to aggravate considerably earlier during the physical activity and may not necessarily result in a discontinuation or even slower walking in every patient. While in the 6WD, the overall walking distance covered is used as an absolute measure of functional impairment, the DTFS and TTFS are measures that indicate the distance and time of their first symptoms which can be of significant relevance in the daily life of a patient.

As shown in this study, not every patient may experience the appearance or significant aggravation of symptoms during their 6WT. This is especially true postoperatively in case of a successful surgery. In fact, 60% of patients in our study cohort who did experience symptoms preoperatively no longer indicated symptoms postoperative. In order to be able to also analyze patients who did not (longer) experience symptoms, we opted for a pragmatic approach to define the DTFS as corresponding to the 6WD of the same run and TTFS as 360 s in case of missing symptoms. While this may generate a ceiling effect for the TTFS, this effect is similar to most PROM questionnaires [4].

Reliability and validity

The reliability of repeated DTFS and TTFS self-measurements proved to be good. Overall, the results showed a moderate inverse correlation of changes in walking distances (6WD and DTFS) to subjective measures, indicating that patients with more pain and/or disability walked shorter distances without relevant symptoms. In line with our findings, previous studies have shown a weak to moderate correlation of different objective outcome measures, like the timed up-and-go (TUG) test or the motorized treadmill test (MTT), with PROMs [20, 21]. Subjective and objective assessments in lumbar DLD patients do not seem to always align in a linear fashion, indicating that the 6WT is not a mere objectification of PROM questionnaires and should be considered a separate dimension in the outcome assessment of spine patients [6]. Interestingly, the TTFS did demonstrate a generally weaker correlation with PROMs which might be explained by the fact that the TTFS does not consider walking speed. A patient with high functional impairment may, as a result of the disability, walk slower but trigger the app button for TTFS later. The findings suggest that both 6WD and DTFS may be more relevant and directly related to a patient’s disability than the time span until symptoms exaggerate.

External responsiveness and prediction of treatment success

External responsiveness reflects on the relevance of a detected change compared to the overall change in a patient’s clinical status. Changes in the DTFS exceeded the capability of the 6WD to differentiate between satisfied and unsatisfied patients with AUCs of 0.75 vs 0.70, respectively. On the contrary, in line with a weaker correlation with PROMs, change in TTFS did not demonstrate a meaningful capability to differentiate between patient response and surgical treatment. This indicates that the distance a patient can ambulate without a noticeable aggravation in symptoms may be more relevant for treatment satisfaction than the overall distance he/she can ultimately walk within a certain time frame.

Interestingly, similar objective tests quantifying the patients walking distance such as the MTT or the self-paced walking tests (SPWT) previously failed to demonstrate external responsiveness in patients with lumbar spinal stenosis [22]. Contrary to the 6WT, both measures are designed with no time limit to test a patient’s maximum physical capability. Given this lack of responsiveness to perceived treatment success, one can speculate that the patient’s maximum physical capacity does not represent a measure most relevant to the patient.

Two additional advantages of the smartphone-based 6WT are that it does not require additional equipment, such as a treadmill, and that the patient may walk in a familiar setting without being accompanied by a health-care professional. Both factors have previously been reported to influence the walking distance [22,23,24]. In turn, the smartphone-based 6WT may lead to a more accurate measurement of the individual’s disability in a real-world scenario.

Future prospects

Our study indicates that the DTFS, in contrast to the TTFS, might aid in the discrimination of satisfied and unsatisfied DLD patients after surgery. However, the cohort in our study included individuals with different spinal pathologies. While these pathologies share common characteristics, usually a form of mobility restriction resulting from back and/or leg pain, the predominant complaint (back pain vs. radicular leg pain vs. neurogenic claudication) often differs for specific spinal pathologies. An approach like the present one where we included a range of degenerative spinal pathologies may neglect these differences. In future studies, we therefore aim to include larger patient cohorts with specific diseases, which will allow to analyse fine differences in test responsiveness and validity for spinal pathologies separately. The TTFS, for instance, may show a greater responsiveness in LSS than in LDH as symptoms of neurogenic claudication generally do not start at the beginning of an exercise, whereas patients with lumbar radicular pain typically report pain immediately after mobilization. We are confident that the growing availability and usage of smart devices even among the elderly will help us to increasingly apply digital objective outcome assessments in spine patients.

Strengths and limitations

The strengths of this study lie in its prospective design and the comprehensive patient evaluation using the 6WT, the two innovative sub-scores DTFS and TTFS, as well as several well-validated PROMs. The main limitation is the relatively small sample size which might limit the implication for other patient cohorts. However, existing studies on objective outcome measures mostly analyze small patient cohort consisting of as few as < 20 patients which we exceed by far [5]. As mentioned before, our results also need to be interpreted in light of a heterogenous patient cohort with various degenerative spinal pathologies. Secondly, we determined the ability of the 6WT’s DTFS and TTFS to change over the prespecified time frame of 6 weeks. Recovery may still continue 6 weeks after surgery, especially for some patients who underwent multi-level surgery or fusion procedures. Further long-term data will have to shed more light on the dynamics of postoperative recovery as measured by an objective test like the 6WT and its effect on the responsiveness of different 6WT metrics.

Conclusions

The DTFS demonstrated both a higher external responsiveness and a better correlation with subjective outcome measures than the TTFS. Change in DTFS can differentiate between satisfied and unsatisfied patients after spine surgery. Digital outcome measures on the 6WT metric provide spine surgeons and researchers with a mean to assess their patient’s functional disability and response to surgical treatment in DLD.