Introduction

Chronic neck pain is a musculoskeletal disease with major clinical repercussions in the adult population worldwide [1], with high levels of disability, which is one of the most important clinical measures for epidemiological studies and for the monitoring of the disease’s course and the results of the used treatments for these patients [2, 3]. In this sense, instruments that assess patient-reported outcomes are highly employed to measure the disability. Among these instruments, the Neck Disability Index (NDI) is the most used in clinical contexts and with the highest number of publications regarding patients with neck pain [4]. It was originally proposed in 1991 with 10 items and 6 response options for each item, based on the Oswestry Disability Index [5]. NDI has already been adapted for several languages, including Brazilian Portuguese [6].

However, the internal structure of the 10-item NDI has been questioned in several previous studies [7,8,9,10]. In this context, Walton and MacDermid [8] proposed a 5-item NDI version (SF-NDI): personal care, concentration, work, driving, and recreation. In the Brazilian Portuguese version, Barreto et al. [7] found the internal structure of the 5-item SF-NDI appropriate, with a total score ranging between 0 and 25 points.

However, despite this important initiative conducted by Barreto et al. [7], the other measurement properties of the Brazilian Portuguese version of the SF-NDI have not yet been investigated, so the clinical and research use of this short version does not have full scientific support. Therefore, assuming that the SF-NDI has adequate measurement properties, the aim of this study was to assess test–retest reliability, internal consistency, construct validity, and the presence of ceiling and floor effects in the Brazilian version of the SF-NDI in patients with chronic neck pain.

Methods

Type of study and ethical aspects

This is a cross-sectional, questionnaire validity study based on COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) [11]. The study was performed both face-to-face and online. Face-to-face data collection occurred in São Luís (MA) and in São Carlos (SP), cities that belong to different states in Brazil. The online data collection occurred with people from all over the country. Participants were only included in the study after they read and signed (or downloaded, read and ticked the “I agree to take part” box on the online data collection) the informed consent term. All research procedures were approved by the institution's human research ethics committee (protocol number 3.182.525).

Participants

The study was disclosed through social media and the institution’s website, and people who were interested to take part contacted the responsible researchers, who recorded their contact and, in the appropriate moment, assessed them for eligibility criteria. Recruitment of participants took place from August 2019 to May 2021.

For this study, the minimum sample size of 100 participants was considered [12]. The inclusion criteria were: ages between 18 and 60 years; both sexes; pain in the neck for more than ninety days, neck pain rated ≥ 3 in the Numerical Pain Rating Scale (NPRS) [13, 14]. As all instruments were self-reported outcome measures, participants should also read and write in Portuguese and should not present diagnosed cognitive alterations. For the online data collection, people should have access to the internet and to the Google Forms Platform (Mountain View, CA, the USA).

The non-inclusion criteria were: the presence of chronic specific neck pain; history of spinal surgery and/or vertebral fractures; the presence of radiculopathy and/or herniated disk with neurological repercussions confirmed by diagnosis; history of physical therapy treatment for neck pain in the last ninety days or medication in the last seven days; medical diagnosis of cancer, self-report of severe neurological or psychiatric illness.

Data collection

Due to the COVID-19 pandemics and the need of social distancing, data collection occurred both face-to-face (from August 2019 to February 2020) and online, with the Google Forms platform (from April 2020 to May 2021). In both situations, researchers did not influence on the answers given by the participants.

This study comprised two independent samples. Participants in sample 1 were asked to answer only once the subjective assessment to collect clinical and demographic characteristics, along with the NPRS, the NDI, the Short-Form Neck Disability Index (SF-NDI) [7], the Tampa Scale of Kinesiophobia (TKS) [15], the Pain Catastrophizing Scale (PCS) [16], and the 36-Item Short-Form Health Survey questionnaire (SF-36) [17]. Participants from sample 2 had to answer once the subjective assessment, along with the SF-NDI and the NPRS. In a second data collection, seven days later, participants from sample 2 answered again the SF-NDI, the NPRS, and the Global Rating of Change Scale (GRCS).

Assessment tools

NPRS is a scale used to quantify pain intensity by means of a sequence of eleven numbers: 0 represents “no pain” and 10 indicates “worst pain imaginable.” Pain intensity was assessed at rest and after active neck movements (as instructed in the online form). This scale is validated for Portuguese [14].

SF-NDI is the 5-item version of the NDI, with valid internal structure for the Brazilian population [7], able to measure disability in individuals with neck pain (Supplement 1). It consists of 5 items with 6 possible answers, ranging from 0 to 5. The total score varies between 0 and 25 points, the higher the value, the greater the disability [6, 18].

TSK is a validated scale for the Brazilian population able to assess fear of movement [15]. It is a self-administered instrument composed of 17 items. For each item, there are four options with their respective values in ascending order: totally disagree (equivalent to 1 point), partially disagree (2 points), partially agree (3 points), and totally agree (4 points). It is necessary to invert the scores of items 4, 8, 12, and 16 to calculate the final score, which ranges from 17 to 68. The higher the score, the greater the kinesiophobia.

PCS was used for catastrophizing assessment in pain patients. It consists of 13 items divided into three domains. Total score is computed by domain: helplessness (score: 0 to 24), magnification (score: 0 to 12), and rumination (score: 0 to 16), with higher scores meaningful of higher catastrophizing levels, depending on the version adapted for the population Brazilian [16].

SF-36 is a validated instrument for the Brazilian population [17], consisting of 36 items able to assess eight dimensions related to quality of life: functional capacity, physical limitation, pain, general health status, vitality, social aspects, emotional aspects, and mental health. The score for each domain ranges from 0 to 100. The higher the score, the higher the quality of life.

The 11-point Global Rating of Change Scale (GRCS) was used to check for the sample 2 stability [19] during the days between evaluations. It was validated to Brazilian Portuguese [20], and it is scored from − 5 (much worse) to 5 (much better), with zero meaning no changes at all. The question in the form was “Comparing with the very first pain episode, how would you describe your neck pain in the present days?”

Statistical analysis

In the descriptive analysis, quantitative variables are presented as the mean and standard deviation (SD) and qualitative variables as the absolute number and percent. We used the SPSS version 17.0 (SPSS Inc., Chicago, IL, the USA) for all analysis, and a 5% significance level was adopted. Data from participants of sample 1 were used to assess construct validity and ceiling and floor effects, while data from participants of sample 2 were used to assess test–retest reliability and internal consistency.

Data normality was checked by the Kolmogorov–Smirnov test. The Spearman correlation coefficient was used to assess construct validity (magnitude of correlation between SF-NDI and the other questionnaires). Cronbach’s α was used to evaluate internal consistency, considering values between 0.70 and 0.95 to indicate good internal consistency [12]. Test–retest reliability for the seven days interval was assessed with intraclass correlation coefficient (ICC), standard error of measurement (SEM), and the minimal detectable change (MDC) [21]. We used the following formula to calculate the SEM: standard deviation x √(1−ICC). To calculate the MDC, we used the following formula: 1.96 × SEM x √2. The NPRS and GRCS were applied in the test and retest to ensure the clinical stability of the participants' symptoms.

Ceiling and floor effects were also assessed. By definition, these effects occur when a number of study participants (more than 15%) reach the minimum or maximum value of the questionnaire, which indicates a problem when assessing the instrument's responsiveness.

For the interpretation of the ICC value, the Fleiss study classification was used: for values below 0.40, the reliability was considered low; between 0.40 and 0.75, moderate; between 0.75 and 0.90, substantial; and greater than 0.90, excellent [22]. For the construct validity, we hypothesized a magnitude of correlation greater than 0.50 (similar construct) with the original NDI, between 0.30–0.50 with the functional capacity domain of the SF-36 (related but different constructs), and less than 0.30 with the others research instruments (unrelated constructs) [21].

Results

A total of 241 individuals were recruited and included in the study, and 34 participants were excluded (Fig. 1). Thus, the final sample consisted of 207 participants divided into sample 1 (n = 156) and sample 2 (n = 51). Of the participants in sample 1, eighty-six were collected in a face-to-face manner, and 70 were collected online. All participants in sample 2 were collected online. Sample 1 data were used to perform the construct validity calculations and to verify the presence of ceiling and floor effects, while data from Sample 2 were used to calculate the reliability and internal consistency.

Fig. 1
figure 1

Flowchart The Short-Form Neck Disability Index has adequate measurement properties in chronic neck pain patients

Table 1 shows the anthropometric and clinical characteristics of the study participants. Table 2 shows the values obtained for each scale and questionnaire used. Table 3 shows substantial reliability (ICC = 0.844) and adequate internal consistency (Cronbach's alpha = 0.778) of the SF-NDI. We observed clinical stability of the symptoms with similar mean values in the NPRS in the test and retest: 4.92 (SD = 2.19) and 4.07 (SD = 2.66), respectively. In addition, the mean score in the GRCS in the test and retest were −0.39 (2.58) and −0.35 (SD = 2.54), respectively.

Table 1 Clinical and demographic characteristics
Table 2 Pain measures and quality of life of the sample 1 (n = 156)
Table 3 Reliability of the Short-Form Neck Disability Index (SF-NDI) in the sample 2 (n = 51)

Regarding the construct validity, we observed significant values (p < 0.05) and with a correlation magnitude greater than 0.80 for the SF-NDI with the original NDI, between 0.30 and 0.50 for the correlations with TKS, and the functional capacity and pain domains of the SF-36, and less than 0.30 with the other study instruments (Table 4). We observed that 11 (7.1%) participants achieved the minimum score on the SF-NDI. No participant reached the maximum score. Ceiling and floor effects were not observed.

Table 4 Correlation between the Short-Form Neck Disability Index (SF-NDI), pain measures, and quality of life (n = 156)

Discussion

Our results show that the SF-NDI version has adequate reliability, internal consistency, and construct validity, without ceiling and floor effects, which shows that the SF-NDI a reliable tool to assess disability in patients with chronic neck pain.

Regarding reliability, the Brazilian version of the original 10-item NDI presented internal consistency with a Cronbach's alpha value of 0.74 [6], slightly below the value we found for SF-NDI (Cronbach's alpha = 0.778). Furthermore, the authors present a reliability index value of 0.48 with a retest after 7 days. Conversely, we observe substantial value for reliability with 7 days of retest (ICC = 0.844) [12, 23]. Similarly, versions of the NDI in other languages found the following ICC values: the first version of the SF-NDI (ICC = 0.91) [8], the Mexican Spanish version (ICC = 0.86) [24], the Nepali version (ICC = 0.87) [25], the Malay version (ICC = 0.79) [26], and the Norwegian version (ICC = 0.84) [27].

The construct validity of the Brazilian version of the original 10-item NDI found a magnitude of correlation ranging between 0.13 and 0.41 with the SF-36 domains [6]. Our study obtained higher values in the magnitude of correlation between the SF-NDI and the SF-36, ranging from −0.205 to −0.475. Furthermore, our study observed a magnitude of correlation ranging between 0.225 and 0.334 when considering the NPRS, TKS, and PCS instruments. In addition, the SF-NDI has a high correlation with the original NDI (rho = 0.859), that is, even with the reduction of 5 items, the short version correlates very well with the long version of the instrument.

In addition, construct validity has been assessed in several adapted versions of the NDI. The first version of the SF-NDI found correlation values superior to our results with the NRPS (r = 0.67), TSK (r = 0.54) and PCS (r = 0.64) [8]. The Taiwanese version found adequate correlation of the NDI with Visual Analogue Scale (VAS) (rho = 0.38) and PCS (rho = 0.55) [28]. With correlation magnitude values close to our results, the German version found significant correlations of the NDI with the VAS at rest (r = 0.22), VAS during movement (r= 0.39), and SF-36 domains (r = −0.0 to −0.45) [29].

Two systematic reviews investigated the measurement properties of the NDI. Yao et al. [30] analyzed the cross-cultural adaptations and observed that most versions have adequate internal consistency and reliability with an ICC greater than 0.70. In addition, the authors noted that the Arabic, Italian, and Thai versions had higher quality than the others versions. In turn, a systematic review conducted by MacDermid et al. [31] noted that most studies suggest that the NDI has acceptable reliability, although ICC ranges from 0.50 to 0.98 and construct validity with magnitude of correlation ranging between 0.30 and 0.70 of the NDI with the SF-36, PCS, and Visual Analogue Scale.

The present study has some limitations that must be considered. The samples were collected in different manners, i.e., face-to-face and online; even though a recent study demonstrates similarities in these forms of data collection [32], difficulties that patients may have presented in the online data collection were not clarified immediately, which may have impacted on the time they took to answer the questionnaire. The measurement properties assessed in this research are specific to the Brazilian version of the SF-NDI used in patients with chronic neck pain. We suggest carrying out further research considering the short version of the instrument in other languages or with patients with acute neck pain or with specific neck pain, such as the pain from whiplash syndrome. One last reflection should be considered by clinicians and researchers: The use of the SF-NDI is supported from the clinimetric point of view; however, disability is being measured based on only five daily activities.

Conclusion

The SF-NDI with 5 items has adequate measurement properties in Brazilian chronic neck pain patients.