Scolaris Content Display Scolaris Content Display

Cochrane Database of Systematic Reviews Protocol - Intervention

Arthroplasty versus fusion for single‐level cervical degenerative disc disease

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

The goal of this updated systematic review is to determine whether discectomy with fusion or discectomy with arthroplasty provides the best clinical, functional, and radiological outcomes, with the least adverse effects, for patients with symptomatic single level cervical degenerative disc disease.

Background

See Appendix 1 for a glossary of terms.

Description of the condition

Cervical degenerative disc disease can cause axial pain, radicular pain, sensory loss or motor weakness due to neural compression, and headache (Grob 1998; Persson 1997; Polston 2007). Possible causes are a bulging or herniated intervertebral disc, foraminal narrowing due to loss of disc space height or osteophyte formation, or a combination of these. The majority of patients respond well to conservative treatment options, such as physical therapy, traction, use of a collar, analgesic or anti‐inflammatory medication (Kuijper 2009). With such a treatment regimen, up to 60% of patients are satisfied or very satisfied at six weeks (Kuijper 2009a); up to 83% of the patients who undergo conservative treatment report good or excellent results at a mean follow‐up of two years (Saal 1996). Surgical treatment can be considered in cases of insufficient relief of symptoms with these conservative treatment options (Gore 1984; Gore 1998; Klein 2000; White 1973). While surgery is a common option in cases of radiculopathy or myelopathy, controversy exists regarding the operative treatment of axial neck pain, headache, or both (Wieser 2007). Therefore, we will focus on surgical intervention in the case of radiculopathy, myelopathy, or a combination. Primary goals of surgical intervention are to relieve radiating arm pain in the case of radiculopathy, and to prevent progression of neurologic deficit in the case of myelopathy. A systematic review of the trials comparing conservative treatment with surgical intervention for cervical degenerative disc disease is the subject of a separate review (Nikolaidis 2010).

The mean annual incidence of cervical radiculopathy is reported to be about 83 per 100,000 people, with a peak incidence of 202 per 100,000 for people aged 50 to 54 years. The male to female ratio is about 1.7 to 1. The prevalence is about 3.5 per 100,000 people, also with a peak in the sixth decade. In contrast, with the higher incidence reported in males, a consistently higher age‐specific prevalence is reported in females. After the age of 60 years, a rapid decline in incidence is described, more dramatic in females than in males (Radhakrishnan 1994; Salemi 1996).

The reported proportion of patients who are eventually treated surgically ranges from 8% to 35% (Radhakrishnan 1994; Saal 1996; Wolff 2002). The reason for surgical treatment varies from patient to patient but generally falls within one of two groups; a group where surgery is elective, and a group where there is an absolute indication. Surgery on patients presenting with intractable pain despite maximum conservative treatment, or patients with complaints persisting longer than an arbitrary period of conservative treatment of two to eight weeks is generally considered elective. Significant or progressive (or both) neurological deficits are considered an absolute indication for surgical treatment (Saal 1996; Wolff 2002).

Description of the intervention

The most commonly performed procedure is discectomy, either with or without fusion (for the rest of the document, these will be referred to as fusion and simple discectomy, respectively). In fusion, a bony fusion between the two vertebrae is promoted by interposition of a cage, autograft, or allograft material in the disc space, plate and screw fixation, or a combination of these. In simple discectomy, all or part of an intervertebral disc is removed (Jacobs 2011). Posterior laminoforaminotomy is less common, and falls outside the scope of this review (Henderson 1983). Fusion and simple discectomy have good clinical results, with short term (up to six to eight weeks) satisfactory or good results in 90% to 100% of the patients (Dowd 1999; Nandoe 2007; Wirth 2000). In the long term, patient satisfaction slowly decreases. This decrease is the result of a substantial portion of the patients who develop recurrent symptoms in the years after surgery, usually at a level adjacent to the previously operated segment. Numbers for recurrent pain vary between studies. In one study, around 68% of the patients reported no recurrent pain at 20‐year follow‐up, while another study reported satisfactory results in 68% after a mean follow‐up of seven years (Gore 1998; Nandoe 2007). Yet another study reported satisfactory to excellent outcomes in 96% of treated patients after a mean follow‐up of 11 years; this included five patients (4%) who were operated on at an adjacent level due to recurrent symptoms (Palma 2010). For both types of surgery, a high rate of fusion, up to 80% in simple discectomy and over 95% in fusion, has been reported (Dowd 1999; Fountas 2007; Gore 1998; Grob 1998; Persson 1997).

The first part of the surgical procedure for simple discectomy, fusion, and arthroplasty is almost identical: the discectomy. An incision is made on the left or right side of the front of the neck, after which the subcutis and platysma are opened. This is followed by blunt dissection of muscles, oesophagus, trachea, and arteries to reach the anterior aspect of the vertebral column. The longus colli muscles are dissected, and a retractor is placed to ensure good visualisation. The correct level is identified, usually by means of fluoroscopy. The anterior longitudinal ligament and intervertebral disc are incised, after which the contents of the disc space are removed. If necessary, additional decompression of nerve roots and spinal cord is accomplished by removing disc material, osteophytes, or both, usually with microscopy. In simple discectomy, the wound is then closed. In fusion, some form of interposition material is then placed inside the disc space to promote fusion of the adjacent vertebrae. The procedures most commonly performed are variations on the techniques described by Smith and Robinson (Smith 1958), and by Cloward (Cloward 1958; Emery 1994). These variations are primarily based on the amount and type of bony resection, the origin, shape and material of the interposition graft, and the use of additional anterior plating.

In arthroplasty, moveable disc prostheses are inserted instead of a rigid interposition material. Since their introduction in the 1980s (Cummins 1998), several types of artificial cervical discs have been developed, with different materials and movement characteristics (Galbusera 2008; Seo 2008).

There are small but definite risks associated with surgical therapy (Anderson 2008; Daniels 2008; Fountas 2007; Pickett 2006). The most common complication of anterior cervical spine surgery is dysphagia. A recent systematic review reported postoperative incidence of dysphagia ranges from 1% to 79% (Riley 2010). The authors attributed this large range to the fact that there is no universally accepted method to determine the presence of dysphagia. Six months after surgery, the incidence had dropped to 8% to 22%, while at 12 months and 24 months, incidences of 13% to 21% and 13.6%, respectively, were reported(Lee 2007; Riley 2010). Female gender and multilevel surgery were identified as risk factors (Riley 2010). One study reported a significantly lower incidence for dysphagia in arthroplasty when compared to fusion; resolution of symptoms was significantly higher as well (McAfee 2010). Serious and potentially life‐threatening complications are relatively rare. Spinal cord injury or increased myelopathy has been reported in 0.2% to 0.4% and oesophageal perforation in 0.3% of surgeries. A prevertebral or epidural hematoma was reported in 0.2% to 5.6% of cases; a portion of which required surgical exploration. No vertebral artery lesions were reported in this type of surgery. Reported mortality was very low, at 0 to 0.1%. A lesion of the recurrent laryngeal nerve was reported in 0.05% to 8.3% of cases (Anderson 2008; Fountas 2007; Pickett 2006). The true incidence of recurrent laryngeal nerve palsy is probably much higher, since only 35% of the people diagnosed with it through laryngoscopy were found to exhibit symptoms (Jung 2005). Wound infections are rare, with superficial infections being reported in 0.1% to 1.8% of cases. Leakage of cerebrospinal fluid was reported in 0 to 1% of cases. Hardware failure (screw back‐out, migration) has been reported in 0.1% to 1% of cases. No deep infections have been reported, although one fatality was reported due to mediastinitis following a oesophageal tear. Patients generally have a good chance of recovery in the case of a complication (Fountas 2007).

How the intervention might work

Depending on the symptoms of a patient, the primary goal of surgery is to relieve radicular pain, prevent progression of neurologic deficit due to spinal cord compression, or both. Theoretically, since a discectomy is performed to achieve both goals, the direct effect on preoperative symptoms should be equal.

In addition to the surgical decompression, interposition material is added to retain or restore normal cervical lordosis, increase foraminal height, and improve the effects of decompression (Bartels 2001). Lastly, the interposition material leads to a higher fusion rate with subsequent segmental stability and alignment (Dowd 1999). There are reports in the literature that suggest that fusion of a segment would result in reabsorption or blunting of osteophytes (Bailey 1960; Martel 1962). However, this has been contradicted in another publication (Stevens 1993).

In addition to the advantages of fusion, arthroplasty has the supposed advantage of motion sparing in the operated segment (Goffin 2003). This is thought to reduce aforementioned stress on adjacent motion segments, which in turn should lead to less adjacent segment degeneration.

Why it is important to do this review

Currently, there is much debate about whether fusion is a contributing factor to clinical outcomes, and whether it causes accelerated degeneration of adjacent segments, so‐called adjacent segment degeneration. If this adjacent segment degeneration causes symptoms, it is referred to as adjacent segment disease. This may be the result of increased intradiscal pressure and increased mobility at the segments adjacent to a fused segment (Eck 2002). However, it may simply be the result of natural progression of degeneration in an aging spine. Therefore, it remains unclear whether fusion should be regarded as a goal of treatment, or should be prevented (Seo 2008). The yearly risk of adjacent segment disease is calculated to be around 2.9%. Up to 75% of these cases end up undergoing additional surgery (Gore 1984; Gore 1998; Hilibrand 1999; Wirth 2000; Van Eck 2014).

In order to reduce chances of adjacent segment disease as a presumed long‐term effect of fusion, arthroplasty has become increasingly popular. The value of arthroplasty in reducing adjacent segment disease remains a matter of discussion (Seo 2008). In one systematic review, the authors concluded that there was no evidence of a reduction of adjacent level degeneration in arthroplasty compared to fusion (Botelho 2010). The maximum duration of follow‐up in this review was 24 months. Another meta‐analysis found 5.1% of adjacent segment disease after arthroplasty in studies reporting data at a follow up of 12 to 24 months (Shriver 2016). No comparison to fusion was made in that study.

Trials with long‐term follow‐up, and a direct comparison to fusion, are required to evaluate the effectiveness of arthroplasty in preventing adjacent segment disease (Goffin 2003; Seo 2008).

It is important to realize that the possible benefits of preserving motion in the long term will only be useful if arthroplasty is at least as successful in resolving pain and disability as simple discectomy or fusion in the short term. In the original Cochrane review on this subject, we concluded that there were no clinically relevant differences at a maximum of 12 to 24 months follow‐up, although several outcomes were statistically in favour of arthroplasty (Boselie 2012; Boselie 2013).

Although the intended long‐term benefits of arthroplasty compared to fusion should lead to less recurrent disability, pain, and concomitant surgery, it is important to realize that these benefits are not proven. The maximum follow‐up in the previous review was one to two years. This cannot be seen as long‐term follow‐up, and will be referred to as mid‐term in this updated review. Studies reporting longer term results (longer than four years) are becoming available, which should enable better conclusions about the long‐term comparative effects of these treatments, as well as the occurrence of adverse events.

As was the case in the original review, a preliminary search only identified randomised controlled trials that compared fusion and arthroplasty; there were none that compared all of the aforementioned treatment arms combined, or simple discectomy compared to arthroplasty alone. A systematic review of the trials comparing simple discectomy and several methods of fusion in cervical degenerative disc disease has been the subject of a separate review (Jacobs 2011). Studies comparing simple discectomy, fusion, and arthroplasty are being planned or conducted at the moment (Arts 2010). Therefore, adding simple discectomy to the comparison should be possible in future updates of this review, in which case it would be interesting to investigate the possibility of combining data with the previously mentioned review (Jacobs 2011).

This protocol replaces the previous version of the review (Boselie 2012 ; Boselie 2013).

Objectives

The goal of this updated systematic review is to determine whether discectomy with fusion or discectomy with arthroplasty provides the best clinical, functional, and radiological outcomes, with the least adverse effects, for patients with symptomatic single level cervical degenerative disc disease.

Methods

Criteria for considering studies for this review

Types of studies

We will only include randomised controlled trials.

Types of participants

We will include studies with adult patients (18 years of age or older), male or female, with symptomatic single level cervical degenerative disc disease of C3‐C4, C4‐C5, C5‐C6 or C6‐C7. We define symptomatic as the presence of radicular pain, myelopathy, or both, corresponding to the afflicted level. Duration of symptoms must be at least six weeks (with the exception of progressive myelopathy, which requires earlier treatment) and there must be insufficient relief of symptoms with conservative therapy. We will exclude studies with patients with metabolic bone disease (e.g. osteoporosis), more than one pathologic level, previous surgery of the cervical spine, inflammatory spinal arthritis, malignancy or radiotherapy of the cervical spine region.

Types of interventions

We will include studies in which single level anterior cervical discectomy with fusion (either by plate, cage, autograft, allograft material, or a combination) is directly compared to anterior cervical discectomy with arthroplasty (with any type of movable cervical disc prosthesis).

Types of outcome measures

Clinical, functional, and radiological outcome parameters. Minimum of 12‐month follow‐up.

We will collect all outcome measures at short‐term follow‐up (six weeks to three months), mid‐term follow‐up (12 to 24 months), and long‐term follow‐up (48 months and longer). Data for follow‐up moments between these time points (i.e. greater than 3 months to less than one year, and greater than 24 months to less than 48 months) will be omitted unless there is no available follow‐up data for that study population for a follow‐up moment for which data is extracted. In that case we will regard any follow‐up less than 12 months as short‐term, 12 to 36 months as mid‐term follow‐up, and more than 36 months as long‐term follow‐up.

Primary outcomes

We will primarily evaluate:

Benefits:

  • Arm and neck pain expressed on a visual analogue or similar scale (e.g. Visual Analogue Scale (VAS), Numerical Rating Scale (NRS));

  • Neck‐related functional status, expressed on a neck‐specific scale (e.g. Neck Disability Index (NDI)).

Harms:

  • Adverse effects (subdivided into: death, neurological worsening (increase in sensory or motor deficits, or both), extrusion or migration of implant material, infection, persisting hoarseness (longer than six weeks), persisting dysphagia (longer than six weeks), symptomatic pseudoarthrosis;

  • Revision surgery at operated segment, or secondary surgery at adjacent segments.

Secondary outcomes

  • Neurological outcomes (presence of sensory or motor deficits, or both), compared to preoperative status;

  • Patient satisfaction according to Odom's outcome criteria (four‐level assessment of success of surgery in relieving preoperative symptoms; Appendix 2);

  • Radiological signs of fusion (bridging of trabecular or cortical bones, or both, absence of detectable movement on flexion and extension lateral radiographs, absence of lucencies around the cage or disc prosthesis, if present);

  • Global health status (Short Form‐36 (SF‐36));

  • Mobility on flexion‐extension radiographs, measured as segmental range of motion in the sagittal plane (sROM), at target segment and adjacent segments;

  • Work disability (rate of health‐related absenteeism or unemployment);

  • Cost, primary (e.g. cost of surgery, hospital stay, medication), and secondary (e.g. absenteeism, follow‐up examinations, long‐term medication use related to the surgery or underlying pathology (e.g. opioids).

Search methods for identification of studies

Electronic searches

We will search the following electronic databases from inception to current with no language restrictions:

  • Cochrane Back and Neck (CBN) Group Trials Register (through CENTRAL, the Cochrane Register of Studies (CRS), or both);

  • Cochrane Central Register of Controlled Trials (CENTRAL; Cochrane Library, current issue);

  • Epub ahead of print, In‐process and other non‐indexed citations, MEDLINE Daily and MEDLINE (OvidSP, 1946 to current);

  • EMBASE (OvidSP, 1980 to current).

Instead of searching MEDLINE alone, we will search the Epub ahead of print, In‐process and other non‐indexed citations, MEDLINE Daily and MEDLINE database, which gives access to multiple MEDLINE databases in one search interface via OvidSP. As we no longer have access to Evidence‐Based Medicine (EBM), this database will not be searched. Search strategies from the previous review can be found in Appendix 3. They will be updated as necessary.

Searching other resources

We will search the System for Information on Grey Literature (SIGLE) database through OpenGrey, subheading Biological and Medical Sciences, to search for trials that might have been missed by the other sources. We will search the FDA database on medical devices for references to trials concerning disc prostheses, and Clinicaltrials.gov to identify trials in progress. We will also screen the reference list of potentially relevant papers. Lastly, we will search the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP).

Data collection and analysis

One author (TB) will conduct the literature search, retrieve the full‐text articles to be evaluated, and contact authors for additional information if necessary.

Selection of studies

Three authors (TB, PW, and HVS) will independently perform the first screening, based on title and abstract, and subsequently apply the selection criteria to the articles that appear to be relevant or for which more information is needed to reach a decision. If there is disagreement, we will initially try to resolve it through discussion; if consensus cannot be reached, we will consult a fourth author (RDB). We will only use full‐text articles for the final decision to include studies.

Data extraction and management

We will prepare and use a standardised data extraction form to extract relevant data from selected articles. This will be done independently by two authors (TB and HVS). If consensus cannot be reached between these two authors, we will ask a third author to arbitrate. One author (TB) will enter the data into RevMan 5.3 (Review Manager (RevMan)), after which another author (HVS) will check it. In case of missing information, we will contact the author(s) of the study.

Extracted data will include:

  • study design (e.g. inclusion and exclusion criteria, method of randomisation, blinding (patient, practitioner, analyst), study period, duration of follow‐up);

  • characteristics of the study population (e.g. number of participants, age, gender, baseline scores of the outcome measures described in an earlier paragraph);

  • characteristics of the intervention (e.g. type of implant used, technique used);

  • outcomes in each group (for each outcome: number of subjects, mean, standard deviation, 95% confidence intervals);

  • duration of follow‐up (and follow‐up moments);

  • loss to follow‐up for each group (at different follow‐up moments);

  • sponsorship of the trial (e.g. industry sponsored).

Assessment of risk of bias in included studies

We will use the 13 criteria recommended in the 2015 updated Cochrane Back and Neck Group guidelines (Furlan 2015; Table 1; Table 2), which are based on the updated Cochrane Handbook (Higgins 2011) to assess risk of bias. Two authors (TB and HVS) will independently apply them to the selected studies and score the criteria as 'low risk of bias', 'high risk of bias', or 'unclear' in a 'Risk of bias' table. In case of disagreement, the two authors will discuss it until consensus is reached. If consensus cannot be reached, they will consult a third author (PW).

Open in table viewer
Table 1. Sources of Risk of Bias

Bias Domain

Source of Bias

PossibleAnswers

Selection

(1) Was the method of randomisation adequate?

Yes/No/Unsure

Selection

(2) Was the treatment allocation concealed?

Yes/No/Unsure

Performance

(3) Was the patient blinded to the intervention?

Yes/No/Unsure

Performance

(4) Was the care provider blinded to the intervention?

Yes/No/Unsure

Detection

(5) Was the outcome assessor blinded to the intervention?

Yes/No/Unsure

Attrition

(6) Was the drop‐out rate described and acceptable?

Yes/No/Unsure

Attrition

(7) Were all randomised participants analysed in the group to which they were allocated?

Yes/No/Unsure

Reporting

(8) Are reports of the study free of suggestion of selective outcome reporting?

Yes/No/Unsure

Selection

(9) Were the groups similar at baseline regarding the most important prognostic indicators?

Yes/No/Unsure

Performance

(10) Were co‐interventions avoided or similar?

Yes/No/Unsure

Performance

(11) Was the compliance acceptable in all groups?

Yes/No/Unsure

Detection

(12) Was the timing of the outcome assessment similar in all groups?

Yes/No/Unsure

Other

(13) Are other sources of potential bias unlikely?

Yes/No/Unsure

Open in table viewer
Table 2. Criteria for a Judgment of ‘‘Yes’’ for the Sources of Risk of Bias

1

A random (unpredictable) assignment sequence. Examples of adequate methods are coin toss (for studies with 2 groups), rolling a dice (for studies with 2 or more groups), drawing of balls of different colours, drawing of ballots with the study group labels from a dark bag, computer‐generated random sequence, pre‐ordered sealed envelopes, sequentially‐ordered vials, telephone call to a central office, and pre‐ordered list of treatment assignments.Examples of inadequate methods are: alternation, birth date, social insurance/security number, date on which they are invited to participate in the study, and hospital registration number.

2

Assignment generated by an independent person not responsible for determining the eligibility of the patients. This person has no information about the persons included in the trial and has no influence on the assignment sequence or on the decision about eligibility of the patient.

3

Index and control groups are indistinguishable for the patients, or if the success of blinding was tested among the patients and it was successful.

4

Index and control groups are indistinguishable for the care providers, or if the success of blinding was tested among the care providers and it was successful.

5

Adequacy of blinding should be assessed for each primary outcome separately. This item should be scored ‘yes’ if the success of blinding was tested among the outcome assessors and it was successful or:

  • for patient‐reported outcomes in which the patient is the outcome assessor (e.g. pain, disability): the blinding procedure is adequate for outcome assessors if participant blinding is scored ‘yes’

  • for outcome criteria assessed during scheduled visit and that supposes a contact between participants and outcome assessors (e.g. clinical examination): the blinding procedure is adequate if patients are blinded, and the treatment or adverse effects of the treatment cannot be noticed during clinical examination

  • for outcome criteria that do not suppose a contact with participants (e.g. radiography, magnetic resonance imaging): the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed when assessing the main outcome

  • for outcome criteria that are clinical or therapeutic events that will be determined by the interaction between patients and care providers (e.g. co‐interventions, hospitalisation length, treatment failure), in which the care provider is the outcome assessor: the blinding procedure is adequate for outcome assessors if item ‘4’ (caregivers) is scored ‘yes’

  • for outcome criteria that are assessed from data of the medical forms: the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed on the extracted data

6

The number of participants who were included in the study but did not complete the observation period or were not included in the analysis must be described and reasons given. If the percentage of withdrawals and drop‐outs does not exceed 20% for short‐term follow‐up and 30% for long‐term follow‐up, and does not lead to substantial bias a ‘yes’ is scored. (N.B. these percentages are arbitrary, not supported by literature).

7

All randomised patients are reported and analysed in the group to which they were allocated by randomisation for the most important moments of effect measurement (minus missing values) irrespective of noncompliance and co‐interventions.

8

All the results from all prespecified outcomes have been adequately reported in the published report of the trial. This information is either obtained by comparing the protocol and the report, or in the absence of the protocol, assessing that the published report includes enough information to make this judgment.

9

Groups have to be similar at baseline regarding demographic factors, duration and severity of complaints, percentage of patients with neurological symptoms, and value of main outcome measure(s).

10

If there were no co‐interventions, or they were similar between the index and control groups.

11

The reviewer determines if the compliance with the interventions is acceptable, based on the reported intensity, duration, number and frequency of sessions for both the index intervention and control intervention(s). For example, physiotherapy treatment is usually administered for several sessions; therefore it is necessary to assess how many sessions each patient attended. For single‐session interventions (e.g. surgery), this item is irrelevant.

12

Timing of outcome assessment should be identical for all intervention groups and for all primary outcome measures.

13

Other types of biases. For example:

  • When the outcome measures were not valid. There should be evidence from a previous or present scientific study that the primary outcome can be considered valid in the context of the present.

  • Industry‐sponsored trials. The conflict of interest (COI) statement should explicitly state that the researchers have had full possession of the trial process from planning to reporting without funders with potential COI having any possibility to interfere in the process. If, for example, the statistical analyses have been done by a funder with a potential COI, usually ‘unsure’ is scored.

We will define a study as having a 'low' overall risk of bias when seven or more of the 13 criteria have been met and there are no serious methodological flaws.

We will explore the impact of including studies with a high risk of bias by conducting a sensitivity analysis.

Measures of treatment effect

Dichotomous data

For dichotomous data, (e.g. fused segment or not (a segment will be regarded as fused if segmental range of motion is less than 2°, or if there is cortical bridging of the segment), neurological worsening compared to preoperative, mortality), we will use odds ratio (OR) with 95% confidence intervals (CI).

Continuous data

For continuous data (e.g. VAS, NDI, sROM), we will use the difference in means (MD) if outcomes are measured in the same way between trials. We will use the standardized mean difference (SMD) to combine trials that measure the same outcome, but use different methods.

We will calculate a 95% confidence interval for each outcome.

Unit of analysis issues

Due to the nature of the compared interventions, we do not expect study designs other than randomised parallel group trials with repeated observations in the same participants. We will attribute different moments of follow‐up to different time frames; either short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), or long‐term (48 months and longer). If there is more than one follow‐up moment within the short‐term and mid‐term time frames, we will extract the outcomes for the moment closest to three months and 24 months, respectively. For the long‐term follow‐up, we will extract data for the longest follow‐up that is available, with a maximum of six years to minimize heterogeneity within that group.

Dealing with missing data

If possible, we will contact the original investigators to request the missing information. We will explore the impact of including studies with high levels of missing data (15% or greater of the participants lost to follow‐up) in the overall assessment of treatment effect by using a sensitivity analysis.

Assessment of heterogeneity

We will assess the presence of heterogeneity between the trials primarily on a visual inspection of the overlap of confidence intervals in the forest plots, followed by an I² analysis to describe the proportion of total variance that can be explained by heterogeneity. If I² is greater than 50%, we will interpret this as an indication that there may be substantial heterogeneity, and will not combine the results in a meta‐analysis.

Assessment of reporting biases

We will use funnel plots to identify possible publication bias for all outcomes for which we have retrieved data for at least 10 study populations. We will place no language restrictions on the search terms to reduce language, and possibly location, bias. We will scan the publications for the possibility of duplicate reports on the same group of subjects.

Data synthesis

Regardless of whether there are sufficient data available to use quantitative analyses to summarize the data, we will assess the overall quality of the evidence for each outcome. To accomplish this, we will use an adapted GRADE approach, as recommended by the updated Cochrane Back and Neck Group method guidelines (Furlan 2015, Appendix 4). The quality of the evidence on a specific outcome is based on the performance against several factors. These factors are risk of bias (study design and methodological quality), inconsistency of the results, indirectness of the results (generalisability), imprecision of the results (sufficient data), and publication bias, across all studies that measure that particular outcome. The quality starts at high when RCTs with a low risk of bias provide results for the outcome, and we will downgrade the quality of the evidence for a specific outcome according to the performance of the studies against these factors as follows.

Meta‐analysis will only be conducted where studies are judged to be sufficiently homogeneous with respect to the included population, treatments, and outcome measure. Random‐effects meta‐analysis will be used to combine quantitative data.

Results will be presented in a summary of findings table (outline example in Table 3). These will include a comparison for the following outcomes at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer). Data for follow‐up moments between these time points (i.e. greater than 3 months to less than one year, and greater than 24 months to less than 48 months) will be omitted unless there is no available follow‐up data for that study population for a follow‐up moment for which data is extracted. In that case we will regard any follow‐up less than 12 months as short‐term, 12 to 36 months as mid‐term follow‐up, and more than 36 months as long‐term follow‐up:

Open in table viewer
Table 3. Outline summary of findings

Arthroplasty compared with fusion for single‐level cervical degenerative disc disease

Patient or population: patients with single‐level cervical degenerative disc disease between C4 and C7, eligible for anterior cervical discectomy.

Settings: hospital

Intervention: anterior cervical discectomy with arthroplasty ('arthroplasty')

Comparison: anterior cervical discectomy with fusion ('fusion')

Outcomes

Outcome type

Outcome measure

Comments *

Arm pain

Continuous

VAS or NRS (maximum score of 10)

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neck pain

Continuous

VAS or NRS (maximum score of 10)

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neck‐related functional status

Continuous

NDI or equivalent neck specific score (maximum score of 100%)

at short‐term (six weeks to three months after surgery, mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neurological worsening

Dichotomous

number of patients with neurological worsening, 'yes' is defined as any increase in sensory or motor deficits compared to preoperatively

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Revision surgery at operated level

Dichotomous

number of patients with secondary surgery at the operated level; 'yes' is defined as any secondary surgery at the operated level, regardless of the time since initial surgery

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Secondary surgery at adjacent level

Dichotomous

number of patients with secondary surgery at the adjacent level; 'yes' is defined as any secondary surgery at adjacent levels, regardless of the time since initial surgery

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

* Data for follow‐up moments between these time points (i.e. greater than 3 months to less than one year, and greater than 24 months to less than 48 months) will be omitted unless there is no available follow‐up data for that study population for a follow‐up moment for which data is extracted. In that case we will regard any follow‐up less than 12 months as short‐term, 12 to 36 months as mid‐term follow‐up, and more than 36 months as long‐term follow‐up.

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

  • Arm pain (continuous outcome, VAS or NRS scores, out of a maximum score of 10);

  • Neck pain (continuous outcome, VAS or NRS scores, out of a maximum score of 10);

  • Neck‐related functional status, (continuous outcome, Neck Disability Index (NDI) or equivalent, out of a maximum score of 100%);

  • Neurological worsening (dichotomous outcome, 'yes' is defined as any increase in sensory or motor deficits compared to preoperatively);

  • Revision surgery at the operated level (dichotomous outcome, 'yes' is defined as any secondary surgery at the operated level, regardless of the time since initial surgery;

  • Secondary surgery at adjacent segments (dichotomous outcome, 'yes' is defined as any secondary surgery at adjacent levels, regardless of the time since initial surgery.

Subgroup analysis and investigation of heterogeneity

A negative influence of smoking on fusion rates and patient reported outcomes in spinal fusion surgery has been reported for both lumbar and cervical spine (Andersen 2001; Arnold 2010; Glassman 2000). This negative influence has not been reported in lumbar disc arthroplasty (Bertagnoli 2006). Therefore, we will do a subgroup analysis to identify if smoking is a possible confounding variable.

In theory, we can expect different results from a patient suffering solely from myelopathy and a patient suffering solely from radiculopathy (e.g. VAS score for arm pain will usually decrease postoperatively in the radiculopathy group, in contrast with a VAS score for arm pain that will already be low in most patients in the myelopathy group). It has also been postulated that arthroplasty may maintain microtrauma to the spinal cord, although a study comparing arthroplasty and fusion in myelopathy due to degenerative disc disease showed similar results for both treatment modalities (Riew 2008). This could potentially influence results, especially when there are large differences in the percentage of patients with myelopathy between study groups. Therefore, we plan to do a subgroup analysis to identify if the presence of myelopathy is a confounding variable.

Sensitivity analysis

We will perform a sensitivity analysis to assess the influence of the number of risk of bias criteria that are met, more specifically one that evaluates the results when studies with a high risk of bias are excluded, as well as one that looks at the results when studies with more than 15% loss to follow‐up are excluded. We will perform further sensitivity analyses if deemed appropriate.

Table 1. Sources of Risk of Bias

Bias Domain

Source of Bias

PossibleAnswers

Selection

(1) Was the method of randomisation adequate?

Yes/No/Unsure

Selection

(2) Was the treatment allocation concealed?

Yes/No/Unsure

Performance

(3) Was the patient blinded to the intervention?

Yes/No/Unsure

Performance

(4) Was the care provider blinded to the intervention?

Yes/No/Unsure

Detection

(5) Was the outcome assessor blinded to the intervention?

Yes/No/Unsure

Attrition

(6) Was the drop‐out rate described and acceptable?

Yes/No/Unsure

Attrition

(7) Were all randomised participants analysed in the group to which they were allocated?

Yes/No/Unsure

Reporting

(8) Are reports of the study free of suggestion of selective outcome reporting?

Yes/No/Unsure

Selection

(9) Were the groups similar at baseline regarding the most important prognostic indicators?

Yes/No/Unsure

Performance

(10) Were co‐interventions avoided or similar?

Yes/No/Unsure

Performance

(11) Was the compliance acceptable in all groups?

Yes/No/Unsure

Detection

(12) Was the timing of the outcome assessment similar in all groups?

Yes/No/Unsure

Other

(13) Are other sources of potential bias unlikely?

Yes/No/Unsure

Figures and Tables -
Table 1. Sources of Risk of Bias
Table 2. Criteria for a Judgment of ‘‘Yes’’ for the Sources of Risk of Bias

1

A random (unpredictable) assignment sequence. Examples of adequate methods are coin toss (for studies with 2 groups), rolling a dice (for studies with 2 or more groups), drawing of balls of different colours, drawing of ballots with the study group labels from a dark bag, computer‐generated random sequence, pre‐ordered sealed envelopes, sequentially‐ordered vials, telephone call to a central office, and pre‐ordered list of treatment assignments.Examples of inadequate methods are: alternation, birth date, social insurance/security number, date on which they are invited to participate in the study, and hospital registration number.

2

Assignment generated by an independent person not responsible for determining the eligibility of the patients. This person has no information about the persons included in the trial and has no influence on the assignment sequence or on the decision about eligibility of the patient.

3

Index and control groups are indistinguishable for the patients, or if the success of blinding was tested among the patients and it was successful.

4

Index and control groups are indistinguishable for the care providers, or if the success of blinding was tested among the care providers and it was successful.

5

Adequacy of blinding should be assessed for each primary outcome separately. This item should be scored ‘yes’ if the success of blinding was tested among the outcome assessors and it was successful or:

  • for patient‐reported outcomes in which the patient is the outcome assessor (e.g. pain, disability): the blinding procedure is adequate for outcome assessors if participant blinding is scored ‘yes’

  • for outcome criteria assessed during scheduled visit and that supposes a contact between participants and outcome assessors (e.g. clinical examination): the blinding procedure is adequate if patients are blinded, and the treatment or adverse effects of the treatment cannot be noticed during clinical examination

  • for outcome criteria that do not suppose a contact with participants (e.g. radiography, magnetic resonance imaging): the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed when assessing the main outcome

  • for outcome criteria that are clinical or therapeutic events that will be determined by the interaction between patients and care providers (e.g. co‐interventions, hospitalisation length, treatment failure), in which the care provider is the outcome assessor: the blinding procedure is adequate for outcome assessors if item ‘4’ (caregivers) is scored ‘yes’

  • for outcome criteria that are assessed from data of the medical forms: the blinding procedure is adequate if the treatment or adverse effects of the treatment cannot be noticed on the extracted data

6

The number of participants who were included in the study but did not complete the observation period or were not included in the analysis must be described and reasons given. If the percentage of withdrawals and drop‐outs does not exceed 20% for short‐term follow‐up and 30% for long‐term follow‐up, and does not lead to substantial bias a ‘yes’ is scored. (N.B. these percentages are arbitrary, not supported by literature).

7

All randomised patients are reported and analysed in the group to which they were allocated by randomisation for the most important moments of effect measurement (minus missing values) irrespective of noncompliance and co‐interventions.

8

All the results from all prespecified outcomes have been adequately reported in the published report of the trial. This information is either obtained by comparing the protocol and the report, or in the absence of the protocol, assessing that the published report includes enough information to make this judgment.

9

Groups have to be similar at baseline regarding demographic factors, duration and severity of complaints, percentage of patients with neurological symptoms, and value of main outcome measure(s).

10

If there were no co‐interventions, or they were similar between the index and control groups.

11

The reviewer determines if the compliance with the interventions is acceptable, based on the reported intensity, duration, number and frequency of sessions for both the index intervention and control intervention(s). For example, physiotherapy treatment is usually administered for several sessions; therefore it is necessary to assess how many sessions each patient attended. For single‐session interventions (e.g. surgery), this item is irrelevant.

12

Timing of outcome assessment should be identical for all intervention groups and for all primary outcome measures.

13

Other types of biases. For example:

  • When the outcome measures were not valid. There should be evidence from a previous or present scientific study that the primary outcome can be considered valid in the context of the present.

  • Industry‐sponsored trials. The conflict of interest (COI) statement should explicitly state that the researchers have had full possession of the trial process from planning to reporting without funders with potential COI having any possibility to interfere in the process. If, for example, the statistical analyses have been done by a funder with a potential COI, usually ‘unsure’ is scored.

Figures and Tables -
Table 2. Criteria for a Judgment of ‘‘Yes’’ for the Sources of Risk of Bias
Table 3. Outline summary of findings

Arthroplasty compared with fusion for single‐level cervical degenerative disc disease

Patient or population: patients with single‐level cervical degenerative disc disease between C4 and C7, eligible for anterior cervical discectomy.

Settings: hospital

Intervention: anterior cervical discectomy with arthroplasty ('arthroplasty')

Comparison: anterior cervical discectomy with fusion ('fusion')

Outcomes

Outcome type

Outcome measure

Comments *

Arm pain

Continuous

VAS or NRS (maximum score of 10)

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neck pain

Continuous

VAS or NRS (maximum score of 10)

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neck‐related functional status

Continuous

NDI or equivalent neck specific score (maximum score of 100%)

at short‐term (six weeks to three months after surgery, mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Neurological worsening

Dichotomous

number of patients with neurological worsening, 'yes' is defined as any increase in sensory or motor deficits compared to preoperatively

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Revision surgery at operated level

Dichotomous

number of patients with secondary surgery at the operated level; 'yes' is defined as any secondary surgery at the operated level, regardless of the time since initial surgery

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

Secondary surgery at adjacent level

Dichotomous

number of patients with secondary surgery at the adjacent level; 'yes' is defined as any secondary surgery at adjacent levels, regardless of the time since initial surgery

at short‐term (six weeks to three months after surgery), mid‐term (12 to 24 months after surgery), and long‐term (48 months and longer)

* Data for follow‐up moments between these time points (i.e. greater than 3 months to less than one year, and greater than 24 months to less than 48 months) will be omitted unless there is no available follow‐up data for that study population for a follow‐up moment for which data is extracted. In that case we will regard any follow‐up less than 12 months as short‐term, 12 to 36 months as mid‐term follow‐up, and more than 36 months as long‐term follow‐up.

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

Figures and Tables -
Table 3. Outline summary of findings