Introduction

Differentiating Parkinson’s disease (PD) from the various forms of atypical parkinsonism (AP), such as multiple system atrophy (MSA), progressive supranuclear palsy (PSP), and corticobasal syndrome (CBS) can be challenging, especially in early disease stages. Clinical diagnostic criteria are suboptimal or only partially validated [1]. Clinical-pathological studies show that the rates of misdiagnosis during life can be as high as 24%, especially in early disease stages [26]. However, a correct and timely diagnosis is important for both patients (e.g., counseling) and clinicians (e.g., being alert for development of specific disease complications, such as nocturnal stridor in MSA). It is, therefore, common practice to call for ancillary investigations to improve the differentiation between PD and AP.

Brain magnetic resonance imaging (MRI) is the most widely used ancillary test, and can be used to search for presence of, e.g., cerebrovascular disease or normal pressure hydrocephalus [7].

Routine brain MRI studies, including T1, T2, T2 FLAIR and proton density sequences, are typically normal in PD [8, 9]. In contrast, many signs have been described for the various APs, but these changes are usually seen in advanced disease stages [10]. Well-known brain MRI abnormalities include: putaminal atrophy and signal changes in MSA-P; atrophy of the pons and cerebellum and the hot cross bun sign in MSA-C; atrophy of the midbrain in PSP; and asymmetric cortical atrophy in CBS [8]. However, the added diagnostic value of these brain MRI abnormalities over and above the clinical diagnosis remains unknown.

Our objective here was to evaluate the diagnostic value of routine brain MRI relative to the clinically based differentiation between PD and the various forms of AP. A specific new element was our evaluation of whether brain MRI improved the diagnostic accuracy, taking into account the level of certainty about the clinically based diagnosis. For this purpose, we performed a prospective 3-year follow-up study in a large cohort of patients with an uncertain diagnosis, and used the ‘silver standard’ diagnosis at follow-up (i.e., based on rate of disease progression, new neurological signs and response to treatment) for subsequent comparisons with the baseline MRI results.

Patients and methods

Study group

We performed a prospective observational study in 113 patients with various forms of parkinsonism, but without clinically definite diagnosis upon inclusion. Inclusion criteria were clinical signs and symptoms of parkinsonism. Exclusion criteria were age under 18 years, prior brain surgery, and unstable co-morbidity. Patients with dystonic tremor and a normal DAT scan were excluded from the study, using careful clinical assessment [11]. Consecutive patients were recruited from the outpatient department of our movement disorder center between 2003 and 2006. The study was approved by the medical ethics committee of our center and all participants gave written informed consent.

Study design

Patients were clinically assessed (history taking and neurological examination) at baseline and after 3 years of follow-up. All examinations were performed by one neurologist specialized in movement disorders (WFA). The assessments at baseline included the Unified Parkinson’s Disease Rating Scale (UPDRS-III; assessing severity of motor symptoms) [12], Mini-Mental State Examination (MMSE; global cognitive status) [13], Hoehn and Yahr staging scale (H&Y; disease severity) [14] and the clinical effect of levodopa administration. At baseline all patients had a brain MRI scan, IBZM-SPECT, anal sphincter EMG, and comprehensive CSF analysis.

After completion of the study, the diagnosis at baseline and the silver standard diagnosis at 3-year follow-up were made during a consensus meeting with two experienced movement disorders experts (BRB and RAJE). For the baseline clinical diagnosis only data from the initial history taking and neurological examination were used. All diagnoses were made according to international diagnostic criteria [1520].

Our primary interest was in separating PD from the group of AP, and therefore all various forms of AP were grouped together. The level of diagnostic certainty after the baseline clinical neurological examination was scored using a visual analogue scale, ranging from 0 (completely uncertain) to 100 (completely certain). The clinical diagnosis was classified as having either ‘low certainty’ (when the clinically based rate of certainty was lower than 80%) or ‘high certainty’ (when the clinically based rate of certainty was 80% or higher).

The silver standard diagnosis was made using the data obtained after 3 years of follow-up data, and included rate of disease progression, new neurological signs during repeated neurological examination and response to treatment. Again the level of diagnostic certainty was scored.

There was no inter-rater disagreement regarding the nature of the diagnoses at baseline or after follow-up, but there were occasionally differences regarding the level of certainty about the diagnosis. In case of such a discrepancy, a consensus diagnosis was made.

We hypothesized that MRI at baseline would have additional diagnostic value for increasing the degree of certainty of the clinical diagnosis at baseline, using the follow-up diagnosis at 3 years as silver standard.

Brain MRI

All patients had a brain MRI at first presentation, performed on a 1 Tesla (66 patients) or 1.5 Tesla MRI scanner (44 patients). The scanning protocols were not standardized, reflecting daily clinical practice, and included: axial T1 spin echo, T2 turbo spin echo, T2 FLAIR, and proton density sequences. Half of the scanning protocols also included a sagittal T1 or T2 image.

The brain MRI studies were evaluated in a standardized way by two neuroradiologists (FJAM and BG) blinded to the clinical symptoms and diagnosis. The signs and abnormalities were selected based on a literature search [810]. Criteria to select these abnormalities were that they should be validated for the evaluation of parkinsonism, able to be seen on routine brain MRI and to be easily scored. The following MRI changes were scored: putaminal T2 hypo-intensity, putaminal rim sign, putaminal atrophy, frontal lobe and parietal lobe atrophy, lateral, third and fourth ventricle dilatation, midbrain and pontine atrophy, hummingbird sign, atrophy of the cerebellum and cerebellar vermis, atrophy of the medulla oblongata, pontine T2 hyperintensity and hot cross bun sign, white matter changes and lacunar infarction. For standardization, the scoring system proposed by Yekhlef [10] was used. White matter changes were scored according to the age-related white matter changes (ARWMC) criteria [21].

Statistical analysis

Inter-observer agreement was evaluated by the kappa coefficient in a sample of 60 patients. We evaluated the discriminative power of each individual parameter. As all parameters were dichotomous, we calculated their sensitivity and specificity. Next, we used multivariate logistic regression with forward selection to investigate whether particular combinations of parameters would lead to better discrimination. Such an approach results in a score consisting of a weighted sum of parameters. This score is not dichotomous; therefore, we used the area under the receiver operation curve (AUC) to evaluate its discriminative power. When scores are constructed on the basis of parameter selection methods, the AUCs tend to be overestimated, in particular when many candidate parameters are used (optimism). We used cross-validation to estimate the optimism and we present both the raw AUCs and the AUCs corrected for optimism. Subgroup analyses were performed for patients with either short (<36 months) or longer duration of symptoms, and also for patients with either ‘high certainty’ or ‘low certainty’ about the initial clinical diagnosis.

Results

Diagnoses

Thirteen patients were excluded because a diagnosis other than PD or AP was made (n = 8) or because patients were lost to follow-up (n = 5). One hundred patients were, therefore, included in the final statistical analyses.

After 3 years of follow-up, the silver standard diagnoses were: PD (n = 43), MSA (n = 27), PSP (n = 7), LBD (n = 1), CBS (n = 1) and vascular parkinsonism (n = 21) (Table 1). Mean age of patients diagnosed with an AP was higher than for patients with PD. Disease severity as measured by UPDRS-III was slightly higher in AP.

Table 1 Patient characteristics

At baseline ‘low certainty’ about the clinical diagnosis was present in 46% of patients ultimately diagnosed with PD after follow-up, and for 39% of patients ultimately diagnosed with AP (p = 0.278). This was equal for patients with short (<36 months) and longer (>36 months) duration of symptoms at presentation. After 3 years follow-up the final diagnosis differed from the baseline clinical diagnosis in 21% of patients: six patients were diagnosed PD where they were initially diagnosed AP and 15 patients were diagnosed AP where they were initially diagnosed PD.

Inter-observer agreement MRI changes

Inter-observer agreement for the various MRI changes differed. Atrophy and T2 hypo-intensity changes of the putamen and frontal and parietal lobe atrophy showed low inter-observer agreement (k < 0.3). Good inter observer agreement (k = 0.6–0.8) was seen for lateral ventricle dilatation, third and fourth ventricle dilatation, hummingbird sign, medulla oblongata atrophy and white matter changes. The hot cross bun sign showed excellent inter-observer agreement (k = 0.85).

Diagnostic value of combinations of MRI changes

The AUC of a combination of MRI changes for the whole group did not exceed 0.74 (0.71 after correction for optimism), whereas clinical evaluation alone resulted in an AUC of 0.80. The combination of clinical evaluation and MRI changes did not lead to an increase of the AUC (= 0.80). For patients with low certainty about the initial clinical diagnosis, the AUC of the clinical evaluation was 0.67 (sensitivity 59% and specificity 75%, Fig. 1). For a combination of clinical findings and MRI results, the AUC increased to 0.81 (0.77 after correction for optimism). The MRI parameters responsible for this additional discriminative power were cerebellar and putaminal atrophy. In patients with low certainty about the clinical diagnosis, sensitivity increased to 68% and specificity increased to 86% for the combination of the clinical diagnosis AP and cerebellar atrophy.

Fig. 1
figure 1

ROC analyses. a ROC of the initial clinical evaluation alone for patients with uncertain initial clinical diagnosis, resulted in an AUC of 0.67 (sensitivity 59%, specificity 75%). b ROC of the patient with uncertain initial clinical diagnosis and MRI showing putaminal and cerebellar atrophy resulted in an AUC of 0.81. Point 1 represents cerebellar atrophy (sensitivity 68%, specificity 86%), point 2 represents putaminal atrophy (sensitivity 59%, specificity 100%)

Diagnostic value of individual MRI changes

Except for atrophy of the medulla oblongata, all MRI signs and abnormalities were seen in PD as well as AP (Table 2). Atrophy of the midbrain, pons, cerebellum, medulla oblongata and T2 signal intensity changes in the pons and putamen showed high specificity for the diagnosis of AP, but limited sensitivity. Subgroup analysis in patients with duration of symptoms more than 36 months showed the same high specificity and moderate to low sensitivity for the diagnosis of AP.

Table 2 Frequency of brain MRI abnormalities and ability of brain MRI to identify atypical parkinsonism

For patients with low certainty about the initial clinical diagnosis (42 patients) putaminal atrophy, putaminal rim, hummingbird sign and lacunar infarction were seen in a minority of patients with a final diagnosis of AP, but were not seen at all in patients with a final diagnosis of PD (Table 3). This results in a high positive predictive value.

Table 3 Ability of brain MRI to diagnose atypical parkinsonism in a subgroup of patients with low certainty about the initial clinical diagnosis (<80%, n = 42)

In differentiating between the different forms of atypical parkinsonism, atrophy and signal changes of pons and putamen were relatively specific for MSA and midbrain atrophy was relatively specific for PSP.

Discussion

We studied the diagnostic value of routine brain MRI for the differentiation between PD and AP. A new element of this study was our analysis of brain MRI results relative to the clinical diagnosis at presentation, taking into account the degree of certainty about the initial clinical diagnosis, and using a carefully defined silver standard diagnosis made after 3 years of follow-up by experts in the field. Moreover, we did not perform cerebral MRI in patients with advanced and established disease (where the added value is presumably more limited), but earlier in the course of the disease when clinical certainty was lower, creating a greater need for additional diagnostic information from ancillary studies. To reach the silver standard diagnosis, we followed all patients for 3 years, allowing us to make a more certain clinical diagnosis (using repeat neurological examination, monitoring for new disease signs, information about disease progression, and treatment responsiveness). Our study confirms earlier reports that routine brain MRI can identify abnormalities which have a high specificity for diagnosing AP, but with a limited sensitivity [8, 10, 22]. These abnormalities include atrophy of the midbrain, pons, cerebellum and medulla oblongata and T2 hypo-intensity changes of the putamen and the hot cross bun sign. The new finding from the present prospective follow-up study is that the added diagnostic value of brain MRI is relatively highest for those patients where the baseline diagnostic certainty is lowest.

Our study also demonstrates that the clinically based diagnosis is good, at least in the hands of experienced movement disorders specialists. The degree of certainty about the clinical diagnosis was more important in predicting the diagnosis at follow-up than durations of symptoms alone. For the whole group brain MRI did not improve the differentiation between PD and AP. However, when the degree of certainty about the clinical diagnosis was low (<80%), brain MRI did have some added diagnostic value. In these patients, cerebellar and putaminal atrophy on routine brain MRI improved the AUC for the differentiation between PD and AP. We, therefore, conclude that routine brain MRI has limited added value to clinical neurological evaluation for the differentiation between PD and AP, except when there is uncertainty about the clinical diagnosis. A practical implication is that in clinical practice, brain MRI should be reserved for those patients with an ambiguous clinical presentation. This could lead to substantial cost reductions, because various clinical guidelines recommend a more or less standard use of cerebral MRI for all patients presenting with parkinsonism [23].

The proportions of patients with a diagnosis of either PD or AP in our study population is different from what would be expected based on published work. The high proportion of MSA patients is a reflection of the tertiary nature of our referral centre, which is a national centre of excellence for movement disorders, so relatively more cases of atypical parkinsonism would be expected compared to the general population. Since our centre is also part of the European MSA consortium, we attract relatively many patients with MSA. So the proportions of PD and AP seen in our centre do not represent an accurate epidemiological estimate, but this is not problematic for the purpose of our present study, which is to separate AP from PD. For this purpose, we needed a sufficiently large group of patients with AP.

There are some limitations to our study. First, patients were scanned on a 1 or 1.5 Tesla MRI, and we cannot exclude that standard use of 1.5 or 3 Tesla MRI studies might have better diagnostic accuracy [22, 30]. However, use of 1 or 1.5 Tesla MRI scans represents daily clinical neurological practice in most hospitals. Moreover, there was no significant difference for the calculated sensitivity or specificity for the patients scanned on a 1 and 1.5 Tesla MRI scanner. Second, inter-observer agreement differed for the various MRI changes. Low inter-observer agreement was seen for T2 hypointensity changes and atrophy of the putamen, probably because of low spatial resolution of the 1 Tesla MRI studies, and because of the relative subjectivity in scoring these abnormalities. Third, we did not have post-mortem brain examination to reach a final gold standard diagnosis. However, we can reasonably argue that our final diagnosis approached the optimal diagnosis one can reach during life. Specifically, the final diagnosis was made during a consensus meeting between two experienced movement disorder specialists, and was based upon an extensive neurological examination (performed by a single neurologist in all patients) after a clinical follow-up of 3 years. This also provided information about the rate of progression and the effectiveness of dopaminergic medication. Although high rates of misdiagnosis have been reported for the clinical diagnosis, recent pathological studies show high accuracy levels (>90%) for the clinical diagnosis when the diagnosis was made by movement disorder specialist after a minimal follow-up of 2 years [3].

Diagnostic accuracy can be improved by modifying conventional sequences or applying advanced MRI techniques. Sensitivity of MRI changes may increase by using T2*-weighted gradient echo sequences, susceptibility weighted imaging (SWI) [24, 25] or by using inversion recovery sequences [26]. Furthermore the use of a 3 or 7 Tesla MRI scanner probably is of more diagnostic value. Using 3 Tesla scans, a putaminal rim is a normal finding and not indicative of AP [27]. The diagnostic value of the putaminal rim sign as presented above should therefore be interpreted with caution, taking into account the field strength of the MRI scanner. Other work suggested that particularly diffusion weighted imaging (DWI) improves the diagnostic accuracy to differentiate between PD and AP [2832]. The value of other advanced MRI techniques are diffusion tensor imaging (DTI), magnetization transfer imaging (MTI), magnetic resonance spectroscopy (MRS) and functional MRI (BOLD) needs to be established.

Most of these advanced MRI techniques have thus far been studied in patients with advanced disease where the diagnosis is already clear using clinical examination alone. The challenge now is to apply these novel techniques to large cohorts of patients in early disease stages where clinicians are uncertain about the diagnosis, and to correlate the baseline findings to the silver (or even gold) standard diagnosis at follow-up, as we did in the present study.