Introduction

Ménière’s disease (MD) is an inner ear condition characterised by the clinical presentation of low- to mid-frequency hearing loss, episodic vertigo and fluctuating aural symptoms. The increased worldwide application of delayed post-gadolinium magnetic resonance imaging (MRI) for the diagnosis of MD has resulted in a shift in the diagnostic paradigm. The detection of MD-affected ears is based on the presence of various qualitative or semi-quantitative MRI descriptors and grading scales which define expansion of the endolymphatic space (ES), termed endolymphatic hydrops (EH). However, there is no universal consensus on which specific MRI features best distinguish affected ears [1].

Previous studies have generally applied small numbers of MRI descriptors within individual grading systems [2,3,4,5,6], with only a few authors comparing the diagnostic performance of different grading systems in the same patient cohort [2, 7,8,9,10]. Thus, the evaluation of differences in diagnostic performance across the range of MRI descriptors has involved pooled comparisons between heterogenous studies, and this approach is subject to bias [1]. This matter may be further explored by directly comparing the diagnostic performance of a comprehensive range of different MRI descriptors within a single large cohort. Furthermore, there remains uncertainty as to how MRI descriptors may be optimally combined in order to corroborate the diagnosis of MD and only limited combinations have been evaluated. Whilst the presence of both vestibular EH and increased perilymphatic enhancement (PLE) has demonstrated improved diagnostic performance [6, 11], further combinations of MR descriptor variables remain untested [1].

There is also the potential for new MRI descriptors to enhance the prediction of MD provided by existing imaging features [1]. Whilst it has been postulated that the vestibular aqueduct (VA) is integral to the pathophysiology of MD [12,13,14,15] and imaging abnormalities have been described in MD [16,17,18], there is limited data on its post-gadolinium MRI appearances [19,20,21,22,23]. Variation in the visibility and signal of the VA [18, 21,22,23] may act as potential biomarkers for MD ears, and their added value should be assessed.

This study aimed to explore and quantify the reliability, performance and optimal combination of MRI descriptors, in order to detect MD ears on delayed post-gadolinium MRI. In addition, it intended to evaluate the diagnostic accuracy of novel MRI features of VA and their potential contribution to the diagnosis of MD.

Methods

Patients

The study was approved by the institutional ethical committee (GSTT Electronic Record Research Interface, IRAS ID: 257283, Rec Reference: 20/EM/0112). This retrospective case-controlled cross-sectional study evaluated consecutively examined patients who underwent delayed post-gadolinium MRI for the evaluation of endolymphatic hydrops between December 2017 and March 2023. Patients presented with symptoms of inner ear hydrops such as episodic vertigo, sudden-onset or fluctuating sensorineural hearing loss (SNHL), aural fullness and tinnitus. Patient details were retrieved using the PACS system search function (Sectra AB). A priori exclusion criteria were previous inner ear operations, inadequate contemporary clinical details for diagnostic classification and technically inadequate MR imaging (Table 1; Fig. 1).

Table 1 Summary of the demographics of the study sample
Fig. 1
figure 1

Flowchart demonstrating the selection process for the 96 Ménière’s disease and 78 control ears

Clinical data and classification

Two observers (S.C., I.P.) reviewed the contemporary clinical and audiometric data by consensus, blinded to imaging findings. Clinical review was required to be within 6 months and audiometry within 12 months prior to the MRI study. Definite MD was classified according to the current 2015 Barany Society criteria (Supplementary Table 1) [24]. Definite MD was not diagnosed if there was a clinical diagnosis of secondary hydrops. Control ears were obtained from patients without any criteria for either MD or “atypical MD” [25,26,27,28,29,30] and who had (a) no Ménière’s-type vertigo, (b) normal hearing (thresholds ≤ 20 dBHL (decibels hearing loss) at 0.5, 1, 2 and 4 kHz) or (c) isolated high-frequency sensorineural hearing loss ≥20 dBHL at ≥ 6 kHz (Supplementary Table 1). Intratympanic therapy (ITT) with gentamicin or corticosteroid prior to the MRI study was recorded.

MRI protocol and technique

MRI for inner ear EH was performed on a 3-Tesla Siemens Magnetom® Skyra scanner with a 64-channel head coil. MRI was performed 4 h after the intravenous administration of gadoterate meglumine (0.2 mmol/kg) with an isotropic (0.7mm voxel) high-resolution three-dimensional inversion recovery (IR) sequence with real reconstruction. An additional T2-weighted sampling perfection with application-optimised contrasts using different flip angle evolution (SPACE) sequences was performed. Siemens product sequences were used with parameters as set out in Table 2.

Table 2 MRI parameters for the 3D-real IR and T2-SPACE sequence

Extraction and definition of qualitative MRI descriptors

The MRI grading scales or unique individual EH descriptors which had been previously applied to ≥ 3 published diagnostic accuracy studies of MD were collated [1]. This yielded four grading scales (Nakashima, Barath, Bernaerts and Kahn) (Supplementary Table 2), and one additional descriptor of asymmetric PLE [1, 3,4,5,6]. The MRI descriptors extracted were classified according to whether they evaluated the cochlea, superior vestibular endolymphatic space (VES) or inferior VES (Table 3). Single cochlear grades were applied since they only slightly differ between grading systems. Since “saccule as large as the utricle” cannot be accurately evaluated when “confluent with the utricle”, these were combined as “saccule as large as or confluent with the utricle” to avoid missing values. An exploratory saccular descriptor was added as “saccule absent, as large as or confluent with the utricle” (Fig. 2) [2]. MRI descriptors pertaining to the decreased conspicuity of the VA were newly defined as “non-visualised VA”, “incompletely visualised VA” or “no hyperintensity VA” (Table 3; Figs. 3 and 4).

Table 3 Description of MRI descriptors and associated grading systems
Fig. 2
figure 2

Delayed post-gadolinium 3D-real IR axial images angled through the inferior vestibule demonstrating saccule-related MRI descriptors. a Normal appearances with the smaller saccule (filled arrow) anterior to the utricle (open arrow). b Absent saccule with the expected location indicated (filled arrow). c Saccule confluent with the utricle (filled arrow). d Saccule (filled arrow) larger than the utricle (open arrow)

Fig. 3
figure 3

Vestibular aqueduct appearances on axial and sagittal images. a T2w SPACE and (b) delayed post-gadolinium 3D-real IR axial images showing a “fan shaped” vestibular aqueduct which is well demonstrated on a single axial section (filled arrows). c Delayed post-gadolinium 3D-real IR axial image in a patient with left-sided unilateral MD. The asymptomatic right ear shows a short segment of the “tubular” vestibular aqueduct (filled arrow). The symptomatic left ear demonstrates vestibulo-cochlear endolymphatic hydrops and replacement of the vestibular perilymph (curved filled arrow), with no vestibular aqueduct visualised (open arrow). Delayed post-gadolinium 3D-real IR sagittal images of the (d) right ear and (e) left ear in the same patient. The “tubular” vestibular aqueduct is better depicted on the right due to its angulation (filled arrow in d) whilst the left vestibular aqueduct remains non-visualised (open arrow in e)

Fig. 4
figure 4

Vestibular descriptor analysis. a Delayed post-gadolinium 3D-real IR axial image demonstrates a right vestibular aqueduct (filled vertical arrow) to be as hyperintense as the enhancing nasal mucosa (horizontal filled arrow) whilst the left vestibular aqueduct (open arrow) is not hyperintense. b Delayed post-gadolinium 3D-real IR axial image shows an axial image with a completely visualised vestibular aqueduct. Note: The medial component (filled arrow) indicates it to be classified as hyperintense, despite the intermediate signal lateral portion (open arrow). c Delayed post-gadolinium 3D-real IR sagittal image illustrates a completely visualised vestibular aqueduct (filled arrow) which is not hyperintense. d Delayed post-gadolinium 3D-real IR sagittal image shows a hyperintense vestibular aqueduct; however, its relationship to the posterior petrous bone is better demonstrated in sagittal (e) T2w SPACE and (f) fused 3D-real IR and T2w SPACE images, which allow it to be more confidently classified as incompletely visualised (open arrow indicating posterior petrous bone)

Imaging analysis

The delayed post-gadolinium 3D-real IR sequence was sequentially reformatted to the planes as defined in Table 3 and Supplementary Table 2 on a PACS workstation (Sectra Workstation, Sectra AB), with T2w SPACE images used for anatomical correlation as required.

The images were reviewed with standardised magnification and window settings. Two radiologists (P.T. and S.C.), with 6 and 24 years of subspecialty head and neck radiology experience, independently evaluated the MRI studies. The qualitative MRI descriptors were analysed according to the defined criteria (Table 3) and the four grading scales were also scored based on the observed descriptors. Imaging review was performed whilst blinded to clinical diagnosis, and the two observers achieved consensus when different scores were obtained.

A quantitative analysis of PLE was measured as the signal intensity ratio of the cochlear basal turn relative to the ipsilateral middle cerebellar peduncle [11] as an internal reference (Fig. 5). A mean of the values obtained by the two observers was used for further analysis.

Fig. 5
figure 5

Quantitative analysis of cochlea PLE. The mean of three 1 mm2 regions of interest (ROI) in the scala tympani of the inferior segment of basal turn (black circles and indicated by open arrow) and the mean value was divided by a reference measurement from a 30 mm2 ROI in the middle cerebellar peduncle (white circle)

After review of the VA descriptors on delayed post-gadolinium 3D-real IR, the observers referred to the T2w SPACE sequence and any sub-millimetric computed tomography (CT) imaging to corroborate the anatomical location of the VA.

Statistics

Statistical analyses were performed using SPSS® Statistics 25.0 (IBM®). Inter-rater reliability of the qualitative MRI descriptors was evaluated with Cohen’s kappa test. Since the variable was not normally distributed on the Shapiro-Wilk’s test (p < 0.05), the correlation between quantitative cochlear measurements for the two observers was evaluated with Spearman’s linear weights rank-order correlation. Agreement between the two observers for the four grading scales was performed with a weighted kappa (κw) with linear weights.

The 2 × 2 contingency tables were created for the presence of MRI descriptors in definite MD and control ears, with the chi-squared test of independence conducted (significance level of p < 0.00294 to account for multiple comparisons). The diagnostic odds ratios (DORs), sensitivity and specificity were calculated. A Mann-Whitney U test determined any difference in the quantitative PLE measurements in MD ears. The chi-squared test of independence compared the scores for the four grading scales between MD ears and controls.

A binomial forward stepwise logistic regression was performed to determine the optimal combination of MRI predictors of definite MD. Descriptors with reliability kappa score < 0.6 and variance inflation factor > 5 were excluded. According to Vittinghoff and McCulloch [32], the sample size allowed the inclusion of all the MRI-based predictors in the model. At each step, the most highly correlated remaining variable was added, and the p-value threshold of 0.05 was used to set a limit on the number of variables used in the final model. Area under the receiver operating characteristic (ROC) curve was calculated. The forward stepwise logistic regression was repeated after removing MD ears which had undergone ITT.

Results

Descriptive data of cohort

There were 238 consecutive patients referred by our tertiary ENT unit for delayed post-gadolinium MRI. Patients with previous inner ear surgery (n = 2) and technically inadequate MR imaging (n = 9) were excluded from the study (Fig. 1).

Of the remaining 227 patients (128 female, 99 males; age mean 48.3 ± 14.6 years; duration of symptoms mean 88.3 ± 101 months), there were 87 patients with definite MD. This comprised 78 patients with unilateral MD, of which five also had contralateral “atypical” MD ears, and nine patients with bilateral MD. Hence, there were a total of 96 definite MD ears, and nine of these had undergone prior ITT. There were ten patients with conditions associated with secondary hydrops and 47 patients with “atypical” MD (40 unilateral, 7 bilateral) (Supplementary Table 1) [24,25,26,27,28,29,30]. The remaining 83 patients presented with audio-vestibular symptoms without diagnostic criteria or either definite or “atypical” MD, and from these, there were 78 control ears derived according to the previously stated conditions.

Inter-rater reliability

The inter-rater reliability for the MRI descriptors (n = 454) is documented in Table 4. “Saccule as large as or confluent with the utricle” demonstrated the highest kappa value of κ = 0.920 (0.881–0.959) with very good agreement. The poorest agreements were recorded for grade 1 cochlea with κ = 0.511 (95% confidence interval (CI), 0.429–0.593) and “33–50% VES relative to total vestibular (TV) area (superior)” with κ = 0.505 (95% CI, 0.417–0.593). There was a positive correlation between the two observers for quantitative cochlear PLE with rs = 0.858 (p < 0.001). All VA descriptors demonstrated very good reliability of κ = 0.801–0.868. The weighted kappa (κw) values were 0.727 (0.672–0.783) for the Nakashima scale, 0.844 (0.787–0.900) for the Barath scale, 0.840 (0.796–0.884) for the Bernaerts scale and 0.818 (0.775–0.862) for the Kahn scale.

Table 4 Inter-rater reliability kappa values, sensitivity/specificity and DOR for the MRI descriptors

Diagnostic performance of individual descriptors and parameters

All MRI descriptors were associated with the presence of definite MD ears (p < 0.001). The sensitivity, specificity and DORs of the MRI descriptors for the detection of definite MD ears are shown in Table 4. The highest DOR of 292.6 was achieved by “saccule absent, as large as or confluent with the utricle” (Fig. 2).

The DOR for the novel VA descriptors ranged from 7.761 to 18.1 and was highest for incompletely visualised VA (Figs. 3 and 4). There were 45/96 MD (16 CT, 19 T2w MRI, 10 both) and 58/78 control (10 CT, 34 T2w MRI, 14 both) ears in which either CT or T2w MRI clearly demonstrated VA for anatomical verification.

The quantitative cochlear PLE measurements were significantly increased in definite MD ears compared to control ears according to the Mann-Whitney U test (U = 6072; z = 7.04; p < 0.001) with AUC = 0.811 (0.748–0.873) indicating excellent discrimination.

The Nakashima (χ2 = 60.796, p < 0.001), Barath (χ2 = 72.762, p < 0.001), Bernaerts (χ2 = 87.029, p < 0.001) and Kahn score (χ2 = 86.059, p < 0.001) grades were all significantly associated with the presence of definite MD ears.

Logistic regression

Of the initial 17 MRI descriptors and one quantitative measure, two were excluded from the analysis due to kappa values < 0.6 (grade 1 cochlea and “33–50% VES relative to TV area (superior)”) and three due to collinearity (“> 50% VES area relative to TV area (inferior)”, “saccule as large as or confluent with the utricle”, “saccule confluent with the utricle”). Four standardised residuals with values of greater than 2.5 standard deviations (SDs) were kept in the analysis.

Of the remaining 13 MRI variables, only three were statistically significant (Table 5): “saccule absent, as large as or confluent with the utricle” (χ2 = 129.957, p < 0.001), “asymmetric cochlear PLE” (χ2 = 10.422, p < 0.001) and “incompletely visualised VA” (χ2 = 8.280, p < 0.001). The model explained 76.9% (Nagelkerke R2) of the variance in definite MD and correctly classified 90.2% of cases (sensitivity 84.4%, specificity 97.4%, PPV 97.6%, NPV 83.5%). The AUC was 0.938 (95% CI, 0.900–0.976) (Fig. 6) which is an outstanding level of discrimination [33]. Repeating the logistic regression after removing MD ears which had undergone ITT yielded the same three significant MRI descriptors, correctly classifying 90.9% of cases (sensitivity 85.1%, specificity 97.4%, PPV 97.4%, NPV 85.4%).

Table 5 Variables in the equation showing the contribution of each of the three MRI descriptors to the model and its statistical significance
Fig. 6
figure 6

Receiver operating characteristic (ROC curve) for the model comprising “saccule absent, as large as or confluent with the utricle”, “asymmetric cochlear PLE” and “incompletely visualised vestibular aqueduct”

Discussion

All MRI descriptors were recorded with good to excellent reliability apart from grade 1 cochlear and “33–50% vestibular endolymphatic space relative to total vestibular area (superior)”. Both are derived from the Nakashima grading system (κw = 0.727) which was less reliable than those of Barath, Bernaerts and Kahn (κw = 0.818–0.844). The DORs for the MRI descriptors ranged from 1.215 to 292.6. Those derived from the inferior vestibule performed best, and “saccule absent, as large as or confluent with the utricle” had the largest DOR of 292.6 (95% CI, 38.305–2235.058), although there were wide confidence intervals. The newly defined VA descriptors all had excellent reliability and demonstrated DORs of 7.761 (95% CI, 3.517–17.125) to 18.1 (95% CI, 8.445–39.170). A forward stepwise logistic regression demonstrated the presence of “saccule absent, as large as or confluent with the utricle” or “asymmetric cochlear PLE” and “incompletely visualised VA” to be the statistically significant variables, correctly classifying 90.2% of cases (sensitivity 84.4%, specificity 97.4%, PPV 97.6%, NPV 83.5%).

There has been no previous integrated comparison of reliability and performance applied to a comprehensive range of delayed post-gadolinium MRI descriptors for the diagnosis of MD. A recent systematic review found only 3/72 studies [1, 2, 7, 8] applying more than one vestibular grading scale. These three studies evaluated both SURI (inversion of the saccule to utricle area ratio) and the Nakashima grading scale, but sample sizes were small (n ≤ 30). Two further recent studies have compared the reliability and diagnostic performance of different vestibular grading systems, but not individual descriptors. The Bernaerts scale was found to be more sensitive than Barath [9], whilst Xiao et al found the authors’ own 18-point scoring scale to have superior diagnostic performance to the Bernaerts grade [10].

The consideration of newly defined VA descriptors was motivated by previous histopathological [13, 14] and radiological studies [16,17,18,19,20] which have highlighted associations between a reduction in size and conspicuity of the VA and endolymphatic sac in MD ears. Attyé et al evaluated the visibility of the VA in a cohort of 20 unilateral MD ears compared to 20 healthy control patients and found a bilateral reduction in visibility in unilateral MD [19]. Our study applied similar imaging analysis and confirmed that “incompletely visualised” was reliably evaluated and added value to established MRI descriptors by its contribution to the logistic regression model. In addition, we demonstrated the absence of hyperintensity within the VA to be a predictor of MD ears with an odds ratio of 11.205 (5.009–25.068). Hyperintensity due to VA enhancement has been observed on early post-gadolinium MRI, in both normal ears [21] and in immune-mediated and viral inner ear diseases [22, 23]. It is speculated that our observation of absent VA hyperintensity is due to reduced gadolinium enhancement, and this may be related to the fibrotic changes which develop within the richly vascularised [34] peri-saccular loose connective tissue [14] in MD.

Our study outcomes build on previous work [4, 11] identifying SURI or higher grade EH and PLE as key MRI predictors for definite MD. These previous studies applied the asymptomatic contralateral ear in unilateral MD as the control, which is not ideal since EH has been observed to be present in up to 23.3% of asymptomatic contralateral ears in MRI studies [35] as well as in histopathology studies [36]. The control ears in our study were all from asymptomatic ears in patients without any audiometric or vestibular features suggestive of MD. The potential for our exploratory descriptor “absent saccule” as a marker of MD was first highlighted by Attyé [2] but it has not been evaluated in a large cohort for its diagnostic potential. Whilst an absent saccule due to rupture or fistula is infrequently associated with definite MD ears [37], our study has demonstrated it to provide improved diagnostic performance when added to the other saccular descriptors.

This study has limitations that should be considered. Firstly, there is innate bias in patient selection introduced by the case-controlled study design [38]. Although inappropriate exclusions such as bilateral MD and ITT interventions were avoided, nine ears were excluded post hoc due to technically inadequate imaging, which may distort outcomes. Secondly, the retrospective nature of the study led to an interval between the imaging index test and the clinical MD reference standard. These concerns may be addressed by future application of a prospective study design. Thirdly, although the MRIs were interpreted blinded to the clinical diagnosis, there was potential bias due to the unavoidable viewing of multiple MRI descriptors simultaneously within each ear. Fourthly, the generalisability of the results may be restricted since the performance of the MRI descriptors may depend on the specifics of the MRI sequence and technique. There is also no guarantee that the selection of optimal MRI descriptors would be the same in patients with atypical phenotypes of MD. Subgroup analysis was limited by sample size in the current study. In addition, a number of the estimates of diagnostic accuracy demonstrated wide confidence intervals, which should hence be interpreted with caution. Fifthly, the range of MR descriptors studied was not exhaustive. It was decided a priori to study a single cochlear hydrops grading system since there are no substantial differences between the different grading scales, and descriptors employed in less than three studies were excluded. Finally, with respect to the novel VA descriptors, anatomical correlation was not always possible due to CT not being available or poor visualisation on T2w MRI. Considering it was hypothesised that a reduction in conspicuity was associated with MD ears, the exclusion of these ears from the analysis may have led to partial verification bias.

A direct comparison of MRI descriptors for the diagnosis of definite MD ears found 15/17 to be demonstrated with good to excellent reliability and with “saccule absent, as large as or confluent with the utricle” demonstrating the best diagnostic performance. The three VA MRI descriptors were reliable and demonstrated high DORs. “Incompletely visualised VA” was included in the optimal combination of three descriptors for the diagnosis of MD ears, indicating its added value. The radiologist should focus on these three highly performing descriptors in combination when reporting delayed post-gadolinium MRI, to best corroborate the diagnosis of MD. As guidelines develop which incorporate MRI in their diagnostic criteria for MD [39], specific imaging criteria may now be advised.