Introduction

The clinical course of Graves’ orbitopathy (GO) is heterogeneous. The natural course is benign in a considerable proportion of GO cases. However, some GO cases may present with significant severe sequelae. The GO ocular manifestations include eyelid retraction, proptosis, chemosis, periorbital edema, altered ocular motility, and loss of vision with significant functional, social, and cosmetic consequences.1, 2, 3, 4 As there are few clues currently available to assist clinicians in estimating GO prognosis, it is very difficult to predict which patients will suffer from such severe sequelae. Recently, Eckstein et al5 reported that persistently high TSH-receptor antibody (TRAb) levels were associated with a severe course of GO by demonstrating that TRAb levels at 12–24 months after diagnosis were significantly higher in patients with a severe course of GO than in those with a mild outcome. In their report, TRAb was measured using the second-generation thyrotropin-binding inhibitor immunoglobulin (TBII) assay.

Currently, there are two assays that are able to measure TRAb: one is the TBII assay and the other is a functional thyroid-stimulating immunoglobulin (TSI) bioassay. The TBII assay utilizes the ability of TRAb to inhibit the binding of radiolabeled TSH to TSH receptors. Through first-, second-, and third-generation modification, TBII has been modified to enhance its diagnostic sensitivity and specificity for Graves’ disease (GD).6, 7, 8 Classical TBII assays are limited by their inability to discriminate between thyroid-stimulating antibodies (TSAbs) and thyroid-blocking antibodies (TBAbs). However, the most newly developed third-generation TBII assay inhibits binding of a labeled TSAb (monoclonal Ab clone #M22) rather than labeled TSH to the TSH receptor9, 10 This results in enhanced sensitivity and specificity vs earlier assays using radiolabeled TSH.6, 8, 11, 12 The TSI bioassay measures cyclic adenosine monophosphate production after TSAb binds to TSH receptor.13, 14, 15 Thus, this method enables the identification of the functional characteristics of TRAb. With the development of the Mc4-CHO cell line, the cell culture protocols of the TRAb bioassays were simplified.15, 16, 17 The superiority of the Mc4-CHO TSI bioassay has been reported with respect to GD diagnosis,17 and in the differential diagnosis of GD vs painless thyroiditis.16

Recent reports evaluated the relationship between Mc4-CHO TSI and GO. Lytton et al17 reported the superiority of the Mc4-CHO TSI bioassay in reflecting GO clinical activity and severity at a given time point, vs the second-generation TBII assay. Ponto et al18 also showed that the Mc4-CHO TSI assay results were better associated with GO clinical features than those of the second-generation TBII assay in their cross-sectional study. The correlation between TRAb levels and GO symptomology at certain time points may be of scientific importance, but may be limited in improving patient clinical management.

We investigated whether serum TRAb in newly diagnosed, untreated GO patients are predictive of disease course beyond 1 year after initial GO diagnosis. In addition, we compared the prognostic performance levels of two newly developed TRAb assays (a third-generation TBII assay and a Mc4-CHO TSI bioassay) in predicting GO disease course.

Patients and methods

Subjects

This was a retrospective, consecutive, and observational study. Subjects were consecutive Korean GO patients attending the Department of Ophthalmology at the Severance Hospital of Yonsei University, College of Medicine, between January 2008 and May 2011. All subjects were newly diagnosed with GO, and no one had been previously treated with steroids or radiation before first presentation. All subjects met the following criteria: (1) the duration of GO ocular symptoms at initial diagnosis was no longer than 6 months; (2) both the third-generation TBII and Mc4-CHO TSI assays were performed simultaneously at initial GO diagnosis; and (3) patients were evaluated at regular intervals for longer than 1 year by a single ophthalmologist. Data from 112 patients were available for analysis. Past medical history, smoking history, GO duration, GD duration, type of current GD treatment, family history, presence of increased thyroid size, and peripheral thyroid function (initial free T4 level) were assessed. During this study, all patients with a clinical activity score (CAS; detailed below) of 4 or greater were prescribed steroid treatment. Steroid therapy usually began with oral prednisolone (1 mg/kg of body weight for adults and 0.5 mg/kg for children), which was tapered according to GO clinical activity and severity. Oral steroid administration was the preferred dosing regimen in our institute, because it was not necessary for patients to visit the clinic once a week.19 For severe cases, such as vision-threatening compressive optic neuropathy, intravenous steroids were given (methylprednisolone 500 mg/day for 3 days), and then substituted with the tapered oral steroid regimen.

The study adhered to the Declaration of Helsinki, and approval to conduct this study was obtained from the Institutional Review Board of Severance Hospital of Yonsei University. Informed consent was obtained from all participants.

Ocular examinations

At each visit, the patients were examined for best-corrected visual acuity, intraocular pressure measurement, and an exophthalmometry measured with a Hertel exophthalmometer. In patients with diplopia, Hess screen test, binocular single vision test, and computed tomography scan were performed. Monocular excursion was evaluated with the Hess screen test. GO activity was assessed using 7 points of the modified CAS as described by Mourits et al.20 GO severity was assessed using the modified NOSPECS classification suggested by Eckstein et al.5, 21 The sum of scores in each class constitutes the total NOSPECS score. Patient CAS and NOSPECS scores were evaluated by a single examiner who was blinded to the TRAb data (SYJ).

Grouping subjects

All patients were assigned to either the mild or severe GO category according to their ocular manifestations after 1 year beyond the initial visit. Patient grouping was done as described by Eckstein et al.5, 21 Patients with a CAS score of less than 4 and a NOSPECS score of less than 5 were assigned into the mild course group (Group 1), while patients with a CAS score equal or more than 4 and/or a NOSPECS score of 5 were defined as the severe course group (Group 2). Group 1 patients were in an inactive phase and displayed mild symptoms of GO; while Group 2 patients were considered to be in an active phase (GO activity was poorly controlled) and/or displayed severe GO sequelae. Patient CAS and NOSPECS scores were evaluated by a single examiner (SYJ) who was blinded to the TRAb data. One clinician who was blinded to the results of TRAb levels also performed a patient grouping.

TSH-R antibody detection assays

We used the measurement at initial GO diagnosis (initial TRAb levels) to analyze. Schematics depicting the TRAb assays are shown in Figure 1. TRAb was measured with a third-generation TBII assay using the automated Cobas electrochemiluminescence immunoassay (Elecsys, Roche Diagnostics GmbH, Penzberg, Germany) according to the manufacturer’s instructions. Mc4-CHO TSI in patient sera was measured with the Thyretain TSI Reporter BioAssay (Diagnostic Hybrids Inc., Athens, OH, USA) according to the manufacturer’s instructions.

Figure 1
figure 1

Schematic representation showing the principles of the third-generation TBII assay and Mc4-TSI bioassay. AC, adenylate cyclase; CRE, cAMP response element; LH-CG, lutropin-choriogonadotropin; TSH, thyroid-stimulating hormone; TRAb, TSH-receptor autoantibody; pTSH-R, porcine thyroid-stimulating hormone receptor; TSI, thyroid-stimulating immunoglobulin; TBI, thyroid-blocking immunoglobulin.

Data analysis and statistics

Normality of data distribution for all variables was tested using the Kolmogorov–Smirnov test. We used independent t-tests and Pearson χ2 analysis to compare primary data such as demographics and laboratory data between the two subject groups. Wilcoxon rank-sum test was used to compare median values of initial CAS score, duration of GO, duration of GD, duration of anti-thyroid drug (ATD) treatment, and follow-up periods between the two groups.

We investigated the relationships between initial TRAb levels and GO outcome using multiple logistic regression analysis, allowing for adjustment of possible factors influencing GO course. For multiple logistic regression analysis, TBII and TSI assays were entered into the model I and II, respectively. Models I and II were used to allow for adjustment of possible risk factors influencing GO course such as smoking, type of GD treatment, thyroid size, and steroid use. If there was a significant correlation (R>0.5) between risk factors, only one factor was entered into the model to avoid multicollinearity. Receiver operating characteristic (ROC) plot analysis was performed to obtain the area under the ROC curve (AUC) using the third-generation TBII and Mc4-TSI assay values as independent variables, and the severe course of GO as dependent variables. Comparison of AUCs was performed using the Delong method. The sensitivity and specificity of both assays were examined at an optimal cutoff point in the ROC curves regarding the ability of the two assays in terms of discrimination between the severe and not severe groups, which was determined by the Youden index.22 Statistical analysis was conducted using SAS software (SAS version 9.1; SAS Institute Inc., Cary, NC, USA) and PASS (version 11, NCSS, LLC.; Kaysville, UT, USA), and P-values <0.05 were considered statistically significant.

Results

Clinical description

Of a total of 112 patients, 73 patients (65.2%) were determined as having mild GO (Group 1) and 39 patients (34.8%) were classified as having severe GO symptoms (Group 2) after 1 year beyond the initial visit. As shown in Table 1, the mean age, the median duration of ocular symptoms at the initial GO diagnosis, type of GD treatment prescribed, and initial mean fT4 level were not different between Groups 1 and 2. The male and smoker proportions were higher in Group 2 than in Group 1, albeit not significantly higher. At initial GO diagnosis, the median prior duration of GD was 8 and 10 months in patients with mild and severe courses, respectively (P=0.308). Of the 73 patients with mild GO, 66 patients (90.4%) had been treated for GD, using ATDs in 64 patients (87.7%) and radioiodine in 2 patients (2.7%). The median duration of ATD treatment in mild GO at the initial GO diagnosis was 6 months. Among the 39 patients with severe GO, 30 patients (76.9%) had been treated with ATDs, 4 patients (10.3%) underwent radioiodine treatment, and 1 patient (2.6%) underwent thyroidectomy. The median duration of ATD treatment in severe GO at the first visit was 10 months. Steroid treatment between the two groups was significantly different (P=0.003), with a much greater proportion of subjects with severe disease (27/39 patients, 69.2%) receiving steroids than the subject group with a mild course (22/73 patients, 30.1%). The median values of initial CAS scores between the two groups was different as well (P<0.001).

Table 1 Comparison of clinical and biochemical characteristics between mild (Group A) and severe (Group B) Graves’ orbitopathy

Initial TRAb levels (TRAb measured at initial GO diagnosis) in patients with mild and severe GO

The mean third-generation TBII level was significantly higher in the severe disease Group 2 vs Group 1 (21.19 and 7.07 IU/l, respectively, P<0.001). Likewise, the mean Mc4-CHO TSI level, evaluated by the more specific bioassay, was significantly elevated in Group 2 vs Group 1 (597.19 and 320.22%, respectively, P<0.001, Table 1).The distribution of initial TRAb levels according to mild and severe group was shown in Figures 2a and b.

Figure 2
figure 2

Distribution of TSH receptor autoantibody levels according to mild and severe Graves’ orbitopathy, and comparison of the receiver operating characteristic analysis between two different TSH-receptor antibody assays. (a) Distribution of third-generation TBII assay. (b) Distribution of Mc4-TSI bioassay. (c) Comparing areas under the curve for each TRAb assay yielded that the predictive power of the third-generation TBII assay (AUC=0.817, 95% CI=0.732–0.902) was equivalent to that of the Mc4-TSI bioassay (AUC=0.868, 95% CI=0.803–0.934) in predicting severe patient disease outcome using the Delong method (P=0.203).

Comparison of the performance levels of the third-generation TBII assay and Mc4-TSI bioassays in predicting the GO course

The cutoff values for the prediction of severe course of the third-generation TBII and Mc4-TSI assays were 10.67 IU/l and 555.10 (SRR%) with respective specificities of 84.9 and 89.0% and sensitivities of 66.7 and 69.2% (Table 2). The predictive power of the third-generation TBII assay (AUC=0.817, 95% confidence interval (CI)=0.732–0.902) was equivalent to that of the Mc4-TSI bioassay (AUC=0.868, 95% CI=0.803–0.934) in predicting severe patient disease outcome (P=0.203) (Table 2 and Figure 2c).

Table 2 Cutoff value for the prediction of severe Graves’ orbitopathy course of the third-generation TBII and Mc4-TSI bioassay using Youden method

Multiple logistic regression analysis including factors that are associated with the course of GO

Patients with higher initial TRAb levels had a greater risk of severe disease outcomes. As shown in Table 3, this relationship was independent from age, steroid use, type of GD treatment, fT4, and thyroid size for both the third-generation TBII (P=0.001) and for the Mc4-TSI (P<0.001) assays.

Table 3 Multiple logistic regression analysis including factors that are associated with the course of Graves’ orbitopathy

Discussion

We investigated whether the levels of TRAb measured with third-generation TBII assay and Mc4-TSI bioassay were predictive of clinical courses of GO. Of note, TRAb levels were measured at the time of GO diagnosis, whereas the clinical courses of GO were estimated at least 12 months after the initial diagnosis. We selected newly diagnosed, untreated patients whose duration of ocular symptoms was not longer than 6 months before TRAb measurement. This enabled us to assess the clinical relevance of TRAb at the early stage of GO in terms using TRAb levels to develop an accurate prognosis of the disease outcome. Interestingly, we found that TRAb levels were significantly higher in patients with a severe course compared with patients with a mild course of GO. Importantly, patients with higher initial levels of TRAb had a significantly higher risk of severe outcomes upon multiple logistic regression analysis. As GO pathogenesis remains incompletely understood and because there are few clues that are currently available to assist clinicians in estimating the course of GO,23, 24, 25, 26, 27 identifying novel, objectively measurable risk factors that could predict severe GO outcomes are of important scientific and clinical benefit. In patients with high levels of TRAb, it would be reasonable to follow up with these patients and monitor the activity and severity more closely. Also regarding the treatment, although the TRAb level alone cannot be an indication of a certain GO treatment, more aggressive anti-inflammatory treatment other than steroid, including radiotherapy28 or rituximab29 treatment, might be helpful to prevent progression of the disease.

Reporting a positive correlation between changes in TSI and CAS over time, Dragan et al14 suggested that serial evaluation of TSI levels would help in deciding which GO patients should get aggressive treatment to prevent a severe inflammatory phase from occurring. Eckstein et al5 reported that persistently high second-generation TBII levels were associated with a severe course of GO. Accordingly, serial monitoring of TRAb levels will likely prove to be helpful in the assessment and management of GO. Although we agree that serial measurements of TRAb might provide valuable information, our results suggested that a single TRAb measurement, taken immediately after initial GO diagnosis, would also provide clinicians with a clear impression of whether a patient will present with a mild or severe course in the future.

However, such an association between high TRAb levels and a severe course of GO was not found by Eckstein et al5 for patients at an early disease stage (within 4 months of first GO symptoms). Based on the results by Laurberg et al30 demonstrating that the largest decrease in TRAb levels occurred within the first6 months of ATD treatment, we assumed that the discordance between our results and Eckstein et al’s might be owing to the difference of the prior GD duration in study subjects. In Eckstein et al’s report,5 the median GD duration at GO onset in the group with a mild course was 0 months, and 2.4 months in the group with a severe course, indicating that almost all patients required immediate GD treatment initiation. In our study, however, the median duration of thyroid disease was 8 months in the group with mild GO and 10 months in the group with severe GO. In our study, 101 patients (90.2%) of the 112 total subjects had already received definitive hyperthyroidism treatment at the time of their initial visit to the ophthalmic clinic. Therefore, highly maintained TRAb levels even after ATD treatment might be associated with GO development and/or progression towards severe GO. This idea is supported by the idea that TRAb consistently mediates the autoimmune process in the orbit.5 If TRAb levels were measured before initiating ATD treatment, it might help to elucidate the impact of serial change of TRAb during ATD treatment on GO clinical course.

There are several reports that TSI is more closely associated and reflective of severity of ophthalmopathy than the second-generation TBII assay.17, 18 A previous report conducted in Singapore31 showed that TSI was significantly higher in newly diagnosed GD patients with ophthalmic-dominant symptomology, whereas thyroid-dominant patients had higher TBII levels. In the present study, ROC plot analysis compared the power of the third-generation TBII and Mc4-TSI bioassays to predict GO progression. Interestingly, the AUC of the third-generation TBII assay was 0.817 and that of the Mc4-TSI assay was 0.868, which means that both of these TRAb assays have equivalently high power in predicting severe GO disease course. Smith et al12 showed the higher sensitivity of the M22-based ELISA than the TSH-based ELISA in diagnosing GD. The human monoclonal TSAb (M22)9, 10 allows a third-generation TBII assay system where the TRAb inhibits binding of a labeled TSAb rather than labeled TSH to the TSH receptor. Considering that different regions of the TSH receptor are involved in binding TRAb with TSH agonist and antagonist properties,32 it was not surprising that the performance level of the third-generation TBII assay was comparable to that of the TSI bioassay.

During the GO active phase, patients should receive immunosuppressive treatment, such as corticosteroids or radiotherapy, to prevent progression to more severe disease and to shorten the active phase.33, 34 As steroid use could potentially affect the disease course, we considered it necessary to standardize steroid administration in order to accurately evaluate it as a GO risk factor. In our study, no GO patients had been previously treated with steroids at the time of their initial visit. By standardizing our steroid administration protocol, we were able to definitively demonstrate that the association of TRAb with GO course severity is independent of steroid use. Although it is well documented that smoking adversely affects the course of GO,25, 26, 27 our results did not elucidate this relationship. This might be because there were only 27 smokers and only 39 patients with severe GO in our study. Ethnic differences affect the prevalence and severity of GO,35 thus the prevalence of severe course of GO in our study (34.8%) was relatively lower than that in Eckstein et al’s (53%).5 Although the difference did not reach statistical significance owing to small number of smokers in our study, smoking was more common in patients with a severe vs a mild GO course.

In conclusion, both types of TRAb assays, the third-generation TBII assay and the Mc4-TSI assay, were clinically relevant as predictors of a severe course of GO. The predictive power of the third-generation TBII assay was as strong as that of Mc4-TSI bioassay. This is the first report to compare the predictive power of two different laboratory tests for TRAb on GO course severity. We recommend measuring TRAb at the first establishment of a GO diagnosis, in order to better organize future disease management strategies for these patients. Additional studies defining the clinical importance of TRAb in GO are necessary. These include assessing the relationship between TRAb levels and patient responses to medical or surgical GO treatments, and evaluating the utility of TRAb measurement in predicting GO symptom recurrence after rehabilitative surgery.