Introduction

The incidence and prevalence of diabetes mellitus have been increasing. New figures indicate that, if no urgent action is taken, the number of people living with diabetes will rise from 366 million in 2011 to 552 million by 2030 [1]. In adults, type 2 diabetes mellitus accounts for 90–95% of all diagnosed cases of diabetes in the USA [2]. Approximately 40% of people with diabetes develop diabetic nephropathy (DN), which has become the leading cause of end-stage renal disease (ESRD) in developed countries [3]. From United States Renal Data System reports, the adjusted rate of prevalent ESRD due to diabetes rose 2.2% to 647 per million people in 2009, and the total medical care expenditure for ESRD rose 3.1%, reaching US$29 billion dollars [4].

Kidney biopsy can discriminate DN from non-diabetic renal disease (NDRD), but it is invasive and not suitable for every patient. NDRD is rare in type 1 diabetes mellitus, particularly in patients with a history of diabetes of >10 years [5]; however, reports of the prevalence of NDRD in type 2 diabetes mellitus have varied widely from 10% to 85% [610]. One joint analysis of available data on the prevalence of NDRD among patients with type 2 diabetes mellitus revealed that NDRD was evident on kidney biopsy in ∼22% of European and 26.7% of Asian patients [11]. Furthermore, the treatment and prognosis of DN and NDRD are different. Research has shown that diabetic patients with NDRD have significantly better renal outcomes than patients with biopsy-proven DN, since many NDRDs are treatable, and even remittable [12].

Assessment of diabetic retinopathy (DR) is inexpensive and could be routinely performed during outpatient screening for chronic complications of diabetes. Indeed, previous literature has shown that DR may be helpful in distinguishing the type of kidney pathology in patients with type 2 diabetes mellitus and renal disease [1315]. However, the results of these studies are diverse and have been found to have variable predictive value in the different series. In addition, most of the available data are from retrospective studies with small samples that lack a quantified standard. Therefore, it is worth comprehensively reviewing the data on the predictive role of DR in biopsy studies. This meta-analysis focused on both prospective and retrospective biopsy studies, and aims to estimate the overall capacity of DR for predicting DN in type 2 patients with diabetes mellitus and renal disease.

Methods

Search strategy

The databases searched included Medline and Embase, from the time of their inception to February 2012. The medical subject headings (MeSH) were ‘Biopsy’ or ‘Pathology’ and ‘Diabetic nephropathy/diagnosis/aetiology/pathology’. The references from retrieved articles and reviews identified in the search were manually inspected to verify further articles. One reviewer (F.H.) performed the search, while a second (X.X.) confirmed the process.

Study selection

The search yielded 3,361 articles, which were assessed using titles, abstracts and/or full articles. Only papers published in English were included. The inclusion criteria were: (1) patients with type 2 diabetes mellitus and renal disease; (2) identification of renal diseases based on kidney biopsy findings; (3) presence of DR and numbers of patients classified in each renal disease group. We included the latest publication when more than one paper was published on a study. While screening the citations, two reviewers (F.H. and X.X.) independently reviewed the search results to determine article inclusion. In cases of discord, a consensus was reached through discussion with the senior author (F.X.H.).

Data extraction and quality assessment

The same investigators (F.H. and X.X.) each retrieved data using standardised forms, obtaining information on study design, author, publication year, percentage of men, duration of diabetes, presence of baseline proteinuria, methods of evaluating DR, and the inclusion criteria to select patients. Data were collected at baseline in the case of longitudinal studies. The numbers of true-positive, false-positive, true-negative and false-negative results were calculated for each study. Study quality was assessed with the quality assessment of studies of diagnostic accuracy included in systematic reviews (QUADAS) checklist (maximum score 14) [16]. The checklist is structured as a list of 14 questions that should be answered ‘yes,’ ‘no’ or ‘unclear.’

Statistical analysis

Sensitivity, specificity, positive predictive value, negative predictive value and diagnostic odds ratios (DORs) were calculated for each study after construction of a 2 × 2 table. Cells with a value of ‘0’ in the 2 × 2 tables were replaced with ‘0.5’ for pooling purposes. The pooled estimates with 95% CIs were calculated using a random-effects model [17]. A summary receiver operating characteristic (sROC) curve was performed to assess the interaction between sensitivity and specificity. A weighted AUC was obtained to estimate the diagnostic performance. The I 2 test was used to quantify the degree of heterogeneity among studies, with I 2 values of 25%, 50% and 75% being tentatively considered low, moderate and high heterogeneity, respectively [18]. The potential presence of publication bias was tested for using the Egger test [19].

Analyses were performed using Stata statistical software v.11.0 for Windows (Stata Corp, College Station, TX, USA) and Meta-DiSc software (Madrid, Spain) [20]. Statistical tests were two-sided and used a significance level of p < 0.05.

Results

Literature search results and study characteristics

The literature search initially identified 3361 articles, which were reduced to 48 after titles and abstracts had been read. After full-text evaluation, 26 papers remained for analysis [810, 12, 14, 15, 2140], including nine prospective studies and 17 retrospective ones.

The 26 articles identified above involved 2,012 participants for inclusion. The quality of original studies was checked according to QUADAS, and all of the studies had high scores (≥11). Details of the study characteristics and their corresponding QUADAS scores are summarised in Table 1. The proportion of men in the studies ranged between 47% and 94% (weighted average 62%). The mean duration of diabetes in the studies was 5–12 years. In prospective studies, the participants all had proteinuria when the biopsies were performed, which were mostly macroalbuminuria, except for one study, which was limited to microalbuminuria [21]. Different methods were used to assess DR: ophthalmoscopy after mydriasis in four studies [21, 22, 24, 30, 36]; fundus photography after mydriasis in one study [27]; ophthalmoscopy without mydriasis in four studies [15, 23, 26, 28, 29, 33, 34, 40]; and in nine studies no relevant details were provided [8, 10, 25, 3133, 35, 38, 39]. Of the nine prospective studies assessed, five were screening studies conducted on consecutive patients with type 2 diabetes mellitus and proteinuria [2125], while the remaining four were conducted on selected type 2 diabetes mellitus populations using criteria for the biopsy for type 1 diabetes mellitus (microhaematuria, absence of DR, atypical change in renal function, or immunological abnormalities) [2629]. However, in most retrospective studies, renal biopsy indications were based on clinical suspicion of NDRD (microhaematuria, atypical change in renal function, overt proteinuria, without unified and clear criteria).

Table 1 Characteristics of the 26 studies included in the meta-analysis

DR predicting DN

Forest plots of the pooled sensitivity and specificity are shown in Fig. 1. In the five screening prospective studies, the sensitivity ranged from 0.59 to 1.00 (pooled sensitivity 0.70, 95% CI 0.63, 0.76), and the specificity ranged from 0.41 to 1.00 (pooled specificity 0.73, 95% CI 0.65, 0.80). For the remaining four selected samples of the prospective studies, the sensitivity ranged from 0.35 to 0.71 (pooled sensitivity 0.50, 95% CI 0.40, 0.60), and the specificity ranged from 0.27 to 0.88 (pooled specificity 0.77, 95% CI 0.69, 0.84). The overall pooled sensitivity and specificity for all the prospective studies were 0.64 (95% CI 0.58, 0.69) and 0.75 (95% CI 0.70, 0.80), respectively. Finally, taking all 26 prospective and retrospective studies together, the total pooled sensitivity and specificity were 0.65 (95% CI 0.62, 0.68) and 0.75 (95% CI 0.73, 0.78), respectively.

Fig. 1
figure 1

Forest plots of pooled sensitivity (a) and specificity (b) of DR predicting biopsy-proven DN in separate and combined groups. T2DM, type 2 diabetes mellitus. T1DM, type 1 diabetes mellitus. The black circles and horizontal lines represent the study-specific index of diagnosis and corresponding 95% CI, respectively. The diamond and vertical lines/brackets represent the pooled estimate with 95% CI

Forest plots of the pooled DOR are shown in Fig. 2. In the five screening prospective studies, the pooled DOR ranged from 1.00 to 45.84 (pooled DOR 8.98, 95% CI 2.01, 40.19). For the remaining four selected samples of the prospective studies, the pooled DOR ranged from 0.36 to 10.93 (pooled DOR 2.28, 95% CI 0.46, 11.22). The overall pooled DOR for all the prospective studies was 4.65 (95% CI 1.62, 13.35). When all 26 prospective and retrospective studies were taken together, the total pooled DOR was 5.67 (95% CI 3.45, 9.34). The pooled positive predictive value of DR predicting DN for all 26 papers was 0.72 (95% CI 0.68, 0.75) (Fig. 3a), while the pooled negative predictive value was 0.69 (95% CI 0.67, 0.72) (Fig. 3b).

Fig. 2
figure 2

Forest plots of pooled DORs of DR predicting biopsy-proven DN in separate and combined groups. T2DM, type 2 diabetes mellitus; T1DM, type 1 diabetes mellitus. The black circles and horizontal lines represent the study-specific index of diagnosis and corresponding 95% CI, respectively. The diamond and vertical lines/brackets represent the pooled estimate with 95% CI

Fig. 3
figure 3

Forest plots of pooled positive (a) and negative (b) predictive values of DR predicting biopsy-proven DN in separate and combined groups. T2DM, type 2 diabetes mellitus; T1DM, type 1 diabetes mellitus. The black circles and horizontal lines represent the study-specific index of diagnosis and corresponding 95% CI, respectively. The diamond and vertical lines/brackets represent the pooled estimate with its 95% CI

The sROC curves showing sensitivity vs 1 − specificity from individual studies was not positioned near the desirable upper left corner (Fig. 4). The AUC was 0.82 (SEM 0.08) in the prospective consecutive screening subgroup, 0.73 (SEM 0.08) in nine prospective studies, and 0.75 (SEM 0.03) in all 26 studies.

Fig. 4
figure 4

sROC curves of DR predicting biopsy-proven DN in separate and combined groups. (a) Symmetric sROC curve of the screening subgroup; AUC, 0.8194; SE (AUC), 0.0881; Q*, 0.7530; SE (Q*), 0.0791; (b) sROC curve of the nine prospective studies; AUC, 0.7288; SE (AUC), 0.0797; Q*, 0.6760; SE (Q*), 0.0647 (c) sROC curve of all 26 studies included in the meta-analysis; AUC, 0.7532; SE (AUC), 0.0344; Q*, 0.6961; SE (Q*), 0.0285. Q* (i.e. the Q value) is the maximum joint sensitivity and specificity

The I 2 test detected moderate to high heterogeneity among studies. Therefore, random-effects models were used for the meta-analysis. The Egger test showed no significant publication bias (p = 0.77).

Proliferative DR (PDR) predicting DN

Figure 5 shows the sROC curves and the forest plots of the pooled sensitivity, specificity, positive predictive value and negative value of PDR predicting biopsy-proven DN. Only four studies evaluated DR graded as simple or proliferative DR [8, 21, 24, 28]. The sensitivity ranged from 0.03 to 0.57 (pooled sensitivity 0.25, 95% CI 0.16, 0.35), while the specificity ranged from 0.93 to 1.00 (pooled specificity 0.98, 95% CI 0.92, 1.00). The pooled positive predictive value was 0.96 (95% CI 0.79, 1.00), while the pooled negative predictive value was 0.48 (95% CI 0.39, 0.57). The sROC curves showing sensitivity vs 1 − specificity from individual studies was positioned near the desirable upper left corner with an AUC of 0.99 (SEM 0.05). The I 2 test detected high heterogeneity among studies for sensitivity (I 2 = 85.3%, p < 0.001) and negative predictive value (I 2 = 93.2%, p < 0.001), but heterogeneity was low for specificity (I 2 = 0.5%, p = 0.389) and positive predictive value (I 2 = 45.9%, p = 0.136).

Fig. 5
figure 5

Forest plots of (a) pooled sensitivity, specificity, positive predictive value, negative predictive value. The black circles and horizontal lines represent the study-specific index of diagnosis and corresponding 95% CI, respectively. The diamond and vertical lines/brackets represent the pooled estimate with its 95% CI. (b) sROC curve of proliferative DR predicting biopsy-proven diabetic nephropathy; AUC, 0.9887; SE (AUC), 0.0459; Q*, 0.9508; SE (Q*), 0.1008. Q* (i.e. the Q value) is the maximum joint sensitivity and specificity

Discussion

This meta-analysis shows that the pooled sensitivity and specificity for the presence of DR differentiating DN from NDRD among patients with type 2 diabetes mellitus and renal diseases were 0.65 and 0.75, respectively. Meanwhile, the pooled positive and negative predictive values were 0.72 and 0.69. In addition, DOR was 5.67 and AUC was 0.75. With regard to the subgroup analysis, DR had a higher predictive value in the screening studies than in those with selected patients based on biopsy criteria for type 1 diabetes mellitus. Moreover, predictive results of DR between prospective and retrospective studies were not very different. Our data also show that PDR (an advanced stage of DR) had a high pooled specificity (0.98) and high pooled positive predictive value (0.96), although the pooled sensitivity and negative predictive value were 0.25 and 0.48, respectively. Furthermore, the AUC of 0.99 represented good discrimination.

The strength of this analysis is that it is the first meta-analysis combining data from previous prospective and retrospective biopsy studies on the predictive accuracy of DR for the clinical differentiation of DN. Guidelines for diabetes and chronic kidney disease (CKD) summarised the predictive value of DR for diagnosis of diabetic kidney disease in biopsy studies until 2005 [3]. Estimates of renal disease in diabetic patients are defined by functional abnormalities, such as microalbuminuria. However, it is important to exclude NDRD because some cases require targeted therapy. Kidney biopsy is the gold standard method for identifying DN, but it cannot be performed on all patients because of contraindications. Furthermore, at least 24 h of observation are recommended after a percutaneous kidney biopsy to assess potential complications [41]. In contrast, assessment of DR is very convenient and is routinely performed as part of a physical examination in outpatient departments. DR and DN are both microvascular complications of diabetes, and some authors have identified correlations of anatomical measures between them. Retinopathy severity was found to be associated with renal anatomical measures when other risk factors were controlled for in patients with type 1 diabetes mellitus [42]. Although DR was found to be an important predictor, it is not known whether its presence can completely differentiate DN from NDRD. Along these lines, our previous study showed that the absence of DR together with a short duration of diabetes may be a useful indication for renal biopsy in patients with type 2 diabetes mellitus and overt proteinuria [34]. One study suggested that diabetes mellitus duration of >10 years together with retinopathy did not exclude NDRD in patients with type 2 diabetes mellitus [37]. A differential diagnostic model composed of five clinical indices (diabetes duration, systolic blood pressure, HbA1c, haematuria and DR) had an advantage in the clinical prediction of DN, with a sensitivity of 90.0% and a specificity of 92.0% [25]. The present studies indicated that DR alone had an imperfect predictive value, and that perhaps more clinical features should be confirmed to construct more precise diagnostic criteria for making a distinction between DN and NDRD.

The 26 studies identified for our meta-analysis varied in certain characteristics. For instance, studies included patients with microalbuminuria (20–200 mg/day) [21], macroalbuminuria with ranges defined as from >300 mg/day to >3 g/day [9, 14, 15, 2229, 31, 34, 3639], or not defined clearly. In addition, there was significant diversity in the methods used to assess DR, and only seven studies applied gold standards for DR screening [21, 22, 24, 27, 30, 36, 37, 43]. A third issue was the different categories of renal pathology between studies. Most studies divided patients into two groups (DN group and NDRD group). Otherwise, a near-normal renal structure was taken as NDRD in one study for analysis [21], and one study omitted a small proportion of overlapping cases (1.8%) and an ambiguous case [25], whereas in the other studies these overlapping cases were classified in the NDRD groups.

We also found significant heterogeneity, which may be explained by the following limitations. First, only articles published in English were included, although the Egger test did not indicate publication bias. Second, baseline risk factors were not standardised between studies. Of these, ethnic origin affected the susceptibility to DR development even after adjustment for other risk factors [44, 45], although DR showed no significant difference in diagnostic accuracy among the Asian, European and African-American populations (data not shown) in our analysis. The duration of diabetes, a strong risk factor for the development of DR, varied widely from 5 to 12 years in all the studies [46]. Moreover, hyperglycaemia, hypertension and dyslipidaemia have all been confirmed to have an effect on DR [4749]. However, individual patient data on these risk factors were not available to allow us to explore the heterogeneity in more detail. Third, our results showed that PDR was a high specific indicator for the diagnosis of DN. However, the findings should be interpreted cautiously because they were based on a small sample (169 patients).

In conclusion, current evidence suggests a potential role for DR in predicting DN in type 2 diabetes mellitus with renal disease. Although the overall test performance was not as high as expected, measuring DR may be considered useful for predicting DN in the light of its simplicity and non-invasiveness.