figure b

Introduction

Type 2 diabetes is a heterogenous condition with respect to the progression of dysglycaemia, occurrence of consequences, and response to therapies. Previous studies have grouped individuals with adult-onset type 2 diabetes into subtypes based on selected clinical and biological variables [1]. Several approaches have been proposed [2, 3, 4]. The most replicated approach used six clinical variables, including age at diagnosis, GADA, BMI, HbA1c, index of insulin resistance, and insulin secretion, and identified five different diabetes subtypes, so-called severe autoimmune diabetes (SAID), severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD) [5]. We previously replicated this analysis in the Outcome Reduction With Initial Glargine Intervention (ORIGIN) trial and showed that the incidence of renal outcomes and the response to insulin varied between the five diabetes subtypes [6]. Other studies also investigated the molecular characteristics of these particular clusters, and suggested they are associated with different biological pathways [7, 8, 9, 10, 11]. We hypothesised that circulating biomarkers might be used to distinguish between subtypes, and that a set of identified biomarkers might be used as a diagnosis tool for diabetes subtyping.

Methods

Study population

The design of the ORIGIN trial has been described previously [6]. Briefly, between 2003 and 2005, a total of 578 clinical sites in 40 countries enrolled 12,537 participants aged ≥50 years with established or newly detected diabetes, impaired glucose tolerance or impaired fasting glucose levels, and additional cardiovascular risk factors. Following random allocation of participants to two therapies using a factorial design (either one daily injection of insulin glargine or standard care; and either omega-3 fatty acid supplement or placebo), participants were monitored for a median duration of 6.2 years for cardiovascular events and other health outcomes. A subset of 8494 participants also consented to provide blood samples at baseline for further analyses.

Biomarker measurements

After completion of the ORIGIN trial, 1 ml of a baseline fasting frozen serum aliquot from each participant was transported to Myriad RBM (Austin, Texas, USA) to quantify 284 biomarkers per sample using Luminex technology (Austin, Texas, USA). Components of the biomarker panel were selected based on their role in physiological systems relevant to cardiometabolic disease (e.g. inflammation, coagulation, endothelial function, renal function, oxidative stress, adipocyte biology, angiogenesis, beta cell biology, tissue repair, lipid metabolism and iron metabolism) with the objective to identify biomarkers that could independently provide better estimates of the risk of future outcomes than could be estimated from routinely measured clinical and biochemical data alone [12]. A total of 237 biomarkers from 8401 participants were deemed suitable for analysis (electronic supplementary material [ESM] Table 1). Biomarkers that were not normally distributed were first log transformed [12]. All biomarkers were subsequently standardised to a mean of 0 and an SD of 1, except for biomarkers with a high proportion of low or undetectable concentrations, which were analysed as ordinal variables. Biomarkers for which the levels were equimolar to the C-peptide level, such as insulin, total proinsulin and active proinsulin were excluded from current analyses. A total of 7017 participants with established or newly diagnosed diabetes at baseline and available data for clustering, measures of C-peptide and 233 protein biomarkers were included in these analyses.

Diabetes subtypes

Using five clinical variables (GADA, age at diabetes diagnosis, BMI, HbA1c and fasting C-peptide level), we previously categorised these 7017 ORIGIN participants into the five subtypes [6]. The SAID subtype included participants who were positive for GADA (n=241). We used the sex-specific nearest centroid approach, using the coordinates of each predefined cluster (age at diagnosis, BMI, HbA1c and C-peptide) provided by Ahlqvist et al [5], to assign the remaining participants into the four subtypes, SIDD (n=1594), SIRD (n=914), MOD (n=1595), and MARD (n=2673). Baseline characteristics of each cluster have been described elsewhere [6] (ESM Table 2).

Statistical analyses

To identify biomarker serum concentrations that could independently determine one subtype from the others, we performed a comprehensive screening of the 233 biomarkers with available data, using forward-selection logistic regression models separately within each subtype (vs others), retaining significant biomarkers at p<0.05/(233 × 5)=4.3 × 10−5. The covariates age, sex, ethnicity, diabetes duration and glucose-lowering medication usage (coded as a categorical variable with 0 = no medication, 1 = metformin alone, 2 = sulfonylureas alone, 3 = combination of both metformin and sulfonylureas, or other glucose-lowering agents) at blood collection were included in the null model and constituted the minimally adjusted model. A second model, further adjusted for C-peptide level was performed to identify biomarkers that were independent from C-peptide levels. A third model, also adjusted for BMI was performed as a sensitivity analysis. Next, differences in the strength of association of significant and independent biomarkers, measured as the change in odds per biomarker SD level, were graphically represented in a hierarchical heatmap, which also enabled us to rank the biomarkers from the most to the least determinant for subtyping. Then, we assessed the performance of adding identified biomarkers (one after one, from the most discriminant to the least) to predict each type 2 diabetes subtype vs the others. The performance of the predictive models, including only identified biomarkers and without additional clinical covariates, were estimated for each subtype (vs others) using the area under the receiver operating characteristic curve (AUC ROC) [13, 14]. All statistical analyses were conducted using R software (R version 3.6.0 [15]; packages: StepReg v.1.4.1 [16], gplots v.3.0.3 [17] and PredictABEL v.1.2-4 [18]). Two-tailed p values <0.05 were considered statistically significant, with adjustments for multiple hypothesis testing applied, as appropriate.

Results

Through forward-selection models that included C-peptide level as a covariate, we identified 25 biomarkers among the 233 biomarkers tested that were significantly and independently associated with the four subtypes, SIDD, SIRD, MOD and MARD. No biomarkers were associated with the SAID subtype. Specifically, 13 biomarkers were independent determinants of SIDD, 2 for SIRD, 7 for MOD and 11 for MARD, including 5 biomarkers overlapping between subtypes (all p<4.3 × 10−5) (Table 1, ESM Tables 36). The 25 biomarkers were then introduced, one after one, from the most discriminant to the least (based on their effect size), into sets of biomarkers. The performance of these biomarker sets (comprising 1 to 25 biomarkers) in differentiating SIDD, SIRD, MOD or MARD from the others was assessed through AUC ROC, and ranged from 0.611 to 0.734, 0.723 to 0.861, 0.672 to 0.742, and 0.651 to 0.751, respectively (Fig. 1 and ESM Table 7). A value of 0.7 to 0.8 is considered acceptable, whereas 0.8 to 0.9 is considered excellent [19].

Table 1 Biomarkers independently associated with each type 2 diabetes subtype
Fig. 1
figure 1

Diagnostic performance of the 25 identified biomarkers to distinguish one type 2 diabetes subtype from the others. (a) Hierarchical heatmap representing the strength of associations between the 25 identified biomarkers and each of the type 2 diabetes subtypes. Values are given as the change in odds per SD biomarker level in each subtype vs the others. Red corresponds to an increased level and blue to a decreased level; darker colour corresponds to a stronger relationship. Columns represent type 2 diabetes subtypes; rows represent biomarkers and are ordered from the most discriminant biomarkers (top) to the least (bottom) based on association strength. Dendrograms reflect the distance or similarity between rows (i.e. biomarkers) and columns (subtypes). (b) Graph depicting the AUC ROC estimates for each of the 25 predictive models (comprising 1 to 25 biomarkers, added one after one, from the most discriminant to the least) distinguishing each of the four subtypes (SIDD, SIRD, MOD and MARD) from the others. SAID is not represented as no circulating protein biomarkers were identified as determinant biomarker. The x-axis represents the number of biomarkers included in the predictive models (from 1 to 25 biomarkers); the y-axis represents the AUC ROC values for each predictive model

Discussion

This study identified a total of 25 independent circulating biomarkers that differentiate four type 2 diabetes subtypes. The performance of biomarker sets to distinguish one subtype from the others was only modest for SIDD, MOD and MARD, but was more impressive for SIRD, mostly driven by circulating leptin levels. No biomarker other than GADA detected SAID.

Similar to the reports from the IMI-Rhapsody study [8] and the Qatar Biobank study [10] (ESM Table 8), which both investigated associations of more than 1000 circulating proteins assayed through the aptamer technology, leptin was the most consistently replicated biomarker within the clusters, followed by pancreatic polypeptide, neuronal cell adhesion molecule (NCAM), fatty acid-binding protein adipocyte (FABP), C-reactive protein (CRP) and sex hormone-binding globulin (SHBG). Associations of haemopexin, matrix metalloproteinase (MMP)-7, creatine kinase (CK)-MB, IGF-binding protein (IGFBP)-1 and Galectin-3 with clusters replicated the results seen in the IMI-Rhapsody study, but not in the Qatar Biobank study. However, there is a notable heterogeneity across these cohorts with regards to population structure, biomarker assays and statistical methods, which limits the comparability between these studies. For instance, associations with methylglyoxal, myoglobin, IGFBP-2, alpha-1-microglobulin, gastric inhibitory polypeptide (GIP), pepsinogen I and MMP-3 were not assayed in the other studies, and therefore our findings would need further replication. While lower postprandial GIP levels have been associated with increased risk of type 2 diabetes [20], there is no large epidemiological study that has investigated the association between fasting GIP levels and type 2 diabetes risk. Finally, our model has revealed TNF receptor-1 (TNFR1) as a determinant of SIRD, although this result was not consistent with the finding from the IMI-Rhapsody study. Whether a change in TNFR1 is a marker of, or plays a causal role in, insulin resistance (and possibly kidney function, as suggested in previous reports [21]) remains uncertain. Finally, these results also suggest that, from a biological perspective, SIRD is a more distinguishable subtype than the others, given its higher level of prediction using biomarkers.

We acknowledge this study has some limitations. First, as the analysis was restricted to the biomarkers included in the assay panel, more comprehensive multiplex platforms may discover additional biomarkers. Second, despite its large sample size, this study may have been underpowered to detect associations with other biomarkers. Third, our biomarker panel was designed for disease risk prediction and could not provide extensive insights into disease pathophysiology. Fourth, cluster identification is dependent on the variables used as inputs for the clustering analyses. Thus, using circulating biomarkers as inputs for clustering analysis would undoubtedly identify clusters with different characteristics and disease trajectory than the clusters identified using five clinical variables. Finally, additional studies are needed for validation, further translation to clinical practice, and also to investigate whether identified biomarkers have a causal effect on type 2 diabetes consequences. In total, this study provides evidence that circulating biomarkers might be used as a diagnosis tool for type 2 diabetes subtyping, while there are needs for a better characterisation of type 2 diabetes to better predict its course, consequences and response to treatment.