Abstract
Aims/hypothesis
Individuals with diabetes can be clustered into five subtypes using up to six routinely measured clinical variables. We hypothesised that circulating protein levels might be used to distinguish between these subtypes. We recently used five of these six variables to categorise 7017 participants from the Outcome Reduction with an Initial Glargine Intervention (ORIGIN) trial into these subtypes: severe autoimmune diabetes (SAID, n=241), severe insulin-deficient diabetes (SIDD, n=1594), severe insulin-resistant diabetes (SIRD, n=914), mild obesity-related diabetes (MOD, n=1595) and mild age-related diabetes (MARD, n=2673).
Methods
Forward-selection logistic regression models were used to identify a subset of 233 cardiometabolic protein biomarkers that were independent determinants of one subtype vs the others. We then assessed the performance of adding identified biomarkers (one after one, from the most discriminant to the least) to predict each subtype vs the others using area under the receiver operating characteristic curve (AUC ROC). Models were adjusted for age, sex, ethnicity, C-peptide level, diabetes duration and glucose-lowering medication usage at blood collection.
Results
A total of 25 biomarkers were independent determinants of subtypes, including 13 for SIDD, 2 for SIRD, 7 for MOD and 11 for MARD (all p<4.3 × 10−5). The performance of the biomarker sets (comprising 1 to 25 biomarkers), assessed through the AUC ROC, ranged from 0.611 to 0.734, 0.723 to 0.861, 0.672 to 0.742, and 0.651 to 0.751, for SIDD, SIRD, MOD and MARD, respectively. No biomarkers other than GAD antibodies were determinants of SAID.
Conclusions/interpretation
We identified 25 serum biomarkers, as independent determinants of type 2 diabetes subtypes, that could be combined into a diagnostic test for subtyping.
Trial registration
ORIGIN trial, ClinicalTrials.gov NCT00069784.
Graphical abstract
Introduction
Type 2 diabetes is a heterogenous condition with respect to the progression of dysglycaemia, occurrence of consequences, and response to therapies. Previous studies have grouped individuals with adult-onset type 2 diabetes into subtypes based on selected clinical and biological variables [1]. Several approaches have been proposed [2, 3, 4]. The most replicated approach used six clinical variables, including age at diagnosis, GADA, BMI, HbA1c, index of insulin resistance, and insulin secretion, and identified five different diabetes subtypes, so-called severe autoimmune diabetes (SAID), severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD) [5]. We previously replicated this analysis in the Outcome Reduction With Initial Glargine Intervention (ORIGIN) trial and showed that the incidence of renal outcomes and the response to insulin varied between the five diabetes subtypes [6]. Other studies also investigated the molecular characteristics of these particular clusters, and suggested they are associated with different biological pathways [7, 8, 9, 10, 11]. We hypothesised that circulating biomarkers might be used to distinguish between subtypes, and that a set of identified biomarkers might be used as a diagnosis tool for diabetes subtyping.
Methods
Study population
The design of the ORIGIN trial has been described previously [6]. Briefly, between 2003 and 2005, a total of 578 clinical sites in 40 countries enrolled 12,537 participants aged ≥50 years with established or newly detected diabetes, impaired glucose tolerance or impaired fasting glucose levels, and additional cardiovascular risk factors. Following random allocation of participants to two therapies using a factorial design (either one daily injection of insulin glargine or standard care; and either omega-3 fatty acid supplement or placebo), participants were monitored for a median duration of 6.2 years for cardiovascular events and other health outcomes. A subset of 8494 participants also consented to provide blood samples at baseline for further analyses.
Biomarker measurements
After completion of the ORIGIN trial, 1 ml of a baseline fasting frozen serum aliquot from each participant was transported to Myriad RBM (Austin, Texas, USA) to quantify 284 biomarkers per sample using Luminex technology (Austin, Texas, USA). Components of the biomarker panel were selected based on their role in physiological systems relevant to cardiometabolic disease (e.g. inflammation, coagulation, endothelial function, renal function, oxidative stress, adipocyte biology, angiogenesis, beta cell biology, tissue repair, lipid metabolism and iron metabolism) with the objective to identify biomarkers that could independently provide better estimates of the risk of future outcomes than could be estimated from routinely measured clinical and biochemical data alone [12]. A total of 237 biomarkers from 8401 participants were deemed suitable for analysis (electronic supplementary material [ESM] Table 1). Biomarkers that were not normally distributed were first log transformed [12]. All biomarkers were subsequently standardised to a mean of 0 and an SD of 1, except for biomarkers with a high proportion of low or undetectable concentrations, which were analysed as ordinal variables. Biomarkers for which the levels were equimolar to the C-peptide level, such as insulin, total proinsulin and active proinsulin were excluded from current analyses. A total of 7017 participants with established or newly diagnosed diabetes at baseline and available data for clustering, measures of C-peptide and 233 protein biomarkers were included in these analyses.
Diabetes subtypes
Using five clinical variables (GADA, age at diabetes diagnosis, BMI, HbA1c and fasting C-peptide level), we previously categorised these 7017 ORIGIN participants into the five subtypes [6]. The SAID subtype included participants who were positive for GADA (n=241). We used the sex-specific nearest centroid approach, using the coordinates of each predefined cluster (age at diagnosis, BMI, HbA1c and C-peptide) provided by Ahlqvist et al [5], to assign the remaining participants into the four subtypes, SIDD (n=1594), SIRD (n=914), MOD (n=1595), and MARD (n=2673). Baseline characteristics of each cluster have been described elsewhere [6] (ESM Table 2).
Statistical analyses
To identify biomarker serum concentrations that could independently determine one subtype from the others, we performed a comprehensive screening of the 233 biomarkers with available data, using forward-selection logistic regression models separately within each subtype (vs others), retaining significant biomarkers at p<0.05/(233 × 5)=4.3 × 10−5. The covariates age, sex, ethnicity, diabetes duration and glucose-lowering medication usage (coded as a categorical variable with 0 = no medication, 1 = metformin alone, 2 = sulfonylureas alone, 3 = combination of both metformin and sulfonylureas, or other glucose-lowering agents) at blood collection were included in the null model and constituted the minimally adjusted model. A second model, further adjusted for C-peptide level was performed to identify biomarkers that were independent from C-peptide levels. A third model, also adjusted for BMI was performed as a sensitivity analysis. Next, differences in the strength of association of significant and independent biomarkers, measured as the change in odds per biomarker SD level, were graphically represented in a hierarchical heatmap, which also enabled us to rank the biomarkers from the most to the least determinant for subtyping. Then, we assessed the performance of adding identified biomarkers (one after one, from the most discriminant to the least) to predict each type 2 diabetes subtype vs the others. The performance of the predictive models, including only identified biomarkers and without additional clinical covariates, were estimated for each subtype (vs others) using the area under the receiver operating characteristic curve (AUC ROC) [13, 14]. All statistical analyses were conducted using R software (R version 3.6.0 [15]; packages: StepReg v.1.4.1 [16], gplots v.3.0.3 [17] and PredictABEL v.1.2-4 [18]). Two-tailed p values <0.05 were considered statistically significant, with adjustments for multiple hypothesis testing applied, as appropriate.
Results
Through forward-selection models that included C-peptide level as a covariate, we identified 25 biomarkers among the 233 biomarkers tested that were significantly and independently associated with the four subtypes, SIDD, SIRD, MOD and MARD. No biomarkers were associated with the SAID subtype. Specifically, 13 biomarkers were independent determinants of SIDD, 2 for SIRD, 7 for MOD and 11 for MARD, including 5 biomarkers overlapping between subtypes (all p<4.3 × 10−5) (Table 1, ESM Tables 3–6). The 25 biomarkers were then introduced, one after one, from the most discriminant to the least (based on their effect size), into sets of biomarkers. The performance of these biomarker sets (comprising 1 to 25 biomarkers) in differentiating SIDD, SIRD, MOD or MARD from the others was assessed through AUC ROC, and ranged from 0.611 to 0.734, 0.723 to 0.861, 0.672 to 0.742, and 0.651 to 0.751, respectively (Fig. 1 and ESM Table 7). A value of 0.7 to 0.8 is considered acceptable, whereas 0.8 to 0.9 is considered excellent [19].
Discussion
This study identified a total of 25 independent circulating biomarkers that differentiate four type 2 diabetes subtypes. The performance of biomarker sets to distinguish one subtype from the others was only modest for SIDD, MOD and MARD, but was more impressive for SIRD, mostly driven by circulating leptin levels. No biomarker other than GADA detected SAID.
Similar to the reports from the IMI-Rhapsody study [8] and the Qatar Biobank study [10] (ESM Table 8), which both investigated associations of more than 1000 circulating proteins assayed through the aptamer technology, leptin was the most consistently replicated biomarker within the clusters, followed by pancreatic polypeptide, neuronal cell adhesion molecule (NCAM), fatty acid-binding protein adipocyte (FABP), C-reactive protein (CRP) and sex hormone-binding globulin (SHBG). Associations of haemopexin, matrix metalloproteinase (MMP)-7, creatine kinase (CK)-MB, IGF-binding protein (IGFBP)-1 and Galectin-3 with clusters replicated the results seen in the IMI-Rhapsody study, but not in the Qatar Biobank study. However, there is a notable heterogeneity across these cohorts with regards to population structure, biomarker assays and statistical methods, which limits the comparability between these studies. For instance, associations with methylglyoxal, myoglobin, IGFBP-2, alpha-1-microglobulin, gastric inhibitory polypeptide (GIP), pepsinogen I and MMP-3 were not assayed in the other studies, and therefore our findings would need further replication. While lower postprandial GIP levels have been associated with increased risk of type 2 diabetes [20], there is no large epidemiological study that has investigated the association between fasting GIP levels and type 2 diabetes risk. Finally, our model has revealed TNF receptor-1 (TNFR1) as a determinant of SIRD, although this result was not consistent with the finding from the IMI-Rhapsody study. Whether a change in TNFR1 is a marker of, or plays a causal role in, insulin resistance (and possibly kidney function, as suggested in previous reports [21]) remains uncertain. Finally, these results also suggest that, from a biological perspective, SIRD is a more distinguishable subtype than the others, given its higher level of prediction using biomarkers.
We acknowledge this study has some limitations. First, as the analysis was restricted to the biomarkers included in the assay panel, more comprehensive multiplex platforms may discover additional biomarkers. Second, despite its large sample size, this study may have been underpowered to detect associations with other biomarkers. Third, our biomarker panel was designed for disease risk prediction and could not provide extensive insights into disease pathophysiology. Fourth, cluster identification is dependent on the variables used as inputs for the clustering analyses. Thus, using circulating biomarkers as inputs for clustering analysis would undoubtedly identify clusters with different characteristics and disease trajectory than the clusters identified using five clinical variables. Finally, additional studies are needed for validation, further translation to clinical practice, and also to investigate whether identified biomarkers have a causal effect on type 2 diabetes consequences. In total, this study provides evidence that circulating biomarkers might be used as a diagnosis tool for type 2 diabetes subtyping, while there are needs for a better characterisation of type 2 diabetes to better predict its course, consequences and response to treatment.
Data availability
No additional data available.
Abbreviations
- AUC ROC:
-
Area under the receiver operating characteristic curve
- GIP:
-
Gastric inhibitory polypeptide
- IGFBP:
-
IGF-binding protein
- MARD:
-
Mild age-related diabetes
- MMP:
-
Matrix metalloproteinase
- MOD:
-
Mild obesity-related diabetes
- ORIGIN:
-
Outcome Reduction with Initial Glargine Intervention
- SAID:
-
Severe autoimmune diabetes
- SIDD:
-
Severe insulin-deficient diabetes
- SIRD:
-
Severe insulin-resistant diabetes
- TNFR1:
-
TNF receptor-1
References
Ahlqvist E, Prasad RB, Groop L (2021) 100 YEARS OF INSULIN: towards improved precision and a new classification of diabetes mellitus. J Endocrinol 252(3):R59–R70. https://doi.org/10.1530/JOE-20-0596
Udler MS, Kim J, von Grotthuss M et al (2018) Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: a soft clustering analysis. PLoS Med 15(9):e1002654. https://doi.org/10.1371/journal.pmed.1002654
Dennis JM, Shields BM, Henley WE, Jones AG, Hattersley AT (2019) Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data. Lancet Diabetes Endocrinol 7(6):442–451. https://doi.org/10.1016/S2213-8587(19)30087-7
Wagner R, Heni M, Tabák AG et al (2021) Pathophysiology-based subphenotyping of individuals at elevated risk for type 2 diabetes. Nat Med 27(1):49–57. https://doi.org/10.1038/s41591-020-1116-9
Ahlqvist E, Storm P, Käräjämäki A et al (2018) Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol 6(5):361–369. https://doi.org/10.1016/S2213-8587(18)30051-2
Pigeyre M, Hess S, Gomez MF et al (2021) Validation of the classification for type 2 diabetes into five subgroups: a report from the ORIGIN trial. Diabetologia 65:206–215. https://doi.org/10.1007/s00125-021-05567-4
Mansour Aly D, Dwivedi OP, Prasad RB et al (2021) Genome-wide association analyses highlight etiological differences underlying newly defined subtypes of diabetes. Nat Genet 53:1534–1542. https://doi.org/10.1038/s41588-021-00948-2
Slieker RC, Donnelly LA, Fitipaldi H et al (2021) Distinct molecular signatures of clinical clusters in people with type 2 diabetes: an IMI-RHAPSODY study. Diabetes 70(11):2683–2693. https://doi.org/10.2337/db20-1281
Herder C, Maalmi H, Strassburger K et al (2021) Differences in biomarkers of inflammation between novel subgroups of recent-onset diabetes. Diabetes 70(5):1198–1208. https://doi.org/10.2337/db20-1054
Zaghlool SB, Halama A, Stephan N et al (2022) Metabolic and proteomic signatures of type 2 diabetes subtypes in an Arab population. Nat Commun 13(1):7121. https://doi.org/10.1038/s41467-022-34754-z
Peng X, Huang J, Zou H et al (2022) Roles of plasma leptin and resistin in novel subgroups of type 2 diabetes driven by cluster analysis. Lipids Health Dis 21(1):7. https://doi.org/10.1186/s12944-022-01623-z
Gerstein HC, Paré G, McQueen MJ, Lee SF, Hess S, ORIGIN Trial Investigators (2017) Validation of the ORIGIN cardiovascular biomarker panel and the value of adding troponin I in dysglycemic people. J Clin Endocrinol Metab. https://doi.org/10.1210/jc.2017-00273
Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27(2):157–172; discussion 207-212. https://doi.org/10.1002/sim.2929
Kundu S, Aulchenko YS, van Duijn CM, Janssens ACJW (2011) PredictABEL: an R package for the assessment of risk prediction models. Eur J Epidemiol 26(4):261–264. https://doi.org/10.1007/s10654-011-9567-4
R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Li J, Lu X, Cheng K, Liu W (2020) StepReg: stepwise regression analysis. R package version 1.4.1. Available from https://CRAN.R-project.org/package=StepReg
Warnes GR, Bolker B, Bonebakker L et al (2020). gplots: various R programming tools for plotting data. R package version 3.0.3. Available from https://CRAN.R-project.org/package=gplots
Kundu S, Aulchenko YS, Janssens ACJW (2020) PredictABEL: assessment of risk prediction models. R package version 1.2-4. Available from https://CRAN.R-project.org/package=PredictABEL
Mandrekar JN (2010) Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol 5(9):1315–1316. https://doi.org/10.1097/JTO.0b013e3181ec173d
Guccio N, Gribble FM, Reimann F (2022) Glucose-dependent insulinotropic polypeptide-A postprandial hormone with unharnessed metabolic potential. Annu Rev Nutr 42:21–44. https://doi.org/10.1146/annurev-nutr-062320-113625
Liu C, Debnath N, Mosoyan G et al (2022) Systematic review and meta-analysis of plasma and urine biomarkers for CKD outcomes. J Am Soc Nephrol 33(9):1657–1672. https://doi.org/10.1681/ASN.2022010098
Acknowledgements
We are thankful to all the participants for contributing to this project. These data were presented as a poster (1102-P) at the 2022 American Diabetes Association meeting.
Authors’ relationships and activities
MP report no conflicts of interest. HG holds the McMaster-Sanofi Population Health Institute Chair in Diabetes Research and Care. He reports research grants from Eli Lilly and Company, AstraZeneca, Merck, Novo Nordisk and Sanofi; honoraria for speaking from AstraZeneca, Boehringer Ingelheim, Eli Lilly, Novo Nordisk, DKSH, Zuellig, Roche, Sanofi, Jiangsu Hanson and Carbon Brand; and consulting fees from Abbott, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Novo Nordisk, Sanofi, Kowa, Pfizer, Hanmi and Viatris. EA has received consulting fees from Novo Nordisk and support for research from AstraZeneca. SH is an employee and shareholder of Sanofi. GP has received consulting fees from Sanofi, Bristol Myers Squibb, Lexicomp and Amgen and support for research through his institution from Sanofi. The study funder was not involved in the design of the study; the collection and analysis of the report; and did not impose any restrictions regarding the publication of the report. An employee of the funder (SH) proposed to use the biomarker sub-study in the ORIGIN cohort for further diabetes subgroup classification and contributed to the interpretation and writing of the report.
Contribution statement
SH proposed to use the biomarker-substudy in the ORIGIN cohort for further diabetes subgroup classification. MP, GP and HG designed the study, planned the analyses, interpreted the results and wrote the manuscript. MP performed the statistical and bioinformatics analyses. EA contributed to analysis and interpretation of data. SH suggested including C-peptide in the biomarker panel and was coordinating the biomarker screen at Myriad RBM, Inc (Austin/TX). SH and EA substantially contributed to revise the manuscript. All authors contributed to the critical reading and revision of the manuscript. All authors have approved the submitted version of this manuscript. HG is the guarantor of this work.
Funding
The ORIGIN trial and biomarker project were supported by Sanofi. The biomarker project was led by ORIGIN investigators at the Population Health Research Institute (Hamilton, Canada) with the active collaboration of Sanofi scientists. Sanofi directly compensated Myriad RBM for measurement of the biomarker panel and the Population Health Research Institute for scientific, methodological and statistical work. MP was supported by the E.J. Moran Campbell Internal Career Research Award from McMaster University. GP is supported by the Canada Research Chair in Genetic and Molecular Epidemiology and the CISCO Professorship in Integrated Health Biosystems. EA is supported by the Swedish Research Council (2020-02191) and the Novo Nordisk foundation (NNF21OC0070457).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
ESM
(XLSX 224 kb)
Rights and permissions
About this article
Cite this article
Pigeyre, M., Gerstein, H., Ahlqvist, E. et al. Identifying blood biomarkers for type 2 diabetes subtyping: a report from the ORIGIN trial. Diabetologia 66, 1045–1051 (2023). https://doi.org/10.1007/s00125-023-05887-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00125-023-05887-7