Introduction

Second-generation sulfonylureas are a mainstay of type 2 diabetes mellitus (T2DM) management. Although their use is slowly declining1, this pharmacologic class remains the most common add-on to metformin2,3. Cardiovascular effects of sulfonylureas continue to be debated. Findings from the recently completed Cardiovascular Outcome Trial of Linagliptin Versus Glimepiride in Type 2 Diabetes (CAROLINA) trial4 may eventually assuage concerns of glimepiride-associated all-cause and cardiovascular death5 reported in prior clinical trials and meta-analyses6,7,8,9,10, but little is known of cardiovascular safety among users of other sulfonylureas. Further, despite pre-existing data on sulfonylureas’ effects on cardiac physiology11,12,13,14, neither CAROLINA nor Thiazolidinediones or Sulfonylureas Cardiovascular Accidents Intervention Trial (TOSCA.IT) provided, and Glycemia Reduction Approaches in Diabetes (GRADE) will not provide data on serious arrhythmogenic endpoints like sudden cardiac arrest (SCA) and ventricular arrhythmia (VA). Thus, there exists a major knowledge gap in the arrhythmogenic effects of individual second-generation sulfonylureas. We therefore set forth to assess the real-world comparative safety of glyburide, glimepiride, and glipizide by conducting two incident user cohort studies—in independent United States (US) populations of Medicaid and commercial health insurance beneficiaries—to elucidate associations with SCA and VA.

Our decision to ask this clinical question in independent healthcare claims datasets acknowledges the importance of replicability in pharmacoepidemiology15,16,17,18. A historic lack of transparency in reporting detailed study methods and key operational decisions has hindered direct (i.e., reproducing a study result using the same database) and conceptual (i.e., reproducing a study finding using a different database and subsequently a different population) replication16 and confused decision makers. We focused on conceptual replication because of its broader impact and potential to advance the T2DM evidence base16. Yet, conceptual replication can be challenging because of inherent differences in populations under study and/or databases used; data source choice can substantively affect results of nonexperimental population-based studies19. While examples of both randomized and nonrandomized studies with the same research question but conflicting results abound20,21,22,23,24, few have assessed probable causes of their inconsistencies. We therefore evaluated the same clinical question in independent populations of second-generation sulfonylurea users, compared this to our prior results on the topic25, and explored potential reasons for similarities and differences among findings.

Results

Baseline characteristics

We identified 268,094 glipizide users, 124,354 glimepiride users, and 231,958 glyburide users in Medicaid and 206,034, 151,229, and 134,677 users in Optum (Supplementary Tables 1 and 2). Users were similar in age across datasets (median = 57.8 years in Medicaid, 58.2 years in Optum) (Table 1). The majority of users in Medicaid (59.8%), but not Optum (43.9%), were female. Users were largely of white race in both Optum (56.9%) and Medicaid (34.6%). Substantial proportions of users had diagnoses of hypertension (57.9% in Medicaid, 60.5% in Optum), dyslipidemia (42.5% in Medicaid, 56.7% in Optum), ischemic heart disease (20.9% in Medicaid, 14.3% in Optum), and depression (24.6% in Medicaid, 13.9% in Optum). Small proportions had pre-existing cardiomegaly (5.8% in Medicaid, 3.0% in Optum), cardiac conduction disorders (2.0% in Medicaid, 1.6% in Optum), and congenital heart anomalies (1.3% in Medicaid, 0.4% in Optum). Episodes of serious hypoglycemia during baseline were uncommon (2.2% in Medicaid, 0.6% in Optum).

Table 1 Characteristics of second-generation sulfonylurea users in Medicaid and Optum.

Follow-up time and crude incidence rates

Among 201,183 person-years (p-y) of follow-up in Medicaid (median follow-up of 46 days per user), we identified 714 SCA/VAs (crude incidence = 3.55 per 1,000 p-y), 375 (52.5%) of which were fatal. Among 197,848 p-y of follow-up in Optum (median follow-up of 76 days per user), we identified 385 SCA/VAs (1.95 per 1,000 p-y). In secondary analyses restricted to the first 30 days of follow-up, we identified 325 (7.52 per 1,000 p-y) and 110 SCA/VAs (3.00 per 1,000 p-y) in Medicaid and Optum respectively.

Modeled findings

The propensity score included 482 covariates in Medicaid (55 predefined and 427 empirically identified by the high-dimensional propensity score [hdPS] algorithm) and 529 covariates in Optum (70 predefined and 459 empirically identified by the hdPS algorithm) (Supplementary Tables 35). Crude and adjusted hazard ratios (aHRs) are presented in Table 2. Notably, dataset-specific associations between glimepiride (vs. glipizide) and SCA/VA were on opposite sides of and could not exclude the null (aHRMedicaid 1.17, 95% confidence interval [CI] 0.96–1.42; aHROptum 0.84, 0.65–1.08). Similarly, dataset-specific associations between glyburide (vs. glipizide) and SCA/VA were on opposite sides of and could not exclude the null (aHRMedicaid 0.87, 0.74–1.03; aHROptum 1.11, 0.86–1.42). In the Medicaid-only analysis of the secondary outcome, glimepiride (vs. glipizide) was associated with an elevated risk of sudden cardiac death (SCD)/fatal VA (aHR 1.33, 1.02–1.75). Results from secondary analyses were generally consistent with the primary analysis (Supplementary Table 6) and we identified no dose-response relationships (Supplementary Table 7).

Table 2 Outcomes, incidence rates, and effect estimates for primary analysis.

Discussion

This study examined the comparative risk of SCA/VA among users of individual second-generation sulfonylureas in two independent US populations. The crude incidence rates of SCA/VA in the Medicaid (3.55 per 1,000 p-y) and Optum (1.95 per 1,000 p-y) populations are similar to those reported in other diabetic populations26,27 and higher than those reported in general populations27,28, potentially explained by the two to fourfold increase in risk of SCA with diabetes26,29. Although analyses of both populations found non-statistically significant differences in the risk of SCA/VA among users of individual second-generation sulfonylureas, the effect estimates were on opposite sides of the null. These results demonstrate the sensitivity of study findings to the specific data source used despite using similar study methods, and the importance of the conceptual replication of findings in multiple populations.

Differences in data availability between databases are an obvious potential source of discordant results. There were two notable data dimensions available in Optum but not in Medicaid. First, Optum provides laboratory results for a subset of beneficiaries. In order to more robustly characterize baseline T2DM severity and SCA/VA risk for Optum beneficiaries, we pre-specified relevant laboratory values to be included in the propensity scores (e.g., blood glucose, HbA1c, serum creatinine, hematocrit, and hemoglobin). Furthermore, we incorporated laboratory values as a data dimension in the hdPS algorithm, allowing for the empirical selection of laboratory values as potential proxies for unmeasured confounders. Laboratory findings were unavailable in Medicaid, and thus were not included as pre-specified or empirically identified covariates in the propensity score estimation. To understand the effect of incorporating laboratory results in the propensity score, we performed a post hoc sensitivity analysis in Optum in which we estimated the propensity score without pre-specified and empirically identified laboratory covariates. The study results were consistent with the primary analysis, suggesting that discrepancies between the Medicaid and Optum study findings were due to other causes. Second, socioeconomic variables (education level, housing, household income, and net worth of primary customer) were available for Optum but not Medicaid beneficiaries. We performed a post hoc sensitivity analysis removing these covariates (in addition to laboratory covariates) from the propensity score estimation; the results were unchanged.

Death dates were available for Medicaid but not Optum beneficiaries. This limited censoring on death to Medicaid analyses. Optum beneficiaries who died (for reasons other than the outcome of interest) would likely have been censored on their disenrollment date. This may be evident in differences in censoring reasons by dataset; censoring due to disenrollment occurred in 12.7% of Optum vs. 6.6% of Medicaid beneficiaries, though disenrollment for some beneficiaries may have been caused by reasons other than death. Optum beneficiaries under study may have had a lag between dates of death and disenrollment (e.g., if benefits terminate at the end of the calendar month in which a beneficiary dies). This may be reflected in longer observed median follow-up times among Optum vs. Medicaid beneficiaries (76 vs. 46 days). Exposure misclassification would be present during this lag period, since expired individuals cannot be sulfonylurea-exposed. This misspecification could have resulted in longer follow-up times and subsequently underestimated incidence rates of the primary outcome among Optum beneficiaries.

Each database represents a distinct underlying population with different characteristics that may contribute to disparate findings. Optum comprises privately-insured, employed individuals across the US, while Medicaid comprises publicly-insured low-income adults, elders, children, pregnant women, and persons with disabilities30. Thus, Medicaid beneficiaries tend to have more comorbidities and be more socioeconomically disadvantaged and overall vulnerable compared to commercially insured beneficiaries. Clinical characteristics of the two study populations indicate this; sulfonylurea users in Medicaid (vs. Optum) were more likely to have ischemic heart disease (20.9% vs. 14.3%), heart failure/cardiomyopathy (12.8% vs. 5.9%), cardiomegaly (5.8% vs. 3.0%), kidney disease (16.2% vs. 13.6%), and prior episodes of serious hypoglycemia (2.2% vs. 0.6%). These conditions are recognized as determinants of SCA risk in persons with DM11, therefore the dissimilar profiles of users may portend differential imbalance in unmeasured confounders and subsequently residual confounding. However, this should have been forestalled by use of data-adaptive hdPS methods.

International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes are commonly used to measure outcomes in healthcare database studies but have varying ability to validly capture the diagnoses they represent31. In order to most accurately capture our primary outcome, we used a validated algorithm for identifying outpatient-originating SCA/VA events. This algorithm was validated against medical records in a Medicaid population resulting in an overall positive predictive value (PPV) of 85.3%, and event-specific PPVs of 92.3% and 74.4% for SCA and VA respectively32. We can expect this algorithm to perform similarly in our Medicaid study but its transportability to a commercial claims dataset like Optum is undetermined, as the algorithm’s specificity and sensitivity is unknown. Since the algorithm’s overall PPV can change with a varied outcome prevalence or a varied distribution of outcome-defining events (i.e., proportion of SCA events vs. VA events), the algorithm may perform differently in Optum. In fact, both overall outcome prevalence (0.08% vs. 0.11%) and proportion of outcome-defining SCA events (59.7% vs. 79.0%) differed in Optum compared to Medicaid, suggesting an altered PPV for identifying SCA/VA in Optum. Varied PPVs between the two databases could have led to differing rates of outcome misclassification, a potential contributor to discrepant results. Additionally, the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) component of the algorithm, used in the Optum but not Medicaid analysis, has not been validated and may have contributed to a varied PPV. However, only 9% of total events in Optum were captured using ICD-10-CM codes, suggesting that this may not have been a principal driver of the discrepancy.

Other potential reasons for incongruous findings may be subtle; thus, assessing their impact is difficult. Formulary considerations, for example, could affect results if medication coverage restrictions were made for health-related reasons. Among second-generation sulfonylurea users under study, there were notably fewer glyburide users in Optum (27.4%) vs. Medicaid (37.1%). This imbalance may have been driven by differing coverage restrictions on glyburide between insurance plans, possibly due to glyburide’s increased comparative risk of serious hypoglycemia among second-generation sulfonylurea agents in elderly patients33. Such a restriction could have resulted in a channeling effect in which a specific subset of beneficiaries was less likely to be exposed to glyburide. Evidence of such formulary considerations, however, is not directly attainable from the data.

Notwithstanding differences in availability of laboratory, socioeconomic, and death data, Medicaid and Optum both provided data on medical diagnoses, procedures, and medication dispensings, allowing for similar methodologic approaches to be used in the two studies. Nonetheless, there may be differences in documentation completeness within each of these data dimensions, as only health services billed to the particular insurance plan were recorded. If beneficiaries of one insurance plan had fewer health services billed to that insurance (e.g., beneficiaries using secondary insurance coverage or paying out-of-pocket) than beneficiaries of the other plan, there would be differential capture of healthcare information between the two databases.

We previously examined the comparative risk of SCA/VA with individual second-generation sulfonylureas in 1999–2010 Medicaid data using similar methods25. Results were generally consistent with the current Medicaid study containing two more years of data, with only two minor differences. First, the primary outcome effect estimate for glyburide shifted slightly towards the null (aHR 0.82, 0.69–0.98 in prior study vs. 0.87, 0.74–1.03 in current study), no longer reaching statistical significance in the current study. This may have been due to differences in variables included in the propensity score model (e.g., the inclusion of the adapted Diabetes Complications Severity Index in the current but not prior study) or changes in prescribing patterns and confounding by indication introduced by the additional two years of data. Second, the secondary outcome effect estimate for glimepiride reached statistical significance in the current study but not in the prior study—potentially due to the increase in sample size. The general agreement between the two Medicaid studies further supports that discrepancies in the current findings may be attributed to differences between databases rather than issues with methodologic reproducibility.

Our studies have limitations. First, a lack of access to biosamples prevented examination of genetic determinants of SCA/VA risk. Second, adjustment for family history of diseases was under-ascertained due to reliance on diagnostic coding. Third, sulfonylurea exposure was defined using prescription dispensings and may not reflect ingestion. To partly address this, we conducted sensitivity analyses in which we modified grace periods between prescriptions. Fourth, outcomes may have been under-ascertained due to the inability to capture fatal events that did not result in hospital presentation, potentially biasing effect estimates towards the null. However, prior work suggests that 69–80% of persons experiencing an out-of-hospital cardiac arrest34,35 and up to 88% of persons experiencing a witnessed ventricular tachycardia survive to hospital admission36, although recent registry data from Cardiac Arrest Registry to Enhance Survival (CARES) suggests poorer survival-to-admission rates (18–49%, depending on presenting characteristics)37. Lastly, although we explored potential reasons for discrepancies in findings between databases, we were unable to directly confirm these effects on study results.

Healthcare claims databases contain both obvious and subtle differences in data availability, population characteristics, and documentation completeness. Our results demonstrate the potential impact of these differences on study findings, despite similarity in study design and analytic methodology. With a growing emphasis on the replication of pharmacoepidemiologic study findings in multiple datasets to inform regulatory decision making and clinical practice, future studies investigating the same clinical question may arrive at differing conclusions. Rather than discount these results, investigators should focus more effort on assessing the specific causes of discrepancies to better elucidate study findings.

Methods

Overview and study populations

We conducted two hdPS-adjusted cohort studies to determine rates of SCA/VA among second-generation sulfonylurea users aged 30–75 years. We excluded younger users because SCA/VA is rare and unlikely due to prescription drugs in this population38, and older users because competing comorbidities may mimic SCA/VA. The cohorts consisted exclusively of person-time exposed to glimepiride, glipizide, or glyburide. Data sources included demographic, enrollment, and healthcare claims from: 1) 1999–2012 Medicaid programs of California, Florida, New York, Ohio, and Pennsylvania (~40% of the national Medicaid population), supplemented with Medicare claims for dual-enrollees, and linked to the Social Security Administration Death Master File; and 2) 2000–2016 Optum Clinformatics commercial health insurance data, which includes >71 million commercially insured and Medicare Advantage beneficiaries of the largest US-based private health insurer by market share39. Optum, but not Medicaid, included laboratory results for a subset of individuals. Medicaid, but not Optum, included death dates.

Defining the cohort

We defined cohort entry upon incident use of a second-generation sulfonylurea, i.e., we required a preceding 12-month baseline period devoid of any sulfonylurea dispensing. We excluded observations with the following baseline events: (1) interruption in insurance plan enrollment; (2) any-setting SCA or VA diagnosis (broader than the outcome definition below, to minimize the inclusion of recurrent events); and (3) pregnancy, to avoid channeling bias (pregnant sulfonylurea-treated patients almost exclusively receive glyburide)40. We included only the first observation meeting these criteria, per user.

Follow-up began at cohort entry and continued until the first occurrence of a/an: (1) outcome (defined below); (2) diagnosis suggestive of the outcome, but otherwise failing to meet the outcome definition; (3) death (Medicaid only); (4) >15-day therapy gap for the cohort-defining sulfonylurea; (5) dispensing of a sulfonylurea different than that defining cohort entry (i.e., switching); 6) dispensing of a drug with a known risk of torsade de pointes41; (7) insurance plan disenrollment; (8) pregnancy; or (9) the end of the dataset. We did not censor upon a non-outcome hospitalization, but excluded hospitalized time from follow-up to minimize immeasurable time bias42.

Exposure and covariate ascertainments

We defined exposure by the second-generation sulfonylurea (glimepiride, glipizide, or glyburide) dispensed on the cohort entry date. We excluded sulfonylurea-unexposed individuals from the study. This allowed for direct comparisons between the three exposures of interest, and minimized the potential for selection bias and confounding by indication and other unmeasured patient characteristics, further improving the performance of hdPS43,44. We pre-specified glipizide as an active comparator referent, as animal models showed that it lacks a direct effect on myocardial contractility45,46,47,48,49.

Potential confounders included pre-specified variables and those identified via empiric methods, both of which informed the propensity score. Pre-specified variables included demographics, putative risk factors for SCA/VA, and measures of intensity of health care utilization (e.g., numbers of prescription drugs dispensed)50. Empiric variables included those identified during baseline via a high-dimensional data-adaptive approach51,52 which ranks and selects potential confounders based on their empiric associations with the exposure and outcome as described below.

Outcome ascertainment

The primary outcome was an incident outpatient-originating SCA/VA event precipitating hospital presentation. Outcomes were identified in emergency department or inpatient claims having at least one discharge diagnosis code of interest in the principal or first-listed position (indicative of the reason for presentation/admission) (Supplementary Table 8). The ICD-9-CM component of this algorithm was validated against primary medical records in a Medicaid population and had a PPV of ~85% for identifying outpatient-originating SCA/VA not due to extrinsic causes32. We did not study inpatient-originating SCA/VA because: (1) sulfonylureas are rarely used in the hospital setting; (2) in-hospital arrhythmogenic events are often attributable to causes other than ambulatory drug exposures; and (3) neither Medicaid nor Optum record inpatient drug exposures. The secondary outcome was the subset of primary outcomes that were fatal, i.e., SCA/VA events in which the person died on the day of or the day after the event occurrence (Medicaid only).

Statistical analysis

We calculated descriptive statistics for baseline variables, crude incidence rates, and unadjusted association measures, the latter via Cox proportional hazards models. We utilized a semi-automated, data-adaptive hdPS approach—an algorithm identifying proxies for important confounders53—to reduce the impact of measured and unmeasured potential confounders. We used pairwise hdPS to identify potential confounders for each sulfonylurea of interest vs. glipizide (the referent) and included all such empirically identified variables (and pre-specified variables) in a multinomial propensity score model. We first identified the 200 most prevalent diagnosis, procedure, and drug codes in each identified dimension to assess their associations with the sulfonylurea of interest (vs. glipizide) and with the outcome. We then used these associations to select the top 500 codes with the largest potential for confounding. Then, the union of all confounders arising from the two sets of 500 hdPS-identified variables (one for each sulfonylurea of interest vs. glipizide) was included in the multinomial PS, modeled using multinomial logistic regression53. Differences in selected covariates by exposure group were assessed using weighted conditional standardized differences54. We included propensity score and calendar year of cohort entry in each Cox proportional hazards regression outcome model, then calculated PS-adjusted conditional HRs and 95% CIs. Analyses were conducted using SAS version 9.4 (SAS Institute, Cary, NC). The institutional review board of the University of Pennsylvania approved this research.