Metformin and sulphonylurea (second‐ or third‐generation) combination therapy for adults with type 2 diabetes mellitus

Kasper S Madsen; Pernille Kähler; Lise Katrine Kähler; Sten Madsbad; Maria‐Inti Metzendorf; Bernd Richter; Bianca Hemmingsen

doi:10.1002/14651858.CD012368

Metformin and sulphonylurea (second‐ or third‐generation) combination therapy for adults with type 2 diabetes mellitus

Authors' declarations of interest

Version published: 26 September 2016 Version history

https://doi.org/10.1002/14651858.CD012368

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

To assess the effects of metformin and sulphonylurea (second‐ or third‐generation) combination therapy for adults with type 2 diabetes mellitus.

Background

A number of medical organisations have developed guidelines or recommendations for treatment of type 2 diabetes mellitus (T2DM). Most people with T2DM are initially recommended to reduce calorie intake and increase physical activity in order to improve glycaemic control (ADA 2016). However, in order to achieve and maintain specific glycaemic targets, the majority of people with T2DM will require pharmacological glucose‐lowering interventions. Metformin is currently the first‐line glucose‐lowering drug for people with T2DM because of its postulated benefits including absence of weight gain, or even weight loss, and lack of hypoglycaemia (Inzucchi 2012; Nathan 2009). If behavioural interventions like diet and exercise and maximum tolerated doses of one oral glucose‐lowering drug fail to achieve the glycaemic target, other glucose‐lowering drugs are often added (ADA 2016). As T2DM is a progressive condition, a substantial proportion of people with T2DM will, with time, require insulin. Some guidelines recommend to continue metformin in this situation (ADA 2016).

As described below, people with T2DM have an elevated risk of developing macrovascular as well as microvascular complications (Almdal 2004). In the treatment of people with T2DM, consideration has been given to which glycaemic target is appropriate in order to lower the risk of these complications. One hypothesis so far has been that lower glycosylated haemoglobin A1c (HbA1c) values are associated with reduced macrovascular and microvascular complications. However, this paradigm has been challenged by a Cochrane review investigating intensive glycaemic control compared to conventional glycaemic control in people with T2DM (Hemmingsen 2011). In this review, authors found insufficient information to confirm or exclude a risk reduction in macrovascular as well as microvascular complications with intensive compared to conventional glycaemic control.

Description of the condition

Worldwide, the number of people with diabetes was estimated to be 177 million in 2000 and is foreseen to rise to 366 million in 2030 (Wild 2004). T2DM comprises 90% of people with diabetes and is associated with excess bodyweight and physical inactivity (WHO 2015). T2DM is characterised by hyperglycaemia, insulin resistance and impaired insulin secretion (LeRoith 2002). Although the definition of T2DM relies on elevated blood glucose, T2DM occurs not in isolation but as part of a complex metabolic‐cardiovascular syndrome that includes dyslipidaemia, hypertension, obesity, clotting abnormalities, microalbuminuria and accelerated atherosclerosis, although not every one of these disorders occurs in every person with T2DM (DeFronzo 1999). People with T2DM have an elevated risk of developing macrovascular disease (such as cardiovascular death, myocardial infarction, stroke and peripheral ischaemia) as well as microvascular complications (such as retinopathy, nephropathy and neuropathy) (Almdal 2004).

Description of the intervention

Since the introduction of the sulphonylureas in the 1950s these glucose‐lowering drugs have been the mainstay in the treatment of T2DM. The first to be introduced on the market were the first‐generation sulphonylureas (acetohexamide, carbutamide, chlorpropamide, tolazamide and tolbutamide). Later, the second and third generations of sulphonylureas were introduced, and have now almost completely replaced the first‐generation sulphonylureas (Harrower 2000). The second‐generation sulphonylureas (e.g. glibenclamide (in the US: glybyride), glipizide and gliclazide) and the third‐generation sulphonylureas (gliclazide modified release (MR), glipizide gastrointestinal therapeutic system (GITS) and glimepiride) are thought to have a better safety profile (Harrower 2000). In the late 1950s the biguanide metformin was introduced as another glucose‐lowering drug (Bailey 1996).

Metformin is usually the first choice of a glucose‐lowering drug if diet and exercise are insufficient in controlling T2DM. However, in case of metformin intolerance or contraindications, sulphonylureas might be prescribed as monotherapy. Sulphonylureas are mostly prescribed as a part of combination therapy with other glucose‐lowering drugs, especially metformin (ADA 2016). All sulphonylureas are administered orally. The daily dose recommended in people with T2DM depends on the specific sulphonylurea compound. The sulphonylureas have different pharmacokinetic profiles due to different bindings to the sulphonylurea receptor in the pancreatic β‐cells. Chlorpropamide has a half‐life of 36 hours, whereas glimepiride has a half‐life of around 5 hours (McCall 2001). Metformin has an estimated plasma half‐life of 1.5 to 4.9 hours (Bailey 1996). Because of variations in the half‐life of the different sulphonylureas, some have to be taken once daily and others two or three times daily. For glimepiride, the recommended dose is up to 6 mg/d (Langtry 1998). For gliclazide the daily dose is 30 mg to 120 mg (Deacon 2015; Harrower 2000a). Metformin is likewise administered orally. Titration of metformin begins with a low dose (500 mg) taken once or twice per day with meals. The maximum recommended dose is up to 1000 mg twice daily (Nathan 2009).

Adverse effects of the intervention

All sulphonylureas have the potential to cause hypoglycaemia. The risk of hypoglycaemia differs between the different types of sulphonylureas and some agents like glibenclamide are prone to cause prolonged hypoglycaemia. The risk of hypoglycaemia is more pronounced for the first‐generation sulphonylureas than the newer generations of sulphonylureas (Harrower 2000). In people with T2DM receiving metformin, gastrointestinal adverse effects, including abdominal discomfort and diarrhoea, are the most common adverse events, occurring in 20% to 30% of patients. Because metformin does not increase insulin secretion, hypoglycaemia is uncommon in people with T2DM taking metformin monotherapy (DeFronzo 1999). Previously, metformin was considered to be contraindicated in many chronic conditions, due to an increased risk of lactic acidosis. However, a Cochrane review has concluded that there is no evidence that metformin is associated with an increased risk of lactic acidosis and the list of contraindications for metformin use should be reassessed (Salpeter 2010).

The University Group Diabetes Program (UGDP) trial suggested that tolbutamide was associated with adverse cardiovascular effects compared with placebo and insulin in people with T2DM (UGDP 1976). Later, other randomised clinical trials did not demonstrate clear evidence of an increased risk of cardiovascular events with sulphonylurea use compared with other glucose‐lowering drugs in people with T2DM (ADOPT 2006; UKPDS‐33 1998). Several observational studies have indicated increased mortality and risk of cardiovascular disease with sulphonylurea monotherapy compared with metformin monotherapy in people with T2DM (Morgan 2014; Roumie 2012; Schramm 2011). However, the risk seems to depend on the type of sulphonylurea (Pantalone 2012; Schramm 2011). Moreover, because of uncontrolled, undetected or both confounding factors in observational studies, the results of these studies have to be verified by RCTs (Deeks 2003).

A UKPDS substudy showed that the early addition of metformin in sulphonylurea‐treated patients was associated with an increased risk of mortality compared with continuation of sulphonylurea alone (UKPDS‐34 1998). Several observational studies have investigated the association between the combination of metformin and sulphonylureas and the risk of cardiovascular disease and mortality. Overall, these studies show conflicting results (Evans 2006; Gulliford 2004; Johnson 2002; Kahler 2007).

How the intervention might work

The primary mechanism of action for the sulphonylureas is to stimulate insulin release from the pancreatic β‐cells. Sulphonylureas increase pancreatic insulin release by closing of potassium‐sensitive adenosine triphosphate (P‐ATP) channels in the β‐cells (Harrower 2000; Scott 2012). Metformin is thought to increase insulin sensitivity which may lead to a variety of metabolic effects. Inhibition of hepatic glucose production (through increased hepatic sensitivity to insulin) is regarded as the principal mechanism through which metformin lowers blood glucose (Krentz 2005). The enzyme adenosine 5´‐monophosphate‐activated protein kinase (AMPK) has been identified as a target of the drug. Through phosphorylation of key proteins affecting energy production AMPK regulates and coordinates cellular glucose and lipid metabolism (Krentz 2005).

Why it is important to do this review

Several studies have investigated the combination therapy of metformin and sulphonylureas and the risk of cardiovascular disease and mortality (Evans 2006; Gulliford 2004; Johnson 2002; Kahler 2007; UKPDS‐34 1998). However, the data are primarily based on observational studies and show conflicting results. Therefore, it is still unclear whether metformin and sulphonylurea in combination increase the risk of cardiovascular disease and mortality. Guidelines suggest flexibility in choosing the next drug after metformin monotherapy failure (ADA 2016). It therefore remains to be clarified which drug class is the most suitable second line, since most people with T2DM will need a combination therapy over time in order to achieve glycaemic targets. This systematic review aims to evaluate whether sulphonylureas are the best choice of combination therapy with metformin.

Objectives

To assess the effects of metformin and sulphonylurea (second‐ or third‐generation) combination therapy for adults with type 2 diabetes mellitus.

Methods

Criteria for considering studies for this review

Types of studies

We will include randomised controlled clinical trials (RCTs).

Types of participants

Adults aged 18 years or older with type 2 diabetes mellitus (T2DM).

Diagnostic criteria for diabetes mellitus

In order to be consistent with changes in the classification of and diagnostic criteria for diabetes mellitus over the years, the diagnosis should be established using the standard criteria valid at the time of the trial commencing (for example ADA 2003; ADA 2008; WHO 1998). Ideally, the diagnostic criteria should have been described. We will use the trial authors' definition of diabetes mellitus if necessary. We plan to subject diagnostic criteria to a sensitivity analysis.

Types of interventions

We plan to investigate the following comparisons of intervention versus control/comparator.

Intervention

Metformin plus sulphonylurea (second‐ or third‐generation) combination therapy.

Comparator

Metformin plus another glucose‐lowering intervention as a combination therapy (e.g. metformin plus dipeptidylpeptidase‐4 inhibitor, metformin plus insulin).
Metformin plus placebo.
Metformin monotherapy.

Concomitant interventions will have to be the same in both the intervention and comparator groups to establish fair comparisons.

If a trial includes multiple arms, we will include any arm that meets the review's inclusion criteria.

Minimum duration of intervention

We will include trials with a minimum duration of intervention of 52 weeks. Because we are primarily interested in patient‐important outcomes, we focus on longer‐term trials due to the fact that macrovascular and microvascular complications develop over time.

Minimum duration of follow‐up

The minimum duration of follow‐up will be 52 weeks. Extended follow‐up periods (also called open‐label extension studies) defined as a follow‐up of participants once the original trial was terminated, as specified in the power calculation for this trial, are frequently of an observational nature and will only be evaluated for adverse events (Buch 2011; Megan 2012).

Summary of specific exclusion criteria

We will exclude combinations of more than two glucose‐lowering agents.
We will exclude people diagnosed with gestational diabetes.

Types of outcome measures

We will not exclude a trial only on the basis that one or several of our primary or secondary outcome measures were not reported in the publication. In case none of our primary or secondary outcomes was reported we will not include this trial but will provide some basic information in an additional table.

Primary outcomes

All‐cause mortality
Health‐related quality of life
Serious adverse events

Secondary outcomes

Cardiovascular mortality
Non‐fatal myocardial infarction
Heart failure
Non‐fatal stroke
Amputation of lower extremity
Blindness or severe vision loss
End‐stage renal disease
Non‐serious adverse events
Hypoglycaemia
Socio‐economic effects

Additional explorative outcomes

Weight
Glycosylated haemoglobin A1c (HbA1c)

Method and timing of outcome measurement

All‐cause mortality: defined as death from any cause and measured at any time after participants were randomised to intervention/comparator groups.
Health‐related quality of life: defined as mental and physical health‐related quality of life as separate and combined, evaluated by a validated instrument such as Short‐Form 36. Measured at the end of the intervention and at the end of follow‐up.
Serious adverse events: defined according to the International Conference on Harmonization Guidelines as any event that leads to death, that is life‐threatening, required in‐patient hospitalisation or prolongation of existing hospitalisation, resulted in persistent or significant disability, and any important medical event which may have jeopardised the patient or required intervention to prevent it (ICH 1997), or as reported in trials. Measured at any time after participants were randomised to intervention/comparator groups.
Cardiovascular mortality: defined as death from myocardial infarction, heart failure or stroke. Measured at the end of the intervention and at the end of follow‐up.
Non‐fatal myocardial infarction, heart failure, non‐fatal stroke, amputation of lower extremity, blindness or severe vision loss, hypoglycaemia (mild, moderate, serious): defined as reported in trials. Measured at the end of the intervention and at the end of follow‐up.
End‐stage renal disease: defined as dialysis, renal transplantation or death due to renal disease. Measured at the end of the intervention and at the end of follow‐up.
Non‐serious adverse events: defined as number of participants with any untoward medical occurrence not necessarily having a causal relationship with the intervention. Measured at any time after participants were randomised to intervention/comparator groups.
Weight and HbA1c: measured in kg and % at the end of the intervention and at the end of follow‐up.
Socio‐economic effects: for example costs of the intervention, absence from work, medication consumption. Measured at the end of the intervention and at the end of follow‐up.

Specification of key prognostic variables

Ethnicity
Obesity
Hypertension
Previous gestational diabetes
Age
Existing cardiovascular disease
Kidney disease

Summary of findings

We will present a 'Summary of findings' table to report the following outcomes, listed according to priority.

All‐cause mortality.
Cardiovascular mortality.
Serious adverse events (including serious/severe hypoglycaemia).
Non‐fatal stroke.
Non‐fatal myocardial infarction.
Microvascular complications (end‐stage renal disease, blindness or severe vision loss, amputation of lower extremity).
Health‐related quality of life.

Search methods for identification of studies

Electronic searches

Recently, the Agency for Healthcare Research and Quality (AHRQ) published a systematic review and meta‐analysis (Maruthur 2016) based on an extensive AHRQ report (Bolen 2016), in which the authors evaluated the comparative effectiveness and safety of glucose‐lowering interventions for people with T2DM, including metformin‐based combination therapies. This report included search results from several databases up to April 2015 and a further update of the MEDLINE search up to December 2015.

We will base our search on the results of this systematic AHRQ report and add new references identified by a revised search strategy from 2015 onwards, in the following literature databases. We will place no restrictions on the language of publication.

Cochrane Central Register of Controlled Trials (CENTRAL) via the Cochrane Register of Studies Online (CRSO).
MEDLINE Ovid (Epub Ahead of Print, In‐Process & Other Non‐Indexed Citations, Ovid MEDLINE(R) Daily and Ovid MEDLINE(R); from 1946 onwards).
Embase Ovid (from 1974 onwards).

Additionally we will search the following trials registers and will place no restrictions on the date of the record.

ClinicalTrials.gov.
World Health Organization International Clinical Trials Registry Platform (ICTRP) (www.who.int/trialsearch/).

We will continuously apply a MEDLINE (Ovid SP) email alert service to identify newly published trials using the same search strategy as described for MEDLINE (for details on search strategies, see Appendix 1). Should we identify new trials for inclusion, we will evaluate these, incorporate the findings into our review and resubmit another review draft (Beller 2013). If we detect additional relevant keywords during any electronic or other searches, we will modify the electronic search strategies to incorporate these terms and will document the changes to the search strategy.

Searching other resources

We will try to identify other potentially‐eligible trials or ancillary publications by searching the reference lists of included trials, systematic reviews, meta‐analyses and health technology assessment reports. In addition we will contact authors of included trials to identify any additional information on the retrieved trials and to determine if further trials exist, that we may have missed.

We will also search manufacturers' websites and the databases of regulatory agencies (European Medicines Agency (EMA), US Food and Drugs Administration (FDA)) (Hart 2012; Schroll 2015).

We will not use abstracts or conference proceedings for data extraction because this information source does not fulfil the Consolidated Standards of Reporting Trials (CONSORT) requirements which is "an evidence‐based, minimum set of recommendations for reporting randomized trials" (CONSORT; Scherer 2007).

Data collection and analysis

Selection of studies

Two review authors (KSM and PK or LK) will independently scan the abstract, title, or both, of every record we retrieve in the literature searches, to determine which trials we should assess further. We will obtain the full text of all potentially‐relevant records. We will resolve any disagreements through consensus or by recourse to an additional review author (BH). If we cannot resolve a disagreement, we will categorise the trial as a 'study awaiting classification' and contact the trial authors for clarification. We will present an adapted PRISMA flow diagram to show the process of trial selection (Liberati 2009).

Data extraction and management

For trials that fulfil inclusion criteria, two review authors (KSM and PK or LK) will independently extract key participant and intervention characteristics. We will report data on efficacy outcomes and adverse events using standard data extraction sheets from the Cochrane Metabolic and Endocrine Disorders (CMED) Group. We will resolve any disagreements by discussion or, if required, by consultation with an additional review author (BH).

We will provide information about potentially‐relevant ongoing trials including trial identifier in the 'Characteristics of ongoing trials' table and in a joint appendix 'Matrix of trial outcome (publications and trial documents)'. We will try to find the protocol for each included trial and will report primary, secondary and other outcomes in comparison with data in publications in a joint appendix.

We will email all authors of included trials to enquire whether they would be willing to answer questions regarding their trials. We will present the results of this survey in an appendix. We will thereafter seek relevant missing information on the trial from the primary author(s) of the article, if required.

Dealing with duplicate and companion publications

In the event of duplicate publications, companion documents or multiple reports of a primary trial, we will maximise the information yield by collating all available data and will use the most complete dataset aggregated across all known publications. We will list duplicate publications, companion documents, multiple reports of a primary trial and trial documents of included trials (such as trial registry information) as secondary references under the study identifier (ID) of the included trial. Furthermore, we will also list duplicate publications, companion documents, multiple reports of a trial and trial documents of excluded trials (such as trial registry information) as secondary references under the study ID of the excluded trial.

Data from clinical trial registers

In case data of included trials are available as study results in clinical trial registers such as ClinicalTrials.gov, we will make full use of this information and extract data. If there is also a full publication of the trial, we will collate and critically appraise all available data. If an included trial is marked as a completed study in a clinical trial register but no additional information is available, we will add this trial to the table 'Characteristics of studies awaiting classification'.

Assessment of risk of bias in included studies

Two review authors (KSM and PK or LK) will independently assess the risk of bias of each included trial. We will resolve any disagreements by consensus, or by consultation with an additional review author (BH). In cases of disagreement, we will consult the rest of the group and make a judgement based on consensus. If adequate information is not available from trial authors, trial protocols or both, we will contact trial authors for missing data on 'Risk of bias' items.

We will use the Cochrane 'Risk of bias' assessment tool (Higgins 2011a; Higgins 2011b) and will judge 'Risk of bias' criteria as either 'low', 'high', or 'unclear' risk. We will evaluate individual bias items as described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011a) where any of the specified criteria for a judgement of 'low', 'unclear' or 'high' risk of bias justifies the associated categorisation.

Random sequence generation (selection bias due to inadequate generation of a randomised sequence) ‐ assessment at trial level

We will describe for each included trial the method used to generate the allocation sequence in sufficient detail to allow an assessment of whether it should produce comparable groups.

Low risk of bias: sequence generation was achieved using computer random number generation or a random number table. Drawing of lots, tossing a coin, shuffling cards or envelopes, and throwing dice are adequate if performed by an independent person not otherwise involved in the trial. Use of the minimisation technique will be considered as equivalent to being random.
Unclear risk of bias: insufficient information about the sequence generation process.
High risk of bias: the sequence generation method was non‐random (e.g. sequence generated by odd or even date of birth; sequence generated by some rule based on date (or day) of admission; sequence generated by some rule based on hospital or clinic record number; allocation by judgement of the clinician; allocation by preference of the participant; allocation based on the results of a laboratory test or a series of tests; allocation by availability of the intervention).

Allocation concealment (selection bias due to inadequate concealment of allocation prior to assignment) ‐ assessment at trial level

We will describe for each included trial the method used to conceal allocation to interventions prior to assignment and will assess whether intervention allocation could have been foreseen in advance of, or during recruitment, or changed after assignment.

Low risk of bias: central allocation (including telephone, interactive voice‐recorder, web‐based and pharmacy‐controlled randomisation); sequentially‐numbered drug containers of identical appearance; sequentially‐numbered, opaque, sealed envelopes.
Unclear risk of bias: insufficient information about the allocation concealment.
High risk of bias: using an open random allocation schedule (e.g. a list of random numbers); assignment envelopes were used without appropriate safeguards; alternation or rotation; date of birth; case record number; any other explicitly unconcealed procedure.

We will also evaluate trial baseline data to incorporate assessment of baseline imbalance into the 'Risk of bias' judgement for selection bias (Corbett 2014). Chance imbalances might also affect judgements on the risk of attrition bias. In case of unadjusted analyses we will distinguish between studies rated as at low risk of bias on the basis of both randomisation methods and baseline similarity, and studies rated as at low risk of bias on the basis of baseline similarity alone (Corbett 2014). We will reclassify judgements of unclear, low or high risk of selection bias as specified in Appendix 2.

Blinding of participants and study personnel (performance bias due to knowledge of the allocated interventions by participants and personnel during the trial) ‐ assessment at outcome level

We will evaluate the risk of detection bias separately for each outcome (Hróbjartsson 2013). We will note whether outcomes were self‐reported, investigator‐assessed or adjudicated outcome measures (see below).

Low risk of bias: blinding of participants and key study personnel ensured, and unlikely that the blinding could have been broken; no blinding or incomplete blinding, but the review authors judge that the outcome is not likely to be influenced by lack of blinding.
Unclear risk of bias: insufficient information about the blinding of participants and study personnel; the trial did not address this outcome.
High risk of bias: no blinding or incomplete blinding, and the outcome is likely to be influenced by lack of blinding; blinding of trial participants and key personnel attempted, but likely that the blinding could have been broken, and the outcome is likely to be influenced by lack of blinding.

Blinding of outcome assessment (detection bias due to knowledge of the allocated interventions by outcome assessment) ‐ assessment at outcome level

Low risk of bias: blinding of outcome assessment ensured, and unlikely that the blinding could have been broken; no blinding of outcome assessment, but the review authors judge that the outcome measurement is not likely to be influenced by lack of blinding.
Unclear risk of bias: insufficient information about the blinding of outcome assessors; the trial did not address this outcome.
High risk of bias: no blinding of outcome assessment, and the outcome measurement is likely to be influenced by lack of blinding; blinding of outcome assessment, but likely that the blinding could have been broken, and the outcome measurement is likely to be influenced by lack of blinding.

Incomplete outcome data (attrition bias due to amount, nature or handling of incomplete outcome data) ‐ assessment at outcome level

We will describe for each included trial, and for each outcome, the completeness of data including attrition and exclusions from the analysis. We will state whether attrition and exclusions were reported and the number included in the analysis at each stage (compared with the number of randomised participants per intervention/comparator groups), if reasons for attrition or exclusion were reported, and whether missing data were balanced across groups or were related to outcomes. We will consider the implications of missing outcome data per outcome such as high drop‐out rates (e.g. above 15%) or disparate attrition rates (e.g. difference of 10% or more between trial arms).

Low risk of bias: no missing outcome data; reasons for missing outcome data unlikely to be related to true outcome (for survival data, censoring unlikely to be introducing bias); missing outcome data balanced in numbers across intervention groups, with similar reasons for missing data across groups; for dichotomous outcome data, the proportion of missing outcomes compared with observed event risk not enough to have a clinically‐relevant impact on the intervention effect estimate; for continuous outcome data, plausible effect size (difference in means or standardised difference in means) among missing outcomes not enough to have a clinically‐relevant impact on observed effect size; appropriate methods, such as multiple imputation, were used to handle missing data.
Unclear risk of bias: insufficient information to assess whether missing data in combination with the method used to handle missing data were likely to induce bias; the trial did not address this outcome.
High risk of bias: reason for missing outcome data likely to be related to true outcome, with either imbalance in numbers or reasons for missing data across intervention groups; for dichotomous outcome data, the proportion of missing outcomes compared with observed event risk enough to induce clinically‐relevant bias in intervention effect estimate; for continuous outcome data, plausible effect size (difference in means or standardised difference in means) among missing outcomes enough to induce clinically‐relevant bias in observed effect size; ‘as‐treated’ or similar analysis done with substantial departure of the intervention received from that assigned at randomisation; potentially inappropriate application of simple imputation.

Selective reporting (reporting bias due to selective outcome reporting) ‐ assessment at trial level

We will assess outcome reporting bias by integrating the results of the appendix 'Matrix of trial outcomes (publications and trial documents)' (Boutron 2014; Mathieu 2009), with those of the appendix 'High risk of outcome reporting bias according to ORBIT classification' (Kirkham 2010). This analysis will form the basis for the judgement of selective reporting.

Low risk of bias: the trial protocol is available and all of the trial’s prespecified (primary and secondary) outcomes that are of interest in the review have been reported in the prespecified way; the study protocol is not available but it is clear that the published reports include all expected outcomes (ORBIT classification).
Unclear risk of bias: insufficient information about selective reporting.
High risk of bias: not all of the trial’s prespecified primary outcomes have been reported; one or more primary outcomes is reported using measurements, analysis methods or subsets of the data (e.g. subscales) that were not prespecified; one or more reported primary outcomes were not prespecified (unless clear justification for their reporting is provided, such as an unexpected adverse effect); one or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta‐analysis; the trial report fails to include results for a key outcome that would be expected to have been reported for such a trial (ORBIT classification).

Other bias (bias due to problems not covered elsewhere) ‐ assessment at trial level

Low risk of bias: the trial appeared to be free of other sources of bias.
Unclear risk of bias: insufficient information to assess whether an important risk of bias existed; insufficient rationale or evidence that an identified problem introduced bias.
High risk of bias: had a potential source of bias related to the specific trial design used; has been claimed to have been fraudulent; had some other serious problem.

We will present a 'Risk of bias' graph and a 'Risk of bias' summary figure.

We will distinguish between self‐reported, investigator‐assessed and adjudicated outcome measures.

We define the following outcomes as self‐reported.

Health‐related quality of life.
Non‐serious adverse events.
Hypoglycaemia, if reported by participants.
Weight, if reported by participants.

We define the following outcomes as investigator‐assessed.

All‐cause mortality.
Serious adverse events.
Cardiovascular mortality.
Non‐fatal myocardial infarction.
Heart failure.
Non‐fatal stroke.
Amputation of lower extremity.
Blindness or severe vision loss.
End‐stage renal disease.
Hypoglycaemia, if measured by trial personnel.
Socio‐economic effects.
Weight, if measured by trial personnel.
HbA1c.

Summary assessment of risk of bias

Risk of bias for a trial across outcomes: some risk of bias domains, such as selection bias (sequence generation and allocation sequence concealment), affect the risk of bias across all outcome measures in a trial. In case of high risk of selection bias, all endpoints investigated in the associated trial will be marked as high risk. Otherwise, we will not perform a summary assessment of the risk of bias across all outcomes for a trial.

Risk of bias for an outcome within a trial and across domains: we will assess the risk of bias for an outcome measure by including all entries relevant to that outcome (i.e. both trial‐level entries and outcome‐specific entries). We consider low risk of bias to denote a low risk of bias for all key domains, unclear risk to denote an unclear risk of bias for one or more key domains and high risk to denote a high risk of bias for one or more key domains.

Risk of bias for an outcome across trials and across domains: these are our main summary assessments that we will incorporate into our judgements about the quality of evidence in the 'Summary of findings' tables. We define outcomes as at low risk of bias when most information comes from trials at low risk of bias, unclear risk when most information comes from trials at low or unclear risk of bias, and high risk when a sufficient proportion of information comes from trials at high risk of bias.

Measures of treatment effect

When at least two trials are available for a comparison and a given outcome we will try to express dichotomous data as risk ratio (RR) or odds ratio (OR) with 95% confidence interval (CI). For continuous outcomes measured on the same scale (e.g. weight loss in kg) we will estimate the intervention effect using the mean difference (MD) with 95% CI. For continuous outcomes measuring the same underlying concept (e.g. health‐related quality of life) but using different measurement scales, we will calculate the standardised mean difference (SMD). We will express time‐to‐event data as hazard ratio (HR) with 95% CI.

The scales measuring health‐related quality of life may go in different directions. Some scales increase in values with improved health‐related quality of life, whereas other scales decrease in values with improved health‐related quality of life. To adjust for the different directions of the scales, scales reporting better health‐related quality of life with decreasing values will be multiplied by ‐ 1.

Unit of analysis issues

We will take into account the level at which randomisation occurred, such as cross‐over trials, cluster‐randomised trials and multiple observations for the same outcome. If more than one comparison from the same trial is eligible for inclusion in the same meta‐analysis, we will either combine groups to create a single pair‐wise comparison or appropriately reduce the sample size so that the same participants do not contribute multiply (splitting the 'shared' group into two or more groups). While the latter approach offers some solution to adjusting the precision of the comparison, it does not account for correlation arising from the same set of participants being in multiple comparisons (Higgins 2011a).

We will attempt to reanalyse cluster randomised trials that have not appropriately adjusted for potential clustering of participants within clusters in their analysis.The variance of the intervention effects will be inflated by a design effect (DEFF). Calculation of a DEFF involves estimation of an intra‐cluster correlation (ICC). Estimates of ICCs will be obtained through contact with authors, or imputed using estimates from other included studies that report ICCs, or using external estimates from empirical research (e.g. Bell 2013). We plan to examine the impact of clustering using sensitivity analyses.

Dealing with missing data

If possible, we will obtain missing data from trial authors and will carefully evaluate important numerical data such as screened, randomly‐assigned participants as well as intention‐to‐treat (ITT), and as‐treated and per‐protocol populations. We will investigate attrition rates (e.g. drop‐outs, losses to follow‐up, withdrawals), and we will critically appraise issues concerning missing data and imputation methods (e.g. last observation carried forward (LOCF)).

In trials where the standard deviation of the outcome is not available at follow‐up or cannot be recreated, we will standardise by the average of the pooled baseline standard deviation from those trials in which this information was reported.

Where included trials do not report means and standard deviations (SDs) for outcomes and we do not receive the needed information from trial authors, we will impute these values by estimating the mean and variance from the median, range, and the size of the sample (Hozo 2005).

We will investigate the impact of imputation on meta‐analyses by performing sensitivity analyses, and we will report per outcome which trials were included with imputed SDs.

Assessment of heterogeneity

In the event of substantial clinical or methodological heterogeneity, we will not report trial results as the pooled effect estimate in a meta‐analysis.

We will identify heterogeneity (inconsistency) by visually inspecting the forest plots and by using a standard Chi² test with a significance level of α = 0.1. In view of the low power of this test, we will also consider the I² statistic, which quantifies inconsistency across trials to assess the impact of heterogeneity on the meta‐analysis (Higgins 2002; Higgins 2003); an I² statistic ≥ 75% indicates a considerable level of heterogeneity (Higgins 2011a).

When we find heterogeneity, we will attempt to determine possible reasons for it by examining individual trial and subgroup characteristics.

Assessment of reporting biases

If we include 10 or more trials investigating a particular outcome, we will use funnel plots to assess small‐trial effects. Several explanations may account for funnel plot asymmetry, including true heterogeneity of effect with respect to trial size, poor methodological design (and hence bias of small trials) and publication bias. Therefore we will interpret results carefully (Sterne 2011).

Data synthesis

We plan to undertake (or display) a meta‐analysis only if participants, interventions, comparisons and outcomes are judged to be sufficiently similar to ensure an answer that is clinically meaningful. Unless good evidence shows homogeneous effects across trials, we will primarily summarise low risk of bias data using a random‐effects model (Wood 2008). We will interpret random‐effects meta‐analyses with due consideration to the whole distribution of effects, ideally by presenting a prediction interval (Higgins 2009). A prediction interval specifies a predicted range for the true treatment effect in an individual trial (Riley 2011). For rare events such as event rates below 1% we will use Peto's odds ratio method, provided that there is no substantial imbalance between intervention and comparator group sizes and intervention effects are not exceptionally large. In addition, we will also perform statistical analyses according to the statistical guidelines presented in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011a).

Trial sequential analyses

In a single trial, sparse data and interim analyses increase the risk of type I and type II errors. To avoid type I errors, group sequential monitoring boundaries are applied to decide whether a trial could be terminated early because of a sufficiently small P value, that is the cumulative Z‐curve crosses the monitoring boundaries (Lan 1983). Likewise, before reaching the planned sample size of a trial, the trial may be stopped due to futility if the cumulative Z‐score crosses the futility monitoring boundaries (Higgins 2011a). Sequential monitoring boundaries for benefit, harm, or futility can be applied to meta‐analyses as well, called trial sequential monitoring boundaries (Higgins 2011; Wetterslev 2008). In a trial sequential analysis (TSA), the addition of each trial in a cumulative meta‐analysis is regarded as an interim meta‐analysis and helps to clarify if significance is reached or futility is reached or whether additional trials are needed (Wetterslev 2008).

TSA combines a calculation of the diversity‐adjusted required information size (cumulated meta‐analysis sample size to detect or reject a specific relative intervention effect) for meta‐analysis with the threshold of data associated with statistics. We will perform TSA on all outcomes included in the 'Summary of findings' table (see above) (Pogue 1997; Wetterslev 2008).

The idea in TSA is that if the cumulative Z‐curve crosses the boundary for benefit or harm before a diversity‐adjusted required information size is reached, a sufficient level of evidence for the anticipated intervention effect has been reached with the assumed type I error and no further trials may be needed. If the cumulative Z‐curve crosses the boundary for futility before a diversity‐adjusted required information size is reached, the assumed intervention effect can be rejected with the assumed type II error and no further trials may be needed. If the Z‐curve does not cross any boundary, then there is insufficient evidence to reach a conclusion. To construct the trial sequential monitoring boundaries, the required information size is needed and is calculated as the least number of participants needed in a well‐powered single trial and subsequently adjusted for diversity among the included trials in the meta‐analysis (Wetterslev 2008). We will apply TSA as it decreases the risk of type I and II errors due to sparse data and multiple updating in a cumulative meta‐analysis, and it provides us with important information in order to estimate the risks of imprecision when the required information size is not reached. Additionally, TSA provides important information regarding the need for additional trials and the required information size of such trials (Wetterslev 2008).

We will apply trial sequential monitoring boundaries according to an estimated clinical important effect. The required information size based will be based on an a priori effect corresponding to a 10% relative risk reduction for beneficial effects of the intervention and a 30% relative risk increase for harmful effects of the interventions.

TSA for continuous outcomes will be performed with mean differences, by using the trials applying the same scale to calculate the required sample size. For the continuous outcomes we will test the evidence for the achieved differences in the cumulative meta‐analyses.

For the heterogeneity adjustment of the required information size we will use the diversity (D²) estimated in the meta‐analyses of included trials. In case diversity is zero in a meta‐analysis we will perform a sensitivity analysis using an assumed diversity of 20% when future trials are included, possibly changing future heterogeneity among trials.

Quality of evidence

We will present the overall quality of the evidence for each outcome specified under 'Types of outcome measures: Summary of findings' according to the GRADE approach, which takes into account issues related not only to internal validity (risk of bias, inconsistency, imprecision, publication bias) but also to external validity, such as directness of results. Two review authors (KSM and PK or LK) will independently rate the quality of evidence for each outcome. We will present a summary of the evidence in a 'Summary of findings' table. This will provide key information about the best estimate of the magnitude of the effect, in relative terms and as absolute differences, for each relevant comparison of alternative management strategies, numbers of participants and trials addressing each important outcome and rating of overall confidence in effect estimates for each outcome. We will create the 'Summary of findings' table based on the methods described in the Cochrane Handbook for Systematic Reviews of Interventions by means of Review Manager's table editor (RevMan 2014). We will include an appendix titled 'Checklist to aid consistency and reproducibility of GRADE assessments' (Meader 2014) to help with standardisation of the 'Summary of findings' tables (Higgins 2011a). Alternatively, we will use the GRADEpro Guideline Development Tool (GDT) software (GRADEproGDT 2015) and present evidence profile tables as an appendix. We will present results for the outcomes as described in the Types of outcome measures section. If meta‐analysis is not possible, we will present the results in a narrative format in the 'Summary of findings' table. We will justify all decisions to downgrade the quality of studies using footnotes, and we will make comments to aid the reader's understanding of the Cochrane review where necessary.

Subgroup analysis and investigation of heterogeneity

We expect the following characteristics to introduce clinical heterogeneity, and we plan to carry out the following subgroup analyses including investigation of interactions (Altman 2003).

Trials with a long duration (≥ 2 years) versus trials with a short duration (< 2 years).
Trials including obese participants versus trials including non‐obese participants.

Sensitivity analysis

We plan to perform sensitivity analyses to explore the influence of the following factors (when applicable) on effect sizes, by restricting the analysis to the following.

Published trials.
Taking into account risk of bias, as specified in the 'Assessment of risk of bias in included studies' section.
Very long or large trials to establish the extent to which they dominate the results.
Trials using the following filters: diagnostic criteria, imputation, language of publication, source of funding (industry versus other), or country.

We will also test the robustness of results by repeating the analyses using different measures of effect size (RR, OR, etc) and different statistical models (fixed‐effect and random‐effects models).

Cochrane Review language

Website language