Introduction

In the area of biological drug treatments for patients with psoriasis, several comparative studies have already dealt with the effectiveness of these agents [15], but no such studies have been conducted about their safety [6].

In the present study, we examined the data of comparative safety in patients with moderate to severe psoriasis obtained from randomized trials evaluating adalimumab, ustekinumab and etanercept. We applied Bayesian network meta-analysis to synthesize this information, determine the statistical significance of differences between active treatments, and rank the treatments according to safety end-points.

Methods

Clinical Material

The clinical material for our analysis was derived from published randomized trials in which adalimumab, ustekinumab or etanercept were used to treat patients with moderate to severe psoriasis. Only dosages compatible with the summary of product characteristics were considered. Two safety end-points (or adverse events, AE) were evaluated: (a) any serious AE, and (b) any infectious AE.

Literature Search

The literature search, based on PubMed, covered the last 10 years. Only randomized controlled trials (RCTs) (according to PubMed definitions) evaluating the safety of adalimumab, ustekinumab and etanercept were eligible for our analysis. The search terms “(ustekinumab OR adalimumab OR etanercept) AND safety” were used.

Data Synthesis and Analysis

We employed Bayesian network meta-analysis [79]. In the field of direct and indirect comparisons, this “all-in-one” approach is increasingly being used and can now be considered the current standard. As compared with the traditional frequentist approach [9], the Bayesian method demonstrates one main advantage in that all treatments included in the comparisons are incorporated into a single model. In contrast, in most frequentist approaches (e.g. the Bucher method [9]) there are as many separate analyses as the number of comparisons being studied. Another advantage of the Bayesian approach is that this technique enables rank ordering of the treatments concerned. As opposed to traditional confidence intervals adopted in frequentist analysis, the Bayesian output reports credible intervals, which can be directly interpreted as the probability of an event residing in the reported range.

The Bayesian analysis involves a formal combination of a prior probability distribution that reflects a prior belief of the possible values of the effect of interest, and the likelihood distribution of the effect based on the observed data, to obtain a posterior distribution. In the absence of real data, prior probabilities are assigned by using vague, flat or non-informative priors (that are generally small numbers between 0 and 3).

The Bayesian model adopted for our analysis [79] has been developed by the NICE Support Unit (UK) and is available as fixed-effect model and random-effect model (WINBUGS software). Both employ a random sequence of chains, called the Markov chain Monte Carlo simulation. Each chain must be run for a length of time sufficient to allow model convergence (burn-in) before estimating posterior probabilities. We run the fixed-effect model using the binary outcome of how many AE (any serious AE and any infectious AE) in each arm of each study occurred. Randomization within each study was preserved by specifying each arm in each study separately, thus accounting for the effect of the comparator.

We planned to run both the fixed-effect model and the random-effect model and to choose the best one for our purposes on the basis of the deviance information criterion (which is a sort of goodness-of-fit test implemented in the WINBUGS software). Results were presented as risk difference (RD). We accounted for heterogeneity among studies by applying meta-regression techniques and by consequently generating an index of heterogeneity.

Both direct comparisons and indirect comparisons were considered. The values of RD were associated with their respective 2.5–97.5 % credible interval (i.e. 95 % credible interval), that reflects a formal level of statistical significance at 5 %. Direct comparisons are those for which at least a single clinical trial was available while indirect comparisons are those for which no ‘real’ trial has been done. Finally, as a sensitivity analyses, we changed the initial values from which each Markov chain Monte Carlo simulation began, as is customary in the Bayesian framework [79].

Recent advances in computing power and the development of sophisticated software have greatly facilitated the use of Bayesian statistics. All of our analyses were conducted by using the software package WinBUGS 1.4.3 (Cambridge, UK) in combination with the meta-analysis code developed by the National Institute for Health and Care Excellence [10].

Results

Literature Search and Identification of Included Studies

Our literature search, which is summarized in Fig. S1, extracted a total of 192 citations. For a further scrutiny of the material eligible for our analysis, we examined the full text of 20 articles. After examining these papers, we selected a total of 13 RCTs that met our inclusion criteria. Of these studies, three evaluated adalimumab [1113], five ustekinumab (45 and 90 mg) [1418], four low-dose and high-dose etanercept [1922] and one high-dose etanercept and ustekinumab (45 and 90 mg) [23]. All of these trials adopted a double-blind design and analysed the safety of these treatments in terms of any serious AE or any infectious AE.

Bayesian Network Meta-Analysis

Tables S1 and S2 illustrate for each drug the raw data of any serious AE and of any infectious AE end-point incidence, respectively, derived from the RCTs included in our analysis [1123].

All of these trials used placebo as common comparator, with the exception of the study by Griffiths et al. [23]. in which the end-point of any serious AE was compared between high-dose etanercept and ustekinumab (45 and 90 mg).

For both end-points of any serious AE and any infectious AE, the Bayesian analysis (fixed-effect model) showed no significant difference in all indirect head-to-head comparisons between active agents; as shown in Tables S3 and S4, all of the 95 % credible intervals for all indirect comparisons between active agents (six for any serious AE and ten for any infectious AE) included zero. The results obtained from the Bayesian random-effect model were nearly identical, but the goodness of fit was slightly worse (data not shown).

Figure 1 shows our results concerning the end-point of any serious AE calculated according to the Bayesian model in relation to all possible direct and indirect comparisons; the left panel shows the Forest plot, while the right panel shows the rankogram (in which the five treatments are compared with one another according to their safety). Table S3 shows the numerical values of risk difference (with 95 % credible intervals).

Fig. 1
figure 1

End-point of any serious AE. Left panel values of risk difference (with 95 % credible intervals) calculated for all direct and indirect comparisons according to the Bayesian fixed-effect model. Right panel rankogram comparing the five treatments; rank 1 indicates lowest safety while rank 5 indicates highest safety

Figure 2 shows the results concerning the end-point of any infectious AE calculated according to the Bayesian model in relation to all possible direct and indirect comparisons. Also in this case, the left panel shows the Forest plot, while the right panel shows the rankogram. Table S4 shows the numerical values of risk difference (with 95 % credible intervals).

Fig. 2
figure 2

End-point of any infectious AE. Left panel values of risk difference (with 95 % credible intervals) calculated for all direct and indirect comparisons according to the Bayesian fixed-effect model. Right panel rankogram comparing the six treatments; rank 1 indicates lowest safety while rank 6 indicates highest safety

For the end-point of any serious AE, the overall ranking (from highest safety to lowest safety) was ustekinumab 45 mg and (at the same rank) ustekinumab 90 mg, placebo, adalimumab and (at the same rank) high-dose etanercept. With regard to the end-point of any infectious AE, the overall ranking was: low-dose etanercept, placebo, ustekinumab 45 mg and ustekinumab 90 mg, adalimumab and high-dose etanercept.

Discussion

Our results provided a synthesis of the safety data of subcutaneous biological drugs available for the treatment of moderate to severe psoriasis and was successful in determining the statistical significance of differences between active treatments and in defining their respective rankings. In a context where five different subcutaneous treatments are available and have in fact been tested in RCTs, our comprehensive picture of current therapeutic evidence can be of interest from several viewpoints.

The information on relative rankings (along with the probabilistic analysis) represents—in our view—our most interesting result. In particular, our findings concerning the end-point of any serious AE suggest that ustekinumab at both dosages ranked first and was close to the probabilistic results observed with placebo. For the end-point of any infectious AE, low-dose etanercept ranked better than the other treatments; high-dose etanercept ranked last in this analysis, but one should keep in mind that these indirect comparisons between active agents did not reach the threshold of statistical significance. It is well known that Bayesian models provide a two-fold key for interpreting the results: on the one hand, statistical testings resulting from Bayesian models can be interpreted according to the traditional keys of interpretation that are commonly employed in frequentist analysis (e.g. the dichotomy between significant and non-significant results); on the other hand, the probabilistic analysis on which ranking histograms are based provides another key for interpreting the results in which the descriptive component tends to prevail on the statistical component.

The strengths of our study included, in the first place, the originality of the methodological approach inasmuch this is the first ‘all-in-one’ Bayesian meta-analysis conducted on this specific topic. Another advantage is represented by our choice to evaluate all biologicals currently available for subcutaneous use, without focusing the analysis on a single agent (as in other published papers).

Our study had some limitations. Firstly, since we adopted the end-point definitions employed in the original studies, we cannot rule out that some differences existed in these definitions. In particular, from an examination of included studies, the definitions of any serious AE proved to be quite consistent across the different clinical trials; in contrast, there seemed to be more between-study heterogeneity in the definitions of any infectious AE. This is confirmed by our finding that credible intervals were generally wider in Fig. 2 than in Fig. 1. Finally, another limitation of our study is that further end-points other than those examined in our analysis could be implicated in the safety profile of these treatments (e.g. incidence of allergic phenomena).

In conclusion, our results convey original information that allows us to better interpret the safety profile of these five agents. Overall, our findings indicate that the magnitude of the expected incidence of AE cannot represent the main criterion for selecting a specific agent since these differences tend to be small and lack statistical significance. In the selection of a specific agent, other criteria should therefore prevail, including the rapidity of effect [6], the dosing schedule, and—last but not least—the cost.