Introduction

It has been estimated that almost 14% of all adolescents will meet criteria for a depressive disorder before the age of 18 [1]. Depression in children and adolescents does not only lead to personal suffering in those affected and their families, but it is also associated with increased suicide risk [2] and functional impairment at home, school and society [3, 4]. Several important negative health outcomes in adulthood have been associated with depression in children and adolescents, including poorer self-perceived general health, higher health care utilization and increased work impairment due to physical health [5]. With an estimated prevalence of 2.6% [6] and a much higher and increasing prevalence rate during adolescence, depression is undoubtedly a major public health challenge.

Psychological treatments are considered to be one of the main treatment options for youth depression and meta-analyses have shown that these treatments are indeed effective, although the effects are modest, [7] and smaller than those in adults, especially in younger children [8]. Although most studies have focused on cognitive behaviour therapy (CBT) and to a smaller extent interpersonal therapy (IPT), several other types of therapy have also been examined in randomised controlled trials, including behavioural activation [9], problem-solving therapy [10] and family therapy [11].

Most meta-analyses, however, report the effects of psychotherapies in terms of standardized mean differences (SMD), such as Cohens’ d and Hedges’ g, indicating the difference between the therapy and a control group after the treatment in terms of standard deviation. For patients, their families and clinicians, however, clinical significant change is much more important, because it indicates the chance of getting better after a treatment, and to compare that with the chance of getting better without treatment. The SMD is not very informative in this respect and cannot be seen as an indicator of clinical relevance, because it is still a statistical concept [12, 13].

Binary outcomes, such as response or remission are easier to understand, because they indicate how many patients get substantially better after treatment. Such outcomes are often presented as Relative Risks (RRs) or Odds Ratios (ORs) and indicate the relative benefit of a treatment in comparison to a control condition or another treatment. This is easier to interpret than effect sizes, but these outcomes still do not indicate the chance of getting better when receiving treatment [13].

Simply knowing the chance of getting better with or without a treatment is the most informative outcome for many patients and their families as well as for clinicians. Meta-analyses rarely report these outcomes, however, because heterogeneity is typically very high when proportions are pooled. Nevertheless, the clinical relevance of these outcomes is so high, that we believe that pooling them is still important. Pooling of binary data is also done in other important areas where high levels of heterogeneity are found, such as meta-analyses of prevalence rates [14,15,16].

Unfortunately, most randomised trials examining psychotherapy for depression in youth, usually do not report binary outcomes, but only means and standard deviation of the treatment and control groups. However, there is a well-validated method to estimate binary outcomes in psychotherapy and control conditions using estimates based on the means at baseline, and the means, standard deviations and N at post-test [17]. This method estimates how many patients are scoring above or below a cut-off assuming a normal distribution of the outcome. For example, the cut-off value for response (50% reduction of depressive symptoms from baseline to post-test) can be estimated from the baseline means, by simply taking 50% of the score at baseline. Then, it can be estimated with the means, standard deviation and N at post-test how many participants reached this cut-off value for response, assuming a normal distribution of the outcome measure. This method can also be used to estimate other binary outcomes, as long as they can be estimated with the baseline and post-test measures. In a previous meta-analysis, we found a correlation of 0.94 between the response and remission rates reported in the paper and the estimated rates using this method [18]. This method not only allows to estimate binary outcomes, but also to calculate numbers-needed-to-treat (NNTs), indicating how many patients have to be treated to have one more positive outcome compared to the comparison group [19].

This method also allows to calculate negative effects of psychotherapies. It is now broadly acknowledged that negative effects are a core issue in research and practice of psychological intervention in general [20]. Although it has long been assumed that no harm can be done, because psychotherapy is “only talking”, much research has by now shown that some patients do deteriorate during therapy [20]. To the best of our knowledge, however, negative effects in psychotherapies for depression in youth have hardly been examined. The method described above to estimate binary outcomes based on means at baseline and post-test, and the N and standard deviation at post-test, can also be used to estimate clinically significant deterioration and get a first rough estimate of negative effects of these therapies.

We decided to conduct a meta-analysis of psychological treatments of depression in children and adolescents, aimed at examining binary outcomes using the validated method to estimate these outcomes.

Methods

Search strategy and selection criteria

The protocol for this meta-analysis was registered at the Open Science Framework (https://osf.io/84xka) [21]. We used an existing database of randomised trials on the psychological treatment of depression, which includes trials in adults and in children and adolescents [22]. The database is continuously updated and was developed through a comprehensive literature search (up to Jan 1st, 2021). For this database, we searched four major bibliographical databases (PubMed, PsycINFO, Embase, Cochrane Library) by combining index and free terms indicative of depression and psychotherapies, with filters for randomized controlled trials. The full search string for PubMed is available in Supplement 1 and all search strings can be found at the project’s website (www.metapsy.org). Trials in children and adolescents were also identified through a recent other meta-analysis of psychotherapies in youth [7, 23]. All records were screened by two independent researchers, and all papers that could possibly meet inclusion criteria according to one of the researchers were retrieved as full text. The decision to include or exclude a study in the database was also done by two independent researchers, and any disagreements were solved through discussion and consensus.

For the current meta-analysis, we included (a) randomized trials (b) in which a psychological treatment (c) for depression in children and adolescents (d) was compared with a control group (waitlist, CAU, other control). We included studies in which the presence of a depressive disorder was established using a diagnostic interview as well as studies in which participants had to score above a cut-off on a self-report depression scale. Studies which included both adolescents and adults were excluded from this meta-analysis. No language restrictions were applied.

Quality assessment and data extraction

We assessed the validity of included studies using four criteria of the Cochrane ‘Risk of bias assessment tool [24]: allocation sequence generation; concealment of allocation to conditions; prevention of knowledge of the allocated intervention (masking of assessors); and dealing with incomplete outcome data (this was assessed as low risk when intention-to-treat analyses were conducted). Items were dichotomized as low or high/unclear risk. These assessments were conducted by two independent researchers, and disagreements were solved through discussion.

We also coded participant characteristics, study characteristics, and the time from baseline to outcome.

Outcome measures

Treatment response (50% reduction in depressive symptomatology between baseline and post-test) was the primary outcome [18]. We retrieved all response rates at all time points that were reported in the included studies, but we focused the main analyses on response rates at 2 (± 1) months after baseline, because this was the post-test for most studies and most interventions ended at that time. We clustered the studies according to the time from baseline to post-test, because absolute rates of outcomes are also influenced by spontaneous recovery rates and pooling different times of outcome would introduce considerable heterogeneity. When more than one outcome measure was reported, we selected the outcome according to an algorithm that has been used in previous meta-analyses (meaning that when more than one outcome measure was used, we selected the outcomes with priority for: Hamilton Depression Rating Scale (HAM-D), Beck Depression Inventory I or II (BDI), Children's Depression Inventory (CDI), and the revised Reynolds Adolescent Depression Scale (RADS-R) [8]. If the response rate was not reported in the paper, we estimated it with the well-validated method using estimates based on the means at baseline, and the means, standard deviations and N at post-test [17]. If neither the response rate, nor the data to estimate it were reported, the study was excluded.

The main outcome was the response rate at post-test, assuming that all study drop-outs were non-responders, because this was considered to be the most conservative estimate. We also conducted two sensitivity analyses in which: (a) all participants lost to follow-up were considered as responders, and (b) only study completers were included. We categorized the response rates according to the time between baseline and post-test and selected the post-test at 2 (±1) months after baseline as the main outcome, but also calculated response rates at later follow-up times.

We also calculated the Reliable Change Index, which is a psychometric criterion used to evaluate whether the change between baseline and post-test is considered statistically significant (the difference between baseline and post-test means divided by the standard error of the difference between the two scores is greater than 1.96, conservatively assuming a Cronbach’s alpha of 0.75) [25]. We used the same method to calculate the Reliable Deterioration Index, indicating whether a patient reliably deteriorated, as an indicator of negative effects. Because there were several studies in which all participants met criteria for a depressive disorder at baseline according to a diagnostic interview, we also calculated recovery (the proportion of participants not meeting criteria for a disorder at post-test anymore).

Meta-analyses

We first pooled rates for response, reliable change, reliable deterioration and recovery using the “metaprop” command of the “meta” package in R (version 3.6.3). In these analyses, we synthesized the binomial outcome data by random-effects pooling models after transforming to a logit scale. The pooled summary results were converted to the raw proportion scale, and the estimates and their 95% confidence intervals (CIs) are presented. Because we expected considerable heterogeneity, we employed a random effects pooling model in all analyses, according to the DerSimonian-Laird method. As indicator of heterogeneity, we calculated the I2 statistic and its 95% CI [26]. In addition, we calculated the prediction interval, which indicates the range in which the true effect size of 95% of future studies will fall.

First, we meta-analysed response rates for psychotherapies and control conditions separately at 2 (±1 month) follow-up (our primary outcome). We also pooled response rates assuming that all dropouts are responders, as well as the rates for the completers of the study. To improve the interpretation of the results, we also generated an l’Abbé plot with the response rate in the control group at the horizontal axis and the response rates in the treatment group at the vertical axis [27]. The 45° line indicates no effect.

We then examined the risk of small study effects by testing asymmetry through Egger’s test and adjusted the rates for the small study effects through Duval and Tweedie trim-and-fill procedure (R0 estimator) [28]. We also conducted sensitivity analyses by excluding outliers, defined as studies whose 95% CI of the response rate does not overlap with the 95% CI of the response rate of the pooled studies, by limiting the analyses to those studies with low risk of bias, and by limiting the analyses to those studies that reported response rates in the papers.

In the next step, we meta-analysed the Relative Risk (RR) of response. Then, we calculated the NNT using the pooled RR and the response rate in the control group, as recommended by the Cochrane Collaboration [29].

To examine potential sources of heterogeneity, we conducted subgroup analyses with age category (adolescents; children), recruitment (only clinical samples; other recruitment), diagnosis (diagnosed depressive disorder; subthreshold depression; scoring above a cut-off); type of psychotherapy (CBT; IPT; other), format (individual; group; guided self-help), risk of bias (low; other), control condition (waiting list; usual care; other). These subgroup analyses were conducted separately for the response rates in the psychotherapy conditions, the response rates in the control condition and the RRs.

We conducted the analyses for the pooled response rates at different follow-up times, as well as for reliable change and reliable deterioration.

Results

Selection and inclusion of studies

After examining 27,133 records (19,612 after removal of duplicates), we retrieved 3239 full-text papers for further consideration and excluded 3199 of these. The PRISMA flowchart, including the reasons for exclusion, is presented in Fig. 1. Forty studies including 3779 participants (2029 in the treatment groups and 1750 in the control groups) met our inclusion criteria. References of the included studies are given in Appendix [1].

Fig. 1
figure 1

PRISMA flowchart

Characteristics of included studies

Key characteristics of the included studies are presented in Table 1. Eleven studies were aimed at children up to 12 years of age, while 29 studies were aimed at adolescents (between 12 and 18 years). In 10 studies, all participants were recruited from clinical samples, while the other 30 studies recruited participants through the community or mixed sources. Seventeen studies were aimed at youth with a diagnosed depressive disorder, 16 used a cut-off score on a self-rating depression scale to include participants, and 7 were aimed at youth with a subthreshold depression (depressive symptoms, but not meeting criteria for a diagnosed depressive disorder). The proportion of girls ranged from 32 to 100% (median 60%). The 40 studies included 46 psychotherapy conditions that were compared with a control condition (6 studies included 2 psychotherapy arms). A total of 31 of the 46 psychotherapies were CBT, 6 were IPT and 9 were characterised as another type of therapy (including problem-solving therapy, behavioural activation, family therapy, supportive therapy, among others). Eleven therapies used an individual format, 26 used a group format, 3 a guided self-help format and the other 6 used a mixed format. The number of sessions ranged from 4 to 41 with a median of 12. The control condition used in the 40 studies included waitlist control groups (12 studies), usual care (16 studies) or other control conditions (12 studies). Twenty-five studies were conducted in the United States, 9 in Europe (including the UK), and 6 in other countries. The response rates were reported in only two studies, in the other studies the response rates were estimated with the method of Furukawa and colleagues [17].

Table 1 Selected characteristics of randomised trials on psychotherapies for children and adolescents

Risk of bias was considerable. 21 of the 40 studies reported an adequate sequence generation (52.5%); 11 reported allocation to conditions by an independent party (27.5%); 21 reported using blinded outcome assessors (52.5%) while 14 used only self-report outcomes (35.0%). In 25 studies, intent-to-treat analyses were conducted (62.5%). Eight studies (20.0%) met all quality criteria, 19 studies (47.5%) met 2 or 3 criteria, and 13 met no or only one criterion (32.5%).

Response rates in psychotherapy and control conditions at 2 (±1) months after baseline

The response rate (50% symptom reduction from baseline to post-treatment) at 2 (±1) months after baseline was available for 38 psychotherapy conditions and resulted in an overall response rate of 0.39 (95% CI: 0.34–0.45) (Table 2). The response rate was somewhat higher when it was calculated only on study completers (0.43; 95% CI: 0.37–0.49), and still higher when all study drop-outs were considered responders (0.47; 95% CI: 0.40–0.53). Exclusion of outliers did not materially affect the response rate. However, the sample of studies with low risk of bias resulted in a considerably smaller response rate (0.28; 95% CI: 0.18–0.42), although the confidence interval was wide because of low power. Heterogeneity was high in all analyses, except when outliers were excluded. The prediction interval of the response rate ranged from 0.63 to 4.45.

Table 2 Response rates, relative risks (RRs) and numbers-needed-to-be-treated (NNT) of psychotherapies versus control groups

The pooled response rate in the 32 control conditions was 0.24 (95% CI: 0.19–0.28). It was marginally higher in the completers’ samples (0.25; 95% CI: 0.21–0.30) and somewhat higher when all dropouts were considered responders (0.30; 95% CI: 0.25–0.36). Excluding outliers and adjusting for publication bias resulted again in very comparable rates and it was considerably lower in studies with low risk of bias (0.16; 95% CI: 0.11–0.23). Heterogeneity was moderate to high in all analyses, except after excluding outliers.

The RRs of the therapy versus control conditions for the response rates and the NNTs are also reported in Table 2. The RR for response across all psychotherapies compared to control was 1.67 (95% CI: 1.42–1.96), and the NNT was 6.2 (95% CI: 4.3–9.9). The sensitivity analyses indicated broadly comparable outcomes, although the NNT was considerably larger in the studies with low risk of bias and to a lesser extent after adjustment for publication bias. The forest plot for the RR is presented in Fig. 2. The l’Abbé plot is given in Fig. 3.

Fig. 2
figure 2

Forest plot: RR of response in psychotherapy versus control

Fig. 3
figure 3

L’Abbé plot

Other outcomes

The seven studies which lasted longer than 2 (±1) months but ended between 4 and 6 months after baseline, resulted in a lower response rate for the therapy conditions (0.29; 95% CI: 0.21–0.39) as well as for the control groups (0.19; 95% CI: 0.14–0.25). The relative risk and the NNT was not significant, possibly also because of the small number of studies.

Twelve studies that reported outcomes at 2 (±1) months follow-up also reported outcomes at 6–12 months and this reported in a response rate of 0.44 (95% CI: 0.37–0.51) for the therapy conditions, and 0.33 (95% CI: 0.24–0.43) for the control conditions. With a RR of 1.39 (95% CI: 1.11–1.74), the resulting NNT was 7.8. At 13–24 months follow-up, the response rates in therapy and control groups were very comparable, the RR was almost 1 and was not significant, nor was the NNT.

We were able to calculate the reliable change and deterioration in all 38 studies. The rate for reliable improvement was 0.54 (95% CI: 0.46–0.62) in the psychotherapy conditions and 0.32 (95% CI: 0.26–0.39) in the control conditions, with an RR of 1.59 (95% CI: 1.35–1.88) and a NNT of 5.3 (95% CI: 3.6–8.9). The deterioration rate was low in the therapy condition (0.06; 95% CI: 0.05–0.08) and was 0.13 in the control conditions (95% CI: 0.11–0.16). The RR was 0.40 (95% CI: 0.28–0.57) and the NNT was 12.8 (95% CI: 10.7–17.9), indicating that therapy reduced the chance of clinically significant deterioration.

Six studies (8 comparisons) included participants with a depressive disorder at baseline and reported the proportion no longer meeting these criteria. The proportion not meeting criteria at post-test was 0.58 (95% CI: 0.49–0.67) in the treatment group and 0.36 (95% CI: 0.22–0.54) in the control groups. The RR was 1.84 (95% CI: 0.99–3.44; n.s.) and the NNT was 3.3.

The subgroup analyses are reported in Table 3. No significant differences for the response rates within the psychotherapy conditions were found for any of the subgroup analyses. We did find significant differences in two subgroup analyses of the control conditions: one for diagnosis (the response rate was considerably higher in studies with youth meeting criteria for a depressive disorder or subthreshold depression, compared with studies in which participants had to score above a cut-off on a self-rating scale; p=0.04) and one for risk of bias (studies with low risk of bias resulted in a significantly lower response rate for control groups compared with other studies; p=0.01). For the RRs of the response rate versus control conditions, the only significant difference was found for type of control group (p=0.02).

Table 3 Subgroup analysesa

Discussion

We examined rates for response, reliable improvement, reliable deterioration and recovery for psychotherapies aimed at children and adolescents with depression. We found that on average 39% of children and adolescents respond after getting treatment (at 1(±2) months after randomisation), while 24% respond in control conditions. The RR of responding in psychotherapy versus control was 1.66. The corresponding NNT was 6.3, which roughly means that a total of six patients need to receive therapy to have one more positive outcome, compared to the control group. Sensitivity analyses broadly supported these findings, although the rates were somewhat higher depending on whether dropouts were considered responders or not. Response rates were considerably lower in studies with low risk of bias, both within the psychotherapy and the control conditions and the NNT was considerably higher in these studies (NNT=11.2). Heterogeneity was high in the meta-analyses of response rates within the psychotherapy conditions and within the control conditions, but excluding outliers resulted in a comparable response rate and low heterogeneity.

Overall, the response rates are moderate, with about 60% of those receiving therapy not responding within 2 months. In the control conditions, this was considerably lower, but the additional benefit of therapies above the control condition is still modest. This means that the majority of children and adolescents do not respond to the therapies tested in these studies to date, and a considerable number would also have responded without therapy. These findings make clear that new, more effective treatment are needed to further reduce the burden of depression in these age groups. Future research should also examine potential reasons why children and adolescents do not respond and whether for example enhancing treatment fidelity, optimizing delivery methods, combination treatments, personalised approaches or sequential treatments may increase response rates. It should be noted that it is also important that future studies not only report continuous outcomes, but also binary outcomes such as response and remission, because of the clinical relevance of such outcomes.

The outcomes for reliable change and recovery are somewhat better than those for response, but still almost half of those receiving therapy do not reliably improve. We also found that the effects on response were retained at 6–12-month follow-up, which is encouraging because the effects do not disappear right after the end of therapy.

Effect sizes such as Cohen’s d and Hedges’ g are important indicators of the effectiveness of interventions, indicating the difference between treatment and control groups at post-test in terms of standard deviations. However, a disadvantage of effect sizes is that they do not indicate how many patients get better after treatment and how many in control conditions, although this is exactly the information that patients, parents, and clinicians want to know. This meta-analysis did present such numbers, which made it clear that many children and adolescents do not respond to treatment and that a considerable proportion respond in control groups.

One of the strong points of this study was that we could estimate clinically significant deterioration with the same method across all included studies. To the best of our knowledge, no previous meta-analysis has estimated negative effects of treatments for depression in youth. We found that 6% of youth receiving psychotherapy deteriorated, which was significantly lower than the 13% in the control conditions (NNT=5.1). It is encouraging that deterioration rates are lower in treatment than in control conditions, but 6% is still a large proportion. It is important that clinicians are aware of the fact that a considerable number of children and adolescents deteriorate while receiving treatment, and that strategies should be developed to handle deterioration.

In another study, we examined response rates in psychotherapies for adult depression [30]. We found an overall response rate of psychotherapies of 0.41 at 2 (±1) months follow-up, which is very comparable to the response rate of 0.39 found in the current study. The response rate in the control conditions was somewhat lower in the studies among adults (0.17) than in the studies in children and adolescents (0.24). This suggests that more children and adolescents get better in the control conditions than adults. This could explain that the effects of psychotherapies for depression are smaller in children and adolescents than in adults [8]. These findings have to be considered with caution, however, because of the high heterogeneity of these findings.

This study has several important limitations that should be taken into account when interpreting the effects. The most important limitation is that heterogeneity was very high, especially when estimating the response rates (less so for the RRs). This may be related to characteristics of the included studies that we did not examine in subgroup and metaregression analyses, such as treatment provider and proportion of participants using antidepressants. A complete review of all relevant characteristics is also beyond the scope of this study. Furthermore, such characteristics are often not consistently reported in the papers and reporting on the subsets of studies with clear characteristics could have produced an incomplete and perhaps invalid picture. However, the estimated rates appeared to be relatively robust and resulted in very comparable outcomes, in a series of sensitivity analyses. We also think, as we explained in the introduction that the clinical relevance of these outcomes is substantial and that pooling them is still important, as is also done for example in meta-analyses to estimate the prevalence of mental disorders [14,15,16]. Second, the response rates and the rates for clinically significant improvement and deterioration were based on estimates, using means, standard deviations and N at baseline and post-test. Although this method has been validated and correlates very highly with reported response rates, these are still estimates that may not reflect the actual response rates. “Individual patient data” meta-analyses could have calculated response rates directly. Third, response as outcome has been criticized, because it is depending on the baseline severity score, which may be unreliable [31]. Because other outcomes, such as remission cannot be standardized across different outcome measures, we do think that despite its weaknesses, response is the best measure to make a preliminary estimate of binary outcomes of treatment. Fourth, risk of bias was considerable in the large majority of trials. It was also notable that the studies with low risk of bias resulted in a considerably larger NNT.

Despite the limitations, this study showed that psychotherapies for depression in children and adolescents are effective compared to control conditions, but that still more than half of patients receiving therapy do not respond. Furthermore, a considerable number of those in control groups also respond. More effective treatments and treatments for those not responding to a first treatment are clearly needed.