Introduction

Within the past decade, the individual contribution of distinct etiologies to the burden of liver diseases has switched from viral hepatitis towards non-alcoholic fatty liver disease (NAFLD) [1]. NAFLD constitutes a complex metabolic disorder that manifests with fat accumulation in the cytoplasm of the hepatocyte in the absence of significant alcohol consumption or other causes of liver diseases [2]. Moreover, NAFLD severity ranges from simple steatosis to non-alcoholic steatohepatitis (NASH), usually accompanied by different stages of liver fibrosis, ballooning, and overall chronic inflammation status. This disease has surprisingly become a leading cause of liver cirrhosis and hepatocellular carcinoma worldwide. The risk of liver-related mortality exponentially grows with an increase in fibrosis stages [3]; therefore, the diagnosis of liver fibrosis is one of the first steps when stratifying patients prior to the inclusion in clinical trials. Besides, NAFLD plays a catalytic role in the development of metabolic comorbidities in these multimorbid patients. Significant fibrosis but not simple steatosis or NASH predicts type 2 diabetes mellitus (T2DM) and arterial hypertension in these patients [4].

Despite its enormous prevalence, no regulatory-approved therapeutic option has been authorized yet [5]; therefore. the cornerstone of NAFLD management still relies on lifestyle interventions [6]. In the context of clinical trials, the identification of patients at risk of suffering from liver-related and non-liver-related complications is tough. Besides, identifying the most appropriate therapy in NAFLD patients still remains a challenge. Recent studies have pointed out that the ideal treatment should address liver fibrosis and NASH in a joint fashion. To date, clinical trials have fallen short when testing the efficacy of novel molecular targets due to changes in some uncontrolled variables, such as dysmetabolic comorbidities and/or daily habits, since they are not reported nor adequately represented [7]. Considering hepatic fibrosis as a crucial factor of clinical prognosis and reinforcing the role of inflammation and disease activity as key players in the maintenance of chronicity in this disease, both factors should be taken into account as primary clinical trial endpoints.

Data obtained from NAFLD clinical trials have shown suboptimal results, particularly for liver fibrosis, despite the robust preclinical development of the therapies. Therefore, in this setting, we carried out a meta-analysis to assess the histological response after the experimental treatment versus placebo (including NASH resolution and fibrosis improvement  ≥ 1 stage) and, as a main novelty, the clinical benefits in delaying disease progression.

Methods

Study identification and selection

We conducted our review according to the PRISMA reporting guideline for systematic reviews [8]. One of the reviewers (JA) with experience in database searches designed the search strategy, which was subsequently revised by other three investigators (RG, DM, AR). They independently searched MEDLINE (using PUBMED as the search engine), EMBASE, and Cochrane databases and collected all results separately. Disagreements between them were resolved by a third investigator (MRG) or by consensus. Databases were used to identify suitable studies that were published up to 1 May 2021. MeSH terms and keywords were used, and the search terms were as follows: NAFLD, MAFLD, NASH, non-alcoholic fatty liver disease, non-alcoholic steatohepatitis, fatty liver, liver fat, steatosis, clinical trial, treatment, therapy, drug, and a combination of those MeSH terms by using the appropriate Boolean logic. The searches were limited to English-language publications with human subjects. A manual search was conducted using the references listed in the original articles and review articles retrieved. Only fully published articles and oral presentations subjected to the same assessment as regular articles (AASLD and EASL meetings) were considered, so abstracts and posters were not considered. The inclusion criteria were as follows: (a) randomized clinical trial; (b) placebo-controlled clinical trial; (c) Phase II and Phase III clinical trial; (d) paired biopsy; (e) adults (≥ 18 years old). The exclusion criteria were as follows: (a) duplicate reports; (b) case reports, comments, and letters to the editors; (c) systematic reviews or meta-analyses; (d) botanical products, herbal medicines, or antioxidants; (e) lifestyle intervention.

Data extraction and quality assessment

The following data were extracted: author, year, population selection criteria, sample size, experimental drug, histological endpoint (NASH, NAFLD Activity Score (NAS), fibrosis stage, steatosis, lobular inflammation, ballooning), biochemical response (AST, ALT), age, sex, body mass index (BMI), T2DM. When the same population was published in several journals, we retained only the most informative article or the most complete study to avoid duplication. We also asked the investigators for additional information, and if we received no answer, “unreported” items were treated as “unclear” or “not available”. On the other hand, four investigators (AG, SG, RM, RM) independently assessed the quality of the studies using the “Quality in Prognostic Studies (QUIPS)” tool [9].

Outcome measures

Given that the ultimate goal of NASH treatment is to slow the progress of, halt, or reverse disease progression and improve clinical outcomes, we selected the following the histological response after experimental treatment or placebo as the primary outcome: (a) NASH resolution, with no worsening of fibrosis when available; (b) fibrosis improvement ≥ 1 stage, with no worsening of NAS when available. On the other hand, as a clinical benefit can be verified by demonstrating superiority to placebo in delaying disease progression, we additionally considered: (a) worsening of NAS; (b) worsening of liver fibrosis ≥ 1 stage, including the progression to cirrhosis on histopathology. In addition, other histological outcomes were assessed as secondary endpointsasfollows: (a) NAS improvement > 2 points, irrespective of fibrosis improvement; (b) improvement of steatosis, lobular inflammation, and ballooning. Also, the occurrence of cirrhosis complications was analyzed. Finally, the biochemical response (ALT, AST) was also assessed.

Statistical analysis

We used STATA version 16 (Stata Corp; College Station, TX). All statistical tests were two-sided, with P-values ≤ 0.05 denoting statistical significance. Confidence intervals (CIs) of individual studies were determined from the available data. For the dichotomous variables, the effect denotes odds ratio (OR) and corresponding 95% CIs, while we used the difference in means to specifically provide measures of the absolute difference between the mean values of the explored variables. To estimate the pooled prevalence, the prevalence rates were combined in a random-effects meta-analysis.

The assumption of heterogeneity was tested for each planned analysis using the Cochran-Q heterogeneity and I2 statistics (significant heterogeneity according to I2 value > 50%) [10]. The random-effects model was applied to pool results from studies. We planned a priori subgroup analyses according to the following criteria: trials with cirrhotic versus non-cirrhotic population (three studies included 100% of cirrhotic patients, while other two included 50% and one 11%; however, they were considered as a cirrhotic population because separated information was not available), trials with ≥ 60% versus  < 60% of diabetic population, trials with mean NAS less than 5 versus 5 or greater, treatment duration 48 weeks or less versus greater than 48 weeks, and therapeutic class (Supplementary Table 1). Additionally, significant heterogeneity for primary outcomes was explored by univariable meta-regression, and a sensitivity analysis was performed to determine if there was any undue influence exerted by a single study on the results of the combined studies [11, 12]. Finally, the potential publication bias was assessed by Egger’s test and graphically by a funnel plot when there was an adequate number of studies (> 10 studies).

Results

Eligible study characteristics and quality assessment

The flowchart diagram details the article selection process for this meta-analysis (Supplementary Fig. 1), which ended with 27 studies included. The characteristics of the eligible studies are listed in Table 1. Supplementary Table 2 shows the quality assessment of the clinical trials by QUIPS.

Table 1 Characteristics of the studies included in the meta-analysis

Data analyses about NASH

NASH resolution was assessed by 26 clinical trials (N = 7239 patients). The pooled efficacy for NASH resolution obtained by patients treated with any experimental drug was 19% (95%CI 15–23; I2 96.2%) when compared with placebo 10% (95%CI 7–12; I2 85.8%) (Supplementary Fig. 2). The treatment difference between receiving a therapy placebo was higher considering the studies evaluating additionally the lack of worsening of fibrosis (N = 17) (OR 2.32 (95%CI 1.67–3.23); I2 4.9%) than considering the total of studies (N = 26) (OR 1.66 (95%CI 1.24–2.21); I2 57.8%) (Fig. 1a). The subgroup analysis showed that NASH resolution was more difficult to achieve in cirrhotic in comparison with non-cirrhotic patients for both experimental therapy [(4% (95%CI 1–8; I2 80.1%) versus 22% (95%CI 17–28; I2 95.7%))] and placebo [(2% (95%CI 0–4; I2 48%) versus 12% (95%CI 9–14; I2 65.8%))] (Supplementary Fig. 2). In addition, the experimental drug showed higher efficacy in clinical trials with a mean NAS < 5 versus NAS ≥ 5, with a duration > 48 versus  ≤ 48 weeks, with less than 60% of diabetic population, and when it was based on antimetabolic mechanisms, targeting de novo lipogenesis (DNL) and FXR agonist (Fig. 1).

Fig. 1
figure 1

HYPERLINK "sps:id::fig1||locator::gr1||MediaObject::0" The effect of the experimental drug on: (A) NASH resolution; (B) NASH resolution, according to subgroup analyses

Improvement of NAS by  ≥ 2 points is another typical endpoint of clinical trials. Up to 19 studies assessed this outcome (N = 3798 patients). Overall, receiving an experimental treatment increased the likelihood of achieving this outcome (37% (95%CI 0.28–0.46; I2 96.1%)) compared to placebo (23% (95%CI 0.16–0.30; I2 89%)) (OR 1.72 (95%CI 1.23–2.41); I2 71%) (Supplementary Fig. 3). Similar to NASH resolution, the improvement of NAS ≥ 2 points was higher in non-cirrhotic than in cirrhotic patients for the experimental (40% (95%CI 30–51; I2 96.8%) and 24% (95%CI 14–34; I2 57.4%), respectively) and placebo arms (26% (95%CI 18–34; I2 90.3%) and 13% (95%CI 7–20; I2 31.8%), respectively) (Supplementary Fig. 4, b).

On the other hand, NASH worsening was assessed in four clinical trials (N = 695). Patients treated with experimental therapy significantly displayed a lower rate of NASH worsening (14% (95%CI 5–23); I2 83.2%) than those taking placebo (25% (95%CI 20–30); I2 0%)) (Supplementary Fig. 5a,b), thus showing a protective effect of the medication (OR 0.57 (95%CI 0.39–0.84); I2 67%) (Supplementary Fig. 6).

Data analysis about fibrosis

Fibrosis improvement  ≥ 1 stage was assessed by 27 clinical trials (N = 7151 patients). This analysis proved that the experimental therapy was superior, achieving 26% (95%CI 22–29); I2 90%)) of this outcome versus 18% (95%CI 15–21; I2 59%)) with placebo. The beneficial effect of the drug was similar in the studies that additionally assessed no worsening of NASH (N = 16) (OR 1.30 (95%CI 1.12–1.51); I2 25.8%) in comparison with the total of the studies (N = 27) (OR 1.34 (95%CI 1.13–1.58); I2 25.4%) (Fig. 2a). Although the efficacy was lower in patients with advanced liver disease, the therapy was superior to placebo in non-cirrhotic (28% (95%CI 24–33; I2 91%) versus 20% (95%CI 17–23; I2 61.7%)) and cirrhotic patients (16% (95%CI 11–21; I2 59.3%) versus 12% (95%CI 8–17; I2 23%)), respectively (Supplementary Fig. 7a,b). In addition to non-cirrhotic patients, the experimental therapy showed higher efficacy in trials with a duration  > 48 versus  ≤ 48 weeks, in studies showing  < 60% of diabetic population, and when based on antimetabolic drugs and FXR agonists (s).

Fig. 2
figure 2

The effect of the experimental drug on: A Fibrosis improvement ≥ 1 stage; B Fibrosis improvement ≥ 1 stage, according to subgroup analyses

Sixteen clinical trials (N = 3459 patients) assessed the fibrosis progression showing that patients receiving an experimental drug were more protected against this outcome (17% (95%CI 13–22); I2 89.1%) than individuals under placebo (24% (95%CI 19–29); I2 69.7%) (Supplementary Fig. 8a,b) (OR 0.65 (95%CI 0.46–0.92); I2 61.9%) (Fig. 3a). Finally, when separating between fibrosis progression and progression towards cirrhosis, a similar protective role of the therapy was found [(OR 0.62 (95%CI 0.39–1.00); I2 71.6%) and (OR 0.72 (95%CI 0.51–1.00); I2 0%), respectively] (Fig. 3b).

Fig. 3
figure 3

The effect of the experimental drug on: A Overall fibrosis progression; B Fibrosis progression versus progression to cirrhosis

Other secondary endpoints

We also assessed the improvement of individual components of NAS (14 studies, 2876 patients). Regarding steatosis, the experimental treatment was associated with a higher rate of improvement (47% (95%CI 36–58); I2 96%) than placebo (24% (95%CI 17–31); I2 86.7%) (OR 2.84 (95%CI 1.80–4.47); I2 80%) (Fig. 4a). Also, ballooning decreased more frequently in patients receiving the experimental treatment (40% (95%CI 29–52); I2 96.5%) versus placebo (28% (95%CI 23–33); I2 67.6%) (OR 1.68 (95%CI 1.11–2.56); I2 78.1%) (Fig. 4b). Besides, the pooled efficacy of achieving lobular inflammation improvement was higher in patients receiving the drug (41% (95%CI 35–46); I2 81.3%) than placebo (30% (95%CI 25–34); I2 61%) (OR 1.55 (95%CI 1.19–2.01); I2 47.9%) (Fig. 4c). On the other hand, steatosis (OR 0.34 (95%CI 0.22–0.52); I2 0%) had a lower likelihood to progress in patients receiving any experimental therapy than placebo, although this did not occur with ballooning (OR 0.87 (95%CI 0.45–1.67); I2 69%) and lobular inflammation (OR 0.71 (95%CI 0.34–1.46); I2 79.8%). Otherwise, the occurrence of cirrhosis complications was not prevented when using an experimental treatment (N = 3) (OR 1.41 (95%CI 0.86–2.32); I2 29.4%).

Fig. 4
figure 4

The effect of the experimental drug on: A Steatosis; B Ballooning; C Lobular inflammation

Finally, the necro-inflammatory activity also improved when taking an experimental therapy. AST levels were significantly decreased in these individuals compared to those receiving placebo (mean difference –10.1 IU/L (95%CI (–14.7 to –5.4); I2 79.4%)) (Fig. 5a). Similarly, ALT levels were found to be diminished under treatment (mean difference –13.8 IU/L (95%CI (–23.5 to –4.1); I2 92.3%)) (Fig. 5b).

Fig. 5
figure 5

The effect of the experimental drug on: A AST levels; B ALT levels

Heterogeneity assessment and publication bias

The leave-one-out sensitivity analysis did not identify any single study that significantly contributed to the between-studies variability for NASH resolution (Supplementary Table 3), fibrosis improvement (Supplementary Table 4), and fibrosis progression (Supplementary Table 5). On the other hand, meta-regression showed no evidence of a differential effect of study-level characteristics on the impact of the outcomes, apart from baseline NAS for NASH resolution (P = 0.010) (Supplementary Table 6).

Publication bias was conducted by Egger’s test and funnel plot asymmetry. There was no formal evidence of publication bias for NASH resolution (P = 0.453) (Supplementary Fig. 9), improvement of NAS by  ≥ 2 points (P = 0.101) (Supplementary Fig. 10), fibrosis improvement  ≥ 1 stage (P = 0.451) (Supplementary Fig. 11), fibrosis progression (P = 0.105) (Supplementary Fig. 12), steatosis improvement (P = 0.312), ballooning improvement (P = 0.496), lobular inflammation improvement (P = 0.232), and biochemical response (P = 0.812 and P = 0.957 for AST and ALT, respectively).

Discussion

Over the past five years, many clinical trials testing new drugs for NAFLD have been published [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37. The scientific community has initially witnessed these preliminary data with enthusiasm and the final publication with a relative skepticism because of the suboptimal results, particularly for liver fibrosis improvement as an endpoint [38]. Our meta-analysis observed that taking an experimental therapy versus placebo increased the likelihood of resolute NASH and regress liver fibrosis. Despite the fact that only 5 of 76 studies demonstrated a beneficial effect of the therapy on NASH, the likelihood of NASH resolution was 60% higher than receiving placebo. In liver fibrosis, the likelihood of improving at least one stage was 30% higher with the therapy, although only 3 out of the 77 studies showed an individual benefit. Also, the individual components of NAS (steatosis, ballooning, and lobular inflammation), as well as the necro-inflammatory activity (evaluated by AST and ALT levels) significantly improved with the therapy. Of note, the percentage of NASH resolution and fibrosis regression for placebo was similar to that published in the literature [39], although recently it has been suggested a lower fibrosis progression rate in this group probably related to the number of patients without fibrosis [40]. Despite the global positive results, we found that the percentage of NASH resolution and fibrosis improvement was 19–28%, respectively, for experimental therapies based on biological plausibility, which are far from desirable. Due to its multifaceted nature [41], this fact mirrors the complexity of the underlying mechanisms of the pathogenesis of NAFLD.

We found that some baseline variables and features related to the clinical trial design influenced the likelihood of achieving the outcomes. First, FXR agonists and anti-metabolic drugs (including anti-diabetic therapies and PPAR agonists), and DNL-targeting therapies for NASH resolution, showed the highest efficacy for inflammation and fibrosis improvement. These findings have been documented in some studies [42, 43], although they showed limitations such as not assessing the clinical benefit in delaying disease progression and the baseline features impacting the efficacy of the drugs. Second, NASH resolution was easier to achieve in trials with non-cirrhotic patients, with baseline NAS < 5, with a low proportion of diabetic patients, and with a longer length of the therapy. These data did agree on variables associated with NASH-resolution after life-style intervention supporting a group of features defining more difficult-to-solve patients [44]. These characteristics were also more frequently associated with fibrosis improvement, with the exception of NAS. These results should make us to meditate on the design of NAFLD clinical trials and the adequacy of the endpoints to balance them with the prognostic relevance. On the one hand, the experimental therapy appeared to require at least 1 year to be effective. Thus, a longer duration than 48 weeks is preferred in NAFLD clinical trials. On the other hand, the drug effect was superior to placebo when achieving NASH resolution and fibrosis improvement, but it was not in trials including more severe patients. Therefore, NASH resolution should be required for the experimental therapy in non-cirrhotic patients and in those with a baseline NAS < 5, but questionable for individuals showing a NAS ≥ 5 and, especially, for cirrhotic patients since most of them have lost some of the single components of NAS [45]. Similarly, although desirable, fibrosis improvement is not a realistic aim for cirrhotic patients using the current experimental therapies, according to our results. Instead, clinical trials on cirrhotic patients should focus on preventing portal hypertension, hepatocellular carcinoma occurrence, and mortality, extending the treatment course, rather than in the regression of liver disease. Therefore, we should make efforts to redirect the design and selection criteria of clinical trials because some potentially useful drugs could be discarded too early.

NAFLD clinical trials should report a minimum of information about all relevant aspects that could impact on the efficacy of the experimental drug tested [7]. In this setting, results about efficacy tend to focus on the histological improvement (e.g., NASH or fibrosis), but frequently fail to mention data associated with the prevention of its progression. For example, in our meta-analysis, only 4 and 16 of 27 clinical trials reported information about worsening of NASH and fibrosis, respectively. Considering this, our results indicate that patients receiving therapy were protected against NASH worsening and/or fibrosis progression. In other liver diseases, the treatment aims mainly to eliminate (e.g., hepatitis C) or control the etiology (e.g., hepatitis B, autoimmune hepatitis) but does not reverse liver fibrosis, which is a consequence that requires an extended follow-up [46,47,48]. Instead, NAFLD clinical trials require an early resolution of NASH or fibrosis improvement to be considered a success. Given the nescience regarding in whom fibrosis regression can be expected and how quickly it occurs, we should consider halting the disease as a relevant outcome and, thus, complementary to the improvement of liver disease. Therefore, we encourage NAFLD clinical trials to report essential information about the progression of the disease to have an overall vision of the efficacy of experimental drugs.

Beyond the strengths, our meta-analysis also has some limitations. First, the interpretation of some results could be challenging because of the different mechanisms of action of included drugs. However, this kind of approach has been done for other therapeutic areas (e.g., biologics in ulcerative colitis [49]) and could be interesting to provide additional data to guide the NAFLD drug pipeline properly. Second, studies reporting cirrhotic-related outcomes were scarce, precluding making robust analysis. Third, some baseline variables, such as T2DM or NAS, were categorized. This usual aspect allows making subgroup analyses or a meta-regression in the absence of the individual data but with a limitation in the interpretation.

In conclusion, developing therapeutic strategies to revert or, at least, slow down steatohepatitis and fibrosis progression as much as possible in NAFLD is an unmet need. This meta-analysis provides information about the efficacy of the therapy versus placebo by comparing different and combined trial outcomes such as NASH resolution, fibrosis improvement, and NAS and fibrosis worsening. Given that novel pharmacological agents focused on NASH resolution and liver fibrosis regression are expected to be available in the upcoming years, changes in the experimental design and selection criteria of the clinical trials may increase the ability to demonstrate efficacy.