Introduction

Nonalcoholic fatty liver disease (NAFLD) is a chronic liver disease related to metabolic disorders. The estimated global prevalence of NAFLD in the general population is ~25% [1]. Nonalcoholic steatohepatitis (NASH) is a serious form of NAFLD [2]. Although it has an insidious onset, it can eventually progress to potentially fatal condition, such as decompensated cirrhosis, hepatocellular carcinoma, and necessitate liver transplantation [3, 4]. The estimated global prevalence of NASH is 3%–5%, and the complications related to NASH impose a considerable morbidity and mortality burden [5, 6]. NAFLD is associated with an increased risk of cardiovascular disease, given the close association of NAFLD with metabolic syndrome, including obesity, hypertension, atherogenic dyslipidemia, and insulin resistance [7, 8]. In a recent study of biopsy-proven NAFLD patients, the fibrosis score was the only independent predictor of cardiovascular disease [9]. NAFLD is associated with an increased risk of developing extrahepatic cancers [10, 11]. In a Swedish biopsy-confirmed NAFLD cohort, patients had significantly increased overall mortality (15.3%), which was due primarily to the increased incidence of cancer [12].

Currently, there are no effective drugs approved for the treatment of NASH [13]. Therefore, research and development of novel drugs for treatment of NASH is a key imperative to reduce the disease burden and decrease the incidence of severe complications in the late stage of NASH. Although several drugs with different mechanisms have completed phase IIb/III clinical trials, the reported efficacy has been limited [14,15,16,17]. Owing to the complex pathogenetic mechanism of NASH and the considerable heterogeneity, histological evaluation based on liver biopsy is necessary for distinguishing NASH from NAFLD. Moreover, histological evaluation plays an important role in the research for development of new drugs for NASH [13]. To achieve a breakthrough in the development of NASH therapeutic drugs, the role of liver biopsy needs to be emphasized in the trial design.

Percutaneous fine-needle liver biopsy is a simple, reliable procedure that was introduced into clinical practice in the late 1950s [18]. Liver biopsy is considered as the “gold standard” for assessment of hepatic disease. The American Association for the Study of Liver Disease considers liver biopsy for establishing a diagnosis, staging of underlying liver disease, and directing management based on the underlying histology [19]. In the clinical practice and scientific research of NASH, liver biopsy provides much more essential information than the mere presence or absence of NASH. Liver biopsy remains the only diagnostic modality for identification of the cause, assessment of the classification, and prognostic evaluation.

Liver biopsy has been widely applied in the development of new NASH drugs. From the early PIVENS and FLINT studies to the several phase IIb/III studies carried out in recent years, liver biopsy has been used for patient screening and evaluation of therapeutic efficacy, which is a key link in the development of new drugs [20, 21]. The U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), and the Chinese National Medical Products Administration (NMPA) recommend liver histological evaluation as the main inclusion criterion and primary efficacy/endpoint in the new NASH drug development. However, the failure of several phase III trials has raised concerns about the possible limitations of pathological assessment based on liver biopsy in clinical trials and ways to improve the situation. In this article, we review the value of liver biopsy in the development of new NASH drugs and the associated challenges.

The value of liver biopsy in the new NASH drug development

Liver biopsy used as NASH inclusion criteria in new drug development

Liver biopsy is recommended by drug regulatory authorities as the inclusion criteria for new NASH drug development

In1980, Ludwig et al. proposed the concept of NASH for the first time [22]. To date, histology is still an indispensable method to distinguish NASH from NAFLD. The main histologic features of NASH include steatosis, lobular inflammation, and ballooning hepatocytes [23]. The typical pattern of NASH fibrosis consists of delicate strands of collagen around the sinusoids and hepatocytes, which usually present as a “chicken wire” pattern involving the centrilobular region.

The first proposed grading and staging system for lesions of NASH in adults was the “Brunt system”. The system enables semi-quantitative evaluation of the typical pathological characteristics of NASH. Based on a comprehensive evaluation, the lesions are classified as mild, moderate, or severe [24]. NAFLD Activity Score (NAS) was proposed by NASH Clinical Research Net (NASH-CRN) in 2005 and is being currently used as the criteria for pathological diagnosis of NASH in clinical trials [25]. The NAS scoring system semiquantitatively evaluates steatosis (0–3), lobular inflammation (0–3), and ballooning (0–2). According to the NAS score, the disease can be classified into three categories: NASH, not NASH, and borderline. This system can be applied to the entire spectrum of NAFLD and can also be used in pediatric NALFD [26, 27]. The SAF/FLIP system was developed by the European FLIP Pathology Consortium and is currently used for clinical diagnosis [28, 29]. This system also covers the entire disease spectrum of NAFLD, however, it cannot be used in children. The SAF/FLIP and NASH-CRN system are consistent for the evaluation of steatosis (0–3) and fibrosis (F0–F4). However, the diagnosis criteria for lobular inflammation (0–2 in the SAF system) and ballooning (0–2 in the SAF system) are slightly different. In the SAF/FLIP system, the combination of the semi-quantitative values of these two components is regarded as “activity score”; therefore, a subscore is awarded for each component of the SAF (steatosis + activity + fibrosis), which is the primary advantage of this system [30].

Based on the important role of liver biopsy in the diagnosis of NASH, drug regulatory authorities (including the FDA, EMA, and NMPA) have approved histology as the main inclusion criteria for phase II/III clinical trials of NASH [31,32,33]. In 2018, the FDA issued a draft guideline for developing therapeutic drugs for NASH [31]. For the patient inclusion criteria of phase III development, FDA has accepted NASH activity score (NAS) ≥ 4 with at least 1 point each for inflammation and ballooning along with a NASH-CRN fibrosis score >stage 1 fibrosis but <stage 4 fibrosis. To determine the use of liver biopsy findings as the entry criteria in the different phases of trials and the use of the NAS score, we retrieved 348 currently ongoing NASH-related clinical trials from ClinicalTrials.gov. A total of 188 trials use histological findings as the inclusion criteria, of which 79 trials use the NASH-CRN system as the diagnostic criteria, and 2 trials use the SAF/FLIP system (Table 1). In the early development stage, findings of non-invasive tests were the main inclusion criteria. However, histological evaluation is the major inclusion criterion in the phase of dose exploration and efficacy evaluation. For the NAS score, there is currently no uniform standard. A majority of the phase II–III trials, NAS score ≥4 has been used as the inclusion criterion (Table 1).

Table 1 Number of NASH clinical trials in different phases and the corresponding inclusion criteria.

Liver biopsy is an accurate method to diagnose and differentiate NASH

The cardinal advantage of liver biopsy is that it allows for direct histological evaluation of the lesions, making it a reliable procedure for the diagnosis of NASH [34]. Clinical biochemical parameters such as serum aminotransferase can indicate disease activity to a certain extent but their levels may not always be consistent with the disease state. A recent study evaluated 534 adults with biopsy-proven NAFLD with normal aminotransferase levels; the prevalence of NASH with F2-F3 fibrosis and cirrhosis was 19% and 7%, respectively [35]. Normal aminotransferase levels do not necessarily indicate the absence of significant fibrosis or even cirrhosis; therefore, histopathological examination is more accurate method for diagnosis. Liver biopsy can also differentiate NAFLD from other chronic liver diseases. In an earlier study, among the 24 patients diagnosed with NASH before liver biopsy, the diagnosis was corrected in four patients after liver biopsy (three were diagnosed as normal/nonspecific and one was diagnosed as primary sclerosing cholangitis) [36]. In the study by Adams et al., 45 out of 51 (88%) NAFLD patients with positive autoantibodies qualified the diagnostic criteria for “probable” or “definite” autoimmune hepatitis (AIH). After liver biopsy, only four patients qualified the AIH diagnostic criteria [37]. Liver biopsy provides clinically meaningful information independent of purely clinical parameters. Therefore, use of liver biopsy as the diagnostic criterion for NASH is a more accurate method for the development of new NASH drugs.

Liver biopsy provides more morphological information than non-invasive tests

First, only histological examination can detect lobular inflammation and ballooning, which cannot be determined by current non-invasive diagnostic methods. Second, in the evaluation of steatosis, image-based MRI-proton density fat fraction (PDFF) can relatively accurately evaluate liver fat content. However, this test cannot reflect the distribution pattern of steatosis and the size of fat vesicles. Only liver biopsy can provide this important disease information. The distribution of steatosis was shown to be related with disease activity and correlate with fibrosis [38]. Diffuse microvesicular steatosis can be a manifestation of other etiologies including acute alcohol-induced, drug-induced liver disease, congenital metabolic disorder, and acute fatty liver of pregnancy. Third, for the diagnosis of NASH fibrosis, previous studies showed a certain degree of correlation of image-based examination or serological markers with NASH fibrosis. However, non-invasive modalities do not allow for morphological characterization of fibrosis. Sun et al. established the P-I-R classification (Beijing classification) based on the morphology of the liver fibrosis septa to evaluate the antiviral therapeutic effect of chronic hepatitis B [39]. This new classification based on liver biopsy can help optimize therapeutic strategies, which cannot be achieved through non-invasive examinations. Therefore, compared with non-invasive tests, liver biopsy can provide more comprehensive disease information. Therefore, it is more suitable as inclusion criteria for new NASH drug development.

Liver biopsy used as therapeutic efficacy/endpoint in new NASH drug development

Liver biopsy is recommended by drug regulatory authorities as the therapeutic efficacy/endpoint for new NASH drug development

The ultimate goal of NASH treatment is to slow the progress, halt, or reverse disease progression and improve clinical outcomes. However, considering the slow progression of NASH, it will take a long time to observe the hard clinical endpoints of mortality and morbidity. Therefore, the drug regulatory authorities allowed the use of surrogate endpoints to accelerate the drug approval process. The ideal surrogate endpoint should not only reflect the long-term clinical benefit but also have the characteristics of flexibility and short-term availability to accelerate the new drug approval process.

Pathological evaluation of NASH based on liver biopsy meets the requirements of surrogate treatment endpoints for new drug development. Previous studies have shown a close relation between the pathological manifestations of NASH and prognosis. A recent cohort substudy analyzed 446 adult patients with NAFLD who underwent two liver biopsies over a period of 9 years. By comparing the findings of the two liver biopsies, the authors found that improvement or worsening of disease activity was associated with fibrosis regression or progression [40]. In several studies based on histological findings and long-term follow-up, the degree of fibrosis was a predictor of liver-related endpoint events [40,41,42]. A meta-analysis of 13 studies had a combined study population of 4428 patients with NAFLD, of which 2875 patients were reported to have NASH. The results showed an association of biopsy-confirmed fibrosis with the risk of mortality and liver-related morbidity [43]. A recent study developed a new machine learning (ML) algorithm based on the classical histological features and hepatic venous pressure gradient (HVPG) in patients with compensated cirrhosis NASH [44]. The ML HVPG score differentiated patients with normal, elevated HVPG (5.5–9.5 mmHg), and clinically significant portal hypertension (HVPG ≥ 10 mmHg), which were closely related to liver-related endpoint events. This new AI technology also demonstrated the association between histological features and clinical disease progression. Therefore, the close correlation between the histological manifestations of NASH and the clinical hard endpoints makes it a reasonable surrogate endpoint for NASH clinical trials.

Based on the advantages of pathological evaluation, several drug regulatory authorities recommend liver biopsy as the surrogate efficacy/endpoint in the new NASH drug development [31,32,33, 45]. The 2019 Liver Forum also provided recommendations for the endpoint selection of NASH trials [34]. In phase IIb/III clinical trials, the following criteria can be used as therapeutic endpoints: Resolution of steatohepatitis and no worsening of liver fibrosis based on the NASH-CRN fibrosis score; or improvement in liver fibrosis by ≥1 stage and no worsening of steatohepatitis; or both resolution of steatohepatitis and improvement in fibrosis. Resolution of steatohepatitis is defined as absence of signs of fatty liver disease or presence of isolated or simple steatosis without steatohepatitis and NAS scores of 0–1 for inflammation, 0 for ballooning, and any value for steatosis [31, 34].

New drug development with the efficacy of resolution of steatohepatitis

The resolution of NASH is used as the primary endpoint in many NASH therapeutic trials. Among the trials that are registered on Clinicaltrials.gov with results posted, the most commonly used criterion for NASH resolution is decrease in NAS score by two or more without worsening of fibrosis. A Phase II trial assessed the efficacy of silymarin in NASH patients for 12 months [46]. There were 4/27 (15%) in the 700 mg dose arm, 5/26 (19%) in 420 mg dose arm, and 3/25 (12%) in the placebo arm who reached the primary endpoint of improvement in NAS score by ≥2 points. Another phase II study also used the resolution of NASH as the primary endpoint to assess the efficacy of volixibat, an inhibitor of the apical sodium-dependent bile acid transporter [47]. On histological analysis, a greater proportion of participants in the placebo arm (38.5%) met the primary endpoint compared with the volixibat arm (30.0%). A pilot study of metformin for treatment used improvement in NAS by 3 points as the primary endpoint, and 30% of participants achieved a histological response [48].

In addition to NASH resolution as the primary endpoint, some studies have also used fibrosis improvement as a co-primary endpoint or secondary outcome. In the PIVENS study, vitamin E therapy was associated with a higher rate of improvement in NASH (improvement in NAS score by ≥2 points); however, neither vitamin E nor pioglitazone was associated with improvement in fibrosis scores which was the secondary endpoint [20]. In the FLINT study, 45.5% of participants in the obeticholic acid group reached the primary endpoint of improvement in NAS score by ≥2 and 35.3% of participants reached the secondary endpoint of improvement in fibrosis [21]. In the phase II trial of semaglutide, a higher percentage of patients in the semaglutide arm showed NASH resolution compared with the placebo arm. However, the trial did not show a significant difference in the percentage of patients with improvement in fibrosis stage (Table 2) [49].

Table 2 Clinical trials in NASH patients with histological outcomes (phase II–III with results).

New drug developments with the efficacy endpoint of improvement in liver fibrosis

Trials of new drugs that target fibrosis tend to use improvement of fibrosis as the primary endpoint. In the recently published data of combination therapy of selonsertib (SEL), cilofexor (CILO), and firsocostat (FIR) in NASH patients with bridging fibrosis and cirrhosis, the primary endpoint was a ≥1-stage improvement in fibrosis without worsening of NASH, which was achieved in 11% of placebo-treated patients versus 21% of patients treated with CILO/FIR [50]. The phase III trials of SEL in patients with NASH and bridging fibrosis (F3, STELLAR-3) or compensated cirrhosis (F4, STELLAR-4) used ≥1-stage improvement in fibrosis as the primary endpoint. Unfortunately, SEL did not show any antifibrotic effects in the two studies (Table 2) [14].

Advantages of liver biopsy over non-invasive tests as a surrogate endpoint in trials

Non-invasive diagnosis of NASH and the associated fibrosis is a contemporary research hot spot. Some non-invasive biomarkers have been used in clinical trials. MRI-PDFF has been used as the primary therapeutic endpoint in the early drug development stages such as the proof of concept studies, especially for drugs with the main target of reducing liver steatosis [33, 51]. For the evaluation of fibrosis, non-invasive tests such as controlled transient elastography (VCTE), FibroScan-AST score, and enhanced liver fibrosis test have been used as evaluation methods before liver biopsy to increase the success rate of screening [52, 53]. Magnetic resonance elastography has been used as a secondary endpoint in clinical trials because of its good performance in predicting fibrosis [54, 55].

However, irrespective of how good the predictive efficacy is, the area under the receiver operating characteristic curve (AUROC) will never reach 1.0. Although non-invasive imaging modalities offer some advantages, these do not provide in-depth characterization of the pathological changes. Furthermore, there is inadequate evidence to prove that non-invasive testing can reflect the long-term prognosis of NASH and to predict clinical benefit with reliability and consistency. The current drug regulatory authorities and international guidelines have approved non-invasive testing as the main endpoint of the early IIa stage of drug development, according to the target of the drug [3, 33, 45]. In the pivotal (phase III) trials whose main purpose is to confirm efficacy, histology is still recognized as a relatively reliable efficacy endpoint. Non-invasive testing can be used as a secondary or exploratory endpoint in phase III trials [33, 56].

Challenges of liver biopsy in new NASH drug development

Suboptimal reliability of pathological evaluation

Histological diagnosis obtained by liver biopsy is not perfect in all aspects. The reliability of pathological diagnosis is currently a major challenge. A recent study evaluated 678 biopsies from 339 patients conducted by three experienced hepatopathologists [57]. The inter-reader agreement was just fair to moderate (weighted kappa: steatosis 0.609, fibrosis 0.484, lobular inflammation 0.328, and ballooning 0.517). Although the same diagnostic criteria (NASH-CRN) were used, the agreement of pathological diagnosis was suboptimal. In the study by Pavlides et al., three pathologists evaluated biopsy specimens from 65 patients using NASH-CRN criteria; the kappa value for the fibrosis stage was 0.54. In another study, liver biopsy samples of 100 consecutive adult patients with suspected NAFLD were randomly assigned to four pathologists. Inter-observer agreement was acceptable for steatosis, lobular inflammation, and fibrosis, but not for hepatocyte ballooning (ICC: 0.012) [58]. The intra-observer agreement in this study was acceptable in all scales but was only 0.42 for lobular inflammation. Another study yielded similar results; the kappa value for lobular inflammation and ballooning was only 0.23–0.37 [59].

A high degree of discordance in liver histology interpretation may weaken the reliability of pathological evaluation and adversely affect clinical trials. In terms of screening of patients, pathological findings are the core inclusion criteria. An unreliable pathological evaluation may allow patients who do not meet histological criteria to be included and eligible patients to be excluded. In terms of efficacy evaluation, it is hard to accurately identify the true responders, which may attenuate the treatment effects. Ensuring the quality of liver biopsy specimens can help improve the consistency of evaluation. In addition, training of pathologists is another potential way to improve consistency. The FDA recommends establishing an adjudication committee of central pathologists to read baseline and posttreatment specimens together. The committee should consist of at least two experienced and trained pathologists to decide how each of the components will be interpreted [56].

Potential risks of invasive inspection and sample variability

Liver biopsy is an invasive procedure associated with potential risks such as pain and hemorrhage. The procedure is liable to cause patient discomfort. Therefore, liver biopsy may not be suitable for large-scale use, but is more suitable for use in patients with a high risk of progression [60]. Two or three liver biopsies are usually included in the design of phase IIb/III NASH clinical trials, which is also one of the main obstacles for patient enrollment. However, the overall incidence of adverse reactions is relatively low. The reported incidence of intraperitoneal hemorrhage is only 0.03%–0.7%, that of hemobilia and bile peritonitis is 0.006%–0.2% and 0.03%–0.22%, respectively [61]. Ultrasound-guided biopsy is associated with a lower incidence of complications and improved safety of liver biopsy, especially for patients with known specific mass lesions and history of intra-abdominal surgery [62].

Sample variability is another limitation of liver biopsy. Histologic lesions of NASH are unevenly distributed throughout the liver parenchyma, and 1 or 2 pieces of specimens may not be adequately representative of the entire liver. In the study by Ratziu et al., 51 patients with NAFLD underwent percutaneous liver biopsy with two samples collected to assess the sampling error of liver biopsy. The negative predictive value of a single biopsy for the diagnosis of NASH was at best 0.74. Six of seventeen patients with bridging fibrosis (35%) had only mild or no fibrosis on single sample reading [63]. In clinical trials, the influence of sampling variability will lead to bias in enrollment, disease classification, and efficacy evaluation.

Insufficient evaluation of dynamic changes in fibrosis

In the posttreatment liver biopsy, improvement in liver fibrosis by ≥1 stage is one of the main histological efficacy endpoints of NASH drugs. However, similar to the progress of fibrosis, the reversal of fibrosis is also a gradual and slow process. Therefore, within the limited treatment time of 48 to 72 weeks, the existing pathological staging system may not be sensitive enough to detect subtle changes in fibrosis. In addition, the fibrosis classification is merely a static assessment and does not suggest the dynamic changes in fibrosis morphology, which provides limited information about the direction of disease development. According to the “hepatic repair complex” concept proposed by Wanless et al., even a single biopsy can provide a sense of progression or regression [64]. For example, wide/broad, loosely aggregated collagen fibers and thin, densely compacted stroma can convey different disease information. In NASH clinical trials, we speculate that even in the absence of any posttreatment decrease in the overall fibrosis stage, change in the morphology of the fibrosis scar to a thin and dense form is also a sign of reversal. Recently, several phase III clinical trials that failed to meet the primary efficacy endpoint challenged the pathological evaluation system of NASH, which points to the need for a more nuanced and dynamic understanding of the disease beyond a mere static assessment of scar.

Weak evidence of the correlation between histological improvement and hard endpoints

Another concern about liver biopsy is whether the drug that has been approved by reaching the histological surrogate endpoint can truly achieve the hard endpoint in the long term. Currently, there is a paucity of data to assess whether improvement in histology can predict clinical endpoints. Recently, a study analyzed data from two clinical trials of NASH compensated cirrhosis patients. The results showed an association of cirrhosis regression with a lower risk of liver-related endpoint events compared with non-regression [65]. However, this study only included patients with compensated cirrhosis, the number of patients with fibrosis regression was relatively small, and the follow-up time was not long enough. Therefore, additional studies are required to justify the histological improvement and the hard clinical endpoints. The current FDA draft guidance endorses that premarketing trials for NASH which plan to evaluate the histology surrogate endpoint should ensure at least a 12–18 months period of treatment time [31]. Furthermore, the FDA’s current thinking recommends efficacy evaluations of 2 or more years to confirm clinical benefit because of the subtle and slow changes in histologic features [56]. An investigational drug that is approved through an accelerated pathway based on the histologic efficacy endpoints requires a phase IV clinical outcomes trial to verify long-term benefits. The hard endpoints that truly reflect clinical benefit include the progression to cirrhosis, decompensatory events, Model for End-Stage Liver Disease score from ≤12 to ≥15, all-cause mortality, and need for liver transplantation. After completing pivotal trials and post-marketing trials, the product can be fully approved.

Inadequate identification of the complex heterogeneity of NASH

NAFLD is a multifactorial disease characterized by complex heterogeneity [66]. Several factors contribute to the heterogeneity of this disease, including, age, sex, genetic variants, type 2 diabetes mellitus, reproductive and hormonal status, metabolic health, and physical activity [67, 68]. These factors affect the progression of NAFLD disease through different mechanisms, thus the subphenotypes may have a distinct natural history and prognosis. The failure of several recent NASH trials has recently been attributed to disease heterogeneity [69]. Patients in different subgroups may show different responses to the drug. As mentioned above, histology is the main efficacy/endpoint for phase IIb/III clinical trials of NASH; however, the existing NASH histological criteria may not accurately render complex heterogeneity information. The potential relationship between NASH histological manifestations and dominant disease mechanisms is still not very clear [67, 68]. Thus, new NASH drug trials should take cognizance of clinical heterogeneity while designing the study. In the future, more research is required to explore the relationship between disease heterogeneity and histopathological characteristics and make better use of liver biopsy to identify the therapeutic effect in different subgroups of patients.

The future of liver biopsy in the new NASH drug development

AI technology may improve the reliability of pathological evaluation

Recent advances in artificial intelligence technology has provided support for improving the reliability of pathological diagnosis. Second harmonic generation/two photon excitation fluorescence (SHG/TPEF) microscopy is a new technology that can fully quantify fibrosis and permit identification of individual collagen fibers without staining (Fig. 1). It can help precisely quantify the parameters of collagen fibers, such as the length, width, diameter, and the cross-linkages of the collagen fibers. A study established the Fibrosis-SHG index based on SHG/TPEF technology to evaluate the level of liver fibrosis in chronic liver disease [70]. Wang et al. used the SHG/TPEF technology for evaluation of NASH fibrosis [71]. They developed quantitation of fibrosis-related parameters (q-FPs) and applied four main q-FPs to distinguish the F0–F4 stage of NASH fibrosis which showed a good diagnostic performance (AUROC 0.81–0.93). Chang et al. developed another fibrosis index (SHG B-index) comprising 14 unique SHG-based collagen parameters [72]. An SHG B-index score of >1.76 showed an overall accuracy of 98.5% in predicting the presence of bridging fibrosis. A recent study developed and validated qFIBS, a computational algorithm that quantifies key histological features of NASH. qFIBS included qFibrosis, qInflammation, qBallooning, and qSteatosis, which showed a satisfactory correlation with each respective component and good AUROC values (0.708–0.986) [73]. SHG/TPEF-based quantitative method can quantify the cardinal histological features of NASH independently of operators and experimental conditions. Therefore, this technology can allow for a more objective evaluation, especially when the diagnosis between different pathologists is inconsistent.

Fig. 1: Pathological images of  Masson Trichrome staining and SHG/TPEF for liver specimen of a NASH patient with fibrosis.
figure 1

SHG/TPEF technology for quantification of NASH fibrosis. a. Masson Trichrome staining. b. SHG/TPEF technology for quantification of NASH.

Good quality specimen may reduce sampling variability of pathological evaluation

Qualified liver biopsy specimens are the basis of pathological evaluation. Unqualified biopsy specimens affect the pathological evaluation, which in turn will affect enrollment screening and efficacy evaluation. Common factors that affect the quality of liver biopsy are the length and the integrity of the specimen, the number of portal areas sampled, and the clarity of staining. For pathological assessment, 25 mm is considered the ideal liver biopsy length to fully evaluate NASH lesions. However, 15 mm is also acceptable to balance the potential invasive injury and adequate evaluation [74]. For the width of the specimen, most studies recommend 16G cutting biopsy needles to provide a sufficient observation area. In addition, research centers need to avoid fragmentation of specimens as much as possible, as the integrity of specimens can also improve the reliability of diagnosis. For digital images, the sharpness of scanning is also a factor that can affect interpretation. Sponsors and investigators should deploy high-definition scanning equipment to ensure each specimen’s clarity, and provide qualified scanned images for pathologists to read.

Precise morphological assessment may improve identification of dynamic changes in fibrosis

The existing classification system categorizes fibrosis of NASH into five stages (0–4) [25]. Nevertheless, fibrosis/scarring in the same stage may show different morphology [64]. Future research should focus on the evaluation of precise morphological characteristics of fibrosis, to predict the direction of disease development. The “Beijing classification” provides a new potential idea for the evaluation of dynamic changes in fibrosis [39]. As mentioned above, this new fibrosis quality classification focuses on the morphological balance between progressive and regressive fibrosis/scarring. Based on the traditional fibrosis evaluation system, this new classification system categorizes patients into predominantly progressive (P), indeterminate (I), and predominately regressive (R), according to the morphology of fibrosis after antiviral therapy for chronic hepatitis B. A recent research of chronic hepatitis C used this new classification to assess the antiviral efficacy of direct-acting antiviral agents [75]. However, there are distinct differences between the patterns of fibrotic changes in viral hepatitis and NASH. Zone 3 pericellular fibrosis is the typical feature of NASH fibrosis [76]. Further studies are required to assess whether the “Beijing classification” can be used for the evaluation of NASH fibrosis (progress or reversal).

Conclusions

Histological evaluation based on liver biopsy is of great value in the development of new NASH drugs. It is used as the inclusion criteria, classification basis, and therapeutic efficacy/endpoint in clinical trials. Due to the close correlation between histological findings and long-term prognosis, liver biopsy can be used as a surrogate efficacy/endpoint in clinical trials of new drugs. Despite the inherent challenges in liver biopsy, including variable reliability of pathological evaluation, potential risk of invasive injury, sampling variability, insufficient evaluation of dynamic changes in fibrosis, inadequate identification of the complex heterogeneity, liver biopsy is still irreplaceable in the development of new NASH drugs. These challenges could be hopefully solved through new technology and good experimental design in the future.