Introduction: The Growing Importance of Data in Health Care

Oncology research has been accompanied by important health care innovations in cutting-edge technology over the last decades. In the era of information technology, a remarkable amount of biological clinical data has been generated. As the data accumulated, the demand for bioinformatics flared up. Advanced data science has rapidly emerged in oncology research, where it is used for the analysis and interpretation of biological data for cancer diagnosis and clinical treatment planning (Bayat 2002). The paradigm of cancer treatment has shifted from chemotherapy to immunotherapy, evolving into ‘precision oncology’ guided by highly sophisticated biomarkers. The growing precision medicine market is predicted to exceed USD 119 billion by 2026 (Ugalmugle and Swain 2020). Advances in technologies such as data science are leading the fourth industrial revolution. Data are not only pushing the technology forward but are also seen as a key factor to achieve success in the fourth industrial revolution. Data are expected to advance the development of new technology and industry like artificial intelligence (AI). Health care is one of the most commonly addressed applications of the technological data revolution. Not only contributing to developing new technologies, but data such as big data or real-world data (RWD) are also expected to help provide scientific and systematic evidence to policymakers by combining all available evidence.

As a way of example, in the United States, the 21st Century Cures Act enacted in 2016 was designed to help accelerate medical product development, bringing new innovations and advances into patients who need them faster and more efficiently (FDA 2020a). This Act also placed additional focus on the use of RWD to support the regulatory decision making, including approval of new indications for approved drugs (FDA 2020b). Three years after, the US Food & Drug Administration (FDA) approved the palboclib for treating male breast cancer based on RWD from electronic health records (Wedam et al. 2020). Rare diseases such as male breast cancer have the challenge of obtaining evidence from RCT, which is a big hindrance to developing drugs. However, in this case, RWD played an important role as primary evidence instead of RCTs in getting the FDA’s approval. Using RWD, FDA enabled the early approval of new health technology, which fulfils the unmet medical needs. This example shows that the data utilisation impacts early stages of drug development and the whole process of development, such as drug approval and reimbursement decision making.

Data have therefore become a central topic in health care. The objective of this chapter is to look at the opportunities and challenges of using RWD in health technology assessments, in particular by using examples of the NICE technology appraisals. In order to do so, we will first look at the definition of RWD and big data in section “Real-world data and big data: some definitions”. The following each sections “Health Technology Assessment in the era of information technology” and “Opportunities related to using RWD in HTA” will describe the opportunities and challenges of using RWD in health technology assessment (HTA) with detailed example, how manufacturers, evidence review groups (ERGs) and committee have used RWD in appraisals. In the last section, it will briefly emphasise the deliberation of using RWD based on understanding of its potentialities and limitations.

Real-World Data and Big Data: Some Definitions

When looking at the current trends in terms of data in health care, there are two key concepts – big data and RWD. Commonly, the terms RWD and big data are used interchangeably. However, the relationship between RWD and big data is not as straightforward. Although there is no consensus on their definitions and the boundary between big data and RWD, the two terms are not identical. The Head of Medicines Agencies and European Medicines Agency set up a joint task force for the best use of big data, including RWD such as electronic health records and data from patient registries in 2019 (HMA, EMA 2019). Broadly speaking, big data usually refers to “the explosion in quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology”(Gibbs and McKendrick 2015, 235). Characteristics of big data are summarised into volume (massive amounts), velocity (high-speed processing) and variety (heterogeneous data), the so-called 3Vs of big data (2020). On the other hand, RWD is data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources, including electronic health records, medical claims and billing, product and disease registries, as well as health-related data from mobile devices (FDA 2018). Data such as NHS electronic hospital data, cancer registry data, claims data, and even patient-reported information collected from wearable devices are all RWD.

Big data and RWD are inherently complex, encompassing a wide range of information specific to the health and everyday life of individuals. This is an extensive definition that leaves room for interpretation. For example, NHS electronic hospital data and cancer registry data are routinely collected, large and unprocessed data can be used in potentially different ways. Therefore, they can be RWD as well as big data. Nonetheless, it is incorrect to say that all RWD is big data, even if the attributes of big data seem substantially transferable to RWD. Since big data collectively indicate all data, it is not necessarily RWD. For example, the Big Data Institute at the University of Oxford is working on clinical AI for the patient-centred management and treatment of chronic disease (Oxford Big Data Institute 2019). In order to understand the complexity of the disease, it includes clinical trials, NHS hospital data and all available forms of data from all over the world. Whereas these collective data are considered big data, it is wrong to classify these data as RWD. Similarly, RWD is not always big data. Data from Compassionate Use Programme (CUP) are an example. The CUP is the scheme, which allows the patient who cannot enter a clinical trial to use unauthorised medicine under strict conditions (EMA n.d.-a). The drug in development can be available to patients who are not eligible for clinical trials or use unapproved therapies (EMA n.d.-b). According to the definition, it is RWD, a retrospective observational cohort study routinely collected from the real-world. The CUP was used in appraising the clinical effectiveness and cost-effectiveness of idelalisib for treating refractory follicular lymphoma by the National Institute for Health and Care Excellence (NICE) in the UK. In the NICE appraisal of idelalisib (TA604), CUP data collected in the UK and Ireland (n = 65) was submitted in order to complement the evidence. It generates valuable information which belongs to a category of RWD in patient populations with unmet needs, and it informs future RWD use (Balasubramanian et al. 2016, 251). While the study furnished the information on the real-world evidence, the study’s characteristics deviated from the attributes of big data in terms of volume, velocity, and veracity. In this chapter, the scope of data only focuses on RWD in order to understand the issues around RWD more comprehensively.

Health Technology Assessment in the Era of Information Technology

In this era of information technology, “evidence-based practice” has been a keyword. Evidence-based practice is the “integration of best research evidence with clinical expertise and patient values (De Brún 2013, 3).” When integrating the evidence, it is necessary to consider all available data in an unbiased, transparent and scientific manner. HTA is an example of evidence-based practice aiming to provide the best evidence to health care decision-makers. As briefly mentioned above, HTA is a systematic evaluation of short- and long term safety, clinical effects, and cost-effectiveness of health technology and technology-related social, economic, and ethical issue in terms of health care resource use (Henshall et al. 1997; Potter et al. 2008; WHO 2015). Over the last decades, HTA has become more critical as evidence-based decision making has become more prominent in the health system. Specialised HTA bodies such as NICE have worked hard to enhance the methods of synthesising the evidence. Based on the evidence, the NICE produces guidance, including technology appraisal guidance (TA guidance) and advice for health, public health and social care practitioners (National Institute for Health and Care Excellence (NICE) n.d.). In evidence-based medicine, randomised controlled trials (RCTs) are regarded as the highest level of evidence because they are designed to be unbiased and have less risk of systematic bias (Burns et al. 2011). In the drug approval process, RCTs have been mainly required to show efficacy and safety compared to a control group. Whilst RCTs are the gold standard of evidence for establishing efficacy, it is sometimes difficult to conduct RCTs. For example, the medicine treating rare disease has difficulty in conducting RCTs due to lack of appropriate trial design, proper measurements to complement the trial design, selection of the correct sample and ethical recruitment to participation (Augustine et al. 2013). Moreover, trial-based economic evaluation raises questions regarding generalisation, how representative is the trial of the patient population (Sculpher et al. 2004, 2). Health economic models require a range of data, not all of which are available from RCTs. It is being explored how RWD are able to supplement and enrich the evidence in the arena of HTA (Makady et al. 2017a). Consequently, HTA is reshaping to now be incorporating RWD as evidence, and critical questions are posed regarding the most appropriate ways to incorporate RWD as evidence in HTA. While the NICE has been already committed to embracing all available evidence to appraise innovative health technologies, they set out their ambitions to increase and extend the use of data, including RWD in the development and evaluation of NICE guidance. In February 2020, the NICE announced a statement of intent that a broader range of data will be utilised to address evidence gaps, including electronic health record data and RWD “looking at health and social care practice outside of trials, such as registries and clinical audits” (NICE 2020a, b). In November 2020, the NICE launched the consultation on reviewing their methods for health technology evaluation (NICE 2020b). In their proposal, the NICE explicitly addressed their preference for RCTs but also emphasised the role of the comprehensive evidence base, including non-RCTs and real-world evidence. NICE mentioned that “This type of evidence (real-world evidence) is an important topic, and NICE health technology evaluations are ambitious in ensuring that we make the most of this valuable resource” (Aggarwal et al. 2017; NICE 2020a).

Opportunities Related to Using RWD in HTA

RWD can be used in HTA in several ways. First, RWD can help to extrapolate the long-term survival curve after the trial period for economic evaluation. The NICE makes appraisal recommendations based on the cost-effectiveness or the estimated costs of interventions in relation to expected health benefits over the lifetime of patients (NICE n.d.). The health benefit, usually survival rate in oncology, is then extrapolated from clinical trials, as they only show the health outcome over a limited time period. As the clinical evidence from trials is often limited, the extrapolation is likely to be biased. In that case, RWD can positively supplement the extrapolation of the survival rate by documenting some patients’ characteristics and clinical practice over a longer observation period. For example, in the NICE technology appraisal of pembrolizumab with carboplatin and paclitaxel or nab-paclitaxel for untreated metastatic squamous non-small-cell lung cancer (NSCLC) (NICE TA600 iussed in 2019c), the manufacturer used RWD to extrapolate overall survival (OS) in its submission. In the appraisal, the OS at cancer stage 4 was not available due to the small number of surviving patients. Therefore, the company mentioned that:

It was considered necessary to assess longer-term OS for the trial chemotherapy arm using available population data for squamous NSCLC patients and compare to results from parametric fitting. (p: 141, company submission)

In order to assess the survival, the company analysed the real-world registry data, the US Surveillance, Epidemiology and End Results (SEER) database. The SEER database is an authoritative source for cancer statistics in the United States. It is considered to be the gold standard for data quality amongst cancer registries in the US and globally (Duggan et al. 2016, 4). As real-world and big registry data, the SEER registry is a large, population-based sample, which represents over one-quarter of the US population as well as has long follow-up periods. The company compared OS beyond 12 months, from the projection by the parametric fitting approach of trial data and the SEER population data in order to examine the potential bias of using the SEER registry. As SEER data provided long-term data, it was able to observe that mortality risks within SEER gradually declined until around year 10 and then appeared to stabilise in the range of roughly a 10% risk per year. The company addressed the potential over-estimation of long-term mortality when using available clinical trial data for the best fitting parametric extrapolation model. Therefore, the company’s model used SEER data in both intervention and comparator arms. Although the committee preferred the model, which did not use the SEER database in NICE TA600, because of the absence of second-line treatment in the database, and too optimistic assumptions in the model, it shows how registry data can be used to estimate the long-term survival in HTA.

The second way RWD is used in HTA is when it provides information regarding the comparators such as choice of relevant comparators and treatment effects. As new health technologies become more sophisticated, they can have several comparators, and as the number of potential comparators increases, it becomes less likely that there are head-to-head RCTs comparing their clinical effectiveness. It is unavoidable to synthesise the evidence using other RCTs or other types of research for data for the comparators. In the technology appraisal of cabozatinib for treating previously treated advanced renal cell carcinoma (NICE TA463 issued in 2017b), four comparators, namely axitinib, everolimus, nivolumab and best supportive care, were selected. However, the METEOR phase 3 trial of cabozatinib only included everolimus in the comparator arm since nivolumab was approved during its trial. Due to the lack of comparator data, an indirect comparison was conducted based on available RCTs (Fig. 1). Combining RWD and RCTs in evidence synthesis has the potential to support the findings from RCTs, increase precision and enhance the decision-making process (Efthimiou et al. 2017).

Fig. 1
The flow chart illustrates the evidence network for clinical outcomes. Nivolumab, everolimus, cabozantinib, placebo, and sorafenib.

The evidence network for clinical outcome of TA463

Furthermore, in order to provide fast access to novel treatments, drug regulatory agencies like FDA and EMA accelerate approval based on the results of single-arm trials (Miller and Joffe 2011). The single-arm trial is the simplest trial design to obtain evidence of the efficacy of treatment among individuals with the targeted medical condition without randomisation and a control arm (Evans 2010, 1). Single-arm trials provide the outcome based on the hypothesis that they are also clinical trials designed to test the efficacy or safety of the intervention when there is no comparator. When the information of the comparator is not available in the same trial, the efficacy of comparators mostly comes from other data sources such as RWD or historical RCTs, which exploit the data of previously conducted RCTs (Zhang et al. 2010). NICE published an appraisal of axicabtagene ciloleucel (NICE TA559 issued in 2019a), in which an observational cohort study was used to provide data for the comparators. Axicabtagene ciloleucel is recommended for use within the Cancer Drugs Fund as an option for treating diffuse large B-cell lymphoma and primary mediastinal large B-cell lymphoma after two or more systemic therapies. It is an autologous anti-CD19 chimeric antigen receptor (CAR) T-cell therapy, which is an innovative technology that modified genetics. As this technology has been approved based on ZUMA-1, a single-arm study, comparator data needed to be taken from an alternative source, SCHOLAR-1. This database is a retrospective patient-level study with pooled data from two observational cohorts and follow-up of two large phase 3 RCTs. As the different patient characteristics in two data were likely to impact the clinical outcome, the company adjusted for patient performance status in order to exclude patient in SCHOLAR-1 who would not have been eligible for ZUMA-1. Even though it was noted that comparative-effectiveness results from single-arm studies were prone to bias, the committee concluded that using two single-arm studies was suitable and that it would consider the results of these studies in its decision making given the population characteristics (poor prognosis and vulnerability) and potential difficulties with randomisation. Another example is the guidance for venetoclax for treating chronic lymphocytic leukaemia (NICE TA487 issued in 2017d). The company set two comparators, including best supportive care and palliative care. Since, the clinical evidence for venetoclax came from one phase I (M12-175 trial) and two phase II (M13-982, M14-032) single-arm trials, the data for the comparator came from a different source. The company chose the UK CLL (chronic lymphocytic leukaemia) Forum registry data for survival outcome of palliative care in its submission. In the appraisal, the ERG comments that palliative care was not valid comparator as patients suitable for palliative care have the more advanced disease than those for whom ventoclax is an option. The committee concluded that best supportive care was a more appropriate comparator, and the evidence of palliative care was excluded in final decision-making. These examples show that RWD can help to fill the evidence gaps in some instances.

The third way RWD is used in HTA is when RWD supplements the information on a generalisation of evidence. Eichler et al. (2011) pointed out the limitation of RCTs with respect to the efficacy-effectiveness gap of results on the therapeutic efficacy of medicines from tightly controlled RCT settings and the effectiveness of medicine in the real-world. In the appraisal of afatinib for treating epidermal growth factor receptor mutation-positive locally advanced or metastatic non-small-cell lung cancer (NICE TA310 issued in 2014), the ERG highlighted that:

in view of the important uncertainties around the roles played by specific mutations (singly or in combination) in determining clinical benefit from TKIs (tyrosine kinase inhibitors), it would be most valuable to have data from a long-term clinical registry of all UK patients treated with TKIs. Such a data source could provide a basis for research and audit to inform future assessments of TKIs in a UK specific population.

This implies that RWD such as clinical registry can help generalise the result of RCTs, by including some of the uncertainties, complexity and non-linearity that characterise the efficacy of a treatment in a real-world setting. Also, RWD could give additional information, which is able to reflect the current clinical practice. In appraisal of oncology medicine, the choice of comparators and subsequent treatments is important to populate the cost-effectiveness model as it impacts not only survival outcome but also the cost. Usually, the clinical guideline indicates the treatment line, which clearly informs which drugs are available in each treatment line. However, the treatments are not equally used in clinical practice. Some treatments can be more frequently used than others due to better compliance or clinical prognosis. Also, there is a lack of an established standard of care in the latter line of treatment. In these cases, RWD can provide a snapshot of drug usage. In the technology guidance for Ibrutinib for treating Waldenstrom’s macroglobulinaemia (NICE TA491 issued in 2017e), the company demonstrated that the physician’s choice is the most relevant comparator due to the lack of a standard of care in the clinical guideline. In order to try to delineate the composition of a physician’s choice, a pan-European chart review was used. A medical chart review of Waldenstrom’s macroglobulinaemia (WM) patients was used to generate data on epidemiologic/treatment patterns and efficacy outcomes for WM over a prolonged period of time, specifically on subsequent lines of treatment (i.e. 3L and 4L) because WM patients tend to receive multiple lines of treatment during their lifetimes. RWD would help to maintain the validity and generalisability of the evidence by capturing the current clinical practice.

Lastly RWD is used in HTA to appraise the treatments for rare diseases or conditions, the so-called orphan medicines (EMA n.d.-c). Orphan medicines have difficulties gathering the information to populate the economic evaluation model due to the small patient population. It is challenging to conduct good quality of RCTs. In most cases, the assumption of the model is based on the clinical experts’ opinions. RWD can give a wide range of information required in the cost-effectiveness analysis. It could be a treatment effect of the comparator or the resource use data such as the frequency of hospitalisation. For example, among 1930 people diagnosed with follicular lymphoma annually in the UK, only 52 double refractory patients are eligible for the idelalisib (NICE TA604 issued in 2019d). The manufacturer of idelalisib submitted DELTA, a single-arm trial, as primary clinical evidence along with a comparator cohort created from the registry data (HMRN; haematological malignancy research network). The committee acknowledged that it was likely that the HMRN was the only source of comparative data available for the UK population, and agreed to accept the estimate of progression-free survival from HMRN even though HMRN data had limitations. RWD also can supplement the information on choosing a survival model of rare cancer. The choice of survival distribution model has a huge impact on the estimate of survival. It is critical to know how hazard is changed over time. However, clinical trials of treatment for a rare cancer are less likely to provide the full picture of changing hazard due to the small size of the trial population. RWD such as registry which the long-term observed outcome is available, can help validate the model assumptions, including the choice of survival model. Likewise, in the appraisal of precision medicines, RWD might be able to fill the evidence gap created by difficulties in showing the statistical significance due to small populations.

To summarise, RWD is used in HTA for four main reasons:

  • To supplement the information when extrapolating the long-term survival curve after the trial period for economic evaluation.

  • To help provide information about the comparators such as the choice of relevant comparators reflecting clinical practice and treatment effects.

  • To help supplement the information on a generalisation of evidence which is hardly captured in RCTs.

  • To help appraise treatments for rare diseases or conditions.

Challenges of Using RWD in HTA

Despite the opportunities of using RWD in HTA mentioned in the previous sections, and the growing hype regarding big data, RWD is not a panacea for the evidence paucity in HTA. Indeed, we lack an understanding of the potential benefits, risks and limitations of using RWD. The first important challenge to the use of RWD in HTA, and despite an increasing interest in RWD worldwide, is that there is no consensus on the precise contours of what constitutes RWD, and many different definitions can be found (Makady et al. 2017b). This is one of the significant obstacles to using RWD in HTA. Indeed, while the flexible definition of RWD allows representing different concepts or types of information, it also limits the potential role of RWD in HTA (Berger et al. 2017, 3). With the objective of strengthening the use of RWD in HTA, Makady and colleagues (2017a, b) proposed four broad categories to define RWD: (1) data collected in a non-RCT setting, (2) data collected in a non-interventional/non-controlled setting, (3) data collected in a non-experimental setting, and/or (4) remainders. Among these four categories, ‘data collected in a non-RCT setting’ was the most commonly used definition of RWD. These definitions focus on the setting of collecting data. However, definition of RWD by FDA highlights the way to collect the data. As an umbrella term, FDA defines that RWD is the data relating to patient health status and/or the delivery of health care, which is routinely collected from a variety of sources (FDA 2018). In most cases, two definitions get along well; however, some studies can be interpretated in different way by the choice of definition. For example, A. Lloyd and colleagues on health state utility value is frequently used in the NICE appraisals. In this study, they interviewed the general public to get access to some of the societal preferences about treatment of metastatic breast cancer. The study was designed to include 100 people in order to try and represent the preference of the general public once in the study period (Lloyd et al. 2006). Whilst the health utility values were collected outside clinical data, data about health status was not routinely collected. Depending on which definition is chosen, this study can be defined as either RWD or not. According to the definition of FDA, the data from this study is not RWD as the data is collected once outside clinical trial. On the other hand, it can be RWD as the study is non-RCTs. Without any consistency in the definition of RWD, the potential benefits of using RWD in HTA are weakened.

One of the main concerns of using RWD in the appraisal is the issue of confounding. In statistics, a confounding variable is a variable, other than the independent variables of interested that may affect the dependent variable. It can lead to erroneous conclusions about the relationship between the independent and dependent variables (McDonald 2009). RWD is prone to be manipulated and biased by residual confounding since it is hard to control all the confounding factors, including explicit factors as well as underlying factors, without randomisation (Grieve et al. 2016). It is inadequate to distinguish between the effect of the treatment, a placebo effect, and the effect of natural history (Evans 2010, 2). For example, patient’s health status such as cancer stage and underlying health condition are highly likely to influence clinical outcomes. As the response rate to second-line treatment differs from first-line treatment, it is critical to understand the patient characteristics for precise assessment. The appraisal of Ibrutinib for treating relapsed or refractory mantle cell lymphoma (NICE TA502 issued in 2018) included the HMRN audit data for comparator data since the main clinical evidence was a single-arm trial. The HMRN data consisted of evidence from a unified clinical network operating across 14 hospitals in Northern England (Yorkshire). The company used data on the benefit of the comparator (R-chemo; rituximab + chemotherapy) from the HMRN audit of 118 patients with mantle cell lymphoma that had been treated with first-line treatment. However, the ERG had a concern about the evidence that the HMRN audit did not specifically relate to patients with relapsed or refractory mantle cell lymphoma. The ERG also highlighted that:

It is also noteworthy that since this is not a trial, differences in outcomes between patients receiving R-chemo and those receiving chemotherapy alone may be subject to confounding. The HR (hazard ratio) reported in the audit includes adjustments only for age and sex. (ERG document, 67)

Another set of challenges to the use of RWD in HTA are the unanchored comparisons. Unanchored treatment comparison result from the network of studies being disconnected or single-arm studies (Phillippo et al. 2016). When treatment outcomes come from single-arm studies such as phase 1/2 clinical studies or observational studies, the comparison is unanchored. Unanchored comparison is highly likely to misguide the result as it is confounded by the differences between the two populations. Since the number of technologies in which single-arm trials are the primary clinical evidence has increased for drug approval and reimbursement assessment, the population adjustment methods such as matching adjusted indirect comparison (MAIC) and simulated treatment comparison (STC) were highlighted (Phillippo et al. 2016). The methods assume to take account of all effect modifiers and prognostic factors and control them. If the assumption fails, it will lead to a biased conclusion. In the appraisal of cemiplimab for treating metastatic or locally advanced cutaneous squamous cell carcinoma (NICE TA592 issued in 2019b), only two single-arm trials of cemiplimab were available. The comparator data were very limited. Therefore, a non-UK retrospective chart review study was included in company’s base case. The study evaluated the outcome of patient who took systemic therapy reviewing patient hospital records (Jarkowski et al. 2016). The company tried to use STC and MAIC for indirect treatment comparison. However, it concluded to choose the naïve comparison due to the uncertainty around missing unmeasured prognostic factors and the validity issue of survival curve, which comes from significantly reduced effective sample size (65% of the original sample size). The committee noted that it was not methodologically recommended because outcomes were likely to be confounded by differences between the populations of the studies (Fig. 2).

Fig. 2
The figure illustrates the indirect treatment. Cemiplimab and, Chemotherapy and B S C are disconnected. Cemiplimab consists of phase 1, phase 2 empower-C S C C 1. Chemotherapy and B S C consists of chart review study.

Indirect treatment comparison of TA592

Besides, when incorporating RWD into HTA, different approaches should be applied due to the variation of the contents in RWD. As RWD include the diverse type of data, each dataset has different attributes. It means that all RWD does not necessarily provide the same information. For example, Korean healthcare claims data collected by Health Insurance Review and Assessment Service (HIRA) are national-wide data of over 50 million people (Kim et al. 2020). The database includes the information of age, gender, diagnosis, and utility volume of medical intervention (diKhi n.d.). Although the data have useful information in terms of resource use, the clinical data of patients is not available. Inevitably, clinical research such as survival analysis using the claims data sets the operational definition by an individual researcher. It is a strong assumption that the effect modifiers are adjusted by variables defined operationally. Eventually, it could bring uncertainty into the appraisal. Therefore, several approaches of incorporating RWD by characteristics should be discussed in order to use RWD in the appraisals without distortion.

Moreover, the quality of RWD questions the reliability of the outcome as evidence. To evaluate the quality of RWD, we need to know precisely how the data has been collected and how it has been used in HTA. Due to the characteristics of observational studies, RWD has limitations in the quantity and quality of information. In the aforementioned venetoclax technology appraisal guidance (NICE TA487), the quality of data is the issue to include as the key evidence for decision making. The target population for the decision problem was stratified by 17p deletion/TP53 mutation group and failure of B-cell receptor pathway inhibitor (BCRi). Therefore, information on chromosomal abnormality and disease staging is essential. While the registry data have information on time from BCRi treatment failure to death, staging information is not complete. The lack of staging information introduced the significant mismatch between comparators group and intervention group. In company submission, it reported that:

As patients without the deletion have a better prognosis than patients with the deletion, and given the fact that UK CLL forum data were not stratified by del(17p)/TP53 mutation, this may contribute to overestimating the survival of palliative care which appears better than BSC on the long term. (CS, 145)

Another challenge is that of generalisation. RCTs provide efficacy and safety data with relatively high internal validity, but their results may not be readily generalisable to a broader, more heterogeneous population (Makady et al. 2017b). RWD is expected to provide more information to reflect the clinical practice. Notwithstanding the expectation for improving the external validity of RCTs, RWD has limitations in terms of representativeness of the population. It is highly questionable whether all RWD fully capture a holistic picture of reality. For example, the GIDEON (Global Investigation of therapeutic DEcisions in hepatocellular carcinoma and Of its treatment with sorafeNib) study predominantly includes the Asian population to represent the general UK population in the evaluation of sorafenib for treating advanced hepatocellular carcinoma (NICE TA474 issued in 2017c). Since the treatment effect of sorafenib differed by global regions, it is questionable to use GIDEON data to predict the treatment effect in the UK population. Another example is ceritinib for previously treated anaplastic lymphoma kinase positive non-small-cell lung cancer (NICE TA395 issued in 2016). The manufacturer of certinib submitted additional real-world evidence (Gainor et al. 2015), which were medical records reviewed to determine OS and PFS (progression free survival) in patients who were treated with sequential crizotinib and ceritinib between 2008 and 2014. The ERG criticised that this retrospective non-randomised study did not clearly show how similar these participants are to those in the ceritinib studies. In the appraisal of nivolumab for treating relapsed or refractory classical Hodgkin lymphoma (NICE TA462 issued in 2017a), the generalisability of RWD into UK practice was questioned. The company used the Cheah et al. study for evidence on the clinical outcome estimates of comparator, OS and PFS. The data used in the study came from the American hospital database (Cheah et al. 2016). The committee considered whether the population and composition of treatments in the Cheah et al. study reflected clinical practice in the UK. It considered that the study population partially matched the population of interest. Furthermore, it deemed that the study may not reflect UK practice, notably regarding subsequent treatment rates of allogeneic stem cell transplant. Even if RWD is collected from routine practice, the context of collecting data could be different by country or region. The difference is likely to introduce a bias in representativeness. As the study type such as observational study does not guarantee the generalisability of the evidence, the clinical and social context should be carefully considered when using RWD.

To summarise, these are some of the key challenges related to the use of RWD in HTA:

  • There is no consensus on the precise contours and definition of what constitutes RWD.

  • RWD inherits the risk of confounder to see the causality.

  • RWD is challenging to see the relative treatment effect due to the disconnection with other clinical studies.

  • It is required to understand each dataset separately with a caveat that individual data categorised as RWD have different characteristics.

  • Quality of RWD such as incompleteness is often questioned.

  • RWD is not necessarily generalisable as it does not always reflect whole patients or up-to-date practices.

What Is Next: Do We Want RDW in HTA Be the Cynosure of All Eyes?

The hope for advances in health care through the use of big data, and more specifically RWD, is getting stronger. The pharmaceutical industry has been using RWD for decades to conduct post-market research, inform its decision-making, respond to requests from external stakeholders, and improve market positioning (Mckinsey&Company 2020). The advances in digital and advanced analytics allow RWD to be more employed in the health care ecosystems. The quality of RWD itself is also steadily enhanced as the research using RWD has increased drastically in the last decade (Booth et al. 2019; Evans et al. 2021). Growing interests in RWD clearly create more and more opportunities to generate new evidence from drug development to post-approval studies (Rudrapatna and Butte 2020). In the era of digitalised RWD, the progression in the use of RWD makes the public hold great promise in the ability of it to transform the entire health care system (Berger et al. 2015). The FDA approval of palbociclib shows that RWD can take a central role in the regulatory process. While the FDA accepted RWD in limited use, such as informing the prognosis or natural history of the disease, it was the first approval based on RWD in oncology. The FDA shows confidence that leveraging data such as RWD using modern techniques will unlock new insight and provide state-of-the-art tools to enhance public health. Such potentialities have inundated public discourses with optimistic narratives of cutting-edge scientific innovation and hopes for a complete cure for devastated cancer patients, praising modern science progress and putting forward opportunities for accessing innovative technology at the earliest possible time. However, it is still questioned that RWD can replace the state of RCTs (Ramagopalan et al. 2020). Without appropriate consideration of using RWD, the regulatory body has to take much greater level of uncertainty than the benefit of accelerating patient access to treatment.

In HTA process, diverse types of data such as RWD are already incorporated into evidence. As the interest in RWD is growing, the use of RWD in treatment effects receive more attention. But as discussed above, using RWD in HTA is intricate. Indeed, we have seen that there are many uncertainties related to what RWD can actually bring to HTA in particular, and health care systems in general. RWD are complex, difficult to grasp and to define, and it is also difficult to evaluate their quality. In addition, RWD has limitations to assess the relative treatment effects due to confounders and dysconnectivity. Therefore, it is crucial to think to what extent RWD can be incorporated in HTA, in which part of evidence synthesis it can actually contribute, and what are its limitations. Diverse definitions and data formations allow RWD to be used widely, but also present challenges of consistency, quality, generalisation, and purpose. It is important to critically scrutinise RWD and the hope it might convey, and carefully examine the quality of RWD as a source of evidence. RWD is not, and should not be considered as, an easy fix to the complex question of how to assess health technologies. Arguably, a more systematic approach to RWD could help enhance its robustness when used in HTA. In this chapter, I questioned the idea of RWD as a corrective measure, by critiquing the effectiveness of RWD utilisation in HTA. By critically questioning the drawbacks, limitations and challenges of using RWD in HTA, we can expect to have a more balanced and responsible use of RWD in the future, that does not overpromise results that are unachievable in a context of high uncertainties and complexity. It would also help form more realistic expectations of what RWD can and cannot bring to HTA and health care in general. And fundamentally, before expanding the use of RWD in technology assessment, we should think about exactly what are RWD, to what purpose we want to use them, and how we can meaningfully evaluate their quality. From that line of critical questions, we would be able to think more realistically about the practical benefits and challenges to incorporate these data into evidence synthesis, and to what extent RWD actually contributes to the evaluation of new technology.