Main

Over recent decades, technological innovations have transformed the healthcare domain with the ever-growing availability of clinical data supporting diagnosis and care. Medicine is moving towards gathering multimodal patient data, especially in the context of age-related chronic diseases such as cancer1,2. Integrating different data modalities can enhance our understanding of cancer3,4, and paves the way for precision medicine, which promises individualized diagnosis, prognosis, treatment and care1,5,6.

Increasingly, we are moving from the traditional one-size-fits-all approach to more targeted testing and treatment. Although molecular pathology revolutionized precision oncology, the first Food and Drug Administration (FDA)-cleared companion diagnostic assays relied on simpler molecular methods, and most assays focused on a single gene of interest7,8. However, advances in next-generation sequencing (NGS) now allow for multitarget companion diagnostic assays, which are becoming more prevalent8,9. The continuing cost reduction makes it possible to simultaneously profile thousands of genomic regions, hinting that multitarget panels could soon be run at a similar price point to that of testing five to ten targets individually10. Multitarget tests not only conserve time and tissue but also have the potential to identify complex genetic interactions, thereby enhancing our understanding of tumour biology. While NGS is still in full swing, a third wave of technologies featuring single-molecule, long-read and real-time sequencing is already on the rise. Pacific Biosciences and Oxford Nanopore Technologies enable the assembly and exploration of genomes at unprecedented resolution and speed11. This technology was recently used in a clinical setting to diagnose rare genetic diseases with a turnaround rate of only eight hours12. As cancer is often multicausal, the area of precision oncology greatly benefits from these developments.

At the same time, histopathology and radiology have been critical tools in clinical decision-making during cancer management13,14. Histopathological evaluation enables the study of tissue architecture and remains the gold standard for cancer diagnosis15. More recently, notable progress in whole-slide imaging (WSI) has led to a transition from traditional histopathology methods towards digital pathology16. Digital pathology, the process of ‘digitizing’ conventional glass slides to virtual images, has many practical advantages over more traditional approaches, including speed, more straightforward data storage and management, remote access and shareability, and highly accurate, objective and consistent readouts. On the other end of the spectrum is radiographic imaging, a non-invasive method for detecting and classifying cancer lesions. In particular, computed tomography and magnetic resonance imaging (MRI) scans are useful for generating three-dimensional images of (pre)malignant lesions.

Ongoing improvements in artificial intelligence (AI) and advanced machine learning (ML) techniques have had major impacts on these cancer-imaging ecosystems, especially in diagnostic and prognostic disciplines17. Current annotation of histopathological slides relies on specialized pathologists. Leveraging image-based AI applications would not only alleviate the pathologists’ workload but also has the potential for more efficient, reproducible and accurate spatial analysis capturing information beyond visual perception17,18,19. Radiomics and pathomics refer to fields focusing on the quantitative analysis of radiological or histopathological digital images, respectively, with the aim of extracting quantitative features that can be used for clinical decision-making20. This extraction used to be done with standard statistical methods, but more advanced deep learning (DL) frameworks such as convolutional neural networks, deep autoencoders and vision transformers are now available for automated, high-throughput feature extraction21,22,23,24. Automatic assessment of deterministic objective features has enabled the quantification of tumour microenvironments (TMEs) at unprecedented speed and scale. In addition to the quantification of known handcrafted salient features without inter-observer variability, DL has the ability to discover unknown features and relationships that can provide biological insights and improve disease characterization25. A notable radiomics study in lung cancer found that DL features captured prognostic signatures, both within and beyond the tumour region, that correlated with cell cycle and transcriptional processes26. Despite the diverse capacity of DL, one of the main challenges is the need for large datasets to train, test and validate its algorithms. But, owing to ethical restrictions and the labour intensity to annotate clinical images, most studies have only limited access to large cohorts that contain ground-truth-labelled data27.

Under the 21st Century Cures Act28, the FDA set a goal to advance precision medicine where the patient is at the centre of care. This act defines timelines for discovery, development and delivery, and requires the fusion of evidence across modalities, with the provision that this must include real-world data and patient experience. Technological advances initiated an era where clinical data are being captured from multiple sources at unprecedented pace, ranging from medical images to genomics data and patient-generated health data. Together with successes in AI, this opens the opportunity and necessity to analyse many data types with these advanced tools to better inform decision-making and improve patient care. So far, the FDA has cleared and approved several AI-based software as a medical device29. Together with the publication of their recent AI/ML white paper30, the FDA wants to highlight their intention to develop a regulatory framework for these highly iterative, autonomous and continuously learning algorithms as well as for the specific data types necessary to assure safety and effectiveness. Some proposed considerations for data inclusion are (1) relevance to the clinical problem and current clinical practice, (2) data acquisition in a consistent, generalizable and clinically relevant manner, (3) appropriate definition and separation of training, tuning and test sets, and (4) appropriate level of transparency of the algorithm and its output to users.

Integration of AI functionalities in medical applications has increased in recent years31. However, so far, most methods have focused on only one specific data type at a time, leading to slow progress in approaches to integrate complementary data types with many remaining questions about the technical, analytical and clinical aspects of multimodal integration32,33,34,35. To advance precision oncology, healthcare AI should not only inform about cancer incidence and tumour growth but also must identify the optimal treatment path, accounting for treatment-related side effects, socioeconomic factors and care goals. Precision medicine can therefore be achieved only by merging complex and diverse multimodal data that span space and time. Single data modalities can be noisy or incomplete, but when combined with redundant signals from other modalities, they can be more sensitive and robust to diagnose, prognose and assign treatments. Multimodal data are now being collected, providing a resource for biomarker discovery36,37,38,39. For cancer, both prognostic and predictive biomarkers are of interest. While prognostic biomarkers provide information on the patient’s diagnosis and overall outcome, predictive biomarkers inform about treatment decisions and response40.

Here, we argue that several sources of routinely collected medical data are not used to their full potential for diagnosing and treating patients with cancer, because they are studied mostly in isolation instead of in an integrated fashion. These are: (1) electronic health records (EHRs), (2) molecular data, (3) digital pathology and (4) radiographic images. When combined, these data modalities provide a wealth of complementary, redundant and harmonious information that can be exploited to better stratify patient populations and provide individualized care (Fig. 1). In the next sections, we discuss both challenges and opportunities for multimodal biomarker discovery as it applies to patients with cancer. We cover strategies for data fusion and examine approaches to address data sparsity and scarcity, data orchestration and model interpretability.

Fig. 1: Generation and processing of routinely collected biomedical modalities in oncology.
figure 1

Before data fusion, different steps are needed to go from the raw data to workable data representations for each modality—for example, EHRs, molecular data and medical images. Icon credits: microarray, Guillaume Paumier, under a Creative Commons licence CC BY-SA 3.0; EHR, data processing, DNA and encoder icons, the Noun Project (https://thenounproject.com/); DNA sequencer, MRI machine and stethoscope, created with Biorender.com; genomic circular circus plot, ref. 166, Cold Spring Harbor Laboratory Press; tissue and brain slices, created using TCGA data originally published by the National Cancer Institute.

The need for multimodal data fusion in oncology

Despite huge investments in cancer research and improved diagnosis and treatments, cancer prognosis is still bleak. Predictive models based on single modalities offer a limited view of disease heterogeneity and might not provide sufficient information to stratify patients and capture the full range of events that take place in response to treatments41,42. For example, although immunotherapeutic methods such as antibody–drug conjugates and adoptive cell therapy (for example, T-cell receptor and chimeric antigen receptor T-cell therapy) have shown to be promising, response rates vary markedly depending on the tumour subtype43 and the TME44. Various TME elements play a role in tumour development and also in the therapeutic response. Furthermore, the cellular composition of the TME dynamically evolves with tumour progression and in response to anticancer treatments45,46. The increasing application of immunotherapy underlines the need for (1) a deeper understanding of the TME and (2) multimodal approaches that allow longitudinal TME monitoring during disease progression and therapeutic intervention47.

Currently, biomarker discovery is mainly based on molecular data48. Increasing implementation of genomics and proteomic technologies in a clinical setting has led to growing availability, but also growing complexity, of molecular data8. Large consortia such as The Cancer Genome Atlas (TCGA) and Genomic Data Commons have gathered and standardized large datasets, accumulating petabytes of genomic, expression and proteomics data37,49,50. Barriers for NGS assay development, validation and routine implementation remain due to many factors, such as tumour heterogeneity, sampling bias and interpretation of the results. Clinically accepted performance requirements are also often cancer-specific and depend on where in the care trajectory and for what specific purpose (for example, diagnostic, stratification, drug response or treatment decision) tests are used51. As relevant as molecular data are for precision medicine, they discard tissue architecture, spatial and morphological information.

Although lower in resolution than genomic information, both WSI and radiographic images potentially harness orthogonal and complementary information. Digital pathology with WSIs provides data about the cellular and morphological architecture in a visual way for pathologists to interpret and can provide key information about the TME’s spatial heterogeneity using image analysis and spatial statistics52. Similarly, radiographic images such as MRI or computed tomography scans provide visual data of the tissue morphology and three-dimensional structure53.

Integration of data modalities that cover different scales of a patient has the potential to capture synergistic signals that identify both intra- and inter-patient heterogeneity critical for clinical predictions54,55,56. For example, the 2016 World Health Organization classification of tumours of the central nervous system revisited the guidelines to classify diffuse gliomas, recommending histopathological diagnosis in combination with molecular markers (for example, isocitrate dehydrogenase 1 and 2 (IDH1/2) mutation status), as each modality alone is insufficient to explain patient outcome variance32,33. Of late, some reports also suggest the use of DNA-methylation-based classification of central nervous system tumours34,35.

The need for integrative modelling is increasingly emphasized. In 2015, a report from Ritchie et al.57 highlighted that “approaches to combine multiple data types provide a more comprehensive understanding of complex genotype–phenotype associations than analysis of one dataset”. In recent years, there have been several attempts to develop multimodal approaches, to a great degree stimulated by community-driven competitions, such as DREAM and Kaggle (that is, http://dreamchallenges.org/ and https://www.kaggle.com/). But more work is needed to integrate routinely collected data modalities into clinical decision systems.

Data fusion strategies for multimodal biomarker discovery

The age of precision medicine demands powerful computational techniques to handle high-dimensional multimodal patient data. Each data source has strengths and limitations in its creation, analysis and interpretation that must be addressed.

Medical images, whether two-dimensional in histopathology or three-dimensional in radiology, contain dense information that is encoded at multiple scales. Importantly, they contain high spatial correlation and any successful approach needs to take this into account58. So far, the best performing methods have been based on DL, and specifically convolutional neural networks59,60,61. Continuous improvement in detection, segmentation, classification and spatial characterization means that these methods are becoming a crucial part of cancer biomarker algorithms.

EHRs comprise various data types ranging from structured data such as medications, diagnosis codes, vital signs or lab tests, to unstructured data in the form of clinical notes, patient emails and detailed clinical processes. Natural language processing (NLP) algorithms that can extract useful clinical information from structured and unstructured EHR data are being developed. A recent study showed the feasibility and power of such ML tools in a lung cancer cohort to reliably extract important prognostic factors embedded in the EHRs62. Structured EHR sources are the easiest to process. Usually, these data are embedded into a lower-dimensional vector space and fed as input to a recurrent neural network (RNN). Long short-term memory and gated recurrent unit are the most popular RNN architectures for this purpose63,64,65. While structured EHR data have obvious value, integration with insights from unstructured clinical data has shown to greatly improve clinical phenotyping66. Fortunately, advances in NLP now make it possible to mine the unstructured narratives of patient records. One way to process these data is to convert free text to medical concepts and create lower-dimensional ‘concept embeddings’. Older methods such as Word2Vec67 and global vectors for word representations (GloVe)68 have almost been overtaken by ‘contextualized embeddings’ such as embeddings from language models (ELMo)69 and bidirectional encoder representations from transformers (BERT)70,71,72. While ELMo uses RNNs, BERT is based on transformers, a neural architecture that has revolutionized the NLP field since its inception73. To unlock the full potential of EHRs, more appropriate techniques are needed combining structured and unstructured information, while accounting for the noise and inaccuracies that are common to these data74. In this regard, the concept of transfer learning for extracting clinical information from EHRs has gained a lot of traction75.

Effective fusion methods must integrate high-dimensional multimodal biomedical data, ranging from quantitative features to images and text76. Representing raw data in a workable format remains challenging as ML methods do not readily accept unvectorized data. A multimodal representation thus poses many difficulties. Different modalities measure distinct unmatched features with different underlying distributions and dimensionalities. Also, not all modalities and observations have the same level of confidence, noise or information quality77. Multimodal fusion often suffers from dealing with wide feature matrices originating from very few samples with many features across modalities. Often, advanced feature extraction methods such as kernel-based methods, graphical models or neural networks are needed before or as part of the data fusion process to reduce the dimensionality while preserving most of the salient biological signals77,78,79,80. Meaningful feature descriptions are the critical backbone of any model.

A major decision that must be made is at what specific modelling stage the data fusion takes place: (1) early, (2) intermediate or (3) late (Fig. 2)81,82,83. Early fusion is characterized by concatenating feature vectors of different data modalities and only requires the training of a single model (Fig. 2a). In contrast, late fusion is based on developing models on each data modality separately and integrating their single predictions with specific averaging, weighting or other mechanisms (Fig. 2c). Late fusion not only allows the use of a different, often more suitable, model for each modality but also makes it more straightforward to handle situations when some modalities are missing in the data. However, fusion at the late stage ignores possible synergies between different modalities84.

Fig. 2: Overview of different fusion strategies for multimodal data.
figure 2

a, Raw data are processed into workable formats. b, For each modality, features are extracted using dedicated encoder algorithms. c, Early fusion. d, Intermediate fusion. e, Late fusion. Icon credits: a, DNA icon, the Noun Project (https://thenounproject.com/); tissue and brain slices, created using TCGA data originally published by the National Cancer Institute; b, encoder, the Noun Project (https://thenounproject.com/); c, model icon, the Noun Project (https://thenounproject.com/).

While both early and late fusion approaches are model agnostic, they are not specifically designed to cope with or take full advantage of multiple modalities. Anything between early and late fusion is defined as intermediate or joint data fusion84. Intermediate fusion does not merge input data, nor develop separate models for each modality, but instead involves the development of inference algorithms to generate a joint multimodal low-level feature representation that retains the signal and properties of each individual modality (Fig. 2b). Although dedicated inference algorithms must be developed for each model type, this approach attempts to exploit the advantages of both early and late fusion79,83. One key difference with early fusion is that the loss is propagated back to the inference algorithms during training, thus creating updated feature representations per training iteration84. Although this allows for modeling complex interactions between modalities, techniques need to be in place to prevent overfitting on the training cohort. Importantly, there is currently no decisive evidence that one fusion strategy is superior, and the choice of a specific approach is usually empirically based on the available data and task84.

Advances in multimodal biomarkers for patient stratification

Multi-omics data fusion

Although a single omics technology provides insights into the profile of a tumour, one technique alone does not fully capture the underlying biology. The increasing collection of large cohorts of multi-omics cancer data has spurred several efforts to fuse multi-omics data to fully grasp the tumour profile and several models for survival and risk prediction have been proposed4,6,56,85,86,87,88,89,90,91,92,93. The TCGA research network has also published numerous papers investigating the integration of genomic, transcriptomic, epigenomic and proteomic data for multiple cancer types94,95,96. Additionally, for therapy response and drug combination predictions, multi-omics ML methods have proved their value over traditional unimodal models97,98,99,100. Although various multi-omics fusion strategies now exist, one single method will not be optimal for all research questions and data types, and sometimes adding more omics layers can even negatively impact performance101. Each strategy has its own strengths and weaknesses, and careful selection of effective approaches should be based on the purpose and available data types57.

Multiscale data fusion

Similar efforts as for multi-omics data fusion have been explored for multiscale data89,102,103,104,105,106,107. For example, Cheerla and Gevaert48 used an intermediate fusion strategy to integrate histopathology, clinical and expression data to predict patient survival for multiple cancer types. For each modality, an unsupervised encoder compressed the data into a single feature vector per patient. These feature vectors were aggregated into a joint representation allowing possible absence of one or more modalities48. Similarly, another study proposed a late fusion strategy to classify lung cancer. Using RNA sequencing, microRNA sequencing, WSI, copy number variation and DNA methylation, they achieved better performance than obtained by each individual modality108. A few examples exist that show the potential of radiology to further refine patient stratification109,110,111. However, owing to its high dimensionality and computational demands, so far most studies have avoided its inclusion112.

Imaging genomics and radiogenomics

When possible, molecular tumour information is nowadays used in cancer prognosis and treatment decisions. Interestingly, multiple studies have shown that phenotypes derived from medical images can act as proxies or biomarkers of molecular phenotypes such as an epidermal growth factor receptor (EGFR) mutation in lung cancer113,114,115. This discovery immediately gave rise to an emerging field called ‘radiogenomics’, the study of directly linking image features to underlying molecular properties116. For example, Itakura et al.117 used MRI phenotypes to define subtypes of glioblastoma associated with molecular pathway activity. Also, for breast cancer, the value of radiogenomics for risk prediction and better subtype stratification has been shown118,119,120.

Current challenges and future directions for multimodal data fusion

Use of multimodal data models is probably the only way to advance precision oncology, but many challenges exist to realize their full potential. Although data availability is the main driver of multimodal data fusion, it also poses the major barrier. DL requires large amounts of data, and both data sparsity and scarcity present serious challenges, especially for biomedical data. In clinical practice, there are often different types of data missing between patients, as not all patients might have all modalities owing to cost, insurance coverage, material availability and lack of systemic collection procedures, among others. To become relevant in an oncology setting, methods need to be able to handle different patterns of missing modalities. Fortunately, various interpolation, imputation and matrix completion algorithms have already been successfully applied for clinical data. These can range from basic methods including mean/median substitution, regression, k-nearest neighbour and tree-based methods to more advanced algorithms such as multiple imputation, multivariate imputation by chained equations or neural networks such as RNNs, long short-term memory and generative adversarial networks121,122,123. Also, with the recent successes in DL techniques, dedicated fusion approaches are becoming available that allow joint representations that can handle incomplete or missing modalities48,124,125,126,127,128,129.

However, there are two major hurdles to advance these efforts. First, the depth of data per patient, that is, many observables per patient are routinely generated and stored, but typical cohort sizes of patients are relatively small. Emerging evidence highlights that these cohorts are often biased, representing patients from higher socioeconomic status with continuous access to care and high levels of patient engagement130,131. Limiting analyses to patients with complete data will lead to model overfitting, bias and poor generalization. Second, the lack of large ‘golden labelled’ cohorts with matched multimodal data, mainly due to the intense labour to annotate cancer datasets combined with privacy concerns. Luckily, also here DL algorithms are starting to be developed. One popular approach is data augmentation132,133,134,135, which can include basic data transformations as well as generation of synthetic data, but other strategies such as semi-supervised learning136,137,138,139, active learning140,141, transfer learning139,142,143,144 and automated annotation145,146 have shown to be promising avenues to overcome labelled-data scarcity.

Despite its potential, a critical roadblock for the widespread adoption of DL in a clinical setting is the lack of well-defined methods for model interpretation. While DL can extract predictive features from complex data, these are usually abstract, and it is not always apparent if they are clinically relevant147. To be useful in clinical decision-making, models need to undergo extensive testing, be interpretable, and their predictions need to be accompanied by confidence or uncertainty measures148,149. Only then will they be relevant for and adopted by clinical practitioners.

Interpretation of black-box models is a heavily investigated topic and some methods for post hoc explanations have been proposed147,150. In histopathology, most work focuses on extracting the most informative tiles by selecting those with the highest model confidence or by visualizing tiles that are most relevant to the final prediction (Fig. 3a). For interpreting model predictions at higher resolution, the most relevant regions can be highlighted using gradient-based interpretation methods such as gradient-weighted class activation mapping (Grad-CAM) (Fig. 3b)151. Similarly, for molecular data, predictive features can be determined and visualized via Shapley additive explanation (SHAP)-based methods (Fig. 3d,e)150,152,153,154. Multimodal data add additional complexity and need careful evaluation of appropriate methods before scaling to multimodal interpretability. However, multimodal approaches are starting to emerge with encouraging solutions not only for interpretability but also for discovery of associations between modalities147,150. Note that the aforementioned methods specify why a model makes a specific decision, but do not explain the used features. Additional strategies could be leveraged to further unravel biological insights. For example, selected tiles could be overlayed with Hover-Net155 to segment and classify nuclei to evaluate predominant cell types (Fig. 3c, unpublished results on TCGA data).

Fig. 3: Examples of model interpretability methods for histopathology and gene expression.
figure 3

ac, Histopathology. a, Examples of informative tiles for predicting the presence of TP53 mutations from histopathology images in prostate cancer (unpublished results on TCGA data). b, Visualization of regions within tiles most relevant to the prediction, derived via Grad-CAM151. c, Individual cells within informative tiles are segmented and classified by Hover-Net155. For a fine-grained interpretation of relevant cells (black annotations), pertinent cells within the tile are encircled by calculating the contours from regions highlighted by Grad-CAM. d,e, Gene expression. d, Examples of SHAP visualization152 of hypothetical gene importance according to a unimodal model (top) and a joint multimodal model (bottom) for cancer survival prediction. e, Example of pathway importance visualization based on the respective gene SHAP values in unimodal (top) versus joint multimodal (bottom) models with respect to cancer survival prediction154. SeMet, selenomethionine; Sec, selenocysteine; MeSec, methylselenol; H2Se, hydrogen selenide; GLI, glioma-associated oncogene family zinc finger 1; HH, hedgehog; TCF, T cell factor; LEF, lymphoid enhancer factor; CTNNB1, catenin beta 1; Ras, rat sarcoma; PI3K, phosphatidylinositol 3-kinase; RUNXC3, runt-related transcription factor 3; Wnt, wingless/integrated; TYSND1, trypsin-like peroxisomal matrix peptidase 1; DAG, diacylglycerol; EIF2AK1, eukaryotic translation initiation factor 2 alpha kinase 1; HRI, heme-regulated inhibitor; NGF, nerve growth factor; TRKA, tropomyosin receptor kinase A; SLBP, stem-loop binding protein; APEX1, apurinic/apyrimidinic endodeoxyribonuclease 1; NTRK2/3, neurotrophic receptor tyrosine kinase 2/3; BCL2L11, B cell lymphoma 2-like 11; BIM, B cell lymphoma 2 interacting mediator of cell death; SHC, src homology 2 domain containing transforming protein; IGF1R, insulin-like growth factor 1 receptor; RAC1, ras-related C3 botulinum toxin substrate 1; CDKN1a, cyclin-dependent kinase inhibitor 1A; STAT5, signal transducer and activator of transcription 5; PTK6, protein tyrosine kinase 6; TNFR1, tumor necrosis factor receptor 1; PP2a, protein phosphatase 2A; GRB7, growth factor receptor bound protein 7; ERBB2, v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 2; ERK, extracellular signal-regulated kinase.

Standardization will lead to more uniform and complete datasets, which are easier to process and fuse with other sources and will be much more interpretable on their own. TCGA is probably the best known and most used resource37, but many other initiatives are underway to structurally capture clinical, genomics, imaging and pathological data for oncology, such as The Cancer Imaging Archive36 and the Genomics Pathology Imaging Collection38. Together, these efforts have the shared aim to process, analyse and share data using a community-embraced standard in a FAIR (findable, accessible, interoperable and reusable) way156. This will not only promote reproducibility and transparency but also encourage reutilization and optimization of existing work. However, the volume and complexity of multimodal biomedical data makes it increasingly difficult to produce and share FAIR data and current solutions often require specific expertise and resources157. Furthermore, some modalities such as EHRs are not only extremely difficult to standardize and share but also very expensive to obtain by researchers158,159. Efforts such as the Observational Medical Outcomes Partnership (OMOP) aim at tackling this issue by harmonizing EHR data across institutes and countries160,161. To make progress in multimodal studies, there is a dire need for data orchestration platforms157, but also appropriate regulatory frameworks to preserve patients’ privacy162.

The importance of biomedical multimodal data fusion becomes increasingly apparent as more clinical and experimental data become available. To tackle the multimodal-specific obstacles, multiple methods and frameworks have been proposed and are currently heavily explored. While often still problem specific and experimental, the field is gaining knowledge to evaluate and define what methods excel given specific conditions and data modalities. DL approaches have only touched a limited range of potential applications, mainly because of the challenges inherent to the current state of healthcare data as discussed above, again emphasizing the need for large collaborative data standardization and sharing efforts. In this space, competitions such as DREAM and Kaggle have been an effective concept for making standardized multimodal data available. Importantly, these initiatives also facilitate exchange of ideas and code, reproducibility, innovation and unbiased evaluation163,164. It is our expectation that such efforts will considerably advance development of robust multimodal approaches.

Ultimately, the goal is to advance precision oncology by rigorous clinical validation of successful models in larger independent cohorts to prove any clinical utility. So far, most efforts have focused on multimodal cancer biomarkers to refine risk stratification, but with dedicated strategies, multimodal data fusion could also assist in treatment decision or drug response. However, outcomes in real-world patients often lag relative to clinical trials, thereby hindering the evaluation of efficacies due to lack of follow-up data. Fortunately, efforts are underway to capture treatment response in automated scalable ways using NLP from clinical notes165. With careful study design, ongoing improvements in data collection and sharing methods, and decreasing cost and/or availability of disease monitoring technologies, DL algorithms present a promising choice to further accelerate the field of precision oncology in this direction.