Systematic identification of mutations and copy number alterations associated with cancer patient prognosis

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: December 11, 2018 (This version)
Accepted: November 12, 2018
Received: June 14, 2018

1. Of interest
Early recovery of proteasome activity in cells pulse-treated with proteasome inhibitors is independent of DDI2

Ibtisam Ibtisam, Alexei F Kisselev

Short Report Apr 15, 2024
Further reading

Abstract
eLife digest
Introduction
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Successful treatment decisions in cancer depend on the accurate assessment of patient risk. To improve our understanding of the molecular alterations that underlie deadly malignancies, we analyzed the genomic profiles of 17,879 tumors from patients with known outcomes. We find that mutations in almost all cancer driver genes contain remarkably little information on patient prognosis. However, CNAs in these same driver genes harbor significant prognostic power. Focal CNAs are associated with worse outcomes than broad alterations, and CNAs in many driver genes remain prognostic when controlling for stage, grade, TP53 status, and total aneuploidy. By performing a meta-analysis across independent patient cohorts, we identify robust prognostic biomarkers in specific cancer types, and we demonstrate that a subset of these alterations also confer specific therapeutic vulnerabilities. In total, our analysis establishes a comprehensive resource for cancer biomarker identification and underscores the importance of gene copy number profiling in assessing clinical risk.

https://doi.org/10.7554/eLife.39217.001

eLife digest

Cancers are not created equal: even when the disease affects the same organ, it can run different courses between individuals. For example, amongst people with early-stage bowel cancer who undergo surgery, 60% will go on to live cancer-free but the remaining patients will see the illness come back within a few years. These differences in outcome are still poorly understood, but they may find their roots in the genetic changes present in tumor cells.

Comparing the genomes of healthy and cancerous cells can help to understand which genetic modifications makes a cell go ‘rogue’ and start to multiply uncontrollably. Often, this happens because of a mutation, a change in the letters that compose our genetic code. However, looking at genetic differences between cancerous cells from different patients, or different tumors, can shed light on how certain genetic changes make the disease deadlier or more likely to reoccur.

Smith and Sheltzer looked into the genomes of 17,879 tumors from patients whose clinical information was also available. The analysis revealed that specific genetic alterations were more common in either deadly or treatable cancers. Most of these changes were not mutations that affected a few DNA letters; instead, they were copy number alterations, whereby large portions of the genetic code are being repeated or deleted. These results suggest that while mutations certainly drive the development of the disease, other changes such as copy number alterations can tell us which cancers will be deadlier. Through this approach, Smith and Sheltzer were also able to identify copy number alterations that were associated with patients responding well to certain drugs.

These findings now need to be confirmed on a different set of data. If they hold, new technologies may be developed so that the approach can be used cheaply and easily in the clinic. Ultimately, being able to examine copy number alterations in tumors may help physicians to tailor treatment for a particular cancer, or even a specific tumor.

https://doi.org/10.7554/eLife.39217.002

Introduction

Cancers that arise from the same tissue can exhibit vast differences in clinical behavior. For instance, among individuals diagnosed with early-stage colorectal cancer, about 60% of patients will be cured by surgery alone, while the remaining 40% will experience a recurrence that is frequently fatal (Mäkelä et al., 1995). Various pathological and molecular biomarkers are typically analyzed in order to assess patient risk and aid clinical decision-making. In general, these biomarkers are divided into two classes: predictive and prognostic (Nalejska et al., 2014). Predictive biomarkers identify patients who are likely to respond to specific therapies, like the EGFR mutations that sensitize lung tumors to EGFR inhibition (Paez et al., 2004). In contrast, prognostic biomarkers provide information on cancer aggressiveness and the likelihood of patient death. Tumor de-differentiation and lymph-node infiltration serve as prototypical prognostic biomarkers due to their strong association with poor outcomes (Connolly et al., 2003). Yet, these pathology-based biomarkers can suffer from low levels of inter-observer concordance (Allsbrook et al., 2001; Coons et al., 1997; Elmore et al., 2015; Gilks et al., 2013), and even perfect pathological assessment yields incomplete information on a patient’s most likely clinical course (Bijker et al., 2013; Nofech-Mozes et al., 2005; Young, 2003; Zaniboni et al., 2004). New methods to identify aggressive tumors could lead to improvements in the stratification of patient risk, better clinical management, and a decrease in dangerous and unnecessary over-treatment (Esserman et al., 2013).

Advances in high-throughput technologies have yielded unprecedented insight into the diverse array of genomic changes found within every cancer cell. Projects like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) have characterized methylation, mutation, copy number, and gene expression patterns across cancer types. As a result of these studies, many of the genomic differences between normal and transformed cells have been identified and characterized. However, we lack a similar understanding of the genomic differences between indolent tumors and aggressive malignancies. As the cost of DNA sequencing continues to drop, it has become increasingly feasible for hospitals to implement routine targeted and/or genome-wide analyses of patient tumors (Gagan and Van Allen, 2015; Sholl et al., 2016; Zehir et al., 2017). But, while several DNA-based, therapy-specific predictive biomarkers have been discovered, the prognostic information contained within tumor genomes is much less clear.

Previous genome-wide efforts to discover novel prognostic biomarkers have largely focused on the gene expression changes associated with patient mortality (Anaya, 2016; Anaya et al., 2015; Gentles et al., 2015; Uhlen et al., 2017). These studies have identified a set of transcripts that encode proteins involved in cell cycle progression that correlate with recurrence and death in several cancer types (Cuzick et al., 2011; Dancik and Theodorescu, 2015; Gentles et al., 2015; Mosley and Keri, 2008; Venet et al., 2011; Wang et al., 2012; Wistuba et al., 2013). Comparatively less is known about how changes at the DNA level affect patient survival. Outcome-associated analyses of genetic mutations have predominantly been conducted on a limited number of known oncogenes from single cancer types and have come to divergent conclusions. Reports in the literature commonly suggest that mutations in driver oncogenes are associated with poor outcomes, including, for instance, KRAS mutations in lung cancer (Guan et al., 2013; Marabese et al., 2015; Sun et al., 2013), PIK3CA mutations in breast cancer (Li et al., 2006; Oshiro et al., 2015), and BRAF mutations in colorectal cancer (Richman et al., 2009; Roth et al., 2010; Tol et al., 2009). Other studies of the same genes in the same cancer types have failed to observe any significant associations with outcome (Bozhanov et al., 2010; Gonzalez-Angulo et al., 2009; Hutchins et al., 2011; Pang et al., 2014; Scoccianti et al., 2012). In general, mutation-based biomarker studies may be confounded by small samples sizes, post-hoc hypothesis testing, imprecise clinical endpoints, and the so-called ‘file drawer’ problem, in which negative findings are less likely to be published (Aronson, 2005; Ensor, 2014; Goossens et al., 2015; Rosenthal, 1979; Scargle, 1999). The prognostic information captured by sequencing driver oncogenes remains unknown, and a pan-cancer, exome-wide analysis of outcome-associated mutations has not been conducted.

Previous investigations into the prognostic importance of DNA copy number alterations (CNAs) have indicated that highly-aneuploid tumors tend to have worse outcomes than diploid tumors (Friedlander et al., 1984; Kallioniemi et al., 1987; Kokal et al., 1986; Merkel and McGuire, 1990; Zimmerman et al., 1987). However, these analyses have largely focused either on arm-length changes (Davoli et al., 2017; Roy et al., 2016) or on alterations that affect single oncogenes or tumor suppressors (Deming et al., 2000; Shi et al., 2012; Srividya et al., 2011). The functional importance of copy number changes in most genes at the single-gene level is unknown, and a pan-cancer, gene-by-gene analysis of prognostic copy number alterations has not been conducted. In order to gain a global understanding of the genomic features in a primary tumor that influence cancer prognosis, we collected and analyzed molecular profiles from a ‘discovery’ set of 9442 patients and a ‘validation’ set of 8618 patients with solid tumors. Our comprehensive, gene-centric analysis sheds light on the genomic changes that drive aggressive disease and will provide a useful resource for the development of strategies to improve clinical risk assessment. Additionally, we provide a web portal to facilitate community access to this rich biomarker dataset at http://survival.cshl.edu.

A cross-platform, pan-cancer analysis of cancer survival data

To determine the differences between benign and fatal tumors, we first analyzed multiple classes of genomic data from 9442 patients with 16 types of cancer from the TCGA (outlined in Figure 1—figure supplement 1A; abbreviations are defined in Figure 1—figure supplement 1B). For every tumor type and every dataset, we generated Cox univariate proportional hazards models linking the presence or expression of a particular feature with clinical outcome (described in Supplemental Text 1). We report the Z score for each model, which encodes both the directionality and significance of a particular association. If the presence of a mutation or copy number amplification is significantly associated with patient death, then a Z score >1.96 corresponds to a P value < 0.05 (Figure 1—figure supplement 2A–C). In contrast, a Z score less than −1.96 indicates that the presence of a mutation is associated with survival or that a gene deletion is significantly associated with patient death.

We extracted mutation, copy number, gene expression, and clinical information from 16 TCGA cohorts (summarized in Supplementary file 1 and discussed in additional detail in Supplemental Text 2). To assess the validity of our data analysis pipeline, as well as the accuracy of the reported patient annotations, we first examined the overall survival curves for the 16 tumor types that we profiled. As expected, we observed significant differences in clinical outcome according to a cancer’s tissue-of-origin (Figure 1—figure supplement 2D). Prostate cancer had the least aggressive clinical course, with a median survival time that was not reached in this dataset (>4600 days), while pancreatic cancer conferred the worst prognosis (median survival time: 444 days). Overall, the 5 year survival frequencies of patients in the TCGA were highly similar to the national averages reported by NCI-SEER (R = 0.83, p < 0.0001), suggesting that the patients included in this analysis are broadly representative of the general population (Figure 1—figure supplement 2E). Next, we inferred patient sex on the basis of chromosome-specific gene expression patterns (Gentles et al., 2015; van den Berge and Sijen, 2017). Our analysis exhibited >99% concordance with patients’ self-reported sex, further verifying the overall accuracy of the clinical annotations and our data processing pipeline (Figure 1—figure supplement 2F).

Cancer mutations convey limited prognostic information

We first set out to discover whether coding mutations in cancer genomes were associated with patient outcome. We extracted non-silent mutations in each tumor, and then we identified all genes that were mutated in ≥2% of patients in each of the 16 cohorts (discussed in Supplemental Text 1). We next performed Cox proportional hazards analysis to compare the survival times for patients harboring mutant or wild-type copies of each gene. This analysis uncovered very few mutations that were significantly associated with patient outcome (Figure 1 and Supplementary file 2A-B). We first focused on known oncogenes and tumor suppressors, and found that among the 30 most-frequently mutated cancer driver genes, only two (EGFR and TP53) were associated with prognosis in more than two tumor types (Figure 1C). TP53 mutations were linked to outcome in five of 16 cancer types, though the differences in patient survival were generally small (Figure 1—figure supplement 3A–B). In contrast, many other cancer driver genes were not associated with survival time in any tumor type. While mutations in KRAS, PIK3CA, CDKN2A, BRAF, KMT2D, ATM, SMAD4, and many other genes were frequently observed, they were never significantly linked with patient outcome (Figure 1C).

Figure 1 with 7 supplements see all

Download asset Open asset

Single base-pair mutations convey limited prognostic information.

(A) Schematic of TP53 mutations and patient survival in the BLCA patient cohort. Red dots indicate missense mutations, blue dots indicate frameshift mutations, and purple dots indicate nonsense mutations. (B) Schematic of RB1 mutations and patient survival in the BLCA patient cohort. Red dots indicate missense mutations, blue dots indicate frameshift mutations, purple dots indicate nonsense mutations, and green dots indicate splice-site mutations. Note that while 17 patients harbor RB1 mutations, 19 mutations are displayed on the lollipop plot, as two patients harbor two mutations in the RB1 gene. (C) A heatmap of significant survival associations among the 30 most frequently-mutated cancer driver genes in 16 tumor types from the TCGA are displayed. Z scores were calculated by regressing survival times between patients harboring wild-type and mutant copies of a gene if a gene was mutated in ≥2% of samples per tumor type. For visualization purposes, only significant Z scores are displayed. The complete list of Z scores is presented in Supplementary file 2A. (D) The number of genes mutated in ≥2% of samples per tumor type are displayed. (E) The number of genes significantly associated with patient outcome at a false-discovery threshold of 5% in each tumor type are displayed.

https://doi.org/10.7554/eLife.39217.003

We next considered the possibility that mutations in specific codons could have prognostic significance not captured when all mutations in a gene are pooled together. To test this, we identified the 30 most-frequently mutated amino acid positions in the TCGA cohorts, and then asked whether patients harboring these alterations had different outcomes than those who did not. IDH1^c132 mutations were significantly associated with a favorable prognosis in glioma, but other recurrently-mutated codons (KRAS^c12, PIK3CA^c1047, TP53^c273, etc.) were largely uninformative (Figure 1—figure supplement 4A–D). Then, we identified ‘hotspot’ residues that were mutated in at least five different patients across all cohorts. Considering only these ‘hotspot’ mutations in each gene also failed to uncover robust survival associations (Figure 1—figure supplement 4E). Finally, we identified cancer type-specific recurrent mutations, but these alterations (FGFR3^c249 in BLCA, CTNNB1^c37 in UCEC, etc.) were similarly uninformative (Figure 1—figure supplement 4F).

Next, we sought to test whether the use of targeted therapies had blunted the deleterious effects of certain driver mutations (e.g., in BRAF or EGFR). However, due to the time-frame of sample collection, very few patients were treated with BRAF or EGFR inhibitors, and removing those patients who had received these therapies failed to significantly affect Z scores (Figure 1—figure supplement 5A). Hyper-mutation within a subset of cancers could increase mutational ‘noise’ and decrease our ability to identify prognostic signatures, but excluding patients with hyper-mutated tumors had minimal effect on the prognostic significance of driver gene mutations (Figure 1—figure supplement 5B and Supplementary file 2C). We then asked whether the presence of mutations in multiple cancer driver genes might cooperate to confer a worse clinical outcome. We found that, in general, patients harboring mutations in two cancer driver genes that were not prognostic alone had the same risk of death as patients with wild-type copies of one or both genes (Figure 1—figure supplement 5C). Lastly, we considered the possibility that the clonality of a mutation might affect its prognostic significance. We calculated the variant allele frequency (VAF) for each cancer mutation and tested whether mutations present at clonal levels in single tumors were more likely to be associated with outcome. We found that restricting our analysis to mutations with high VAFs failed to identify more prognostic genes, indicating that patient stratification is unlikely to be improved by assessing only clonal mutations (Figure 1—figure supplement 6).

These analyses suggested that, in general, cancer driver gene mutations lacked significant patient stratification power. This led us to investigate whether mutations in genes other than recurrently-mutated oncogenes and tumor suppressors could affect prognosis. We therefore expanded our analysis to include all genes mutated in ≥2% of patients with a particular tumor type. To account for greatly expanding the number of genes tested, we applied a Benjamini-Hochberg correction with a 5% false-discovery rate to the individual Z scores that we obtained. We uncovered several genes that were linked with prognosis in glioma, but found very few genes significantly associated with death or survival in the other 15 cancer types (Figure 1D and Supplementary file 2A). For instance, in breast cancer and lung adenocarcinoma, 128 and 3996 genes were mutated in ≥2% of patients, respectively, but none of these mutations were significantly correlated with patient outcome at a 5% FDR. In total, these results indicate that most mutations in cancer genomes lack significant prognostic power.

Subtype-independent and subtype-dependent prognostic mutations in gliomas

In our above analysis, we noted that the five genes with the strongest survival associations were all observed in the GBMLGG (pan-glioma) cohort. As glioma appeared to be an exception to our overall finding that mutations are seldom prognostic, we investigated this cohort further. Among the top-scoring genes, we found that PTEN and EGFR mutations conferred dismal prognosis, while mutations in IDH1, TP53, and ATRX were associated with favorable prognosis (Figure 1—figure supplement 7A). Mutations in these genes have previously been linked to distinct glioma subtypes (Ceccarelli et al., 2016; Kannan et al., 2012; Suzuki et al., 2015), and we verified that mutations in IDH1, TP53, and ATRX were most frequently observed in low-grade gliomas, while mutations in PTEN and EGFR were most frequently observed in high-grade glioblastomas (Figure 1—figure supplement 7B). However, when we analyzed low-grade gliomas and glioblastomas separately, several of these alterations remained prognostic (Supplementary file 2D). For instance, while IDH1 mutations were more common in low-grade gliomas, they were occasionally observed in high-grade tumors as well, and they were independently associated with prolonged survival in both cohorts (Figure 1—figure supplement 7C). In contrast, when EGFR mutations were observed in low-grade gliomas, they were associated poor outcomes, but EGFR mutations were non-prognostic in high-grade glioblastomas (Figure 1—figure supplement 7D). Thus, in gliomas, mutations contain both subtype-dependent and subtype-independent prognostic information. However, outside of this cancer type and the tumor suppressor TP53, mutations in most cancer driver genes are non-prognostic.

Driver gene CNAs are commonly associated with cancer patient mortality

As mutations were largely uninformative, we next set out to determine whether gene copy number conveyed prognostic information. We determined the copy number of each gene at its transcriptional start site and regressed this value against patient outcome in each tumor cohort. We then examined the clinical impact of CNAs affecting the same 30 cancer driver genes that we previously investigated. Surprisingly, we found that the copy number of these oncogenes and tumor suppressors was frequently linked with patient outcome (Figure 2 and Supplementary file 3A-B). Amplification of EGFR, PIK3CA, and BRAF, and deletion of CDKN2A, RB1 and EP300 were strongly associated with shorter patient survival times in four or more cancer types each. Copy number was prognostic even for genes in which mutations were not linked with outcome: for instance, while mutations in PIK3CA were never informative, the copy number of PIK3CA was associated with outcome in breast, colorectal, glioma, lung-squamous, pancreas, and prostate cancers (Figure 2B and D). Overall, among the 30 most frequently-mutated cancer driver genes, we detected 108 significant associations between gene copy number and outcome, compared to 23 associations between mutation and outcome. For 28 out of 30 driver genes, DNA copy number was prognostic in more cancer types than mutational status was. We conclude that determining the copy number of oncogenes and tumor suppressors in a primary tumor can better stratify patient risk than assessing single base-pair mutations.

Figure 2 with 4 supplements see all

Download asset Open asset

Oncogene and tumor suppressor CNAs drive cancer patient mortality.

(A) Examples of driver gene CNAs associated with patient outcome. The copy number of *CDKN2A*, *EGFR*, and *BRCA2* in the indicated patient cohorts are displayed, as well as Kaplan-Meier curves of patient survival according to gene copy number. Amplifications and deletions correspond to CNAs > |0.3|, while deep-deletions and high-copy gains correspond to CNAs > |1|. (B) A heatmap of significant survival associations among the 30 most frequently-mutated cancer driver genes in 16 tumor types from the TCGA are displayed. Z scores were calculated by regressing gene copy number against patient outcome within each tumor type. The complete list of Z scores is presented in Supplementary file 3A. (C) Z scores from 16 cancer types from the TCGA were combined using Stouffer’s method, and then the resulting meta-Z scores were plotted against the chromosomal location. Genes were binned by average Z score into groups of 5 for visualization. Gene names indicate candidate driver genes found within survival-associated peaks and valleys. (D) Kaplan-Meier curves are plotted for two oncogenes, *PIK3CA* (left) and *KRAS* (right), comparing the prognostic relevance of mutations in those genes versus copy number alterations in these genes. Amplifications correspond to CNAs > 0.3, while high-copy gains correspond to CNAs > 1.

https://doi.org/10.7554/eLife.39217.011

In our analysis thus far, we have treated mutations as a binary variable (‘mutant’ vs. ‘not mutant’), while copy number alterations are treated as continuous values. Thus, the greater prognostic significance of tumor CNAs could reflect the fact that individual CNA measurements inherently harbor more information. To test this possibility, we trichotomized CNA values into ‘deletions’ (<−0.3), ‘amplifications’ (>0.3), and ‘copy-neutral’ (≥−0.3 and≤0.3). We then calculated Cox regressions at the same 30 loci using the discretized copy number values. This analysis resulted in 94 significant survival associations, more than four times as many significant features as when mutations were analyzed, and comparable to the number of significant features that resulted using continuous CNA values (Figure 2—figure supplement 1). This analysis suggests that the greater prognostic significance of CNAs is not simply a consequence of the continuous nature of copy number data.

We next investigated whether these oncogene and tumor suppressor CNAs were likely to drive patient mortality, or whether they were passenger genes that changed in copy number along with other, unknown drivers. To assess this question, we combined Z scores from different cancer types using Stouffer’s method (Stouffer, 1949), and then plotted the pan-cancer meta-Z scores along every chromosome (Figure 2C). This analysis revealed multiple sharp peaks and valleys in the data that overlapped with known driver mutations. The most significant survival-associated copy number changes genome-wide were found on chromosome 9p in a valley that precisely included the tumor suppressor CDKN2A. Z score peaks were found at loci that include oncogenes PIK3CA, EGFR, MYC, CCNE1, and others. This overlap suggests that, in many instances, the copy number of these oncogenes and tumor suppressors directly influence the risk of cancer patient death.

The prognostic significance of CNAs is independent of tumor sample purity and immune infiltration

Whole-chromosome aneuploidy has previously been linked to a decreased infiltration of immune cells (Davoli et al., 2017; Taylor et al., 2018). We therefore considered the possibility that CNAs are prognostic via an indirect mechanism; namely that they are found in tumors that lack robust immune infiltration, and this deficient immune response was itself driving patient mortality. However, multiple lines of evidence argue against this interpretation. First, we assessed the association between patient survival and three different measures of tumor sample purity: pathologist-assessed tumor cell fraction, sample purity as judged by ABSOLUTE (Carter et al., 2012; Taylor et al., 2018), and leukocyte infiltration, as judged by methylation analysis (Taylor et al., 2018). We found that sample purity was inconsistently-associated with patient outcome (Figure 2—figure supplement 2). For instance, higher tumor purity determined by either pathological analysis or ABSOLUTE was associated with worse outcome in only one of 16 cohorts, each (Figure 2—figure supplement 2A–B). The lack of a strong correlation between infiltrating cell populations and clinical prognosis suggests that analyte purity is insufficient to explain the relationship between CNAs and patient survival. Secondly, we generated multivariate Cox models that included gene copy number and these three measurements of tumor purity, and we found that driver gene CNAs remain broadly prognostic in these bivariate models (Figure 2—figure supplement 2C and Supplementary file 3C-E). For instance, we discovered that amplification of Cyclin E1 is associated with poor prognosis in ovarian cancer, and this remained true even when our analysis was restricted to high-purity tumor samples and samples that lacked significant leukocyte presence (Figure 2—figure supplement 2D). Thus, while the interrelationship between aneuploidy and immunological tolerance likely plays an important role in tumor development, this analysis suggests that it is not the primary driver of CNA-associated patient mortality.

Copy number analysis improves on patient stratification conferred by common clinical parameters

Pathological assessment of tumor stage and grade are important sources of prognostic information, though blinded assessments reveal significant inter-observer discordance (Allsbrook et al., 2001; Coons et al., 1997; Elmore et al., 2015; Gilks et al., 2013). We therefore tested whether the CNA biomarkers that we uncovered could affect the stratification conferred by these parameters. We found that Z scores generated from either univariate models or multivariate models that included stage or grade were highly correlated (R = 0.91 and R = 0.96 respectively; Figure 2—figure supplement 3A and Supplementary file 4). Overall, 71% of prognostic CNAs in individual cancer types remained prognostic in these multivariate models (Figure 2—figure supplement 3B). Thus, including gene-level copy-number assessment can significantly improve the stratification of patient risk beyond standard clinical parameters (Figure 2—figure supplement 3C–D). Certain mutations were similarly able to yield prognostic information in a stage- and grade-independent manner. However, due to the lower overall significance of most mutations that we identified, the improvements in patient stratification were generally more modest (Figure 2—figure supplement 3E and Supplementary file 4E).

Gene-level copy number values also remained prognostic when separating TCGA cohorts by cancer subtype (Figure 2—figure supplement 4 and Supplementary file 4F-G). For instance, CNA Z scores values were highly correlated between the bulk GBMLGG cohort and the individual GBM and LGG subtypes (R = 0.68 and R = 0.86, respectively; Figure 2—figure supplement 4A). While analyzing the GBM cohort separately abolished the prognostic significance of EGFR mutations (Figure 2—figure supplement 4D), EGFR amplifications remained associated with outcome in both the LGG and GBM cohorts (Figure 2—figure supplement 4B–C). Amplifications in MYC and PIK3CA were similarly prognostic in multiple tumor subtypes (Figure 2—figure supplement 4D–E and Supplementary file 4G). At other loci, low patient numbers from certain subtypes may obscure the detection of specific biomarkers. For instance, within the KIPAN cohort, 67% of tumors are clear cell carcinomas, 23% of tumors are papillary cell carcinomas, and 10% of tumors are chromophobe carcinomas. CDKN2A deletion is a strong indicator of poor prognosis in the pan-kidney cohort, in clear cell carcinomas, and in papillary cell carcinomas, but did not reach statistical significance when kidney chromophobe carcinomas were analyzed independently (Figure 2—figure supplement 4F–G). In total, these results underscore the ability of driver gene CNAs to improve patient stratification when controlling for tumor identity, though larger cohort numbers may be needed to identify the strongest biomarkers in rare cancer subtypes.

Driver gene CNAs contain prognostic information not captured by TP53 mutation status or total aneuploidy

Highly-aneuploid tumors tend to harbor mutations in TP53, and both TP53 mutations and arm-length aneuploidy have previously been associated with poor clinical outcomes (Davoli et al., 2017; Petitjean et al., 2007). Using an ‘aneuploidy score’ for each tumor based on the total number of arm-length alterations (Taylor et al., 2018), we verified that TP53-mutant tumors exhibit more aneuploidy than TP53-wild-type tumors (Figure 3—figure supplement 1A), and that total aneuploidy is a poor prognosis factor in several cancer types (Figure 3—figure supplement 1C). To investigate the relationship between gene-level prognostic CNAs, TP53 status, and arm-length aneuploidy, we selected a set of 40 prognostic amplifications and deletions for additional analysis (Figure 3—figure supplement 2A). In multivariate models that included TP53 mutation status, 33 of 40 (83%) gene-level CNAs remained prognostic, demonstrating that these CNAs are not linked with death due to an indirect association with TP53 status (Figure 3—figure supplement 2A–B). Similarly, in multivariate models that included total tumor aneuploidy, 80% of these CNAs were still associated with outcome (Figure 3—figure supplement 2C–D). Finally, as a proxy for the total structural alteration burden, we summed the number of breakpoints (as indicated by discrete copy number values along a chromosome) in each tumor (Figure 3—figure supplement 1B). This metric was associated with outcome in multiple tumor types (Figure 3—figure supplement 1C), but 75% of driver gene CNAs remained prognostic in multivariate models that included this score (Figure 3—figure supplement 2E–F). These results indicate that assessing gene-level tumor CNAs can yield more prognostic information than simply screening for TP53 mutations or measuring bulk levels of tumor aneuploidy (Supplementary file 5).

Focal CNAs typically portend worse prognosis than broad CNAs

We next set out to determine whether focal copy number alterations and broad copy number alterations could have distinct effects on patient outcome. To investigate this possibility, we compared the prognostic power of focal CNAs (defined as an alteration ≤3 Mb in length; Krijgsman et al., 2014) and broad CNAs (defined as all alterations >3 Mb in length). Among loci at which both broad and focal alterations were observed, we frequently found that broad CNAs were associated with moderately worse outcomes, while focal CNAs were associated with sharp declines in survival (Figure 3A–B). At some loci, broad CNAs had outcomes that were indistinguishable from copy-neutral tumors, while only focal CNAs were associated with death (Figure 3C). We rarely detected instances in which broad CNAs indicated a worse prognosis than a focal alteration (Figure 3A). We interpret these results as a reflection of aneuploidy-induced fitness penalties (Sheltzer et al., 2017; Sheltzer and Amon, 2011): large copy number alterations change the dosage of multiple genes at once and can impair tumor growth, while targeted alterations that specifically affect driver gene copy number maximize malignant potential.

Figure 3 with 3 supplements see all

Download asset Open asset

Effects of amplicon size and gene mutation status on prognostic CNAs.

(A) 20 prognostic amplifications and 20 prognostic deletions were selected for further analysis (see also Figure 3—figure supplement 2). Of those 40, 14 had at least five patients who had focal CNAs (≤3 Mb) and at least five patients who had broad CNAs (>3 Mb). Univariate Cox proportional hazards models were constructed comparing the presence or absence of any CNA at the indicated locus, or comparing the presence or absence of a CNA of a particular size. (**B and C**) Kaplan-Meier curves are plotted at four prognostic loci comparing tumors with focal CNAs (≤3 Mb), tumors with broad CNAs (>3 Mb), and tumors that lack CNAs at that locus. Amplifications and deletions correspond to CNAs > |0.3|. (D) Multivariate Cox proportional hazards models were constructed including both the copy number of the indicated gene as well as the mutational status of that gene. Z scores for either the univariate models (CNAs alone) or the multivariate models (CNAs + mutation status) are displayed. (E) Kaplan-Meier curves comparing gene mutation status and gene copy number for *EGFR* and *MLH1* alterations in colorectal cancer. *EGFR* amplification and *MLH1* deletion are associated with poor prognosis, regardless of whether the tumor harbors an *EGFR* or *MLH1* mutation. In the bottom graph, note that no tumors harbored both *MLH1* deletions and mutations.

https://doi.org/10.7554/eLife.39217.016

Focal CNAs affect patient outcome by changing the expression levels of wild-type genes

Gene copy number alterations typically result in a proportional change in the expression of the affected loci (Pollack et al., 2002; Stingele et al., 2012; Williams et al., 2008), though instances of dosage compensation have been reported (Gonçalves et al., 2017). To test the effects of prognostic CNAs on gene expression, we compared transcript levels and gene copy number changes at 40 prognostic loci and found a significant correlation between the two at 98% of the analyzed genes (Figure 3—figure supplement 3). Next, we sought to uncover whether these copy number alterations were deadly because they increased or decreased the expression of mutant gene products. That is, we could observe that the amplification of a driver gene is prognostic only in tumors in which that driver gene is also mutated. Interestingly, this is not the case: at 95% of our test loci, gene copy number remained prognostic in multivariate models that also included gene mutation status (Figure 3D). For instance, in colorectal cancer, amplification of EGFR was associated with death even in tumors that lacked EGFR mutations (Figure 3E). In total, these results indicate that even at recurrently-mutated loci, changes in the expression of the wild-type gene can have a profound effect on cancer cell behavior. Together with our observation that focal changes tend to confer a worse prognosis than broad changes, these results support the recently-proposed ‘cancer gene island’ model of tumor genome evolution (discussed in more detail below; Solimini et al., 2012).

Independent patient cohorts verify the prognostic significance of driver gene CNAs

To determine the generality of our findings, we collected independent patient cohorts harboring mutation or copy number data linked to survival outcome (Supplementary file 1). We then performed univariate Cox proportional hazards analysis on these ‘validation’ cohorts and compared the results to the Z scores obtained from our ‘discovery’ set of TCGA data. First, we identified prognostic mutations within a set of 16 patient cohorts from the International Cancer Genome Consortium (ICGC), comprising 3054 patients analyzed by whole-genome or whole-exome sequencing. Overall, the mutation frequencies and the Z scores of recurrent mutations were highly similar between the ICGC and TCGA cohorts (R = 0.67, p < 0.0001, and R = 0.56, p < 0.0001, respectively; Figure 4A–B). Consistent with our TCGA analysis, mutations in TP53 were associated with outcome in more patient cohorts than any other gene (Figure 4C and Figure 1—figure supplement 3C–D). Other mutations, including in known cancer driver genes, were rarely associated with outcome in individual cancer types and harbored minimal pan-cancer significance (Figure 4D–F and Supplementary file 6). Mutations in KRAS, PIK3CA, BRAF, APC, PTEN, CDKN2A, and many others were frequently observed but were never correlated with outcome (Figure 4D). We next analyzed 2431 additional patients with CNA data curated by cBioportal, and found numerous amplifications and deletions associated with patient mortality (Supplementary file 6C). In breast cancer, we found prognostic amplifications that were centered around oncogenes, including ERBB2, MYC, and MDM2, while prognostic deletions encompassed tumor suppressors CDKN2A, PTEN, and TP53 (Figure 4G). Overall, we observed a highly significant correlation between the meta-Z scores obtained from the TCGA and cBioportal datasets (R = 0.42; Figure 4H). Finally, in patient cohorts subjected to both mutation and copy number analysis, we verified that CNAs in driver genes commonly harbored greater prognostic significance than mutations in those same genes (Figure 4I). For instance, in breast cancer, among 25 frequently-mutated genes, mutations in only two genes (TP53 and GATA3) displayed prognostic significance, while CNAs in 12 of those same genes were associated with patient outcome (Figure 4J). In total, these analyses suggest that the survival patterns discovered in the TCGA dataset are conserved across independent cohorts of cancer patients. In particular, while mutations in most cancer driver genes are non-prognostic, copy number alterations in these same genes are tightly linked with patient outcome.

Figure 4

Download asset Open asset

Driver gene copy number, but not driver gene mutations, are associated with survival in independent patient cohorts.

(A) Genes mutated in ≥10% of patients in each tumor type in the TCGA were identified, and then compared to the mutation frequency of these genes in the corresponding ICGC cohort or cohorts. The complete list of Z scores is presented in Supplementary file 6A. (B) Z scores of the 10 most frequently-mutated genes per cancer type in the ICGC were identified and then plotted against the Z scores of the same gene from the corresponding TCGA cohort or cohorts. (C) Significant Z scores (>1.96 or<−1.96) were counted per gene, and then the number of significant cohorts from the TCGA and the ICGC are plotted. While the vast majority of frequently-mutated genes are significant in zero or one cancer type, *TP53* mutation status is associated with prognosis in 12 of 32 total patient cohorts. (D) A heatmap of significant survival associations among the 30 most frequently-mutated cancer driver genes in 16 patient cohorts from the ICGC are displayed. Z scores were calculated by regressing survival times between patients harboring wild-type and mutant copies of a gene if a gene was mutated in ≥2% of samples per tumor type. For visualization purposes, only significant Z scores are displayed. The complete list of Z scores is presented in Supplementary file 6A. (E) The number of genes mutated in ≥2% of samples per tumor type are displayed. (F) The number of genes significantly associated with patient outcome at a false-discovery threshold of 5% in each tumor type are displayed. (G) Z scores for the copy number of each gene from the TCGA BRCA cohort and the cBioportal METABRIC cohort are plotted against one another. The complete list of Z scores is presented in Supplementary file 6C. (H) Meta-Z scores from datasets curated by cBioportal are plotted against meta-Z scores from the corresponding four cancer types from TCGA (BLCA, BRCA, LIHC, and LUAD). The complete list of Z scores is presented in Supplementary file 6C. (I) Kaplan-Meier curves comparing mutations and CNAs in *ERBB3* and *PTEN* in the cBioportal METABRIC cohort. (J) A bar graph of Z scores for mutations and CNAs in 25 driver genes in the cBioportal METABRIC cohort. While mutations in only two genes are associated with prognosis, CNAs in 12 of these same genes are associated with prognosis.

https://doi.org/10.7554/eLife.39217.020

Cross-cohort identification of high-confidence prognostic biomarkers

In order to discover the biomarkers with the greatest potential clinical relevance, we next identified the individual mutations and CNAs that were consistently associated with outcome across independent patient cohorts. To increase our ability to detect these genetic alterations, we performed survival analysis on an additional set of 2701 primary tumors subjected to targeted sequencing and copy number analysis (MSKCC_2017; Supplementary file 7) (Zehir et al., 2017), on 2431 patients from cBioportal cohorts whose tumors had been sequenced (Supplementary file 6D), and on 628 patients from ICGC cohorts subjected to copy number analysis (Supplementary file 6B). Our combined patient dataset therefore included two to six independent cohorts from each of 13 common cancer types, comprising 16,580 total patients. These cohorts were collected at different locations, in different patient populations, using different study designs, and the samples were analyzed using different genomic technologies. We reasoned that alterations that were consistently associated with outcome despite these significant differences would represent highly-penetrant biomarkers of patient prognosis. To identify such alterations, we screened for biomarkers that were associated with outcome (|Z| > 1.96) in ≥2 independent cohorts, and that were highly significant (|meta-Z| > 3.3) across all available cohorts. This approach revealed multiple high-confidence genetic biomarkers of patient outcome that, to our knowledge, were novel, including MDM4 amplifications in prostate cancer, NOTCH2 amplifications in melanoma, and 2q32 deletions in ovarian cancer (Supplementary file 8). These robust biomarkers allowed a striking stratification of patient risk, and top-scoring CNAs remained prognostic in multivariate models that included commonly-measured prognostic criteria (Gleason score in prostate cancer, Hepatitis serology in liver cancer, etc.; Figure 5—figure supplement 1). Consistent with our single-cohort analyses, cross-cohort prognostic CNAs were significantly more common than prognostic mutations, and TP53 was the only gene whose mutation status was associated with outcome in more than one cancer type (Figure 5—figure supplement 2A).

Certain prognostic biomarkers are also associated with unique therapeutic vulnerabilities

We hypothesized that some genetic alterations that were sufficient to affect overall patient survival could impact other facets of cancer behavior as well, including, potentially, drug sensitivity. That is, biomarkers harboring significant prognostic information could potentially contain predictive information as well. We therefore sought to discover whether genetic alterations that drove aggressive disease could also sensitize patient tumors to specific therapeutic regimens. By analyzing a cohort of 1000 patient-derived xenografts (PDXs), we identified several instances in which high-confidence biomarkers were associated with vulnerability to particular anti-cancer agents (Gao et al., 2015a). For instance, we identified Chr9 deletions that encompassed CDKN2A as a robust biomarker for poor prognosis in breast cancer (Supplementary file 8). We found that PDXs harboring CDKN2A deletions were profoundly sensitive to combination therapy with a CDK4/6 inhibitor and an mTOR inhibitor (Figure 5—figure supplement 2B), consistent with the fact that a protein encoded by CDKN2A, p16, functions as a natural inhibitor of CDK4/6 (Serrano et al., 1993), p. 4). In contrast, other biomarkers associated with poor prognosis in breast cancer failed to predict sensitivity to this treatment combination, but instead correlated with sensitivity to other agents (Supplementary file 8). Due to the limited number of drugs tested in PDXs, we expanded our target search to include a recently-described pharmacogenomic profile of cancer cell lines and discovered several additional biomarker vulnerabilities (Figure 5A–B). For instance, we identified mutations in STAG2 as a high-confidence biomarker of poor prognosis in glioma, and we found that STAG2-mutant gliomas were exquisitely sensitive to treatment with the PARP inhibitor olaparib (Figure 5A). In total, we identified highly-significant therapeutic vulnerabilities for 49% of the prognostic biomarkers uncovered by our integrated analysis, providing potential strategies to treat a subset of patients who have the most aggressive cancers.

Figure 5 with 2 supplements see all

Download asset Open asset

Robust prognostic biomarkers associated with drug sensitivity in cancer cell lines.

(A) Mutations and CNAs associated with patient outcome in multiple cohorts of glioma/glioblastoma are displayed. Mutations in *STAG2* are associated with sensitivity to the PARP inhibitor olaparib, while CDKN2A deletions are associated with sensitivity to the CDK4/6 inhibitor palbociclib in glioma cell lines (Iorio et al., 2016). (B) Mutations and CNAs associated with patient outcome in multiple cohorts of bladder cancer are displayed. Mutations in *RB1* are associated with sensitivity to the SYK inhibitor BAY-61–3606 in bladder cancer cell lines (Iorio et al., 2016). The complete list of high-confidence biomarkers and potential vulnerabilities are listed in Supplementary file 8.

https://doi.org/10.7554/eLife.39217.021

Discussion

Modern medicine has vastly prolonged the survival of individuals diagnosed with cancer (Johnson et al., 2017). However, increasing evidence suggests that large subsets of patients receive sub-optimal care, and are over-treated or under-treated relative to their level of risk (Bhatt and Klotz, 2016; Esserman et al., 2013; Swaminathan and Swaminathan, 2015). To date, many of the genetic alterations that differentiate fatal and benign tumors have remained obscure. Our analysis of prognostic biomarkers from 17,879 patients sheds light on these genetic differences, identifies a subset of patients who may benefit the most from aggressive intervention, and suggests therapeutic strategies for tumors harboring certain alterations associated with poor prognosis. A web portal to facilitate access to these results is available at http://survival.cshl.edu/.

As cancers arise due to the accumulation of mutations in growth-promoting oncogenes and growth-inhibitory tumor suppressors, the presence and diversity of these mutations may be expected to dictate a tumor’s clinical course. However, our data suggest that in many cases, they do not. Substantial disagreements exist in the literature on the value of mutation-based prognostic biomarkers, as the same driver oncogenes have been independently reported to be either adverse or non-significant prognostic features (Guan et al., 2013; Marabese et al., 2015; Scoccianti et al., 2012; Sun et al., 2013). In this manuscript, we performed an unbiased genome-wide analysis of public datasets with pre-established sample sizes. This approach may therefore bypass certain problems, including post-hoc hypothesis testing, patient-selection bias, and the ‘file-drawer problem’, that can confound targeted biomarker studies (Aronson, 2005; Ensor, 2014; Goossens et al., 2015; Rosenthal, 1979; Scargle, 1999). We consider it possible that, with larger sample sizes or more-specific tumor subtypes, additional prognostic mutations could be identified. Importantly, in most patient cohorts that we collected, tumors were analyzed on multiple genomic platforms, and CNAs were commonly prognostic in the same cohorts in which gene mutations were not. These results underscore our ability to successfully detect biomarkers in cohorts of these sizes, and suggest that, in a head-to-head comparison, copy number alterations provide more useful prognostic information than single-gene mutations.

While we identified very few mutations associated with patient outcome, several lines of evidence underscore the potential benefits of continued clinical sequencing efforts. First, our analysis revealed a subset of mutations with tissue-specific prognostic power, including TP53 mutations in breast cancer, RB1 mutations in bladder cancer, and FBXW7 mutations in colorectal cancer. Secondly, most patients in the TCGA cohorts were treated with standard cytotoxic drugs. As targeted and immuno-therapies are increasingly adopted in the clinic, oncogenic mutations that were non-prognostic in the datasets analyzed here may be able to predict sensitivity to specific therapeutic agents (Gagan and Van Allen, 2015). Thirdly, tumors themselves are composed of sub-clonal populations that harbor distinct sets of mutations, and recent evidence suggests that cancer heterogeneity can influence clinical course (Jamal-Hanjani et al., 2017). Thus, interrogating the mutational spectrum at the sub-clonal level may identify prognostic mutations not distinguished in bulk analyses.

Though large-scale changes in tumor ploidy have previously been recognized as an indicator of poor outcome (Friedlander et al., 1984; Kallioniemi et al., 1987; Kokal et al., 1986; Merkel and McGuire, 1990; Zimmerman et al., 1987), the contributions of copy number alterations in most single genes have remained unexplored. Despite the limited stratification value of mutations in cancer driver genes, we found that copy number alterations of many of these same genes are broadly prognostic. Focal CNAs tended to confer a worse prognosis than broad CNAs, consistent with a model in which large-scale gene dosage imbalances trigger proteotoxic stress and impose a fitness penalty on cancer cells (Santaguida and Amon, 2015; Sheltzer and Amon, 2011). Moreover, while prognostic CNAs commonly caused proportional changes in target gene expression, most CNAs remained prognostic whether or not they affected the expression of a mutated gene. These results support a ‘cancer gene island’ or ‘cumulative aneuploidy’ model of tumorigenesis, in which cancers accumulate a series of limited copy number changes affecting haplo-sensitive and triplo-sensitive regions (Davoli et al., 2013; Solimini et al., 2012). Identifying the functional consequences of these prognostic CNAs on tumor physiology is a key future goal.

Patients whose tumors harbor genetic alterations that drive mortality are in urgent need of improved treatment options. We discovered many instances in which high-confidence biomarkers of aggressive disease also sensitized tumors to specific anti-cancer therapies. By taking advantage of these vulnerabilities, a precision-medicine approach could be applied to both stratify patient risk and identify drug combinations most likely to provide a clinical benefit. Several predicted sensitivities from our work have clinical or mechanistic support, including the use of CDK4/6 inhibitors to treat CDKN2A-deleted tumors, the use of PARP inhibitors to treat STAG2-mutant tumors, and the use of SYK inhibitors to treat RB1-mutant tumors (Bailey et al., 2014; Gao et al., 2015b; Zhang et al., 2012). Treatment with targeted agents significantly alters the cellular epigenetic and genetic landscape, often culminating in the development of resistance to the applied therapies (Holohan et al., 2013). We speculate that secondary alterations that tumors evolve to tolerate these drugs could also alter or blunt the aggressive phenotype caused by the original driver alteration. In this way, targeting a biomarker that confers poor prognosis could both directly lead to improved patient outcomes by triggering a robust clinical response, and indirectly help patients by forcing tumor evolution away from dependence on a driver of aggressive disease.

Materials and methods

Data sources

Request a detailed protocol

Patient cohorts analyzed in this study are listed in Supplementary file 1. For the TCGA analysis, pre-processed files from the Broad Institute TCGA Firehose were used (https://gdac.broadinstitute.org/). For the TCGA genomic copy number analysis, we used the HG19 segmented SCNAs, corrected for germline SCNAs. Overall survival time was used as a clinical endpoint for all cancer types except PRAD. Overall survival was chosen because it reflects an objective and unambiguous event, it is the gold-standard for oncology clinical trials, and it is widely-available across different studies (Driscoll and Rixe, 2009). However, as fewer than 2% of the patients in the PRAD cohort died during the follow-up period, ‘days to biochemical recurrence’ was used as a surrogate endpoint. For all cancers, survival or follow-up time from diagnosis were corrected for the days to sample procurement. Primary tumors (indicated with a ‘01’ in the patient barcode) were used for every cancer type except SKCM; for this cancer, few primary samples were available, so metastatic samples (indicated with a ‘06’ in the barcode) were included for patients in which no primary tumor was available. For additional discussion of the TCGA samples, see Supplemental Text 2. Pathology-assessed tumor cell fraction was obtained from the TCGA clinical files under ‘Percent_tumor_cells’. Tumor stage and grade were similarly obtained from the appropriate TCGA clinical files.

Mutation, copy number, and clinical data from Release 25 of the International Genome Consortium were downloaded from the ICGC Data Portal (Zhang et al., 2011). Overall survival was used as a clinical endpoint for all cohorts except EOPC-DE; due to the few deaths in this cohort, recurrence-free survival was used as an endpoint. Cohorts were chosen based on the availability of WGS or WES data, and were included if they came from a cancer type comparable to the types that were studied in our TCGA analysis.

Copy number, mutation, and clinical data from cBioportal were downloaded as pre-processed files from www.cbioportal.org (Gao et al., 2013). For the patients described in Zehir et al. (2017) (the cBioportal/MSKCC_2017 cohorts), only primary tumors were included for all cancer types except melanoma.

Overall analysis strategy

Request a detailed protocol

All processing and analysis was performed using Python. Cox proportional hazard analysis used the R survival package (https://cran.r-project.org/web/packages/survival/index.html) to compute Z scores and p values. Justification and further explanation for the use of Cox proportional hazards modeling can be found in Supplemental Text 1. The rpy2 project was used to control R from python, allowing seamless integration of Z score calculations with data processing and pan-cancer analysis. Pandas DataFrames were used as the primary structure for storing and manipulating data. Additionally, native numpy methods and arrays were for used occasionally for efficiently storing strictly numerical data, for example, as input to Cox proportional hazards models. The statsmodels package (www.statsmodels.org) was used for false discovery correction using the Benjamini-Hochberg procedure. Microsoft Excel was occasionally used for final data processing and examination, so a single apostrophe was added before gene names in intermediate data processing steps to protect genes from auto-formatting (Zeeberg et al., 2004).

Code was structured to allow ease of internal reuse and reproducibility of results. Cox univariate proportional hazards, Cox multivariate proportional hazards, Kaplan-Meier, and Stouffers analysis methods were factored into an analysis library, taking as input the data required to perform the computation as numpy arrays or pandas DataFrames.

TCGA analysis

Request a detailed protocol

In addition to the code for statistical analyses, code for processing TCGA clinical files was factored into a common library. This approach allowed the same TCGA clinical file processing code to be executed across a variety of platform analyses, ensuring identical behavior for each platform. The TCGA clinical processing code selected the relevant clinical endpoints and sample procurement data. The processing translated the available clinical data into the required format for Cox proportional hazard models: an endpoint/survival time value and a censor value for each patient. Code to select tumor samples based on cancer type was also included in this library.

Raw input data for the mutation analysis needed additional preprocessing before Cox proportional hazard models could be constructed. This preprocessing included removing per-patient headers throughout the data and some data transposition. For all analyses using TCGA mutation data, mutations annotated as silent were excluded. Genes were only included in downstream analyses if they were mutated in 2% or more of the patients in a cancer type cohort.

Raw input data for copy number analysis also required substantial preprocessing. Copy number input data consists of per-patient, per-chromosome location maps of copy numbers (hg19 downloaded from the UCSC Genome Browser; Tyner et al., 2017). These maps were converted to a single copy number value for each gene. We created an interval tree (using the intervaltree python package, https://pypi.python.org/pypi/intervaltree) of the location maps for each chromosome and used the appropriate HGNC to convert chromosome locations to genes for each patient. We used the gene’s transcriptional start site position to look up in the interval tree the copy number value for a gene. This analysis produced an intermediate file of a similar form to the other TGCA platforms, which allowed for straightforward Cox analysis. Note that Cox proportional hazards models are a threshold-independent method of performing survival analysis, and so no minimum or maximum threshold for a copy number alteration was specified.

A tumor was defined as having a focal amplification or deletion if its copy number was greater than 0.3 or less than −0.3, and the chromosomal interval with a copy number greater than 80% of the copy number at the gene of interest was less than or equal to 3 Mb (Krijgsman et al., 2014).

To calculate the number of structural alterations per tumor, the number of distinct copy number values per chromosome in the DNA segmentation file was summed for each patient.

Pan-cancer TCGA analysis

Request a detailed protocol

For each platform and analysis type, we performed a pan-cancer analysis. This analysis created a single Z score for each gene by combining the per gene Z scores from each cancer type using Stouffer’s method. To perform Stouffer’s method, we took the sum of the Z scores for a single gene and divided that sum by the square root of the number of cancer types with Z scores for the gene (Stouffer, 1949). This meta-Z score was then compared against meta-Z scores obtained similarly from other platform analyses.

Additional TCGA mutation analyses

Request a detailed protocol

We performed several additional analyses on mutation data, including double mutation combination Z scores, hotspot codon Z scores, and Z scores corrected for VAFs. For double mutation Z scores, we took the top 30 most common cancer driver genes and performed pairwise combinations. We then calculated Cox proportional hazards for each pair of genes, where a patient was considered to have a pairwise mutation if and only if both genes were non-silently mutated for that patient. Z scores were only calculated for a pair if (1) neither gene in the pair was statistically significant alone in the univariate analysis and (2) if both genes were mutated together in at least 10 patients.

Per-codon Z scores were calculated for a selected set of hotspot codons. Most cancer types were available in HG37, so HG37 mutation positions were used to locate codons. Mutations for OV and COADREAD were only available in HG36, so gene positions were converted to HG37 before codon processing. Per codon Z scores were calculated by first identifying patients with mutations in the relevant gene, then selecting from that set of patients those whose mutations were in the codon of interest. If 2% of patients or more had mutations in the selected codon, a Z score was calculated.

VAFs were calculated for 10 of the TCGA cancer types. We analyzed VAF data in two ways. First, we calculated Z scores, only counting a gene as mutated if its VAF was greater than or equal to 0.4. Secondly, we identified the median VAF score per gene, and calculated Z scores only counting a gene as mutated if its VAF was equal to or greater than the median VAF for that gene.

CBioPortal analysis

Request a detailed protocol

CBioPortal was structured similarly to the TCGA analyses, though data processing was not factored into an independent library since each of these datasets was only used in one analysis. Copy number data from one CBioPortal cancer type, blca_mskcc, required initial preprocessing in the manner described above for TCGA copy numbers. Mutations were included if they were annotated as one of these types: In_Frame_Ins, Nonstop_Mutation, Translation_Start_Site, In_Frame_Del, Splice_Region, Frame_Shift_Ins, Frame_Shift_Del, Splice_Site, Nonsense_Mutation, or Missense_Mutation.

ICGC analysis

Request a detailed protocol

ICGC analysis was structured similarly to CBioPortal analysis. Mutations were only included in downstream analyses if they were annotated as one of these types: disruptive inframe deletion, disruptive inframe insertion, frameshift variant, inframe deletion, missense variant, splice acceptor variant, splice donor variant, stop gained, or stop lost. Z scores were calculated if a gene was mutated in 2% or more of the patients in a particular cohort.

Identification of high-confidence biomarkers associated with drug sensitivities

Request a detailed protocol

Across independent datasets, cohorts of patients from related cancer types were identified. Mutations or CNAs significantly associated with patient prognosis (Z > 1.96 or Z < −1.96) in two or more independent cohorts from each cancer type were determined. Then, the subset of these alterations that remained highly-significant (Z > 3.3 or Z < −3.3) across all cohorts from the same cancer type were classified as high-confidence biomarkers. In some instances, amplifications that spanned continuous chromosomal regions were found to correlate with patient prognosis. These segments were identified manually. For the determinations of therapeutic sensitivity described below, the gene with the minimum meta-Z score (for deletions) or maximum meta-Z score (for amplifications) within a segment was chosen to represent the segment as a whole.

Therapeutic sensitivity data for PDXs was acquired from (Gao et al., 2015a). To identify mutations that correlated with therapy sensitivity, for each drug or drug combination, a comparison was performed if five or more PDXs had a mutation in a gene of interest, and if five or more PDXs were wild-type for a gene of interest. For genes and therapies fitting these criteria, we next identified instances in which the therapy resulted in a clinical response in the mutant population, defined as an average ‘Best Average Response’<15% tumor growth among PDXs with a mutation in the gene of interest. Finally, for genes and therapies fitting these criteria, we performed a t-test for the ‘Best Average Response’ between PDXs with mutant and wild-type copies of a gene of interest. We reported therapies in which these criteria were met and tumors with mutation were more sensitive to the therapy than tumors with wild-type copies of the gene of interest (p < 0.01).

To identify CNAs that correlated with therapy sensitivity in the PDX cohort, amplifications and deletions (CNA >|.3|) were called, and then considered separately. As above, CNAs were included if five or more PDXs exhibited an alteration, and if five or more PDXs did not exhibit that alteration. For genes and therapies fitting these criteria, we next identified instances in which the therapy resulted in a clinical response in the altered population, defined as an average ‘Best Average Response’<15% tumor growth among PDXs with an amplification or deletion in the gene of interest. Finally, for genes and therapies fitting these criteria, we performed a t-test for the ‘Best Average Response’ between PDXs with mutant and wild-type copies of a gene of interest. We reported therapies in which these criteria were met and tumors with a mutation were more sensitive to the therapy than tumors with wild-type copies of the gene of interest (p < 0.01).

Therapeutic sensitivity data from cancer cell lines was acquired from (Iorio et al., 2016). For this data, two different comparisons were used. First, the calculations described below were performed for cell lines from the specific cancer type that the high-confidence biomarker was identified in. If this analysis yielded no significant vulnerabilities, then the calculations were repeated across all cancer types (pan-cancer).

High-confidence mutations were assessed if five or more cell lines in the set of interest had a non-synonymous mutation in that gene, and if five or more cell lines had wild-type copies of that gene. CNAs were assessed if five or more cell lines had an alteration (deletion or amplification) of that gene, and if five or more cell lines lacked that alteration. For each comparison, T-tests were performed between the log(IC50) value of every tested compound. For single-cancer type analyses, a threshold of p < 0.01 was used to identify significance, while for pan-cancer analyses, a threshold of p < 0.0001 was used to identify significance.

Code

Request a detailed protocol

Code is available on GitHub at https://github.com/joan-smith/genomic-features-survival (Smith, 2018; copy archived at https://github.com/elifesciences-publications/genomic-features-survival).

Kaplan-Meier analysis

Request a detailed protocol

Kaplan-Meier plots were generated using Graphpad Prism. Deletions and amplifications in Kaplan-Meier plots correspond to CNAs > |0.3|; deep deletions and high-copy gains correspond to CNAs > |1|. P values reported in KM plots were generated by the log-rank test in Prism. Note that Kaplan-Meier plots are displayed in this manuscript primarily for the ease of visualizing patient outcomes. Z scores were always generated with Cox proportional hazards modeling, which does not require the selection of artificial cut-offs or thresholds for continuous data.

Additional data sources and tools

Request a detailed protocol

The 30 frequently-mutated cancer driver genes were acquired from (Zehir et al., 2017). NCI-SEER statistics were downloaded from https://seer.cancer.gov. Total tumor aneuploidy scores, ABSOLUTE-determined purity values, and leukocyte infiltration was obtained from (Taylor et al., 2018). Hyper-mutated samples were obtained from (Bailey et al., 2018). Lollipop plots were generated using Lollipops software (Jay and Brouwer, 2016). Density plots were generated with Python scripts using matplotlib (https://matplotlib.org/). Single base-pair mutations were mapped to codons using PolyPhen-2 (Adzhubei et al., 2010).

Supplemental text 1. Cox proportional hazards modeling

Request a detailed protocol

Multiple statistical techniques have been developed to perform survival or ‘time-to-failure’ analysis (reviewed in Kleinbaum and Klein, 2012). These include Kaplan-Meier analysis, Cox proportional hazards regression, accelerated failure time modeling, and many others. In this paper, we chose to apply Cox proportional hazards regression to analyze cancer survival data. The Cox model is represented by the following function:

h (t, X) = h_{0} (t) e^{\sum_{i = 1}^{n} β_{i} X_{i}}

Where t is the survival time, h(t, X) is the hazard function, h₀(t) is the baseline hazard, X_i is a potential prognostic variable, and β_i indicates the strength of the association between a prognostic variable and survival. In this model, patients have a baseline, time-dependent risk of death [h₀(t)], modified by time-independent prognostic features that either increase (β_i>0) or decrease (β_i<0) risk of death. In this paper, we report Z scores, which are calculated by dividing the regression coefficient (β_i) by its standard error.

Cox proportional hazards modeling was chosen for several reasons. First, unlike Kaplan-Meier analysis, Cox models do not require the selection of a threshold or cut-off, so continuous data like gene expression values do not need to be dichotomized. (Note that in this manuscript, Kaplan-Meier plots are provided for visualization purposes, but the reported Z scores are always from Cox models). Secondly, Cox models can accept both continuous and discrete input data, allowing this approach to be used to analyze both binary (e.g., mutant vs. non-mutant) and continuous (e.g., gene copy number) genomic features. Thirdly, Cox models are amenable to both univariate (i = 1) and multivariate (i > 1) analyses. Fourthly, Cox regression allows us to calculate Z scores and a p value for each association, as Z scores represent the number of standard deviations from the mean of a normal distribution. Fifthly, Z scores encode the directionality of an association: poor prognostic factors will exhibit β_i values greater than 0, while favorable prognostic factors will exhibit β_i values less than 0. This allows ‘positive’ and ‘negative’ survival features to be directly compared. Sixthly, Z scores are useful for meta-analyses, as they can be combined using Stouffer’s Method (Stouffer, 1949):

Z = \frac{\sum_{i = 1}^{n} Z_{i}}{\sqrt{k}}

Seventhly, Cox proportional hazards modeling is commonly used in both previous genome-wide survival analyses and in numerous clinical biomarkers studies (Dhanasekaran et al., 2001; Fukuoka et al., 2011; Gentles et al., 2015; Parker et al., 2009; Wang et al., 2005), facilitating comparison with other biomarker discovery efforts.

To verify the underlying normality of the Z scores, we generated qq plots for gene copy number values (Figure 1—figure supplement 2C). The resulting distributions for CNAs were generally linear, as expected, with occasional shoulders at low and high Z scores. We similarly calculated Z scores for all genes harboring coding-sequence mutations; however, we discovered that this resulted in plateaus around the origin in multiple cancer types. These aberrations were caused by the occurrence of rare, random mutations in multiple genes that lacked any prognostic power. To eliminate these plateaus, we experimented with different thresholds for mutational analysis. Considering only mutations that occurred in a certain percentage of cancer patients diminished the appearance of the plateaus, but high thresholds also eliminated from consideration mutations in a number of known cancer drivers. We selected a 2% threshold to balance between maintaining the normality of the Z score distribution while also retaining infrequent but significant mutations in driver genes.

Note that in many survival analysis papers, a ‘feature selection’ step is included to identify a minimal number of features that can accurately identify at-risk patients. We performed an unbiased, whole-genome analysis without feature selection, to generate a Z score for every gene and for every feature type in the genome. No feature selection step is applied in this work.

Supplemental text 2. Survival analysis in TCGA cohorts

Request a detailed protocol

Patient cohorts that were assembled for the TCGA were collected in order to allow a molecular analysis of the major cancer subtypes found within the United States. Though clinical information was collected for nearly all patients, these cohorts were not specifically chosen in order to conduct survival studies. We posit that our survival analysis is appropriate for several reasons. First, we verified that the overall survival times of patients within the TCGA is highly consistent with national epidemiological data collected by the NCI (Figure 1—figure supplement 2D–E). Secondly, we found that many well-established biomarkers hold prognostic significance in TCGA cohorts, including IDH1 mutations in glioma (Figure 1—figure supplement 7), TP53 mutations in breast cancer (Figure 1—figure supplement 3), tumor stage and grade in multiple cancer types (Figure 2—figure supplement 3), and more. Thirdly, we validated the survival patterns that we describe in the TCGA in several independent patient cohorts, indicating that these are not TCGA-specific phenomena (Figure 4). Fourthly, in an independent analysis of the quality of clinical annotations in the TCGA (Liu et al., 2018), none of the cohort/endpoint combinations chosen for this study were classified as ‘not recommended for use.’ Fifthly, our efforts build upon a robust body of work that has also performed survival analyses on TCGA cohorts, and, in some cases, similarly validated findings from the TCGA in independent patient populations (Andor et al., 2016; Davoli et al., 2017; Gentles et al., 2015; Guinney et al., 2015; Uhlen et al., 2017). Finally, we note that the TCGA has several benefits over standard investigator-initiated survival studies. Patient samples were collected and analyzed in an unbiased manner, precluding the possibility of the ‘file-drawer problem’ (failing to publish negative results) or post-hoc sample size adjustment (ending patient enrollment when a significant result is found). Significantly more molecular data is available from TCGA tumors than in any other comparably-sized dataset, which allows for multivariate and correlational analyses of different facets of tumor genomes. All data from the TCGA and all code from this manuscript are publicly-available, allowing easy replication and extension upon this analysis.

Data availability

Data is publicly available from The Cancer Genome Atlas, the International Cancer Genome Consortium, and the other resources listed in the Materials and Methods section. Code is available on GitHub at https://github.com/joan-smith/genomic-features-survival (copy archived at https://github.com/elifesciences-publications/genomic-features-survival).

The following previously published data sets were used

1. Broad Institute TCGA Genome Data Analysis Center
(2016) Broad GDAC Firehose
Broad Institute TCGA Firehose stddata__2016_01_28.
https://doi.org/10.7908/C11G0KM9
1. International Cancer Genome Consortium
(2017) ICGC Data Portal
International Cancer Genome Consortium.

https://dcc.icgc.org/releases/release_25

References

(2010) A method and server for predicting damaging missense mutations
Nature Methods 7:248–249.
https://doi.org/10.1038/nmeth0410-248
- PubMed
- Google Scholar
1. Allsbrook WC
2. Mangold KA
3. Johnson MH
4. Lane RB
5. Lane CG
6. Epstein JI
(2001) Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologist
Human Pathology 32:81–88.
https://doi.org/10.1053/hupa.2001.21135
- PubMed
- Google Scholar
1. Anaya J
2. Reon B
3. Chen WM
4. Bekiranov S
5. Dutta A
(2015) A pan-cancer analysis of prognostic genes
PeerJ 3:e1499.
https://doi.org/10.7717/peerj.1499
- PubMed
- Google Scholar
1. Anaya J
(2016) OncoRank: A pan-cancer method of combining survival correlations and its application to mRNAs, miRNAs, and lncRNAs
PeerJ Preprints 4:e2574v1.
https://doi.org/10.7287/peerj.preprints.2574v1
- Google Scholar
1. Andor N
2. Graham TA
3. Jansen M
4. Xia LC
5. Aktipis CA
6. Petritsch C
7. Ji HP
8. Maley CC
(2016) Pan-cancer analysis of the extent and consequences of intratumor heterogeneity
Nature Medicine 22:105–113.
https://doi.org/10.1038/nm.3984
- PubMed
- Google Scholar
1. Aronson JK
(2005) Biomarkers and surrogate endpoints
British Journal of Clinical Pharmacology 59:491–494.
https://doi.org/10.1111/j.1365-2125.2005.02435.x
- PubMed
- Google Scholar
1. Bailey ML
2. O'Neil NJ
3. van Pel DM
4. Solomon DA
5. Waldman T
6. Hieter P
(2014) Glioblastoma cells containing mutations in the cohesin component STAG2 are sensitive to PARP inhibition
Molecular Cancer Therapeutics 13:724–732.
https://doi.org/10.1158/1535-7163.MCT-13-0749
- PubMed
- Google Scholar
1. Bailey MH
2. Tokheim C
3. Porta-Pardo E
4. Sengupta S
5. Bertrand D
6. Weerasinghe A
7. Colaprico A
8. Wendl MC
9. Kim J
10. Reardon B
11. Ng PK
12. Jeong KJ
13. Cao S
14. Wang Z
15. Gao J
16. Gao Q
17. Wang F
18. Liu EM
19. Mularoni L
20. Rubio-Perez C
21. Nagarajan N
22. Cortés-Ciriano I
23. Zhou DC
24. Liang WW
25. Hess JM
26. Yellapantula VD
27. Tamborero D
28. Gonzalez-Perez A
29. Suphavilai C
30. Ko JY
31. Khurana E
32. Park PJ
33. Van Allen EM
34. Liang H
35. Lawrence MS
36. Godzik A
37. Lopez-Bigas N
38. Stuart J
39. Wheeler D
40. Getz G
41. Chen K
42. Lazar AJ
43. Mills GB
44. Karchin R
45. Ding L
46. MC3 Working Group
47. Cancer genome atlas research network
(2018) Comprehensive characterization of cancer driver genes and mutations
Cell 173:371–385.
https://doi.org/10.1016/j.cell.2018.02.060
- PubMed
- Google Scholar
1. Bhatt JR
2. Klotz L
(2016) Overtreatment in cancer - is it a problem?
Expert Opinion on Pharmacotherapy 17:1–5.
https://doi.org/10.1517/14656566.2016.1115481
- PubMed
- Google Scholar
(2013) Is DCIS breast cancer, and how do I treat it?
Current Treatment Options in Oncology 14:75–87.
https://doi.org/10.1007/s11864-012-0217-1
- PubMed
- Google Scholar
(2010) Alterations in p53, BRCA1, ATM, PIK3CA, and HER2 genes and their effect in modifying clinicopathological characteristics and overall survival of Bulgarian patients with breast cancer
Journal of Cancer Research and Clinical Oncology 136:1657–1669.
https://doi.org/10.1007/s00432-010-0824-9
- PubMed
- Google Scholar
1. Carter SL
2. Cibulskis K
3. Helman E
4. McKenna A
5. Shen H
6. Zack T
7. Laird PW
8. Onofrio RC
9. Winckler W
10. Weir BA
11. Beroukhim R
12. Pellman D
13. Levine DA
14. Lander ES
15. Meyerson M
16. Getz G
(2012) Absolute quantification of somatic DNA alterations in human cancer
Nature Biotechnology 30:413–421.
https://doi.org/10.1038/nbt.2203
- PubMed
- Google Scholar
1. Ceccarelli M
2. Barthel FP
3. Malta TM
4. Sabedot TS
5. Salama SR
6. Murray BA
7. Morozova O
8. Newton Y
9. Radenbaugh A
10. Pagnotta SM
11. Anjum S
12. Wang J
13. Manyam G
14. Zoppoli P
15. Ling S
16. Rao AA
17. Grifford M
18. Cherniack AD
19. Zhang H
20. Poisson L
21. Carlotti CG
22. Tirapelli DP
23. Rao A
24. Mikkelsen T
25. Lau CC
26. Yung WK
27. Rabadan R
28. Huse J
29. Brat DJ
30. Lehman NL
31. Barnholtz-Sloan JS
32. Zheng S
33. Hess K
34. Rao G
35. Meyerson M
36. Beroukhim R
37. Cooper L
38. Akbani R
39. Wrensch M
40. Haussler D
41. Aldape KD
42. Laird PW
43. Gutmann DH
44. Noushmehr H
45. Iavarone A
46. Verhaak RG
47. TCGA Research Network
(2016) Molecular profiling reveals biologically discrete subsets and pathways of progression in Diffuse Glioma
Cell 164:550–563.
https://doi.org/10.1016/j.cell.2015.12.028
- PubMed
- Google Scholar
1. Connolly JL
2. Schnitt SJ
3. Wang HH
4. Longtine JA
5. Dvorak A
6. Dvorak HF
(2003)
Holland-Frei Cancer Medicine

Principles of Cancer Pathology, Holland-Frei Cancer Medicine, Sixth edition, Decker Inc.
- Google Scholar
(1997) Improving diagnostic accuracy and interobserver concordance in the classification and grading of primary gliomas
Cancer 79:1381–1393.
https://doi.org/10.1002/(SICI)1097-0142(19970401)79:7<1381::AID-CNCR16>3.0.CO;2-W
- PubMed
- Google Scholar
1. Cuzick J
2. Swanson GP
3. Fisher G
4. Brothman AR
5. Berney DM
6. Reid JE
7. Mesher D
8. Speights VO
9. Stankiewicz E
10. Foster CS
11. Møller H
12. Scardino P
13. Warren JD
14. Park J
15. Younus A
16. Flake DD
17. Wagner S
18. Gutin A
19. Lanchbury JS
20. Stone S
21. Transatlantic Prostate Group
(2011) Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study
The Lancet Oncology 12:245–255.
https://doi.org/10.1016/S1470-2045(10)70295-3
- PubMed
- Google Scholar
1. Dancik GM
2. Theodorescu D
(2015) The prognostic value of cell cycle gene expression signatures in muscle invasive, high-grade bladder cancer
Bladder Cancer 1:45–63.
https://doi.org/10.3233/BLC-150012
- Google Scholar
1. Davoli T
2. Xu AW
3. Mengwasser KE
4. Sack LM
5. Yoon JC
6. Park PJ
7. Elledge SJ
(2013) Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome
Cell 155:948–962.
https://doi.org/10.1016/j.cell.2013.10.011
- PubMed
- Google Scholar
1. Davoli T
2. Uno H
3. Wooten EC
4. Elledge SJ
(2017) Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy
Science 355:eaaf8399.
https://doi.org/10.1126/science.aaf8399
- PubMed
- Google Scholar
1. Deming SL
2. Nass SJ
3. Dickson RB
4. Trock BJ
(2000) C-myc amplification in breast cancer: a meta-analysis of its occurrence and prognostic relevance
British Journal of Cancer 83:1688–1695.
https://doi.org/10.1054/bjoc.2000.1522
- PubMed
- Google Scholar
1. Dhanasekaran SM
2. Barrette TR
3. Ghosh D
4. Shah R
5. Varambally S
6. Kurachi K
7. Pienta KJ
8. Rubin MA
9. Chinnaiyan AM
(2001) Delineation of prognostic biomarkers in prostate cancer
Nature 412:822–826.
https://doi.org/10.1038/35090585
- PubMed
- Google Scholar
1. Driscoll JJ
2. Rixe O
(2009) Overall survival: still the gold standard: why overall survival remains the definitive end point in cancer clinical trials
Cancer journal 15:401–405.
https://doi.org/10.1097/PPO.0b013e3181bdc2e0
- PubMed
- Google Scholar
1. Elmore JG
2. Longton GM
3. Carney PA
4. Geller BM
5. Onega T
6. Tosteson AN
7. Nelson HD
8. Pepe MS
9. Allison KH
10. Schnitt SJ
11. O'Malley FP
12. Weaver DL
(2015) Diagnostic concordance among pathologists interpreting breast biopsy specimens
JAMA 313:1122–1132.
https://doi.org/10.1001/jama.2015.1405
- PubMed
- Google Scholar
1. Ensor JE
(2014) Biomarker validation: common data analysis concerns
The Oncologist 19:886–891.
https://doi.org/10.1634/theoncologist.2014-0061
- PubMed
- Google Scholar
(2013) Overdiagnosis and overtreatment in cancer: an opportunity for improvement
JAMA 310:797–798.
https://doi.org/10.1001/jama.2013.108415
- PubMed
- Google Scholar
(1984)
Influence of cellular DNA content on survival in advanced ovarian cancer

Cancer Research 44:397–400.
- PubMed
- Google Scholar
1. Fukuoka M
2. Wu YL
3. Thongprasert S
4. Sunpaweravong P
5. Leong SS
6. Sriuranpong V
7. Chao TY
8. Nakagawa K
9. Chu DT
10. Saijo N
11. Duffield EL
12. Rukazenkov Y
13. Speake G
14. Jiang H
15. Armour AA
16. To KF
17. Yang JC
18. Mok TS
19. Y-l W
20. K-f T
21. Tsk M
(2011) Biomarker analyses and final overall survival results from a phase III, randomized, open-label, first-line study of gefitinib versus carboplatin/paclitaxel in clinically selected patients with advanced non-small-cell lung cancer in Asia (IPASS)
Journal of Clinical Oncology 29:2866–2874.
https://doi.org/10.1200/JCO.2010.33.4235
- PubMed
- Google Scholar
1. Gagan J
2. Van Allen EM
(2015) Next-generation sequencing to guide cancer therapy
Genome Medicine 7:80.
https://doi.org/10.1186/s13073-015-0203-x
- PubMed
- Google Scholar
1. Gao J
2. Aksoy BA
3. Dogrusoz U
4. Dresdner G
5. Gross B
6. Sumer SO
7. Sun Y
8. Jacobsen A
9. Sinha R
10. Larsson E
11. Cerami E
12. Sander C
13. Schultz N
(2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal
Science Signaling 6:pl1.
https://doi.org/10.1126/scisignal.2004088
- PubMed
- Google Scholar
1. Gao H
2. Korn JM
3. Ferretti S
4. Monahan JE
5. Wang Y
6. Singh M
7. Zhang C
8. Schnell C
9. Yang G
10. Zhang Y
11. Balbin OA
12. Barbe S
13. Cai H
14. Casey F
15. Chatterjee S
16. Chiang DY
17. Chuai S
18. Cogan SM
19. Collins SD
20. Dammassa E
21. Ebel N
22. Embry M
23. Green J
24. Kauffmann A
25. Kowal C
26. Leary RJ
27. Lehar J
28. Liang Y
29. Loo A
30. Lorenzana E
31. Robert McDonald E
32. McLaughlin ME
33. Merkin J
34. Meyer R
35. Naylor TL
36. Patawaran M
37. Reddy A
38. Röelli C
39. Ruddy DA
40. Salangsang F
41. Santacroce F
42. Singh AP
43. Tang Y
44. Tinetto W
45. Tobler S
46. Velazquez R
47. Venkatesan K
48. Von Arx F
49. Wang HQ
50. Wang Z
51. Wiesmann M
52. Wyss D
53. Xu F
54. Bitter H
55. Atadja P
56. Lees E
57. Hofmann F
58. Li E
59. Keen N
60. Cozens R
61. Jensen MR
62. Pryer NK
63. Williams JA
64. Sellers WR
(2015a) High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response
Nature Medicine 21:1318–1325.
https://doi.org/10.1038/nm.3954
- PubMed
- Google Scholar
1. Gao J
2. Adams RP
3. Swain SM
(2015b) Does CDKN2A loss predict palbociclib benefit?
Current Oncology 22:498–e501.
https://doi.org/10.3747/co.22.2700
- PubMed
- Google Scholar
1. Gentles AJ
2. Newman AM
3. Liu CL
4. Bratman SV
5. Feng W
6. Kim D
7. Nair VS
8. Xu Y
9. Khuong A
10. Hoang CD
11. Diehn M
12. West RB
13. Plevritis SK
14. Alizadeh AA
(2015) The prognostic landscape of genes and infiltrating immune cells across human cancers
Nature Medicine 21:938–945.
https://doi.org/10.1038/nm.3909
- PubMed
- Google Scholar
(2013) Poor interobserver reproducibility in the diagnosis of high-grade endometrial carcinoma
The American Journal of Surgical Pathology 37:874–881.
https://doi.org/10.1097/PAS.0b013e31827f576a
- PubMed
- Google Scholar
(2017) Widespread post-transcriptional attenuation of genomic copy-number variation in cancer
Cell Systems 5:386–398.
https://doi.org/10.1016/j.cels.2017.08.013
- PubMed
- Google Scholar
(2009) Androgen receptor levels and association with PIK3CA mutations and prognosis in breast cancer
Clinical Cancer Research 15:2472–2478.
https://doi.org/10.1158/1078-0432.CCR-08-1763
- PubMed
- Google Scholar
1. Goossens N
2. Nakagawa S
3. Sun X
4. Hoshida Y
(2015) Cancer biomarker discovery and validation
Translational Cancer Research 4:256–269.
https://doi.org/10.3978/j.issn.2218-676X.2015.06.04
- PubMed
- Google Scholar
1. Guan JL
2. Zhong WZ
3. An SJ
4. Yang JJ
5. Su J
6. Chen ZH
7. Yan HH
8. Chen ZY
9. Huang ZM
10. Zhang XC
11. Nie Q
12. Wu YL
(2013) KRAS mutation in patients with lung cancer: a predictor for poor prognosis but not for EGFR-TKIs or chemotherapy
Annals of Surgical Oncology 20:1381–1388.
https://doi.org/10.1245/s10434-012-2754-z
- PubMed
- Google Scholar
1. Guinney J
2. Dienstmann R
3. Wang X
4. de Reyniès A
5. Schlicker A
6. Soneson C
7. Marisa L
8. Roepman P
9. Nyamundanda G
10. Angelino P
11. Bot BM
12. Morris JS
13. Simon IM
14. Gerster S
15. Fessler E
16. De Sousa E Melo F
17. Missiaglia E
18. Ramay H
19. Barras D
20. Homicsko K
21. Maru D
22. Manyam GC
23. Broom B
24. Boige V
25. Perez-Villamil B
26. Laderas T
27. Salazar R
28. Gray JW
29. Hanahan D
30. Tabernero J
31. Bernards R
32. Friend SH
33. Laurent-Puig P
34. Medema JP
35. Sadanandam A
36. Wessels L
37. Delorenzi M
38. Kopetz S
39. Vermeulen L
40. Tejpar S
(2015) The consensus molecular subtypes of colorectal cancer
Nature Medicine 21:1350–1356.
https://doi.org/10.1038/nm.3967
- PubMed
- Google Scholar
(2013) Cancer drug resistance: an evolving paradigm
Nature Reviews Cancer 13:714–726.
https://doi.org/10.1038/nrc3599
- PubMed
- Google Scholar
1. Hutchins G
2. Southward K
3. Handley K
4. Magill L
5. Beaumont C
6. Stahlschmidt J
7. Richman S
8. Chambers P
9. Seymour M
10. Kerr D
11. Gray R
12. Quirke P
(2011) Value of mismatch repair, KRAS, and BRAF mutations in predicting recurrence and benefits from chemotherapy in colorectal cancer
Journal of Clinical Oncology 29:1261–1270.
https://doi.org/10.1200/JCO.2010.30.1366
- PubMed
- Google Scholar
1. Iorio F
2. Knijnenburg TA
3. Vis DJ
4. Bignell GR
5. Menden MP
6. Schubert M
7. Aben N
8. Gonçalves E
9. Barthorpe S
10. Lightfoot H
11. Cokelaer T
12. Greninger P
13. van Dyk E
14. Chang H
15. de Silva H
16. Heyn H
17. Deng X
18. Egan RK
19. Liu Q
20. Mironenko T
21. Mitropoulos X
22. Richardson L
23. Wang J
24. Zhang T
25. Moran S
26. Sayols S
27. Soleimani M
28. Tamborero D
29. Lopez-Bigas N
30. Ross-Macdonald P
31. Esteller M
32. Gray NS
33. Haber DA
34. Stratton MR
35. Benes CH
36. Wessels LFA
37. Saez-Rodriguez J
38. McDermott U
39. Garnett MJ
(2016) A landscape of pharmacogenomic interactions in cancer
Cell 166:740–754.
https://doi.org/10.1016/j.cell.2016.06.017
- PubMed
- Google Scholar
1. Jamal-Hanjani M
2. Wilson GA
3. McGranahan N
4. Birkbak NJ
5. Watkins TBK
6. Veeriah S
7. Shafi S
8. Johnson DH
9. Mitter R
10. Rosenthal R
11. Salm M
12. Horswell S
13. Escudero M
14. Matthews N
15. Rowan A
16. Chambers T
17. Moore DA
18. Turajlic S
19. Xu H
20. Lee SM
21. Forster MD
22. Ahmad T
23. Hiley CT
24. Abbosh C
25. Falzon M
26. Borg E
27. Marafioti T
28. Lawrence D
29. Hayward M
30. Kolvekar S
31. Panagiotopoulos N
32. Janes SM
33. Thakrar R
34. Ahmed A
35. Blackhall F
36. Summers Y
37. Shah R
38. Joseph L
39. Quinn AM
40. Crosbie PA
41. Naidu B
42. Middleton G
43. Langman G
44. Trotter S
45. Nicolson M
46. Remmen H
47. Kerr K
48. Chetty M
49. Gomersall L
50. Fennell DA
51. Nakas A
52. Rathinam S
53. Anand G
54. Khan S
55. Russell P
56. Ezhil V
57. Ismail B
58. Irvin-Sellers M
59. Prakash V
60. Lester JF
61. Kornaszewska M
62. Attanoos R
63. Adams H
64. Davies H
65. Dentro S
66. Taniere P
67. O'Sullivan B
68. Lowe HL
69. Hartley JA
70. Iles N
71. Bell H
72. Ngai Y
73. Shaw JA
74. Herrero J
75. Szallasi Z
76. Schwarz RF
77. Stewart A
78. Quezada SA
79. Le Quesne J
80. Van Loo P
81. Dive C
82. Hackshaw A
83. Swanton C
84. TRACERx Consortium
(2017) Tracking the evolution of non-small-cell lung cancer
New England Journal of Medicine 376:2109–2121.
https://doi.org/10.1056/NEJMoa1616288
- PubMed
- Google Scholar
1. Jay JJ
2. Brouwer C
(2016) Lollipops in the clinic: information dense mutation plots for precision medicine
PLoS One 11:e0160519.
https://doi.org/10.1371/journal.pone.0160519
- PubMed
- Google Scholar
1. Johnson SB
2. Gross CP
3. Park HS
4. Yu JB
(2017) Use of alternative medicine for cancer and its impact on survival
Journal of Clinical Oncology 35:e18175.
https://doi.org/10.1200/JCO.2017.35.15_suppl.e18175
- Google Scholar
(1987) Aneuploid DNA content and high S-phase fraction of tumour cells are related to poor prognosis in patients with primary breast cancer
European Journal of Cancer and Clinical Oncology 23:277–282.
https://doi.org/10.1016/0277-5379(87)90071-X
- PubMed
- Google Scholar
1. Kannan K
2. Inagaki A
3. Silber J
4. Gorovets D
5. Zhang J
6. Kastenhuber ER
7. Heguy A
8. Petrini JH
9. Chan TA
10. Huse JT
(2012) Whole-exome sequencing identifies ATRX mutation as a key molecular determinant in lower-grade glioma
Oncotarget 3:1194–1203.
https://doi.org/10.18632/oncotarget.689
- PubMed
- Google Scholar
1. Kleinbaum DG
2. Klein M
(2012)
Statistics for Biology and Health

Survival Analysis: A Self-Learning Text, Statistics for Biology and Health, Third Edition, New York, Springer-Verlag.
- Google Scholar
1. Kokal W
2. Sheibani K
3. Terz J
4. Harada JR
(1986) Tumor DNA content in the prognosis of colorectal carcinoma
JAMA: The Journal of the American Medical Association 255:3123–3127.
https://doi.org/10.1001/jama.1986.03370220085032
- PubMed
- Google Scholar
(2014) Focal chromosomal copy number aberrations in cancer—Needles in a genome haystack
Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1843:2698–2704.
https://doi.org/10.1016/j.bbamcr.2014.08.001
- Google Scholar
1. Li SY
2. Rong M
3. Grieu F
4. Iacopetta B
(2006) PIK3CA mutations in breast cancer are associated with poor outcome
Breast Cancer Research and Treatment 96:91–95.
https://doi.org/10.1007/s10549-005-9048-0
- PubMed
- Google Scholar
1. Liu J
2. Lichtenberg T
3. Hoadley KA
4. Poisson LM
5. Lazar AJ
6. Cherniack AD
7. Kovatich AJ
8. Benz CC
9. Levine DA
10. Lee AV
11. Omberg L
12. Wolf DM
13. Shriver CD
14. Thorsson V
15. Hu H
16. Jianfang L
17. Caesar-Johnson SJ
18. Demchok JA
19. Felau I
20. Kasapi M
21. Ferguson ML
22. Hutter CM
23. Sofia HJ
24. Tarnuzzer R
25. Wang Z
26. Yang L
27. Zenklusen JC
28. Zhang J
29. Cancer Genome Atlas Research Network
(2018) An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics
Cell 173:400–416.
https://doi.org/10.1016/j.cell.2018.02.052
- PubMed
- Google Scholar
(1995)
Five-year follow-up after radical surgery for colorectal cancer. Results of a prospective randomized trial

Archives of Surgery 130:1062–1067.
- PubMed
- Google Scholar
1. Marabese M
2. Ganzinelli M
3. Garassino MC
4. Shepherd FA
5. Piva S
6. Caiola E
7. Macerelli M
8. Bettini A
9. Lauricella C
10. Floriani I
11. Farina G
12. Longo F
13. Bonomi L
14. Fabbri MA
15. Veronese S
16. Marsoni S
17. Broggini M
18. Rulli E
(2015) KRAS mutations affect prognosis of non-small-cell lung cancer patients treated with first-line platinum containing chemotherapy
Oncotarget 6:34014–34022.
https://doi.org/10.18632/oncotarget.5607
- PubMed
- Google Scholar
1. Merkel DE
2. McGuire WL
(1990) Ploidy, proliferative activity and prognosis. DNA flow cytometry of solid tumors
Cancer 65:1194–1205.
https://doi.org/10.1002/1097-0142(19900301)65:5<1194::AID-CNCR2820650528>3.0.CO;2-M
- PubMed
- Google Scholar
1. Mosley JD
2. Keri RA
(2008) Cell cycle correlated genes dictate the prognostic power of breast cancer gene lists
BMC Medical Genomics 1:11.
https://doi.org/10.1186/1755-8794-1-11
- PubMed
- Google Scholar
(2014) Prognostic and predictive biomarkers: tools in personalized oncology
Molecular Diagnosis & Therapy 18:273–284.
https://doi.org/10.1007/s40291-013-0077-9
- PubMed
- Google Scholar
(2005) Prognostic and predictive molecular markers in DCIS: a review
Advances in Anatomic Pathology 12:256–264.
https://doi.org/10.1097/01.pap.0000184177.65919.5e
- PubMed
- Google Scholar
1. Oshiro C
2. Kagara N
3. Naoi Y
4. Shimoda M
5. Shimomura A
6. Maruyama N
7. Shimazu K
8. Kim SJ
9. Noguchi S
(2015) PIK3CA mutations in serum DNA are predictive of recurrence in primary breast cancer patients
Breast Cancer Research and Treatment 150:299–307.
https://doi.org/10.1007/s10549-015-3322-6
- PubMed
- Google Scholar
1. Paez JG
2. Jänne PA
3. Lee JC
4. Tracy S
5. Greulich H
6. Gabriel S
7. Herman P
8. Kaye FJ
9. Lindeman N
10. Boggon TJ
11. Naoki K
12. Sasaki H
13. Fujii Y
14. Eck MJ
15. Sellers WR
16. Johnson BE
17. Meyerson M
(2004) EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy
Science 304:1497–1500.
https://doi.org/10.1126/science.1099314
- PubMed
- Google Scholar
1. Pang B
2. Cheng S
3. Sun SP
4. An C
5. Liu ZY
6. Feng X
7. Liu GJ
(2014) Prognostic role of PIK3CA mutations and their association with hormone receptor expression in breast cancer: a meta-analysis
Scientific Reports 4:srep06255.
https://doi.org/10.1038/srep06255
- PubMed
- Google Scholar
1. Parker JS
2. Mullins M
3. Cheang MC
4. Leung S
5. Voduc D
6. Vickery T
7. Davies S
8. Fauron C
9. He X
10. Hu Z
11. Quackenbush JF
12. Stijleman IJ
13. Palazzo J
14. Marron JS
15. Nobel AB
16. Mardis E
17. Nielsen TO
18. Ellis MJ
19. Perou CM
20. Bernard PS
(2009) Supervised risk predictor of breast cancer based on intrinsic subtypes
Journal of Clinical Oncology 27:1160–1167.
https://doi.org/10.1200/JCO.2008.18.1370
- PubMed
- Google Scholar
(2007) TP53 mutations in human cancers: functional selection and impact on cancer prognosis and outcomes
Oncogene 26:2157–2165.
https://doi.org/10.1038/sj.onc.1210302
- PubMed
- Google Scholar
1. Pollack JR
2. Sørlie T
3. Perou CM
4. Rees CA
5. Jeffrey SS
6. Lonning PE
7. Tibshirani R
8. Botstein D
9. Børresen-Dale AL
10. Brown PO
(2002) Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors
PNAS 99:12963–12968.
https://doi.org/10.1073/pnas.162471999
- PubMed
- Google Scholar
1. Richman SD
2. Seymour MT
3. Chambers P
4. Elliott F
5. Daly CL
6. Meade AM
7. Taylor G
8. Barrett JH
9. Quirke P
(2009) KRAS and BRAF mutations in advanced colorectal cancer are associated with poor prognosis but do not preclude benefit from oxaliplatin or irinotecan: results from the MRC FOCUS trial
Journal of Clinical Oncology 27:5931–5937.
https://doi.org/10.1200/JCO.2009.22.4295
- PubMed
- Google Scholar
1. Rosenthal R
(1979) The file drawer problem and tolerance for null results
Psychological Bulletin 86:638–641.
https://doi.org/10.1037/0033-2909.86.3.638
- Google Scholar
1. Roth AD
2. Tejpar S
3. Delorenzi M
4. Yan P
5. Fiocca R
6. Klingbiel D
7. Dietrich D
8. Biesmans B
9. Bodoky G
10. Barone C
11. Aranda E
12. Nordlinger B
13. Cisar L
14. Labianca R
15. Cunningham D
16. Van Cutsem E
17. Bosman F
(2010) Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC-3, EORTC 40993, SAKK 60-00 trial
Journal of Clinical Oncology 28:466–474.
https://doi.org/10.1200/JCO.2009.23.3452
- PubMed
- Google Scholar
1. Roy DM
2. Walsh LA
3. Desrichard A
4. Huse JT
5. Wu W
6. Gao J
7. Bose P
8. Lee W
9. Chan TA
(2016) Integrated genomics for pinpointing survival loci within arm-level somatic copy number alterations
Cancer Cell 29:737–750.
https://doi.org/10.1016/j.ccell.2016.03.025
- PubMed
- Google Scholar
1. Santaguida S
2. Amon A
(2015) Short- and long-term effects of chromosome mis-segregation and aneuploidy
Nature Reviews Molecular Cell Biology 16:473–485.
https://doi.org/10.1038/nrm4025
- PubMed
- Google Scholar
Preprint
1. Scargle JD
(1999) Publication Bias (The “File-Drawer Problem”) in Scientific Inference
arXiv.

https://arxiv.org/abs/physics/9909033
- Google Scholar
(2012) Prognostic value of TP53, KRAS and EGFR mutations in nonsmall cell lung cancer: the EUELC cohort
European Respiratory Journal 40:177–184.
https://doi.org/10.1183/09031936.00097311
- PubMed
- Google Scholar
(1993) A new regulatory motif in cell-cycle control causing specific inhibition of cyclin D/CDK4
Nature 366:704–707.
https://doi.org/10.1038/366704a0
- PubMed
- Google Scholar
1. Sheltzer JM
2. Amon A
(2011) The aneuploidy paradox: costs and benefits of an incorrect karyotype
Trends in Genetics 27:446–453.
https://doi.org/10.1016/j.tig.2011.07.003
- PubMed
- Google Scholar
1. Sheltzer JM
2. Ko JH
3. Replogle JM
4. Habibe Burgos NC
5. Chung ES
6. Meehl CM
7. Sayles NM
8. Passerini V
9. Storchova Z
10. Amon A
11. Jh K
(2017) Single-chromosome gains commonly function as tumor suppressors
Cancer Cell 31:240–255.
https://doi.org/10.1016/j.ccell.2016.12.004
- PubMed
- Google Scholar
1. Shi J
2. Yao D
3. Liu W
4. Wang N
5. Lv H
6. Zhang G
7. Ji M
8. Xu L
9. He N
10. Shi B
11. Hou P
(2012) Highly frequent PIK3CA amplification is associated with poor prognosis in gastric cancer
BMC Cancer 12:50.
https://doi.org/10.1186/1471-2407-12-50
- PubMed
- Google Scholar
1. Sholl LM
2. Do K
3. Shivdasani P
4. Cerami E
5. Dubuc AM
6. Kuo FC
7. Garcia EP
8. Jia Y
9. Davineni P
10. Abo RP
11. Pugh TJ
12. van Hummelen P
13. Thorner AR
14. Ducar M
15. Berger AH
16. Nishino M
17. Janeway KA
18. Church A
19. Harris M
20. Ritterhouse LL
21. Campbell JD
22. Rojas-Rudilla V
23. Ligon AH
24. Ramkissoon S
25. Cleary JM
26. Matulonis U
27. Oxnard GR
28. Chao R
29. Tassell V
30. Christensen J
31. Hahn WC
32. Kantoff PW
33. Kwiatkowski DJ
34. Johnson BE
35. Meyerson M
36. Garraway LA
37. Shapiro GI
38. Rollins BJ
39. Lindeman NI
40. MacConaill LE
(2016) Institutional implementation of clinical tumor profiling on an unselected cancer population
JCI Insight 1:e87062.
https://doi.org/10.1172/jci.insight.87062
- PubMed
- Google Scholar
Software
1. Smith J
(2018) Scripts supporting identification of genomic features affecting survival time in cancer, version 8c7c626
GitHub.

https://github.com/joan-smith/genomic-features-survival
1. Solimini NL
2. Xu Q
3. Mermel CH
4. Liang AC
5. Schlabach MR
6. Luo J
7. Burrows AE
8. Anselmo AN
9. Bredemeyer AL
10. Li MZ
11. Beroukhim R
12. Meyerson M
13. Elledge SJ
(2012) Recurrent hemizygous deletions in cancers may optimize proliferative potential
Science 337:104–109.
https://doi.org/10.1126/science.1219580
- PubMed
- Google Scholar
(2011) Homozygous 10q23/PTEN deletion and its impact on outcome in glioblastoma: a prospective translational study on a uniformly treated cohort of adult patients
Neuropathology 31:376–383.
https://doi.org/10.1111/j.1440-1789.2010.01178.x
- PubMed
- Google Scholar
1. Stingele S
2. Stoehr G
3. Peplowska K
4. Cox J
5. Mann M
6. Storchova Z
(2012) Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells
Molecular Systems Biology 8:608.
https://doi.org/10.1038/msb.2012.40
- PubMed
- Google Scholar
Book
1. Stouffer SA
(1949)
The American Soldier

Princeton University Press.
- Google Scholar
1. Sun JM
2. Hwang DW
3. Ahn JS
4. Ahn MJ
5. Park K
(2013) Prognostic and predictive value of KRAS mutations in advanced non-small cell lung cancer
PLoS One 8:e64816.
https://doi.org/10.1371/journal.pone.0064816
- PubMed
- Google Scholar
1. Suzuki H
2. Aoki K
3. Chiba K
4. Sato Y
5. Shiozawa Y
6. Shiraishi Y
7. Shimamura T
8. Niida A
9. Motomura K
10. Ohka F
11. Yamamoto T
12. Tanahashi K
13. Ranjit M
14. Wakabayashi T
15. Yoshizato T
16. Kataoka K
17. Yoshida K
18. Nagata Y
19. Sato-Otsubo A
20. Tanaka H
21. Sanada M
22. Kondo Y
23. Nakamura H
24. Mizoguchi M
25. Abe T
26. Muragaki Y
27. Watanabe R
28. Ito I
29. Miyano S
30. Natsume A
31. Ogawa S
(2015) Mutational landscape and clonal architecture in grade II and III gliomas
Nature Genetics 47:458–468.
https://doi.org/10.1038/ng.3273
- PubMed
- Google Scholar
1. Swaminathan D
2. Swaminathan V
(2015) Geriatric oncology: problems with under-treatment within this population
Cancer Biology & Medicine 12:275–283.
https://doi.org/10.7497/j.issn.2095-3941.2015.0081
- PubMed
- Google Scholar
1. Taylor AM
2. Shih J
3. Ha G
4. Gao GF
5. Zhang X
6. Berger AC
7. Schumacher SE
8. Wang C
9. Hu H
10. Liu J
11. Lazar AJ
12. Cherniack AD
13. Beroukhim R
14. Meyerson M
(2018) Genomic and functional approaches to understanding cancer aneuploidy
Cancer Cell 33:676–689.
https://doi.org/10.1016/j.ccell.2018.03.007
- PubMed
- Google Scholar
(2009) BRAF mutation in metastatic colorectal cancer
New England Journal of Medicine 361:98–99.
https://doi.org/10.1056/NEJMc0904160
- PubMed
- Google Scholar
1. Tyner C
2. Barber GP
3. Casper J
4. Clawson H
5. Diekhans M
6. Eisenhart C
7. Fischer CM
8. Gibson D
9. Gonzalez JN
10. Guruvadoo L
11. Haeussler M
12. Heitner S
13. Hinrichs AS
14. Karolchik D
15. Lee BT
16. Lee CM
17. Nejad P
18. Raney BJ
19. Rosenbloom KR
20. Speir ML
21. Villarreal C
22. Vivian J
23. Zweig AS
24. Haussler D
25. Kuhn RM
26. Kent WJ
(2017) The UCSC genome browser database: 2017 update
Nucleic acids research 45:D626–D634.
https://doi.org/10.1093/nar/gkw1134
- PubMed
- Google Scholar
1. Uhlen M
2. Zhang C
3. Lee S
4. Sjöstedt E
5. Fagerberg L
6. Bidkhori G
7. Benfeitas R
8. Arif M
9. Liu Z
10. Edfors F
11. Sanli K
12. von Feilitzen K
13. Oksvold P
14. Lundberg E
15. Hober S
16. Nilsson P
17. Mattsson J
18. Schwenk JM
19. Brunnström H
20. Glimelius B
21. Sjöblom T
22. Edqvist PH
23. Djureinovic D
24. Micke P
25. Lindskog C
26. Mardinoglu A
27. Ponten F
(2017) A pathology atlas of the human cancer transcriptome
Science 357:eaan2507.
https://doi.org/10.1126/science.aan2507
- PubMed
- Google Scholar
1. van den Berge M
2. Sijen T
(2017) A male and female RNA marker to infer sex in forensic analysis
Forensic Science International: Genetics 26:70–76.
https://doi.org/10.1016/j.fsigen.2016.10.018
- PubMed
- Google Scholar
(2011) Most random gene expression signatures are significantly associated with breast cancer outcome
PLoS Computational Biology 7:e1002240.
https://doi.org/10.1371/journal.pcbi.1002240
- PubMed
- Google Scholar
1. Wang Y
2. Klijn JG
3. Zhang Y
4. Sieuwerts AM
5. Look MP
6. Yang F
7. Talantov D
8. Timmermans M
9. Meijer-van Gelder ME
10. Yu J
11. Jatkoe T
12. Berns EM
13. Atkins D
14. Foekens JA
(2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
The Lancet 365:671–679.
https://doi.org/10.1016/S0140-6736(05)70933-8
- PubMed
- Google Scholar
1. Wang L
2. Hurley DG
3. Watkins W
4. Araki H
5. Tamada Y
6. Muthukaruppan A
7. Ranjard L
8. Derkac E
9. Imoto S
10. Miyano S
11. Crampin EJ
12. Print CG
(2012) Cell cycle gene networks are associated with melanoma prognosis
PLoS One 7:e34247.
https://doi.org/10.1371/journal.pone.0034247
- PubMed
- Google Scholar
1. Williams BR
2. Prabhu VR
3. Hunter KE
4. Glazier CM
5. Whittaker CA
6. Housman DE
7. Amon A
(2008) Aneuploidy affects proliferation and spontaneous immortalization in mammalian cells
Science 322:703–709.
https://doi.org/10.1126/science.1160058
- PubMed
- Google Scholar
1. Wistuba II
2. Behrens C
3. Lombardi F
4. Wagner S
5. Fujimoto J
6. Raso MG
7. Spaggiari L
8. Galetta D
9. Riley R
10. Hughes E
11. Reid J
12. Sangale Z
13. Swisher SG
14. Kalhor N
15. Moran CA
16. Gutin A
17. Lanchbury JS
18. Barberis M
19. Kim ES
(2013) Validation of a proliferation-based expression signature as prognostic marker in early stage lung adenocarcinoma
Clinical Cancer Research 19:6261–6271.
https://doi.org/10.1158/1078-0432.CCR-13-0596
- PubMed
- Google Scholar
1. Young RC
(2003) Early-stage ovarian cancer: to treat or not to treat
JNCI Journal of the National Cancer Institute 95:94–95.
https://doi.org/10.1093/jnci/95.2.94
- PubMed
- Google Scholar
(2004) Adjuvant therapy for stage II colon cancer: an elephant in the living room?
Annals of Oncology 15:1310–1318.
https://doi.org/10.1093/annonc/mdh342
- PubMed
- Google Scholar
1. Zeeberg BR
2. Riss J
3. Kane DW
4. Bussey KJ
5. Uchio E
6. Linehan WM
7. Barrett JC
8. Weinstein JN
(2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics
BMC Bioinformatics 5:80.
https://doi.org/10.1186/1471-2105-5-80
- PubMed
- Google Scholar
1. Zehir A
2. Benayed R
3. Shah RH
4. Syed A
5. Middha S
6. Kim HR
7. Srinivasan P
8. Gao J
9. Chakravarty D
10. Devlin SM
11. Hellmann MD
12. Barron DA
13. Schram AM
14. Hameed M
15. Dogan S
16. Ross DS
17. Hechtman JF
18. DeLair DF
19. Yao J
20. Mandelker DL
21. Cheng DT
22. Chandramohan R
23. Mohanty AS
24. Ptashkin RN
25. Jayakumaran G
26. Prasad M
27. Syed MH
28. Rema AB
29. Liu ZY
30. Nafa K
31. Borsu L
32. Sadowska J
33. Casanova J
34. Bacares R
35. Kiecka IJ
36. Razumova A
37. Son JB
38. Stewart L
39. Baldi T
40. Mullaney KA
41. Al-Ahmadie H
42. Vakiani E
43. Abeshouse AA
44. Penson AV
45. Jonsson P
46. Camacho N
47. Chang MT
48. Won HH
49. Gross BE
50. Kundra R
51. Heins ZJ
52. Chen HW
53. Phillips S
54. Zhang H
55. Wang J
56. Ochoa A
57. Wills J
58. Eubank M
59. Thomas SB
60. Gardos SM
61. Reales DN
62. Galle J
63. Durany R
64. Cambria R
65. Abida W
66. Cercek A
67. Feldman DR
68. Gounder MM
69. Hakimi AA
70. Harding JJ
71. Iyer G
72. Janjigian YY
73. Jordan EJ
74. Kelly CM
75. Lowery MA
76. Morris LGT
77. Omuro AM
78. Raj N
79. Razavi P
80. Shoushtari AN
81. Shukla N
82. Soumerai TE
83. Varghese AM
84. Yaeger R
85. Coleman J
86. Bochner B
87. Riely GJ
88. Saltz LB
89. Scher HI
90. Sabbatini PJ
91. Robson ME
92. Klimstra DS
93. Taylor BS
94. Baselga J
95. Schultz N
96. Hyman DM
97. Arcila ME
98. Solit DB
99. Ladanyi M
100. Berger MF
(2017) Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients
Nature Medicine 23:703–713.
https://doi.org/10.1038/nm.4333
- PubMed
- Google Scholar
1. Zhang J
2. Baran J
3. Cros A
4. Guberman JM
5. Haider S
6. Hsu J
7. Liang Y
8. Rivkin E
9. Wang J
10. Whitty B
11. Wong-Erasmus M
12. Yao L
13. Kasprzyk A
(2011) International cancer genome consortium data portal--a one-stop shop for cancer genomics data
Database 2011:bar026.
https://doi.org/10.1093/database/bar026
- Google Scholar
1. Zhang J
2. Benavente CA
3. McEvoy J
4. Flores-Otero J
5. Ding L
6. Chen X
7. Ulyanov A
8. Wu G
9. Wilson M
10. Wang J
11. Brennan R
12. Rusch M
13. Manning AL
14. Ma J
15. Easton J
16. Shurtleff S
17. Mullighan C
18. Pounds S
19. Mukatira S
20. Gupta P
21. Neale G
22. Zhao D
23. Lu C
24. Fulton RS
25. Fulton LL
26. Hong X
27. Dooling DJ
28. Ochoa K
29. Naeve C
30. Dyson NJ
31. Mardis ER
32. Bahrami A
33. Ellison D
34. Wilson RK
35. Downing JR
36. Dyer MA
(2012) A novel retinoblastoma therapy from genomic and epigenetic analyses
Nature 481:329–334.
https://doi.org/10.1038/nature10733
- PubMed
- Google Scholar
(1987) Ploidy as a prognostic determinant in surgically treated lung cancer
The Lancet 2:530–533.
https://doi.org/10.1016/S0140-6736(87)92923-0
- PubMed
- Google Scholar

Article and author information

Author details

Joan C Smith

Google, Inc., New York, United States

Contribution
Conceptualization, Software, Formal analysis, Investigation, Methodology, Writing—review and editing

Competing interests
affiliated with Google Inc. The author has no financial interests to declare.
Jason M Sheltzer

Cold Spring Harbor Laboratory, Cold Spring Harbor, United States

Contribution
Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing

For correspondence
sheltzer@cshl.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1381-1323

Funding

National Institutes of Health (1DP5OD021385)

Jason M Sheltzer

Breast Cancer Alliance (Young Investigator Award)

Jason M Sheltzer

Cold Spring Harbor Laboratory (CSHL-Northwell Translational Cancer Research Grant)

Jason M Sheltzer

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank members of the Sheltzer Lab for helpful comments on this work. Research in the Sheltzer Lab is supported by an NIH Early Independence Award (1DP5OD021385), a Breast Cancer Alliance Young Investigator Award, and a CSHL-Northwell Translational Cancer Research Grant.

Version history

Received: June 14, 2018
Accepted: November 12, 2018
Version of Record published: December 11, 2018 (version 1)

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.