Abstract
Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus–specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Hoyert, D.L. & Xu, J. Deaths: preliminary data for 2011. Natl. Vital Stat. Rep. 61, 1–51 (2012).
Howlander, N. et al. SEER Cancer Statistics Review, 1975–2010 http://seer.cancer.gov/csr/1975_2010/ (2013).
Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3, 2650 (2013).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 4, 2185 (2013).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473–477 (2010).
Ostrow, S.L., Barshir, R., DeGregori, J., Yeger-Lotem, E. & Hershberg, R. Cancer evolution is associated with pervasive positive selection on globally expressed genes. PLoS Genet. 10, e1004239 (2014).
Huber, R., Pietsch, D., Panterodt, T. & Brand, K. Regulation of C/EBPβ and resulting functions in cells of the monocytic lineage. Cell. Signal. 24, 1287–1296 (2012).
Miyazawa, K., Mori, A., Yamamoto, K. & Okudaira, H. Transcriptional roles of CCAAT/enhancer binding protein-β, nuclear factor–κB, and C-promoter binding factor 1 in interleukin (IL)-1β–induced IL-6 synthesis by human rheumatoid fibroblast-like synoviocytes. J. Biol. Chem. 13, 7620–7627 (1998).
Zheng, R. & Blobel, G.A. GATA transcription factors and cancer. Genes Cancer 1, 1178–1188 (2010).
Kannan, M.B., Solovieva, V. & Blank, V. The small MAF transcription factors MAFF, MAFG and MAFK: current knowledge and perspectives. Biochim. Biophys. Acta 1823, 1841–1846 (2012).
Shaulian, E. AP-1—the Jun proteins: oncogenes or tumor suppressors in disguise? Cell. Signal. 22, 894–899 (2010).
Schödel, J. et al. Common genetic variants at the 11q13.3 renal cancer susceptibility locus influence binding of HIF to an enhancer of cyclin D1 expression. Nat. Genet. 44, 420–425 (2012).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., B 57, 289–300 (1995).
Zhao, M., Sun, J. & Zhao, Z. TSGene: a web resource for tumor suppressor genes. Nucleic Acids Res. 41, D970–D976 (2013).
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Van Vlierberghe, P. & Ferrando, A. The molecular basis of T cell acute lymphoblastic leukemia. J. Clin. Invest. 122, 3398–3406 (2012).
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).
Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Koboldt, D.C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Kel, A.E. et al. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576–3579 (2003).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Karolchik, D. et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).
Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl. Acad. Sci. USA 107, 139–144 (2010).
Hong, Y. On computing the distribution function for the Poisson binomial distribution. Comput. Stat. Data Anal. 59, 41–51 (2013).
Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Acknowledgements
We would like to thank members of the Snyder laboratory for critical reading of the manuscript. We would also like to thank A. Boyle for helpful discussion regarding the use of RegulomeDB. Finally, we would like to thank members of the ENCODE Consortium for their helpful scientific feedback during the course of this work. C.M. was supported by the Stanford Biomedical Informatics Training program and funds from the US National Institutes of Health (US NIH; 1K99CA191093). D.V.S. was supported by US NIH/National Human Genome Research Institute grant T32HG000044 and the Genentech Graduate Fellowship. This work was supported by funds to C.M. (US NIH, 1K99CA191093-01) and M.S. (US NIH, 5U54HG006996-04).
Author information
Authors and Affiliations
Contributions
J.A.R. and D.V.S. contributed to experimental design, execution and analysis for the experiments in Figures 5 and Supplementary Figure 6. C.M. contributed to all other figures. C.M. and M.S. conceived the experiments, analyzed the data and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Processing of mutations and generation of simulated controls.
On the left, a diagram illustrates the manner in which covariate-matched simulated mutations were obtained, filtered to remove potential false positives from mapping errors and split into experimental and validation subsets. The panels on the upper right shows the fraction of mutations in each RegulomeDB category that were filtered out owing to a high mismap score. Also depicted is a Venn diagram showing the number of mutations filtered out as potential false positives from mapping errors as well as the overlap of these mutations with difficult-to-align regions of the genome. These mutations are enriched in category 3b as well as in regions with no regulatory annotations (6 and 7). The panel on the middle right shows the breakdown of transcript annotations for real and simulated mutations in each RegulomeDB category. The panel on the bottom right shows the distributions of replication timing and base-pair composition for simulated and real mutations for each cancer type. The panel on the bottom left shows the similarity in the distributions of the number of mutations per sample for the experimental and validation subsets in each cancer type.
Supplementary Figure 2 Mutation calling quality metrics.
(a) The distribution of the variant allele fraction for each cancer type is shown via violin plots. (b) A scatter plot showing the relationship between genome sequencing file size and number of mutations called for that sample. (c) Box plots and overlaid points depict the median coverage of each sample grouped by cancer type. BRCA, breast invasive carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma.
Supplementary Figure 3 Similarity of sets of transcription factor–bound mutations.
For each pair of transcription factors shown in Figure 2g, the Jaccard similarity was computed on the basis of the overlap in the genomic positions mutated with RegulomeDB transcription factor annotations for the two factors. Factors were clustered on the basis of this similarity score, and the scores are plotted here as a heat map. The average enrichment score of real versus simulated mutations for all cancer types for each transcription factor is shown below the transcription factor labels on the x axis.
Supplementary Figure 4 Mutational patterns in transcription factor binding sites.
(a) An analysis was performed to identify all transcription factor motifs with an increased match score in mutant sites compared to reference sites. Only mutations in sites for the CEBP factors were used for this analysis. (b) The sequences surrounding the mutations were aligned using TTG(T/C) as the seed. This seed motif, the aligned reference and the aligned mutant sequences are shown as well as a histogram of the number and type of mutations at each position. (c) The most common sequences of eight bases in length contributing to the motif in b are shown. (d) The counts of mutations from these sites by patient are shown. One patient with UCEC has a disproportionate number of these mutant sites. (e) Box plots of RNA-seq expression values for samples with and without CEBP mutations are show for the factors matching CEBP motifs or motifs with a higher match score in a. (f) Seed, reference and variant alignments as well as mutation counts by position are shown for the factors from Figure 3f.
Supplementary Figure 5 Mutation probability fitting and model validation test.
(a) Logistic regression allows for the calculation of the probability of mutation conditioned on replication timing, base-pair composition, transcript type and patient ID. Box plots of predicted probabilities across all patients are shown for the various combinations of transcript region, base-pair type and replication timing bin. (b) The fraction of sites identified in the validation set that can be found in the experimental set and vice versa are plotted, showing the robustness of the method even with a small number of patient samples. (c) A box plot depicting the difference in log10 RNA-seq expression data for PLCXD1 in samples either with or without a mutation at chr. X: 197,480. P value was determined by the bootstrap method, as the data were not normally distributed.
Supplementary Figure 6 Screening for functional mutated regulatory elements.
Wild-type and mutant versions of four control regions and ten repeatedly mutated regulatory regions, including one of the TERT promoter mutations, were assayed for their ability to enhance the transcriptional activity of a minimal promoter using a luciferase assay. Constructs were assayed in NCI-H1437 lung adenocarcinoma cells.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–6. (PDF 835 kb)
Rights and permissions
About this article
Cite this article
Melton, C., Reuter, J., Spacek, D. et al. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet 47, 710–716 (2015). https://doi.org/10.1038/ng.3332
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3332
This article is cited by
-
Fyn and Lyn gene polymorphisms impact the risk of thyroid cancer
Molecular Genetics and Genomics (2022)
-
Clonal expansion in non-cancer tissues
Nature Reviews Cancer (2021)
-
Cancer drivers and clonal dynamics in acute lymphoblastic leukaemia subtypes
Blood Cancer Journal (2021)
-
Estimating the predictive power of silent mutations on cancer classification and prognosis
npj Genomic Medicine (2021)
-
Long non-coding RNAs as the critical factors during tumor progressions among Iranian population: an overview
Cell & Bioscience (2020)