Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Recurrent somatic mutations in regulatory regions of human cancer genomes

Abstract

Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus–specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mutation calling from whole-genome sequencing.
Figure 2: Global analysis of mutations in coding and regulatory regions.
Figure 3: Effects of mutations on transcription factor binding sites.
Figure 4: Identification of repeatedly mutated regulatory regions.
Figure 5: Functional validation of identified mutated regions.

References

  1. Hoyert, D.L. & Xu, J. Deaths: preliminary data for 2011. Natl. Vital Stat. Rep. 61, 1–51 (2012).

    PubMed  Google Scholar 

  2. Howlander, N. et al. SEER Cancer Statistics Review, 1975–2010 http://seer.cancer.gov/csr/1975_2010/ (2013).

  3. Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3, 2650 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  4. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  5. Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 4, 2185 (2013).

    Article  PubMed  Google Scholar 

  7. Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).

    Article  CAS  PubMed  Google Scholar 

  9. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  10. Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473–477 (2010).

    Article  CAS  PubMed  Google Scholar 

  13. Ostrow, S.L., Barshir, R., DeGregori, J., Yeger-Lotem, E. & Hershberg, R. Cancer evolution is associated with pervasive positive selection on globally expressed genes. PLoS Genet. 10, e1004239 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Huber, R., Pietsch, D., Panterodt, T. & Brand, K. Regulation of C/EBPβ and resulting functions in cells of the monocytic lineage. Cell. Signal. 24, 1287–1296 (2012).

    Article  CAS  PubMed  Google Scholar 

  15. Miyazawa, K., Mori, A., Yamamoto, K. & Okudaira, H. Transcriptional roles of CCAAT/enhancer binding protein-β, nuclear factor–κB, and C-promoter binding factor 1 in interleukin (IL)-1β–induced IL-6 synthesis by human rheumatoid fibroblast-like synoviocytes. J. Biol. Chem. 13, 7620–7627 (1998).

    Article  Google Scholar 

  16. Zheng, R. & Blobel, G.A. GATA transcription factors and cancer. Genes Cancer 1, 1178–1188 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kannan, M.B., Solovieva, V. & Blank, V. The small MAF transcription factors MAFF, MAFG and MAFK: current knowledge and perspectives. Biochim. Biophys. Acta 1823, 1841–1846 (2012).

    Article  CAS  PubMed  Google Scholar 

  18. Shaulian, E. AP-1—the Jun proteins: oncogenes or tumor suppressors in disguise? Cell. Signal. 22, 894–899 (2010).

    Article  CAS  PubMed  Google Scholar 

  19. Schödel, J. et al. Common genetic variants at the 11q13.3 renal cancer susceptibility locus influence binding of HIF to an enhancer of cyclin D1 expression. Nat. Genet. 44, 420–425 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., B 57, 289–300 (1995).

    Google Scholar 

  21. Zhao, M., Sun, J. & Zhao, Z. TSGene: a web resource for tumor suppressor genes. Nucleic Acids Res. 41, D970–D976 (2013).

    Article  CAS  PubMed  Google Scholar 

  22. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Van Vlierberghe, P. & Ferrando, A. The molecular basis of T cell acute lymphoblastic leukemia. J. Clin. Invest. 122, 3398–3406 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Koboldt, D.C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

    Article  CAS  PubMed  Google Scholar 

  31. Kel, A.E. et al. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576–3579 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Karolchik, D. et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).

    Article  CAS  PubMed  Google Scholar 

  34. Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl. Acad. Sci. USA 107, 139–144 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Hong, Y. On computing the distribution function for the Poisson binomial distribution. Comput. Stat. Data Anal. 59, 41–51 (2013).

    Article  Google Scholar 

  36. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank members of the Snyder laboratory for critical reading of the manuscript. We would also like to thank A. Boyle for helpful discussion regarding the use of RegulomeDB. Finally, we would like to thank members of the ENCODE Consortium for their helpful scientific feedback during the course of this work. C.M. was supported by the Stanford Biomedical Informatics Training program and funds from the US National Institutes of Health (US NIH; 1K99CA191093). D.V.S. was supported by US NIH/National Human Genome Research Institute grant T32HG000044 and the Genentech Graduate Fellowship. This work was supported by funds to C.M. (US NIH, 1K99CA191093-01) and M.S. (US NIH, 5U54HG006996-04).

Author information

Authors and Affiliations

Authors

Contributions

J.A.R. and D.V.S. contributed to experimental design, execution and analysis for the experiments in Figures 5 and Supplementary Figure 6. C.M. contributed to all other figures. C.M. and M.S. conceived the experiments, analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Processing of mutations and generation of simulated controls.

On the left, a diagram illustrates the manner in which covariate-matched simulated mutations were obtained, filtered to remove potential false positives from mapping errors and split into experimental and validation subsets. The panels on the upper right shows the fraction of mutations in each RegulomeDB category that were filtered out owing to a high mismap score. Also depicted is a Venn diagram showing the number of mutations filtered out as potential false positives from mapping errors as well as the overlap of these mutations with difficult-to-align regions of the genome. These mutations are enriched in category 3b as well as in regions with no regulatory annotations (6 and 7). The panel on the middle right shows the breakdown of transcript annotations for real and simulated mutations in each RegulomeDB category. The panel on the bottom right shows the distributions of replication timing and base-pair composition for simulated and real mutations for each cancer type. The panel on the bottom left shows the similarity in the distributions of the number of mutations per sample for the experimental and validation subsets in each cancer type.

Supplementary Figure 2 Mutation calling quality metrics.

(a) The distribution of the variant allele fraction for each cancer type is shown via violin plots. (b) A scatter plot showing the relationship between genome sequencing file size and number of mutations called for that sample. (c) Box plots and overlaid points depict the median coverage of each sample grouped by cancer type. BRCA, breast invasive carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma.

Supplementary Figure 3 Similarity of sets of transcription factor–bound mutations.

For each pair of transcription factors shown in Figure 2g, the Jaccard similarity was computed on the basis of the overlap in the genomic positions mutated with RegulomeDB transcription factor annotations for the two factors. Factors were clustered on the basis of this similarity score, and the scores are plotted here as a heat map. The average enrichment score of real versus simulated mutations for all cancer types for each transcription factor is shown below the transcription factor labels on the x axis.

Supplementary Figure 4 Mutational patterns in transcription factor binding sites.

(a) An analysis was performed to identify all transcription factor motifs with an increased match score in mutant sites compared to reference sites. Only mutations in sites for the CEBP factors were used for this analysis. (b) The sequences surrounding the mutations were aligned using TTG(T/C) as the seed. This seed motif, the aligned reference and the aligned mutant sequences are shown as well as a histogram of the number and type of mutations at each position. (c) The most common sequences of eight bases in length contributing to the motif in b are shown. (d) The counts of mutations from these sites by patient are shown. One patient with UCEC has a disproportionate number of these mutant sites. (e) Box plots of RNA-seq expression values for samples with and without CEBP mutations are show for the factors matching CEBP motifs or motifs with a higher match score in a. (f) Seed, reference and variant alignments as well as mutation counts by position are shown for the factors from Figure 3f.

Supplementary Figure 5 Mutation probability fitting and model validation test.

(a) Logistic regression allows for the calculation of the probability of mutation conditioned on replication timing, base-pair composition, transcript type and patient ID. Box plots of predicted probabilities across all patients are shown for the various combinations of transcript region, base-pair type and replication timing bin. (b) The fraction of sites identified in the validation set that can be found in the experimental set and vice versa are plotted, showing the robustness of the method even with a small number of patient samples. (c) A box plot depicting the difference in log10 RNA-seq expression data for PLCXD1 in samples either with or without a mutation at chr. X: 197,480. P value was determined by the bootstrap method, as the data were not normally distributed.

Supplementary Figure 6 Screening for functional mutated regulatory elements.

Wild-type and mutant versions of four control regions and ten repeatedly mutated regulatory regions, including one of the TERT promoter mutations, were assayed for their ability to enhance the transcriptional activity of a minimal promoter using a luciferase assay. Constructs were assayed in NCI-H1437 lung adenocarcinoma cells.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6. (PDF 835 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melton, C., Reuter, J., Spacek, D. et al. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat Genet 47, 710–716 (2015). https://doi.org/10.1038/ng.3332

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3332

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer