Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Conserved noncoding sequences are selectively constrained and not mutation cold spots

Abstract

Noncoding genetic variants are likely to influence human biology and disease, but recognizing functional noncoding variants is difficult. Approximately 3% of noncoding sequence is conserved among distantly related mammals1,2,3,4, suggesting that these evolutionarily conserved noncoding regions (CNCs) are selectively constrained and contain functional variation. However, CNCs could also merely represent regions with lower local mutation rates. Here we address this issue and show that CNCs are selectively constrained in humans by analyzing HapMap genotype data. Specifically, new (derived) alleles of SNPs within CNCs are rarer than new alleles in nonconserved regions (P = 3 × 10−18), indicating that evolutionary pressure has suppressed CNC-derived allele frequencies. Intronic CNCs and CNCs near genes show greater allele frequency shifts, with magnitudes comparable to those for missense variants. Thus, conserved noncoding variants are more likely to be functional. Allele frequency distributions highlight selectively constrained genomic regions that should be intensively surveyed for functionally important variation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: DAFs (derived allele frequencies) are lower for SNPs within CNCs.
Figure 2: Fraction of evenly ascertained HapMap SNPs with DAF ≤ 0.1 in the YRI HapMap samples within and outside of CNCs and for selected functional classes.

Similar content being viewed by others

References

  1. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  2. Thomas, J.W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003).

    Article  CAS  Google Scholar 

  3. Dermitzakis, E.T. et al. Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science 302, 1033–1035 (2003).

    Article  CAS  Google Scholar 

  4. Dermitzakis, E.T., Reymond, A. & Antonarakis, S.E. Conserved non-genic sequences – an unexpected feature of mammalian genomes. Nat. Rev. Genet. 6, 151–157 (2005).

    Article  CAS  Google Scholar 

  5. Loots, G.G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000).

    Article  CAS  Google Scholar 

  6. Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005).

    Article  Google Scholar 

  7. Frazer, K.A. et al. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14, 367–372 (2004).

    Article  CAS  Google Scholar 

  8. Nobrega, M.A., Zhu, Y., Plajzer-Frick, I., Afzal, V. & Rubin, E.M. Megabase deletions of gene deserts result in viable mice. Nature 431, 988–993 (2004).

    Article  CAS  Google Scholar 

  9. The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

  10. Kryukov, G.V., Schmidt, S. & Sunyaev, S. Small fitness effect of mutations in highly conserved non-coding regions. Hum. Mol. Genet. 14, 2221–2229 (2005).

    Article  CAS  Google Scholar 

  11. Fay, J.C., Wyckoff, G.J. & Wu, C.I. Positive and negative selection on the human genome. Genetics 158, 1227–1234 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  13. Maruyama, T. & Fuerst, P.A. Population bottlenecks and nonequilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 111, 675–689 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Marth, G.T., Czabarka, E., Murvai, J. & Sherry, S.T. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351–372 (2004).

    Article  CAS  Google Scholar 

  15. Keightley, P.D., Kryukov, G.V., Sunyaev, S., Halligan, D.L. & Gaffney, D.J. Evolutionary constraints in conserved nongenic sequences of mammals. Genome Res. 15, 1373–1378 (2005).

    Article  CAS  Google Scholar 

  16. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 (1999).

    Article  CAS  Google Scholar 

  17. Charlesworth, B., Morgan, M.T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Dermitzakis, E.T. et al. Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res. 14, 852–859 (2004).

    Article  CAS  Google Scholar 

  19. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

  20. Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004).

    Article  CAS  Google Scholar 

  21. Margulies, E.H., Blanchette, M., Haussler, D. & Green, E.D. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003).

    Article  CAS  Google Scholar 

  22. Keightley, P.D., Lercher, M.J. & Eyre-Walker, A. Evidence for widespread degradation of gene control regions in hominid genomes. PLoS Biol. 3, e42 (2005).

    Article  Google Scholar 

  23. Hirakawa, M. et al. JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res. 30, 158–162 (2002).

    Article  CAS  Google Scholar 

  24. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33 (Suppl.), 228–237 (2003).

    Article  CAS  Google Scholar 

  25. Beysen, D. et al. Deletions involving long-range conserved nongenic sequences upstream and downstream of FOXL2 as a novel disease-causing mechanism in blepharophimosis syndrome. Am. J. Hum. Genet. 77, 205–218 (2005).

    Article  CAS  Google Scholar 

  26. Emison, E.S. et al. A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005).

    Article  CAS  Google Scholar 

  27. Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    Article  CAS  Google Scholar 

  28. King, D.C. et al. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15, 1051–1060 (2005).

    Article  CAS  Google Scholar 

  29. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  Google Scholar 

  30. Zar, J.H. Biostatistical Analysis 4th edn. (Prentice Hall, Upper Saddle River, New Jersey, 1999).

    Google Scholar 

Download references

Acknowledgements

The authors wish to thank the International HapMap investigators, T.S. Mikkelsen for assistance determining derived alleles from chimp sequence comparisons, T. Bersaglieri for assistance genotyping SNPs, G. Rockwell for assistance implementing several statistical methods and analysis algorithms, A. Langane for DNA samples and M. Gagnebin and C. Rossier for sequencing. J.N.H. is a recipient of a Burroughs Wellcome Career Award in Biomedical Sciences, which supported this work. E.T.D. is supported by the Wellcome Trust and NIH. D.J.T. was supported by grants from the National Human Genome Research Institute. S.E.A. is supported by the Swiss National Science Foundation, NIH and EU. C.N.C. is supported by the GlaxoSmithKline Competitive Grants Award Program for Young Investigators and by a National Heart, Lung, and Blood Institute Mentored Patient Oriented Research Career Development Award.

AUTHORS' CONTRIBUTIONS

J.A.D., C.B., J.N. and D.J.T. contributed equally to this manuscript, and E.T.D. and J.N.H. co-directed this project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Emmanouil T Dermitzakis or Joel N Hirschhorn.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

Comparisons of DAF distributions, with significance levels and percentages of SNPs in DAF frequency bins. (PDF 112 kb)

Supplementary Table 2

Positions and derived allele frequencies of SNPs from dbSNP located in ultraconserved elements. (PDF 45 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drake, J., Bird, C., Nemesh, J. et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet 38, 223–227 (2006). https://doi.org/10.1038/ng1710

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1710

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing