Quantitative Estimates of Sequence Divergence for Comparative Analyses of Mammalian Genomes

  1. Gregory M. Cooper1,
  2. Michael Brudno2,
  3. NISC Comparative Sequencing Program3,
  4. Eric D. Green3,
  5. Serafim Batzoglou2, and
  6. Arend Sidow1,4,5
  1. 1Department of Genetics, Stanford University, Stanford, California 94305, USA; 2Department of Computer Science, Stanford University, Stanford, California 94305, USA; 3Genome Technology Branch and National Institutes of Health Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA; 4Department of Pathology, Stanford University, Stanford, California 94305, USA

Abstract

Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of evolutionary constraint at the single nucleotide level. High-resolution quantification would be informative for determining the distribution of important positions within functional elements and for evaluating the relative importance of nucleotide sites that carry single nucleotide polymorphisms (SNPs). Because the level of resolution in comparative sequence analyses is a direct function of sequence diversity, we propose that the information content of a candidate mammalian genome be defined as the sequence divergence it would add relative to already-sequenced genomes. We show that reliable estimates of genomic sequence divergence can be obtained from small genomic regions. On the basis of a multiple sequence alignment of ∼1.4 megabases each from eight mammals, we generate such estimates for five unsequenced mammals. Estimates of the neutral divergence in these data suggest that a small number of diverse mammalian genomes in addition to human, mouse, and rat would allow single nucleotide resolution in comparative sequence analyses.

[The multiple sequence alignment of theCFTR region and a spreadsheet with the calculations performed, will be available as supplementary information online atwww.genome.org.]

Footnotes

  • 5 Corresponding author.

  • E-MAIL arend{at}stanford.edu; FAX (650) 725-4905.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1064503.

    • Received December 2, 2002.
    • Accepted March 3, 2003.
| Table of Contents

Preprint Server