Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Discovery and genotyping of genome structural polymorphism by sequencing on a population scale

Abstract

Accurate and complete analysis of genome variation in large populations will be required to understand the role of genome variation in complex disease. We present an analytical framework for characterizing genome deletion polymorphism in populations using sequence data that are distributed across hundreds or thousands of genomes. Our approach uses population-level concepts to reinterpret the technical features of sequence data that often reflect structural variation. In the 1000 Genomes Project pilot, this approach identified deletion polymorphism across 168 genomes (sequenced at 4× average coverage) with sensitivity and specificity unmatched by other algorithms. We also describe a way to determine the allelic state or genotype of each deletion polymorphism in each genome; the 1000 Genomes Project used this approach to type 13,826 deletion polymorphisms (48–995,664 bp) at high accuracy in populations. These methods offer a way to relate genome structural polymorphism to complex disease in populations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A population-aware analytical framework for analyzing Genome STRucture in Populations (Genome STRiP).
Figure 2: Identifying coherent sets of aberrantly mapping reads from a population of genomes.
Figure 3: Evaluating the population-heterogeneity and allele-substitution properties of population-scale sequence data.
Figure 4: Deletion polymorphisms identified by Genome STRiP in low-coverage sequence data from 168 genomes.
Figure 5: Determining the allelic state (genotype) of 13,826 deletions in 156 genomes.

Similar content being viewed by others

References

  1. 1000 Genomes Project Consortium et al. A map of human genome variation from population scale sequencing. Nature 467, 1061–1073 (2010).

  2. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  Google Scholar 

  3. Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).

    Article  CAS  Google Scholar 

  4. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  5. Korbel, J.O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).

    Article  Google Scholar 

  6. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).

    Article  CAS  Google Scholar 

  7. Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).

    Article  CAS  Google Scholar 

  8. Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).

    Article  CAS  Google Scholar 

  9. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).

    Article  CAS  Google Scholar 

  10. Mills, R.E. et al. Mapping copy number variation by population scale sequencing. Nature published online, doi:1:10.1038/nature09708 (3 February 2011).

  11. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    Article  CAS  Google Scholar 

  12. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    Article  CAS  Google Scholar 

  13. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  CAS  Google Scholar 

  14. Iskow, R.C. et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 141, 1253–1261 (2010).

    Article  CAS  Google Scholar 

  15. Huang, C.R. et al. Mobile interspersed repeats are major structural variants in the human genome. Cell 141, 1171–1182 (2010).

    Article  CAS  Google Scholar 

  16. Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

    Article  CAS  Google Scholar 

  17. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  18. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    Article  CAS  Google Scholar 

  19. Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009).

    Article  CAS  Google Scholar 

  20. Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

    Article  CAS  Google Scholar 

  21. Coin, L.J. et al. cnvHap: an integrative population and haplotype-based multiplatform model of SNPs and CNVs. Nat. Methods 7, 541–546 (2010).

    Article  CAS  Google Scholar 

  22. International HapMap3 Consortium et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  23. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  24. McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).

    Article  CAS  Google Scholar 

  25. Willer, C.J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).

    Article  CAS  Google Scholar 

  26. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the 1000 Genomes Structural Variation Analysis Group for helpful conversations throughout this work and for collaborative work to evaluate the sensitivity and specificity of structural variation discovery algorithms. We would particularly like to acknowledge K. Chen for creation of a high-quality breakpoint library for the 1000 Genomes Project, on which Genome STRiP's genotyping algorithm drew, and R. Mills for managing the 1000 Genomes Project deletion discovery sets and validation data. We also thank C. Stewart, K. Walter, M. Hurles and N. Patterson for helpful conversations during the course of this work; D. Altshuler, M. DePristo and M. Daly for helpful comments on the manuscript and figures; and the anonymous reviewers of this manuscript, whose feedback improved it. This work was supported by the National Human Genome Research Institute (U01HG005208-01S1) and by startup funds from the Department of Genetics at Harvard Medical School.

Author information

Authors and Affiliations

Authors

Contributions

R.E.H., J.M.K., J.N. and S.A.M. conceived the analytical approaches. R.E.H. implemented the algorithms and performed the data analysis. R.E.H. and S.A.M. wrote the manuscript.

Corresponding author

Correspondence to Steven A McCarroll.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Table 1 and Supplementary Note. (PDF 806 kb)

Supplementary Table 2

Evaluation of genotype likelihood calibration (XLSX 53 kb)

Supplementary Table 3

tagSNPs identified by Genome STRiP for deletions from the 1000 Genomes Project (XLSX 3631 kb)

Supplementary Table 4

Phenotype associated SNPs in linkage disequilibrium with 1000 Genomes pilot deletions (XLSX 92 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Handsaker, R., Korn, J., Nemesh, J. et al. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 43, 269–276 (2011). https://doi.org/10.1038/ng.768

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.768

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing