Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Haplotype-resolved genome sequencing of a Gujarati Indian individual

An Erratum to this article was published on 06 May 2011

This article has been updated

Abstract

Haplotype information is essential to the complete description and interpretation of genomes1, genetic diversity2 and genetic ancestry3. Although individual human genome sequencing is increasingly routine4, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing5 with the contiguity information provided by large-insert cloning6 to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing 3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions7,8 to specific locations and haplotypes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Haplotype-resolved genome sequencing.
Figure 2: Haplotype assembly results.
Figure 3: Enrichment of novel variants on 'GIH-like' haplotypes.
Figure 4: Insertion anchoring and structural variation detection.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

Change history

  • 12 April 2011

    In the version of this supplementary file originally posted online, Supplementary Figure 4a was not properly drawn. The error has been corrected in this file as of 12 April 2011.

References

  1. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  2. International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  3. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Anonymous. Human genome: Genomes by the thousand. Nature 467, 1026–1027 (2010).

  5. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).

    Article  CAS  PubMed  Google Scholar 

  6. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

    Article  CAS  PubMed  Google Scholar 

  8. Kidd, J.M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods 7, 365–371 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  10. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  15. Reich, D., Thangaraj, K., Patterson, N., Price, A.L. & Singh, L. Reconstructing Indian population history. Nature 461, 489–494 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high density in vitro transposition. Genome Biol. 11, R119 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

    Article  PubMed  Google Scholar 

  20. Kim, J.H., Waterman, M.S. & Li, L.M. Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Res. 17, 1101–1110 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bansal, V., Halpern, A.L., Axelrod, N. & Bafna, V. An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res. 18, 1336–1346 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26, i350–i357 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zody, M.C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat. Genet. 40, 1076–1083 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).

    Article  CAS  PubMed  Google Scholar 

  27. Drysdale, C.M. et al. Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl. Acad. Sci. USA 97, 10483–10488 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods 7, 299–301 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Tycko, B. Allele-specific DNA methylation: beyond imprinting. Hum. Mol. Genet. 19, R210–R220 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Raymond, C.K. et al. Targeted, haplotype-resolved resequencing of long segments of the human genome. Genomics 86, 759–766 (2005).

    Article  CAS  PubMed  Google Scholar 

  32. Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank C. Lee and M. Malig for technical assistance, J. Akey, T. O'Connor and P. Green for helpful discussions, D. Reich for ancestry information on NA20847, the U.W. Genome Sciences Genomics Resource Center (GS-GRC) for sequencing and the 1000 Genomes Project for early data release. This work was supported by National Institutes of Health grants AG039173 (J.B.H.) and HG002385 (E.E.E.), a National Science Foundation Graduate Research Fellowship (J.O.K.), a Natural Sciences and Engineering Research Council of Canada Fellowship (P.H.S.) and a fellowship from the Achievement Rewards for College Scientists Foundation (J.B.H.). E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

The project was conceived and experiments planned by J.O.K., E.E.E. and J.S. J.O.K., A.P.M. and R.Q. carried out all experiments. J.O.K., A.A., J.B.H., R.P.P., P.H.S., S.B.N. and C.A. performed data analysis. J.O.K., A.P.M., A.A., J.B.H., R.P.P. and J.S. wrote the manuscript, and all authors reviewed it. All aspects of the study were supervised by J.S.

Corresponding authors

Correspondence to Jacob O Kitzman or Jay Shendure.

Ethics declarations

Competing interests

J.S. is a member of the science advisory boards of Tandem Technologies, Stratos Genomics, Good Start Genetics and Adaptive TCR. E.E.E. is on the scientific advisory board for Pacific Biosciences.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–3,5, Supplementary Methods and Supplementary Figs. 1–7 (PDF 1756 kb)

Supplementary Table 4

Pan-genome and novel sequence anchoring. (XLS 1322 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitzman, J., MacKenzie, A., Adey, A. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 29, 59–63 (2011). https://doi.org/10.1038/nbt.1740

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1740

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research