Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture

Abstract

SNP-heritability is a fundamental quantity in the study of complex traits. Recent studies have shown that existing methods to estimate genome-wide SNP-heritability can yield biases when their assumptions are violated. While various approaches have been proposed to account for frequency- and linkage disequilibrium (LD)-dependent genetic architectures, it remains unclear which estimates reported in the literature are reliable. Here we show that genome-wide SNP-heritability can be accurately estimated from biobank-scale data irrespective of genetic architecture, without specifying a heritability model or partitioning SNPs by allele frequency and/or LD. We show analytically and through extensive simulations starting from real genotypes (UK Biobank, Nā€‰=ā€‰337ā€‰K) that, unlike existing methods, our closed-form estimator is robust across a wide range of architectures. We provide estimates of SNP-heritability for 22 complex traits in the UK Biobank and show that, consistent with our results in simulations, existing biobank-scale methods yield estimates up to 30% different from our theoretically-justified approach.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Simulations under 64 distinct MAF/LD-dependent architectures (Nā€‰=ā€‰337,205).
Fig. 2: Comparison of \(\hat h_{{\mathrm{GRE}}}^2\) with LDSC, S-LDSC (MAF) and SumHer in genome-wide simulations (Nā€‰=ā€‰337,205, Mā€‰=ā€‰593,300).
Fig. 3: Comparison of \(\hat h_{{\mathrm{GRE}}}^2\) with GREML, BOLT-REML, GREML-LDMS-I and LDAK in small-scale simulations (Nā€‰=ā€‰8,430, Mā€‰=ā€‰14,821 SNPs).
Fig. 4: Percentage difference of \(h_g^2\) estimates from LDSC (in-sample), S-LDSC (baseline-LD/in-sample) and SumHer (in-sample) with respect to \(\hat h_{{\mathrm{GRE}}}^2\) for 18 complex traits and diseases in the UK Biobank for which \(\hat h_{{\mathrm{GRE}}}^2 > 0.05\) (Nā€‰=ā€‰290,641 unrelated British individuals, Mā€‰=ā€‰459,792ļ»æ typed SNPs; Methods).

Similar content being viewed by others

Data availability

The baseline-LD annotations used in Fig. 4 are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/. All individual-level genotypes and phenotypes were obtained from the UK Biobank (https://www.ukbiobank.ac.uk); we do not have permission to release this data. The 1000 Genomes Phase 3 reference panel can be downloaded at http://www.internationalgenome.org/data.

Code availability

Open source code implementing the GRE estimator and our simulation framework is available on Github at https://github.com/bogdanlab/h2-GRE.

References

  1. Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics eraā€”concepts and misconceptions. Nat. Rev. Genet. 9, 255ā€“266 (2008).

    ArticleĀ  CASĀ  Google ScholarĀ 

  2. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507ā€“515 (2013).

    ArticleĀ  CASĀ  Google ScholarĀ 

  3. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565ā€“569 (2010).

    ArticleĀ  CASĀ  Google ScholarĀ 

  4. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7ā€“24 (2012).

    ArticleĀ  CASĀ  Google ScholarĀ 

  5. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5ā€“22 (2017).

    ArticleĀ  CASĀ  Google ScholarĀ 

  6. Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011ā€“1021 (2012).

    ArticleĀ  CASĀ  Google ScholarĀ 

  7. Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114ā€“1120 (2015).

    ArticleĀ  CASĀ  Google ScholarĀ 

  8. Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385ā€“1392 (2015).

    ArticleĀ  CASĀ  Google ScholarĀ 

  9. Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986ā€“992 (2017).

    ArticleĀ  CASĀ  Google ScholarĀ 

  10. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203ā€“209 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  11. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291ā€“295 (2015).

    ArticleĀ  CASĀ  Google ScholarĀ 

  12. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228ā€“1235 (2015).

    ArticleĀ  CASĀ  Google ScholarĀ 

  13. Gazal, S. et al. Linkage disequilibriumā€“dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421ā€“1427 (2017).

    ArticleĀ  CASĀ  Google ScholarĀ 

  14. Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277ā€“284 (2018).

    ArticleĀ  Google ScholarĀ 

  15. Haseman, J. K. & Elston, R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3ā€“19 (1972).

    ArticleĀ  CASĀ  Google ScholarĀ 

  16. Wu, Y. & Sankararaman, S. A scalable estimator of SNP heritability for biobank-scale data. Bioinformatics 34, i187ā€“i194 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  17. Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110ā€“124 (2017).

    ArticleĀ  Google ScholarĀ 

  18. Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737ā€“745 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  19. Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK models and functional enrichment estimates. Preprint at bioRxiv https://doi.org/10.1101/256412 (2018).

  20. Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752ā€“1756 (2010).

    ArticleĀ  CASĀ  Google ScholarĀ 

  21. Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).

    ArticleĀ  Google ScholarĀ 

  22. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    ArticleĀ  Google ScholarĀ 

  23. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746ā€“753 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  24. Oā€™Connor, L. J. et al. Polygenicity of complex traits is explained by negative selection. Preprint at bioRxiv https://doi.org/10.1101/420497 (2018).

  25. Uricchio, L. H., Kitano, H. C., Gusev, A. & Zaitlen, N. A. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evol. Lett. 3, 69ā€“79 (2019).

    ArticleĀ  Google ScholarĀ 

  26. Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318ā€“1326 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  27. Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906ā€“908 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  28. Gamazon, E. R., Cox, N. J. & Davis, L. K. Structural architecture of SNP effects on complex traits. Am. J. Hum. Genet. 95, 477ā€“489 (2014).

    ArticleĀ  CASĀ  Google ScholarĀ 

  29. Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary associationdata. Am. J. Hum. Genet. 99, 139ā€“153 (2016).

    ArticleĀ  CASĀ  Google ScholarĀ 

  30. Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600ā€“1607 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  31. Consortium, T. 1000 G. P. et al. A global reference for human genetic variation. Nature 526, 68ā€“74 (2015).

  32. Ledoit, O. & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365ā€“411 (2004).

    ArticleĀ  Google ScholarĀ 

  33. Nagai, A. et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27, S2ā€“S8 (2017).

    ArticleĀ  Google ScholarĀ 

  34. Leitsalu, L. et al. Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137ā€“1147 (2015).

    ArticleĀ  Google ScholarĀ 

  35. Gaziano, J. M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214ā€“223 (2016).

    ArticleĀ  Google ScholarĀ 

  36. Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117ā€“127 (2016).

    ArticleĀ  Google ScholarĀ 

  37. Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B. & Eskin, E. Identification of causal genes for complex traits. Bioinformatics 31, i206ā€“i213 (2015).

    ArticleĀ  CASĀ  Google ScholarĀ 

  38. Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737ā€“751 (2017).

    ArticleĀ  CASĀ  Google ScholarĀ 

  39. Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948ā€“954 (2018).

    ArticleĀ  Google ScholarĀ 

  40. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272ā€“E5281 (2014).

    ArticleĀ  CASĀ  Google ScholarĀ 

  41. Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89ā€“99 (2018).

    ArticleĀ  CASĀ  Google ScholarĀ 

  42. Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247ā€“250 (2012).

    ArticleĀ  CASĀ  Google ScholarĀ 

  43. Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151ā€“1155 (2013).

    ArticleĀ  CASĀ  Google ScholarĀ 

  44. Elman, R. S., Karpenko, N. & Merkurjev, A. The Algebraic and Geometric Theory of Quadratic Forms Vol. 56 (American Mathematical Society, 2008).

  45. Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294ā€“305 (2011).

    ArticleĀ  Google ScholarĀ 

  46. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76ā€“82 (2011).

    ArticleĀ  CASĀ  Google ScholarĀ 

  47. Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage Analyses. Am. J. Hum. Genet. 81, 559ā€“575 (2007).

    ArticleĀ  CASĀ  Google ScholarĀ 

Download references

Acknowledgements

This research was conducted using the UK Biobank Resource under applications 33297 and 33127. We thank the participants of UK Biobank for making this work possible. We also thank R. Johnson, M. Freund, M. Major, S. Gazal, A. Price and D. Balding for helpful discussions. This work was funded by the National Institutes of Health (NIH) under awards R01HG009120, R01MH115676, R01HG006399, U01CA194393, R35GM125055, T32NS048004, T32MH073526 and T32HG002536 and the National Science Foundation (NSF) under award III-1705121.

Author information

Authors and Affiliations

Authors

Contributions

K.H., K.S.B., H.S. and B.P. conceived and designed the experiments. K.H. and K.S.B. performed the experiments and statistical analyses. A.M., H.S., N.M. and S.S. provided statistical support. K.H., K.S.B. and Y.W. collected and managed the data. K.S.B. and B.P. wrote the manuscript with the participation of all authors.

Corresponding authors

Correspondence to Kathryn S. Burch or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisherā€™s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes and Supplementary Figs. 1ā€“22

Reporting Summary

Supplementary Tables

Supplementary Tables 1ā€“26

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, K., Burch, K.S., Majumdar, A. et al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat Genet 51, 1244ā€“1251 (2019). https://doi.org/10.1038/s41588-019-0465-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-019-0465-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter ā€” what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing