Skip to main content

Supporting data for "Iterative Hard Thresholding in GWAS: Generalized Linear Models, Prior Weights, and Double Sparsity"

Dataset type: Genomic, Software
Data released on April 08, 2020

Chu BB; Keys KL; German CA; Zhou H; Zhou JJ; Sobel EM; Sinsheimer JS; Lange K (2020): Supporting data for "Iterative Hard Thresholding in GWAS: Generalized Linear Models, Prior Weights, and Double Sparsity" GigaScience Database. https://doi.org/10.5524/100722

DOI10.5524/100722

Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.
We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models (GLMs), prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing, and exhibits a 2 to 3 orders of magnitude decrease in false positive rates compared to lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies.
Our real data analysis and simulation studies suggest that IHT can (a) recover highly correlated predictors, (b) avoid over-fitting, (c) deliver better true positive and false positive rates than either marginal testing or lasso regression, (d) recover unbiased regression coefficients, (e) exploit prior information and group-sparsity and (f) be used with biobank sized data sets. Although these advances are studied for GWAS inference, our extensions are pertinent to other regression problems with large numbers of predictors.

Additional details

Read the peer-reviewed publication(s):

  • Chu, B. B., Keys, K. L., German, C. A., Zhou, H., Zhou, J. J., Sobel, E. M., Sinsheimer, J. S., & Lange, K. (2020). Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity. GigaScience, 9(6). https://doi.org/10.1093/gigascience/giaa044 (PubMed:32491161)

Github links:

https://github.com/OpenMendel/MendelIHT.jl

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
Archival copy of the GitHub repository https://github.com/OpenMendel/MendelIHT.jl downloaded 5-Mar-2020. MendelIHT. This project is licensed under the MIT License. Please refer to the GitHub repo for most recent updates. GitHub archive archive 34.58 MB 2020-03-24 license: MIT
MD5 checksum: a43e1bb8de133548b5b7af9cc6903d92
instructions to users on how to download data from UK Biobank Text TEXT 824 B 2020-03-24 MD5 checksum: a277933a2f2c68e044adfb4a374c73cf
instructions to users on how to download Stampeed(NFBC1966) data from dbGaP Text TEXT 917 B 2020-03-24 MD5 checksum: 5a81b79676e1cd5c59b61c3bad9a50f7
Readme TEXT 2.91 kB 2020-04-08 MD5 checksum: 33e6d4123b06f01e24e4db7c2717fff0
Funding body Awardee Award ID Comments
National Institute of Health B Chu NIH T32-HG002536
Google B Chu 2018 Google Summer of Code
National Heart and Lung Institute K Keys R01HL135156
Gordon and Betty More Foundation K Keys GBMF3834
Alfred P. Sloan Foundation K Keys 2013-10-27
National Human Genome Research Institute K Lange HG006139
National Human Genome Research Institute H Zhoa HG006139
National Institute of General Medical Sciences K Lange GM053275
National Institute of General Medical Sciences H Zhoa GM053275
National Institute of General Medical Sciences J Sinsheimer GM053275
National Human Genome Research Institute J Sinsheimer HG009120
National Science Foundation J Sinsheimer DMS1264153
Burroughs Wellcome Fund C German Inter-school Training Program in Chronic Diseases (BWF-CHIP)
Date Action
April 8, 2020 Dataset publish
April 24, 2020 Manuscript Link added : 10.1093/gigascience/giaa044
October 7, 2022 Manuscript Link updated : 10.1093/gigascience/giaa044