Background
In complex disease studies, genome-wide association study (GWAS) has been successfully used for identifying associated genetic risk loci. In fact, only a small fraction of the apparent heritability can be explained by common variants. Because research efforts have largely focused on common genetic variants, the missing heritability could be mostly due to rare genetic variants. Substantial research efforts have been devoted to developing software for genotype imputation and designing variant binning strategies and statistical methods for rare variant association testing of datasets on GWAS chips. However, few systematic pipelines have been proposed to identify rare disease-related genes.
Results
We present EGRVA, an Effective Gene-based Rare Variant Association analysis pipeline for genotype imputation, quality control, gene-based functional annotation, statistical analysis, and bioinformatics analysis of identified genes. As a complementary pipeline for rare variant analysis on GWAS chips, EGRVA is relatively straightforward and cost-efficient. Furthermore, we tested the EGRVA pipeline with the preterm birth (PTB) dataset from the GPN-PBR. We focused on the 6 genes identified by EGRVA: FLG, HRNR, PMS1, ATM, OR2AG1 and SLC22A25. We also explored the underlying biological interpretation of these potentially significant genes.
Conclusions
As a complementary pipeline for rare variant analysis on GWAS chips, EGRVA is relatively straightforward and costefficient. The application of the pipeline will contribute to the support of rare variants to explain the missing heritability by effectively discovering genes related to disease.