Abstract
Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the high dimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of trans-hub-eQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These findings may provide insight into biological processes associated with cancers and generate hypotheses for future studies.
Similar content being viewed by others
Notes
gdac.broadinstitute.org_COADREAD.Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data.Level_3.2012091300.0.0.tar.gz.
gdac.broadinstitute.org_COADREAD.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_2__birdseed_genotype__birdseed.Level_2.2012051500.0.0.tar.gz.
gdac.broadinstitute.org_BRCA.Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3__gene_expression__data.Level_3.2012091300.0.0.tar.gz.
gdac.broadinstitute.org_BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_2__birdseed_genotype__birdseed.Level_2.2012082500.0.0.tar.gz.
References
Ali, Mohamed DA (2010) Identification of novel epigenetic biomarkers in colorectal cancer, gldc and ppp1r14a. Thesis of master’s degree, Department of Molecular Biosciences, University of Oslo
Bartkova J, Hořejší Z, Koed K, Krämer A, Tort F, Zieger K, Guldberg P, Sehested M et al. (2005) DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature 434:864–870
Bartkova J, Rezaei N, Liontos M, Karakaidos P, Kletsas D, Issaeva N, Vassiliou LV, Kolettas E et al. (2006) Oncogene-induced senescence is part of the tumorigenesis barrier imposed by DNA damage checkpoints. Nature 444:633–637
Beillerot A et al. (2012) Protection of CDC25 phosphatases against oxidative stress in breast cancer cells: evaluation of the implication of the thioredoxin system. In: Free Radic Res, 2012 May
Couch FJ et al. (2013) Genome-wide association study in BRCA1 mutation carriers identifies novel loci associated with breast and ovarian cancer risk. In: PLoS Genet
Di Micco R, Fumagalli M, Cicalese A, Piccinin S, Gasparini P, Luise C, Schurra C, Garre’ M et al. (2006) Oncogene-induced senescence is a DNA damage response triggered by DNA hyper-replication. Nature 444:638–642
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35:109–148
Gorgoulis VG, Vassiliou LV, Karakaidos P, Zacharatos P, Kotsinas A, Liloglou T, Venere M, Ditullio RA et al. (2005) Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 434:907–913
Haiman CA, Chen GK, Vachon CM, Canzian F, Dunning A, Millikan RC, Wang X, Ademuyiwa F, Ahmed S, Ambrosone CB et al. (2011) A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat Genet 43(12):1210–1214. doi:10.1038/ng.985
Huang J, Ma S, Xie H, Zhang C (2009) A group bridge approach for variable selection. Biometrika 96(2):339–355
Li B, Chun H, Zhao H (2012) Sparse estimation of conditional graphical models with application to gene networks. J Am Stat Assoc 107(497):152–167
Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88(3):283–293
Chen LS, Hutter CM, Potter JD, Liu Y, Prentice RL, Peters U, Hsu L (2010) Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am J Hum Genet 86(6):860–871
Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Macgregor S (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87(1):139–145
Lutz R, Bühlmann P (2006) Boosting for high-multivariate responses in high-dimensional linear regression. Stat Sin 16:471–494
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430(7001):743–747
Muraki K et al. (2013) The role of ATM in the deficiency in nonhomologous end-joining near telomeres in a human cancer cell line. In: PLoS Genet, March 2013
Neale BM, Sham PC (2004) The future of association studies: gene-based analysis and replication. Am J Hum Genet 75:353–362
Obozinski G, Wainwright MJ, Jordan MI (2011) Union support recovery in high-dimensional multivariate regression. Ann Stat 39(1):1–47
Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 4(1):53–77
Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, Berndt SI, Bézieau S, Brenner H, Butterbach K, Caan BJ, Campbell PT, Carlson CS, Casey G, Chan AT, Chang-Claude J, Chanock SJ, Chen LS, Coetzee GA, Coetzee SG, Conti DV, Curtis KR, Duggan D, Edwards T, Fuchs CS, Gallinger S, Giovannucci EL, Gogarten SM, Gruber SB, Haile RW, Harrison TA, Hayes RB, Henderson BE, Hoffmeister M, Hopper JL, Hudson TJ, Hunter DJ, Jackson RD, Jee SH, Jenkins MA, Jia WH, Kolonel LN, Kooperberg C, Küry S, Lacroix AZ, Laurie CC, Laurie CA, Le Marchand L, Lemire M, Levine D, Lindor NM, Liu Y, Ma J, Makar KW, Matsuo K, Newcomb PA, Potter JD, Prentice RL, Qu C, Rohan T, Rosse SA, Schoen RE, Seminara D, Shrubsole M, Shu XO, Slattery ML, Taverna D, Thibodeau SN, Ulrich CM, White E, Xiang Y, Zanke BW, Zeng YX, Zhang B, Zheng W, Hsu L (Colon Cancer Family Registry and the Genetics and Epidemiology of Colorectal Cancer Consortium) (2013) Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology 144(4):799–807
Rothman A, Levina L, Zhu J (2010) Sparse multivariate regression with covariance estimation. J Comput Graph Stat 19:947–962
Turlach B, Venables W, Wright S (2005) Simultaneous variable selection. Technometrics 47:349–363
Yin J, Li H (2011) A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann Appl Stat 5(4):2630–2650
Yuan M, Ekici A, Lu Z, Monteiro R (2007) Dimension reduction and coefficient estimation in multivariate linear regression. J R Stat Soc B 69(3):329–346
Acknowledgements
This work is supported by NIH grants R01CA138215 (XW, PW), SUB-CA160034 (XW, YZ, PW), R01GM082802 (PW), P01CA53996 (LH, PW), R01AG014358 (LH), P50CA138293 (LH, PW), and U24CA086368 (PW).
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, X., Qin, L., Zhang, H. et al. A Regularized Multivariate Regression Approach for eQTL Analysis. Stat Biosci 7, 129–146 (2015). https://doi.org/10.1007/s12561-013-9106-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-013-9106-9