Skip to main content

Advertisement

Log in

A Regularized Multivariate Regression Approach for eQTL Analysis

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the high dimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of trans-hub-eQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These findings may provide insight into biological processes associated with cancers and generate hypotheses for future studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. gdac.broadinstitute.org_COADREAD.Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data.Level_3.2012091300.0.0.tar.gz.

  2. gdac.broadinstitute.org_COADREAD.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_2__birdseed_genotype__birdseed.Level_2.2012051500.0.0.tar.gz.

  3. gdac.broadinstitute.org_BRCA.Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3__gene_expression__data.Level_3.2012091300.0.0.tar.gz.

  4. gdac.broadinstitute.org_BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_2__birdseed_genotype__birdseed.Level_2.2012082500.0.0.tar.gz.

References

  1. Ali, Mohamed DA (2010) Identification of novel epigenetic biomarkers in colorectal cancer, gldc and ppp1r14a. Thesis of master’s degree, Department of Molecular Biosciences, University of Oslo

  2. Bartkova J, Hořejší Z, Koed K, Krämer A, Tort F, Zieger K, Guldberg P, Sehested M et al. (2005) DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature 434:864–870

    Article  Google Scholar 

  3. Bartkova J, Rezaei N, Liontos M, Karakaidos P, Kletsas D, Issaeva N, Vassiliou LV, Kolettas E et al. (2006) Oncogene-induced senescence is part of the tumorigenesis barrier imposed by DNA damage checkpoints. Nature 444:633–637

    Article  Google Scholar 

  4. Beillerot A et al. (2012) Protection of CDC25 phosphatases against oxidative stress in breast cancer cells: evaluation of the implication of the thioredoxin system. In: Free Radic Res, 2012 May

    Google Scholar 

  5. Couch FJ et al. (2013) Genome-wide association study in BRCA1 mutation carriers identifies novel loci associated with breast and ovarian cancer risk. In: PLoS Genet

    Google Scholar 

  6. Di Micco R, Fumagalli M, Cicalese A, Piccinin S, Gasparini P, Luise C, Schurra C, Garre’ M et al. (2006) Oncogene-induced senescence is a DNA damage response triggered by DNA hyper-replication. Nature 444:638–642

    Article  Google Scholar 

  7. Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35:109–148

    Article  MATH  Google Scholar 

  8. Gorgoulis VG, Vassiliou LV, Karakaidos P, Zacharatos P, Kotsinas A, Liloglou T, Venere M, Ditullio RA et al. (2005) Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 434:907–913

    Article  Google Scholar 

  9. Haiman CA, Chen GK, Vachon CM, Canzian F, Dunning A, Millikan RC, Wang X, Ademuyiwa F, Ahmed S, Ambrosone CB et al. (2011) A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat Genet 43(12):1210–1214. doi:10.1038/ng.985

    Article  Google Scholar 

  10. Huang J, Ma S, Xie H, Zhang C (2009) A group bridge approach for variable selection. Biometrika 96(2):339–355

    Article  MATH  MathSciNet  Google Scholar 

  11. Li B, Chun H, Zhao H (2012) Sparse estimation of conditional graphical models with application to gene networks. J Am Stat Assoc 107(497):152–167

    Article  MATH  MathSciNet  Google Scholar 

  12. Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88(3):283–293

    Article  Google Scholar 

  13. Chen LS, Hutter CM, Potter JD, Liu Y, Prentice RL, Peters U, Hsu L (2010) Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am J Hum Genet 86(6):860–871

    Article  Google Scholar 

  14. Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Macgregor S (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87(1):139–145

    Article  Google Scholar 

  15. Lutz R, Bühlmann P (2006) Boosting for high-multivariate responses in high-dimensional linear regression. Stat Sin 16:471–494

    MATH  Google Scholar 

  16. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430(7001):743–747

    Article  Google Scholar 

  17. Muraki K et al. (2013) The role of ATM in the deficiency in nonhomologous end-joining near telomeres in a human cancer cell line. In: PLoS Genet, March 2013

    Google Scholar 

  18. Neale BM, Sham PC (2004) The future of association studies: gene-based analysis and replication. Am J Hum Genet 75:353–362

    Article  Google Scholar 

  19. Obozinski G, Wainwright MJ, Jordan MI (2011) Union support recovery in high-dimensional multivariate regression. Ann Stat 39(1):1–47

    Article  MATH  MathSciNet  Google Scholar 

  20. Peng J, Zhu J, Bergamaschi A, Han W, Noh DY, Pollack JR, Wang P (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann Appl Stat 4(1):53–77

    Article  MATH  MathSciNet  Google Scholar 

  21. Peters U, Jiao S, Schumacher FR, Hutter CM, Aragaki AK, Baron JA, Berndt SI, Bézieau S, Brenner H, Butterbach K, Caan BJ, Campbell PT, Carlson CS, Casey G, Chan AT, Chang-Claude J, Chanock SJ, Chen LS, Coetzee GA, Coetzee SG, Conti DV, Curtis KR, Duggan D, Edwards T, Fuchs CS, Gallinger S, Giovannucci EL, Gogarten SM, Gruber SB, Haile RW, Harrison TA, Hayes RB, Henderson BE, Hoffmeister M, Hopper JL, Hudson TJ, Hunter DJ, Jackson RD, Jee SH, Jenkins MA, Jia WH, Kolonel LN, Kooperberg C, Küry S, Lacroix AZ, Laurie CC, Laurie CA, Le Marchand L, Lemire M, Levine D, Lindor NM, Liu Y, Ma J, Makar KW, Matsuo K, Newcomb PA, Potter JD, Prentice RL, Qu C, Rohan T, Rosse SA, Schoen RE, Seminara D, Shrubsole M, Shu XO, Slattery ML, Taverna D, Thibodeau SN, Ulrich CM, White E, Xiang Y, Zanke BW, Zeng YX, Zhang B, Zheng W, Hsu L (Colon Cancer Family Registry and the Genetics and Epidemiology of Colorectal Cancer Consortium) (2013) Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology 144(4):799–807

    Article  Google Scholar 

  22. Rothman A, Levina L, Zhu J (2010) Sparse multivariate regression with covariance estimation. J Comput Graph Stat 19:947–962

    Article  MathSciNet  Google Scholar 

  23. Turlach B, Venables W, Wright S (2005) Simultaneous variable selection. Technometrics 47:349–363

    Article  MathSciNet  Google Scholar 

  24. Yin J, Li H (2011) A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann Appl Stat 5(4):2630–2650

    Article  MATH  MathSciNet  Google Scholar 

  25. Yuan M, Ekici A, Lu Z, Monteiro R (2007) Dimension reduction and coefficient estimation in multivariate linear regression. J R Stat Soc B 69(3):329–346

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by NIH grants R01CA138215 (XW, PW), SUB-CA160034 (XW, YZ, PW), R01GM082802 (PW), P01CA53996 (LH, PW), R01AG014358 (LH), P50CA138293 (LH, PW), and U24CA086368 (PW).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Wang.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Materials (PDF 164 kB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Qin, L., Zhang, H. et al. A Regularized Multivariate Regression Approach for eQTL Analysis. Stat Biosci 7, 129–146 (2015). https://doi.org/10.1007/s12561-013-9106-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-013-9106-9

Keywords

Navigation