Skip to main content
Log in

Using public control genotype data to increase power and decrease cost of case–control genetic association studies

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ahn K, Haynes C, Kim W, Fleur RS, Gordon D, Finch SJ (2007) The effects of SNP genotyping errors on the power of the Cochran-Armitage linear trend test for case/control association studies. Ann Hum Genet 71:249–261

    Article  CAS  PubMed  Google Scholar 

  • Armitage P (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375–386

    Article  Google Scholar 

  • Chapman DG, Nam JM (1968) Asymptotic power of chi square tests for linear trends in proportions. Biometrics 24:315–327

    Article  CAS  PubMed  Google Scholar 

  • Cochran WG (1954) Some methods for strengthening the common chi-squared tests. Biometrics 10:417–451

    Article  Google Scholar 

  • Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D (2005) Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet 6:18

    Article  PubMed  Google Scholar 

  • Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le ML, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644

    Article  CAS  PubMed  Google Scholar 

  • Hom G, Graham RR, Modrek B, Taylor KE, Ortmann W, Garnier S, Lee AT, Chung SA, Ferreira RC, Pant PV, Ballinger DG, Kosoy R, Demirci FY, Kamboh MI, Kao AH, Tian C, Gunnarsson I, Bengtsson AA, Rantapaa-Dahlqvist S, Petri M, Manzi S, Seldin MF, Ronnblom L, Syvanen AC, Criswell LA, Gregersen PK, Behrens TW (2008) Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N Engl J Med 358:900–909

    Article  CAS  PubMed  Google Scholar 

  • Kraft P (2006) Efficient two-stage genome-wide association designs based on false positive report probabilities. In: Pacific symposium on biocomputing, pp 523–534

  • Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, Schreiber S, Krawczak M, Lu Y, Styche A, Devlin B, Roeder K, Trucco M (2008) On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet 82:453–463

    Article  CAS  PubMed  Google Scholar 

  • Moskvina V, Holmans P, Schmidt KM, Craddock N (2005) Design of case-controls studies with unscreened controls. Ann Hum Genet 69:566–576

    Article  CAS  PubMed  Google Scholar 

  • Moskvina V, Craddock N, Holmans P, Owen MJ, O’Donovan MC (2006) Effects of differential genotyping error rate on the type I error probability of case-control studies. Hum Hered 61:55–64

    Article  PubMed  Google Scholar 

  • Neale BM, Purcell S (2008) The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet 147B(7):1288–1294

    Google Scholar 

  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  • R Development Core Team (2006) R: a language and environment for statistical computing. R Development Core Team, Vienna

    Google Scholar 

  • Reich DE, Goldstein DB (2001) Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol 20:4–16

    Article  CAS  PubMed  Google Scholar 

  • Roeder K, Luca D (2009) Searching for disease susceptibility variants in structured populations. Genomics 93:1–4

    Article  CAS  PubMed  Google Scholar 

  • Satagopan JM, Verbel DA, Venkatraman ES, Offit KE, Begg CB (2002) Two-stage designs for gene-disease association studies. Biometrics 58:163–170

    Article  PubMed  Google Scholar 

  • Satagopan JM, Venkatraman ES, Begg CB (2004) Two-stage designs for gene-disease association studies with sample size constraints. Biometrics 60:589–597

    Article  PubMed  Google Scholar 

  • Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S, Dworkis DA, Wilk JB, Myers RH, Steinberg MH, Montano M, Baldwin CT, Perls TT (2010) Genetic signatures of exceptional longevity in humans. Science (in press)

  • Silverberg MS, Cho JH, Rioux JD, McGovern DP, Wu J, Annese V, Achkar JP, Goyette P, Scott R, Xu W, Barmada MM, Klei L, Daly MJ, Abraham C, Bayless TM, Bossa F, Griffiths AM, Ippoliti AF, Lahaie RG, Latiano A, Pare P, Proctor DD, Regueiro MD, Steinhart AH, Targan SR, Schumm LP, Kistner EO, Lee AT, Gregersen PK, Rotter JI, Brant SR, Taylor KD, Roeder K, Duerr RH (2009) Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet 41:216–220

    Article  CAS  PubMed  Google Scholar 

  • Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–213

    Article  CAS  PubMed  Google Scholar 

  • Skol AD, Scott LJ, Abecasis GR, Boehnke M (2007) Optimal designs for two-stage genome-wide association studies. Genet Epidemiol 31:776–788

    Article  PubMed  Google Scholar 

  • Slager SL, Schaid DJ (2001) Case-control studies of genetic markers: power and sample size approximations for Armitage’s test for trend. Hum Hered 52:149–153

    Article  CAS  PubMed  Google Scholar 

  • Thomas D, Xie R, Gebregziabher M (2004) Two-stage sampling designs for gene association studies. Genet Epidemiol 27:401–414

    Article  PubMed  Google Scholar 

  • Wang H, Thomas DC, Pe’er I, Stram DO (2006) Optimal two-stage genotyping designs for genome-wide association scans. Genet Epidemiol 30:356–368

    Article  PubMed  Google Scholar 

  • Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  Google Scholar 

  • Wrensch M, Jenkins RB, Chang JS, Yeh RF, Xiao Y, Decker PA, Ballman KV, Berger M, Buckner JC, Chang S, Giannini C, Halder C, Kollmeyer TM, Kosel ML, LaChance DH, McCoy L, O’Neill BP, Patoka J, Pico AR, Prados M, Quesenberry C, Rice T, Rynearson AL, Smirnov I, Tihan T, Wiemels J, Yang P, Wiencke JK (2009) Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility. Nat Genet 41:905–908

    Article  CAS  PubMed  Google Scholar 

  • Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G (2008) Population substructure and control selection in genome-wide association studies. PLoS One 3:e2551

    Article  PubMed  Google Scholar 

  • Zheng G, Tian X (2005) The impact of diagnostic error on testing genetic association in case-control studies. Stat Med 24:869–882

    Article  PubMed  Google Scholar 

  • Zhuang JJ, Zondervan K, Nyberg F, Harbron C, Jawaid A, Cardon LR, Barratt BJ, Morris AP (2010) Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group. Genet Epidemiol 34(4):319–326

    Google Scholar 

Download references

Acknowledgments

This work was supported by National Institutes of Health grants CA120082 and CA1363621. We would like to express our appreciation to Yunfei Wang for his assistance in programing and three anonymous reviewers for their helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ethan M. Lange.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 1,177 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ho, L.A., Lange, E.M. Using public control genotype data to increase power and decrease cost of case–control genetic association studies. Hum Genet 128, 597–608 (2010). https://doi.org/10.1007/s00439-010-0880-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-010-0880-x

Keywords

Navigation