Skip to main content
Log in

An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

The innovative RTM-GWAS procedure provides a relatively thorough detection of QTL and their multiple alleles for germplasm population characterization, gene network identification, and genomic selection strategy innovation in plant breeding.

Abstract

The previous genome-wide association studies (GWAS) have been concentrated on finding a handful of major quantitative trait loci (QTL), but plant breeders are interested in revealing the whole-genome QTL-allele constitution in breeding materials/germplasm (in which tremendous historical allelic variation has been accumulated) for genome-wide improvement. To match this requirement, two innovations were suggested for GWAS: first grouping tightly linked sequential SNPs into linkage disequilibrium blocks (SNPLDBs) to form markers with multi-allelic haplotypes, and second utilizing two-stage association analysis for QTL identification, where the markers were preselected by single-locus model followed by multi-locus multi-allele model stepwise regression. Our proposed GWAS procedure is characterized as a novel restricted two-stage multi-locus multi-allele GWAS (RTM-GWAS, https://github.com/njau-sri/rtm-gwas). The Chinese soybean germplasm population (CSGP) composed of 1024 accessions with 36,952 SNPLDBs (generated from 145,558 SNPs, with reduced linkage disequilibrium decay distance) was used to demonstrate the power and efficiency of RTM-GWAS. Using the CSGP marker information, simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous procedures, especially under large sample size and high trait heritability conditions. A relatively thorough detection of QTL with their multiple alleles was achieved by RTM-GWAS compared with the linear mixed model method on 100-seed weight in CSGP. A QTL-allele matrix (402 alleles of 139 QTL × 1024 accessions) was established as a compact form of the population genetic constitution. The 100-seed weight QTL-allele matrix was used for genetic characterization, candidate gene prediction, and genomic selection for optimal crosses in the germplasm population.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Andolfatto P, Davison D, Erezyilmaz D, Hu TT, Mast J, Sunayama-Morita T, Stern DL (2011) Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res 21:610–617

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R, Muliyati NW, Zhang X, Amer MA, Baxter I, Brachi B, Chory J, Dean C, Debieu M, de Meaux J, Ecker JR, Faure N, Kniskern JM, Jones JD, Michael T, Nemri A, Roux F, Salt DE, Tang C, Todesco M, Traw MB, Weigel D, Marjoram P, Borevitz JO, Bergelson J, Nordborg M (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376

    Article  PubMed  PubMed Central  Google Scholar 

  • Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN (2005) Demonstrating stratification in a European American population. Nat Genet 37:868–872

    Article  CAS  PubMed  Google Scholar 

  • De Coninck A, De Baets B, Kourounis D, Verbosio F, Schenk O, Maenhout S, Fostier J (2016) Needles: toward large-scale genomic prediction with marker-by-environment interaction. Genetics 203:543–555

    Article  PubMed  PubMed Central  Google Scholar 

  • Desta ZA, Ortiz R (2014) Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci 19:592–601

    Article  CAS  PubMed  Google Scholar 

  • Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004

    Article  CAS  PubMed  Google Scholar 

  • Dhanapal AP, Ray JD, Singh SK, Hoyos-Villegas V, Smith JR, Purcell LC, Andy King C, Cregan PB, Song Q, Fritschi FB (2015) Genome-wide association study (GWAS) of carbon isotope ratio (δ13C) in diverse soybean [Glycine max (L.) Merr.] genotypes. Theor Appl Genet 128:73–91

    Article  CAS  PubMed  Google Scholar 

  • Ding K, Zhou K, Zhang J, Knight J, Zhang X, Shen Y (2005) The effect of haplotype-block definitions on inference of haplotype-block structure and htSNPs selection. Mol Biol Evol 22:148–159

    Article  CAS  PubMed  Google Scholar 

  • Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567

    Article  PubMed  Google Scholar 

  • Farnir F, Coppieters W, Arranz JJ, Berzi P, Cambisano N, Grisart B, Karim L, Marcq F, Moreau L, Mni M, Nezer C, Simon P, Vanmanshoven P, Wagenaar D, Georges M (2000) Extensive genome-wide linkage disequilibrium in cattle. Genome Res 10:220–227

    Article  CAS  PubMed  Google Scholar 

  • Felsenstein J (1989) PHYLIP—phylogeny inference package (version 3.2). Cladistics 5:164–166

    Google Scholar 

  • Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229

    Article  CAS  PubMed  Google Scholar 

  • Hanson CH, Robinson HF, Comstock RE (1956) Biometrical studies of yield in segregating populations of Korean Lespedeza. Agron J 48:268

    Article  Google Scholar 

  • Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12

    Article  CAS  Google Scholar 

  • Huang X, Han B (2014) Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol 65:531–551

    Article  CAS  PubMed  Google Scholar 

  • Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng Q, Liu K, Huang T, Zhou T, Jing Y, Li W, Lin Z, Buckler ES, Qian Q, Zhang QF, Li J, Han B (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42:961–967

    Article  CAS  PubMed  Google Scholar 

  • Jia G, Huang X, Zhi H, Zhao Y, Zhao Q, Li W, Chai Y, Yang L, Liu K, Lu H, Zhu C, Lu Y, Zhou C, Fan D, Weng Q, Guo Y, Huang T, Zhang L, Lu T, Feng Q, Hao H, Liu H, Lu P, Zhang N, Li Y, Guo E, Wang S, Wang S, Liu J, Zhang W, Chen G, Zhang B, Li W, Wang Y, Li H, Zhao B, Li J, Diao X, Han B (2013) A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat Genet 45:957–961

    Article  CAS  PubMed  Google Scholar 

  • Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection. Genetics 201:759–768

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31:497–504

    Article  CAS  PubMed  Google Scholar 

  • Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723

    Article  PubMed  PubMed Central  Google Scholar 

  • Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Karkkainen HP, Sillanpaa MJ (2012) Back to basics for Bayesian model building in genomic selection. Genetics 191:969–987

    Article  PubMed  PubMed Central  Google Scholar 

  • Li Z, Sillanpaa MJ (2012) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435

    Article  CAS  PubMed  Google Scholar 

  • Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–1967

    Article  CAS  PubMed  Google Scholar 

  • Li H, Peng Z, Yang X, Wang W, Fu J, Wang J, Han Y, Chai Y, Guo T, Yang N, Liu J, Warburton ML, Cheng Y, Hao X, Zhang P, Zhao J, Liu Y, Wang G, Li J, Yan J (2013) Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet 45:43–50

    Article  CAS  PubMed  Google Scholar 

  • Li S, Cao Y, He J, Zhao T, Gai J (2017) Detecting the QTL-allele system conferring flowering date in a nested association mapping population of soybean using a novel procedure. Theor Appl Genet. doi:10.1007/s00122-017-2960-y

    Google Scholar 

  • Meng S, He J, Zhao T, Xing G, Li Y, Yang S, Lu J, Wang Y, Gai J (2016) Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration. Theor Appl Genet 129:1557–1576

    Article  CAS  PubMed  Google Scholar 

  • Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed  PubMed Central  Google Scholar 

  • Mohammadi M, Tiede T, Smith KP (2015) PopVar: A genome-wide procedure for predicting genetic variance and correlated response in biparental breeding populations. Crop Sci 55:2068

    Article  CAS  Google Scholar 

  • Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC, Buckler ES, Kresovich S (2013) Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA 110:453–458

    Article  CAS  PubMed  Google Scholar 

  • Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res 8:4321–4325

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nordborg M, Weigel D (2008) Next-generation genetics in plants. Nature 456:720–723

    Article  CAS  PubMed  Google Scholar 

  • Pattaro C, Ruczinski I, Fallin DM, Parmigiani AG (2008) Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies. BMC Genomics 9:405

    Article  PubMed  PubMed Central  Google Scholar 

  • Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:e190

    Article  PubMed  PubMed Central  Google Scholar 

  • Peleman JD, van der Voort JR (2003) Breeding by design. Trends Plant Sci 8:330–334

    Article  CAS  PubMed  Google Scholar 

  • Peng B, Kimmel M (2005) simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21:3686–3687

    Article  CAS  PubMed  Google Scholar 

  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  • Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000) Association mapping in structured populations. Am J Hum Genet 67:170–181

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rakitsch B, Lippert C, Stegle O, Borgwardt K (2013) A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29:206–214

    Article  CAS  PubMed  Google Scholar 

  • Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183

    Article  CAS  PubMed  Google Scholar 

  • Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, Nordborg M (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423

    Article  CAS  PubMed  Google Scholar 

  • Vazquez AI, Veturi Y, Behring M, Shrestha S, Kirst M, Resende MF Jr, de Los Campos G (2016) Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multiomic profiles. Genetics 203:1425–1438

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case–control association studies. PLoS Genet 1:e32

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang N, Akey JM, Zhang K, Chakraborty R, Jin L (2002) Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am J Hum Genet 71:1227–1234

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang S-B, Feng J-Y, Ren W-L, Huang B, Zhou L, Wen Y-J, Zhang J, Dunwell JM, Xu S, Zhang Y-M (2016) Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep 6:19444

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Weir BS (2008) Linkage disequilibrium and association mapping. Annu Rev Genom Hum Genet 9:129–142

    Article  CAS  Google Scholar 

  • Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208

    Article  CAS  PubMed  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67

    Article  Google Scholar 

  • Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136:1457–1468

    CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42:355–360

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang Y, He J, Wang Y, Xing G, Zhao J, Li Y, Yang S, Palmer RG, Zhao T, Gai J (2015a) Establishment of a 100-seed weight quantitative trait locus-allele matrix of the germplasm population for optimal recombination design in soybean breeding programmes. J Exp Bot 66:6311–6325

    Article  CAS  PubMed  Google Scholar 

  • Zhang Y, Liu M, He J, Wang Y, Xing G, Li Y, Yang S, Zhao T, Gai J (2015b) Marker-assisted breeding for transgressive seed protein content in soybean [Glycine max (L.) Merr]. Theor Appl Genet 128:1061–1072

    Article  CAS  PubMed  Google Scholar 

  • Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, Yu Y, Shu L, Zhao Y, Ma Y, Fang C, Shen Y, Liu T, Li C, Li Q, Wu M, Wang M, Wu Y, Dong Y, Wan W, Wang X, Ding Z, Gao Y, Xiang H, Zhu B, Lee SH, Wang W, Tian Z (2015) Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol 33:408–414

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the China National Key R & D Program for Crop Breeding (2016YFD0100304), the China National Key Basic Research Program (2011CB1093), the China National Hightech R&D Program (2012AA101106), the Natural Science Foundation of China (31571695), the MOE 111 Project (B08025), Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT13073), the MOA Public Profit Program (201203026-4), the MOA CARS-04 program, the Jiangsu Higher Education PAPD Program, and the Jiangsu JCIC-MCP Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyi Gai.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Communicated by Dr. Mikko J. Sillanpaa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 52 kb)

Supplementary material 2 (DOCX 1657 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, J., Meng, S., Zhao, T. et al. An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor Appl Genet 130, 2327–2343 (2017). https://doi.org/10.1007/s00122-017-2962-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-017-2962-9

Navigation