Skip to main content

Preparation and Curation of Multiyear, Multilocation, Multitrait Datasets

  • Protocol
  • First Online:
Genome-Wide Association Studies

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2481))

Abstract

Genome-wide association studies (GWAS) are a powerful approach to dissect genotype-phenotype associations and identify causative regions. However, this power is highly influenced by the accuracy of the phenotypic data. To obtain accurate phenotypic values, the phenotyping should be achieved through multienvironment trials (METs). In order to avoid any technical errors, the required time needs to be spent on exploring, understanding, curating and adjusting the phenotypic data in each trial before combining them using an appropriate linear mixed model (LMM). The LMM is chosen to minimize as much as possible any effect that can lead to misestimation of the phenotypic values. The purpose of this chapter is to explain a series of important steps to explore and analyze data from METs used to characterize an association panel. Two datasets are used to illustrate two different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhu C, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. Plant Genome J 1(1):5. https://doi.org/10.3835/plantgenome2008.02.0089

    Article  CAS  Google Scholar 

  2. Alqudah AM, Sallam A, Baenziger PS, Börner A (2020) GWAS: fast-forwarding gene identification and characterization in temperate cereals: lessons from barley – a review. J Adv Res 22:119–135. https://doi.org/10.1016/j.jare.2019.10.013

    Article  PubMed  Google Scholar 

  3. Dominik S (2013) Descriptive statistics of data: understanding the data set and phenotypes of interest. In: Gondro C, van der Werf J, Hayes B (eds) Genome-wide association studies and genomic prediction. Methods in molecular biology, vol 1019. Humana Press, Totowa, NJ, pp 19–36. https://doi.org/10.1007/978-1-62703-447-0

    Chapter  Google Scholar 

  4. Bernardo R (2010) Breeding for quantitative traits in plants, 2nd edn. Stemma Press, Woodbury, MN

    Google Scholar 

  5. Falconer DS, Mackay TFC (1996) An introduction to quantitative genetics, 4th edn. Prentice Hall, London

    Google Scholar 

  6. Arnold MH, Kempton RA (1979) Estimating the performance of sugar beet varieties. In: Proceedings of the 42nd Winter Congress of the Institut International de Recherches Betteravières, Brussels, Belgium. Plant Breeding Inst, Trumpington, pp 189–203

    Google Scholar 

  7. Gilmour AR, Cullis BR, Verbyla AP (1997) Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat 2:269–293. https://doi.org/10.2307/1400446

    Article  Google Scholar 

  8. Casler MD (2015) Fundamentals of experimental design: guidelines for designing successful experiments. Agron J 107:692–705. https://doi.org/10.2134/agronj2013.0114

    Article  Google Scholar 

  9. Pablo G-B, Díaz-García L, Gutiérrez L (2019) Mega-environmental design: using genotype × environment interaction to optimize resources for cultivar testing. Crop Sci 59(5):1899. https://doi.org/10.2135/cropsci2018.11.0692

    Article  Google Scholar 

  10. Pacheco A, Vargas M, Alvarado G, Rodríguez F, Crossa J, Burgueño, J (2015) GEA-R genotype x environment analysis with R for windows, Version 4.1, https://hdl.handle.net/11529/10203, CIMMYT Research Data & Software Repository Network, https://data.cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10203

  11. Malosetti M, Bustos-Korts D, Boer MP, van Eeuwijk FA (2016) Multi environment genomic prediction: issues in relation to genotype by environment interaction. Crop Sci 56(5):2210–2222. https://doi.org/10.2135/cropsci2015.05.0311

    Article  Google Scholar 

  12. Welham S, Gogel B, Smith A, Thompson R, Cullis B (2010) A comparison of analysis methods for late-stage variety evaluation trials. Aust N Z J Stat 52:125–149

    Article  Google Scholar 

  13. Piepho HP, Mohring J, Schulz-Streeck T, Ogutu JO (2012) A stage-wise approach for the analysis of multi-environment trials. Biom J 54:844–886

    Article  Google Scholar 

  14. R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/

    Google Scholar 

  15. Conomos, Matthew P Gogarten SM, Brown L, Chen H, Rice K, Sofer T, Thornton T et al (2018) GENESIS: GENetic EStimation and Inference in Structured samples (GENESIS): statistical methods for analyzing genetic data from samples with population structure and/or relatedness. R package version 2.10.0. https://rdrr.io/github/smgogarten/GENESIS/

  16. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2020) e1071: Misc functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-4. https://CRAN.R-project.org/package=e1071

  17. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York, NY. ISBN 0-387-95457-0

    Book  Google Scholar 

  18. Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2021) forecast: forecasting functions for time series and linear models. R package version 8.14, https://pkg.robjhyndman.com/forecast

  19. Smith AB, Cullis BR, Thompson R (2005) The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. J Agric Sci 143(6):449–462. https://doi.org/10.1017/S0021859605005587

    Article  Google Scholar 

  20. Butler DG, Cullis BR, Gilmour A R, Thompson R (2018) ASReml-R Reference Manual (Version 4): ASReml estimates variance components under a general linear mixed model by residual maximum likelihood (REML). University of Wollongong. https://mmade.org/wp content/uploads/2019/01/asremlRMfinal.pdf

  21. Bates D, Maechler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67(1):1–48. https://doi.org/10.18637/jss.v067.i01

    Article  Google Scholar 

  22. Covarrubias-Pazaran G (2018) Software update: moving the R package sommer to multivariate mixed models for genome-assisted prediction. BioRxiv. https://doi.org/10.1101/354639

  23. Pinheiro J, Bates D, Deb Roy S, Sarkar D, R Core Team (2020) nlme: linear and nonlinear mixed effects models. R package version 3.1-149, URL: https://CRAN.R-project.org/package=nlme

  24. Aparicio J (2021) MrBean: web application for analyzing field experiments. R package version 2.0.6., https://apariciojohan.github.io/MrBeanApp/

  25. Technow F (2015) R package mvngGrAd: moving grid adjustment in plant breeding field trials. R package version 0.1.5

    Google Scholar 

  26. Auguie B (2017) gridExtra: miscellaneous functions for “Grid” graphics. R package version 2.3. https://CRAN.R-project.org/package=gridExtra

  27. Rodriguez-Alvarez MX, Boer MP, van Eeuwijk FA, Eilers PHC (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat Stat 23:52–71. https://doi.org/10.1016/j.spasta.2017.10.003

    Article  Google Scholar 

  28. Cullis BR, Gleeson AC (1991) Spatial analysis of field experiments-an extension to two dimensions. Biometrics 47:1449–1460

    Article  Google Scholar 

  29. Kehel Z, Habash DZ, Gezan SA, Welham SJ, Nachit MM (2010) Estimation of spatial trend and automatic model selection in augmented designs. Agron J 102:1542–1552

    Article  Google Scholar 

  30. Ben-Shachar M, Lüdecke D, Makowski D (2020) effectsize: estimation of effect size indices and standardized parameters. J Open Source Softw 5(56):2815. https://doi.org/10.21105/joss.02815

    Article  Google Scholar 

  31. Neyhart JL, Smith KP (2019) Validating genome wide predictions of genetic variance in a contemporary breeding program. Crop Sci 59(3):1062. https://doi.org/10.2135/cropsci2018.11.0716

    Article  Google Scholar 

  32. Milliken GA, Johnson DE (2002) Analysis of messy data, Volume III. Analysis of covariance. Chapman and Hall/CRC, New York, NY

    Google Scholar 

  33. Gastwirth JL, Gel YR, Wallace Hui WL, Lyubchich V, Miao W, Noguchi K (2020) lawstat: tools for biostatistics, public policy, and law. R package version 3.4. https://CRAN.R-project.org/package=lawstat

  34. Kassambara A (2021) rstatix: pipe-friendly framework for basic statistical tests. R package version 0.7.0. https://CRAN.R-project.org/package=rstatix

  35. Alvarado G, Rodríguez F, Pacheco A, Burgueño J, Crossa J, Vargas M, Pérez-Rodríguez P, Lopez-Cruz MA (2020) META-R: a software to analyze data from multi-environment plant breeding trials. Crop J 8(5):745–756. https://doi.org/10.1016/j.cj.2020.03.010

    Article  Google Scholar 

  36. Mohring J, Piepho HP (2009) Comparison of weighting in two-stage analysis of plant breeding trials. Crop Sci 49:1977–1988

    Article  Google Scholar 

  37. Piepho HP (1998) Empirical best linear unbiased prediction in cultivar trials using factor-analytic variance-covariance structures. Theor Appl Genet 97:195–201. https://doi.org/10.1007/s001220050885

    Article  Google Scholar 

  38. Meyer K (2009) Factor-analytic models for genotype × environment type problems and structured covariance matrices. Genet Sel Evol 41:21. https://doi.org/10.1007/978-94-009-7142-4_3

    Article  PubMed  PubMed Central  Google Scholar 

  39. Smith AB, Ganesalingam A, Kuchel H (2015) Factor analytic mixed models for the provision of grower information from national crop variety testing programs. Theor Appl Genet 128:55–72. https://doi.org/10.1007/s00122-014-2412-x

    Article  PubMed  Google Scholar 

  40. Smith AB, Cullis BR, Thompson R (2001) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Amina Abed or Zakaria Kehel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Abed, A., Kehel, Z. (2022). Preparation and Curation of Multiyear, Multilocation, Multitrait Datasets. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2237-7_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2236-0

  • Online ISBN: 978-1-0716-2237-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics