Genome-wide association study and genetic diversity analysis on nitrogen use efficiency in a Central European winter wheat (Triticum aestivum L.) collection

István Monostori; Fruzsina Szira; Alessandro Tondelli; Tamás Árendás; Krisztián Gierczik; Luigi Cattivelli; Gábor Galiba; Attila Vágújfalvi

doi:10.1371/journal.pone.0189265

Abstract

To satisfy future demands, the increase of wheat (Triticum aestivum L.) yield is inevitable. Simultaneously, maintaining high crop productivity and efficient use of nutrients, especially nitrogen use efficiency (NUE), are essential for sustainable agriculture. NUE and its components are inherently complex and highly influenced by environmental factors, nitrogen management practices and genotypic variation. Therefore, a better understanding of their genetic basis and regulation is fundamental. To investigate NUE-related traits and their genetic and environmental regulation, field trials were evaluated in a Central European wheat collection of 93 cultivars at two nitrogen input levels across three seasons. This elite germplasm collection was genotyped on DArTseq® genotypic platform to identify loci affecting N-related complex agronomic traits. To conduct robust genome-wide association mapping, the genetic diversity, population structure and linkage disequilibrium were examined. Population structure was investigated by various methods and two subpopulations were identified. Their separation is based on the breeding history of the cultivars, while analysis of linkage disequilibrium suggested that selective pressures had acted on genomic regions bearing loci with remarkable agronomic importance. Besides NUE, genetic basis for variation in agronomic traits indirectly affecting NUE and its components, moreover genetic loci underlying response to nitrogen fertilisation were also determined. Altogether, 183 marker-trait associations (MTA) were identified spreading over almost the entire genome. We found that most of the MTAs were environmental-dependent. The present study identified several associated markers in those genomic regions where previous reports had found genes or quantitative trait loci influencing the same traits, while most of the MTAs revealed new genomic regions. Our data provides an overview of the allele composition of bread wheat varieties anchored to DArTseq® markers, which will facilitate the understanding of the genetic basis of NUE and agronomically important traits.

Citation: Monostori I, Szira F, Tondelli A, Árendás T, Gierczik K, Cattivelli L, et al. (2017) Genome-wide association study and genetic diversity analysis on nitrogen use efficiency in a Central European winter wheat (Triticum aestivum L.) collection. PLoS ONE 12(12): e0189265. https://doi.org/10.1371/journal.pone.0189265

Editor: Aimin Zhang, Institute of Genetics and Developmental Biology Chinese Academy of Sciences, CHINA

Received: July 17, 2017; Accepted: November 23, 2017; Published: December 28, 2017

Copyright: © 2017 Monostori et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Relevant data are available in the paper, its Supporting Information files, and from https://www.ebi.ac.uk/biostudies/studies/S-BSST36/?query=S-BSST36/ (accession number: S-BSST36/).

Funding: This work was supported by the National Research, Development and Innovation Office, K101794 and K111879 (http://nkfih.gov.hu/english) (IM; FSz; TÁ; KG; AV; GG). The funders do have a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: AMOVA, analysis of molecular variance; GBS, genotyping by sequencing; G × E, genotype by environment interaction; GLM, general linear model; GN, grain number per spike; GNACE, grain N accumulation efficiency; GPC, grain protein content; GS, growth stage; GY, grain yield; GWAS, genome-wide association study; HD, heading date; HI, harvest index; LD, linkage disequilibrium; MLM, mixed linear model; MTA ATK, Centre for Agricultural Research, Hungarian Academy of Sciences; MTA, marker-trait association; N, nitrogen; N0, no nitrogen fertilisation, extensive management; N120, 120 kg per hectare nitrogen fertiliser, intensive management; NHI, N harvest index; N_mineral, mineral N content; N_soil, amount of N available in the soil and applied as fertiliser; NR, nitrogen response, trait_NR stands for the nitrogen response for the given trait; NUE, N use efficiency; NUpE, N uptake efficiency; NUp_full, amount of N taken up by the whole aboveground plant; NUp_grain, total N content harvested in the grain; NUtE, N utilisation efficiency; PCA, principal component analysis; PCoA, principal coordinate analysis; PH, plant height; QTL, Quantitative Trait Locus; RN, response to N fertilization; SN, spike number per meter; Sp, subpopulation; SY, straw yield; TGW, thousand grain weight

Introduction

Wheat is grown on greater land area than any other commercial crops; its production has exceeded 700 million tons in the recent years, hence wheat became the most important food grain source for humans [1]. Considering the predicted population growth, the increase in per capita consumption and the changes in diets, the global production of agricultural products needs to be higher by 60 percent by 2050. So we face the challenge to increase the average wheat yield from the current 3.4 t/ha to 5.42 t/ha. In addition, cereal production already utilises more than half of the global fertiliser production [2]. As a consequence, fertiliser consumption is increasing: Growth in the amount of applied nitrogen (N) fertiliser is expected to double by 2050 from 112 Mt (2015) to 236 Mt. [3–4]. Nevertheless, the utilisation of N fertilisers is rather inefficient. Approximately 50-70% of the applied N vanishes from the plant-soil system enriching the reactive N compounds in the atmosphere, polluting the ground and surface waters. The environmental damages associated with the utilisation of N-based fertilisers are becoming significant not just at local but also regional and global scales [5–6]. The effect of the negative environmental and economic impacts could be reduced through better agronomic practices and the utilisation of N-efficient cultivars with improved N use efficiency (NUE) [7]. The world market demands high-quality bread wheat with high protein content and strong gluten properties. However, with the improvement of NUE, as an adverse effect, the grain protein content (GPC) may decrease. The inverse relationship between grain yield (GY) and GPC makes it difficult to improve these two traits simultaneously. Accordingly, it is a challenging task to reduce the excessive input of N fertilisers while maintaining the desired yield and GPC.

NUE in agricultural systems is defined as GY per unit of available N (from the soil and/or fertiliser) [8], which can be influenced by numerous traits (reviewed by [9]). The most immediate goal is to reach better NUE by the improvement of N uptake efficiency (NUpE)–the ability of the plant to remove N from the soil–and to improve N harvest index (NHI)–the remobilisation of the stored N into the grain [10–11]. It is also known that different limiting factors are involved in plant metabolism for maximising NUE under high or low N availability [12].

Moreover, the importance of the two NUE components, NUpE and N utilisation efficiency (NUtE) are variable, mostly depending on the amount of the available N. Previous studies revealed that NUpE contributes more to genetic variation in NUE, especially at low N levels, whereas at high N levels, variation in NUE is mainly due to differences in NUtE (reviewed by) [11–13]. Consequently, when N availability is not a limiting factor, NUtE can be more dominant. Twenty winter wheat varieties were studied [14] at two N levels and it was found that NUpE accounted for more of the variation in NUE at low N condition (64%) than at high N supply (30%). More importantly, 63% of the genotype × N level interaction was explained for NUpE. In contrast, an investigation studying 39 winter wheat cultivars at five N levels found that cultivar differences in NUpE occurred only at the three highest N supply levels and it was concluded that NUtE explained more of the variation in GY than NUpE, regardless of the level of soil N [15]. These experiments prove that NUE and its components are highly influenced by environmental factors: the interaction of climate- nutrient- water availability, N management and genotypic variation [12]. Therefore, it is important to understand the interactions between NUE-related traits as well as their genetic and environmental regulation.

Since NUE is determined by multiple genetic factors and influenced heavily by the environment, its genetic dissection is quite challenging [12]. In addition, the large genotype by environment interaction makes the investigation even more difficult. It was concluded that the discovery of a major gene (with big effect size) controlling NUE itself, as it is, is quite unlikely. Rather, the description of many genes or QTLs (Quantitative Trait Loci) with minor or moderate effect will take us closer to the understanding of the final picture, the complexity of NUE [13]. These minor QTLs are more likely are specific QTLs representing genetic variation specific for the species or variety and the environmental conditions [13]. To discover many of these specific QTLs it is inevitable to investigate NUE in different environments studying diverse genetic material.

Numerous studies have established that significant genetic variation exists for NUE-related traits in wheat. Components of NUE were investigated in bread wheat under five N levels, and depending on N the supply, a 24-42% difference was fund in NUtE between the elite wheat varieties [15]. Another study revealed significant differences (69–88%) for N remobilisation efficiency and N translocation efficiency (90-93%) among five wheat genotypes [16]. These results indicate that genetic potential indeed exists to improve NUE in wheat.

Genetic diversity is a fundamental aspect of crop improvement; therefore, the effective utilisation of genetic resources in breeding programs is essentialy as long as this diversity integrates positive and profitable genes [17]. I has been concluded [13] that QTL data obtained from wide crosses, in spite of the fact that they have generated relatively robust QTL information dataset, may not be relevant in developing modern cereal crops. Diverse elite genetic materials, which are well-adapted and commercially relevant for specific interests are more useful to identify allelic variation for superior NUE. In this way, knowledge gained from scientific studies can provide valuable results in plant breeding.

Many genes or QTLs influencing NUE have been successfully mapped in wheat under various N supplies. Most of the studies utilised bi-parental populations to identify the genetic basis of NUE and its associated traits [7, 18–20].

Association mapping exploits ancestral recombination events that occurred throughout the evolutionary history of the populations and takes into account all the segregating alleles present in the population. In contrast, due to the restricted number of meiotic events, the genetic resolution of the segregation based conventional bi-parental maps often remain insufficient. Moreover, only the alleles from the parental genotypes are scrutinised in these ‘classical’ bi-parental analyses. So, compared to the capabilities of bi-parental mapping association analysis is a more efficient approach to dissect complex quantitative traits [21]. Concerning that this method is relatively new in cereal genetics, only a few genome-wide association studies (GWAS) have been published in wheat for NUE on diverse genetic materials under various different conditions so far [17, 22–25].

In order to identify chromosomal regions involved in the determination of NUE and its related traits, a genome-wide association study was carried out in a European winter wheat collection under different conditions. The Marker-Trait-Associations (MTA) detected in commercially relevant genetic materials will facilitate the isolation of agronomically important genes, especially those related to NUE.

Materials and methods

Plant materials and experimental design

A set of 93 bread wheat (Triticum aestivum L.) varieties (S1 Table) were phenotyped in Martonvásár, Hungary, at the MTA ATK (Centre for Agricultural Research, Hungarian Academy of Sciences), in six environments (3 year × 2 N levels), during three successive cropping seasons (2012-2013, 2013-2014 and 2014-2015). The examined cultivars represent an elite germplasm collection grown mainly in Hungary and in Central Europe; however, some old (e.g. ‘Bezostaja-1’, ‘Bánkúti 1201’) or non-continental (e.g. ‘Nudakota’) varieties were also involved. Cultivars were selected according to their involvement and importance in the Hungarian wheat breeding.

In three consecutive years adjacent fields belonging to the Agricultural Institute of MTA ATK (47°18ʹ N, 18°48ʹ E, 105 MASL) were used. Each cultivar was sown between 2 and 21 October, in a split-plot design, in three replicates, at two N levels: (1) no nitrogen supply–considered as extensive management (referred as N0) and (2) intensive management, whereby 120 kg N per hectare (referred as N120) was applied. In the N0 treatment, only the naturally occurring nitrogen was available in the soil, while in the case of N120, N was top-dressed at growth stage (GS) 21-24 [26]. In 2014 and 2015 the fertiliser treatment was applied on 7 and 17 March, respectively. In 2013, the spring was cold and wet, so the N fertiliser could be allocated to the field only on 17 April (at tillering stage too, GS 21–26). In 2013, ammonium nitrate (34% N), while in 2014 and 2015, calcium ammonium nitrate (27% N) was allocated. The size of each plot was 3×1.44 m consisting of 12 rows, containing 500 viable seeds/m². Sowing date and seed rate were applied according to the Hungarian practice [27–28]. N treatment was considered as main plots and varieties as sub-plots. P₂O₅ and K₂O fertilisation and plant protection were applied according to standard agricultural practice and no growth regulators were used. Crops were combine-harvested at state of grain maturity in the period of 8-21 July. Each spring, soil samples were collected before fertilisation from two depths (0-0.3 m and 0.3-0.6 m) and mineral N (ammonium + nitrate) contents (N_mineral) and main soil properties were determined in accredited laboratory according to NAT-1-1093/2001 (NÉBIH NTAI, Velence, Hungary). Soil type at each location was chernozem, but they differed in their naturally available N content. The average N_mineral content (mean value of six soil samples in each year) was 21, 494 and 78 kgha^-1 in 2013, 2014 and 2015, respectively. Weather data (i.e. daily precipitation and mean temperature values) were also recorded in the Martonvásár region. Further information about soil parameters and meteorological data can be obtained from [29].

Phenotypic evaluation

The entire collection was evaluated for 16 agronomically important or N input-related traits at two N input levels (N0 and N120). The investigated traits were heading date (HD), plant height (PH), GY, thousand grain weight (TGW), weight of straw–referred to as straw yield (SY), spike number per meter (SN), grain number per spike (GN), harvest index (HI), N harvest index (NHI), GPC, NUE, NUpE, NUtE, amount of N taken up by the whole plant (NUp_full), amount of the N harvested in grain (NUp_grain), grain N accumulation efficiency (GNACE). GY, HD and PH were scored on a per plot basis, while GN, SN, TGW and SY were evaluated from above-ground biomass samples collected from a representative row of each plot, whose length was one meter. HD was assigned when 50% of the spikes within a given plot had fully emerged (GS 59) [26] and expressed in days from sowing, while PH was measured in cm at maturity. The one meter samples were dried for 48 h at 70°C and SY was determined and transformed in kgha^-1. SN and GN were determined on per meter basis. GN was determined using Contador seed counter machine (Pfeuffer GmbH, Kitzingen, Germany) and TGW was also calculated. The number of fully developed spikes was calculated from the one meter sample from each plot and expressed in spikes per m. GY was assigned when plots were combine-harvested at plant maturity and calculated in kg hectare^-1 as SY. HI was determined as the ratio of GY and above-ground biomass (both of them expressed in kgha^-1), which is the sum of GY and SY. The straw and grain samples were milled and N content was determined by Dumas method [30] using Rapid NIII nitrogen analyser (Elementar Analysensysteme GmbH., Hanau, Germany). GPC in w/w% was calculated as grain N concentration multiplied by a coefficient of 5.8. The amount of the N harvested in grain and in straw was calculated by multiplying the N concentration obtained from Rapid NIII analysis with the amount of GY and SY, respectively, and expressed in kg hectare^-1. NUE, NUtE and NUpE were defined according to [8] as follows: NUE is the GY divided by the sum of naturally available N content in the soil and the amount of N allocated with fertiliser in kg hecatare^-1 (N_soil). NUtE is the GY divided by NUp_full. NUp_full is the sum of NUp_grain and N harvested in straw, which was expressed in kg hectare^-1. NUpE is the NUp_full divided by N_soil. NHI is the ratio of NUp_grain to NUp_full, which is a measure of N translocation efficiency. GNACE is NUp_grain divided by the N_soil, which serves as a measure of the overall efficiency.

Analysis of variance for all traits in each of the three consecutive years was performed using General Linear Model (GLM) procedure with SPSS 22.0 software for Windows. Correlations between phenotypic traits were also calculated through SPSS 22.0 in each cropping year and treatment.

Genotyping and marker selection

Genomic DNA was extracted by the Qiagen DNeasy Plant Mini Kit according to the manufacturer’s instructions and sent to the commercial service provider of DArT marker, Triticarte Pty. Ltd. (Canberra, Australia; http://diversityarray.com/). Genotyping was performed with the wheat DArTseq® platform and generated two types of marker data: (1) genotyping by sequencing (GBS) markers are SNP markers obtained by sequencing the fragments derived from genome complexity reduction and subsequent SNP calling, and (2) silicoDArT markers referring to the presence or absence of restriction fragment in the genomic representation. SilicoDArT is analogous to microarray DArT but extracted in silico from the sequences obtained from the genomic representation used for GBS genotyping. Genomic sequences of fragments from both types of markers were also available. A detailed description of the platform used to genotype the collection can be found in [31]. As a result of genotyping, the initial dataset contained 12,293 polymorphic codominant SNP and 13,160 dominant silicoDArT markers. This marker set was filtered on the basis of individual marker-related statistics, provided by the Triticarte Pty. Ltd. The minimum threshold value for call-rate and reproducibility was 95%. Since it was assumed that all genotypes are homozygous, DArTseq® markers showing heterozygous calls were indicated as missing and markers with >5% heterozygous alleles were excluded. Additionally, markers with minor allele frequency lower than 0.1 were also removed from the analysis. After filtering marker data, a total of 4,201 polymorphic DArTseq® markers (2,700 silicoDArTs and 1,501 SNPs) were obtained and used for population structure analysis. However, since only 3,290 of these polymorphic markers had been previously mapped (data obtained from Triticarte Pty. Ltd), the unmapped markers were not included in linkage disequilibrium (LD) and marker trait associations analysis. A reduced marker set consisting of 300 unlinked markers were also used to complement and verify the population structure analysis. They were obtained from the filtered set by excluding markers, which were localised closer than 5 cM to any other marker.

Analysis of population structure

To determine the underlying population structure in the Hungarian wheat collection, different methodologies were used and compared. Firstly, the Bayesian algorithm implemented in the software package Structure. v2.3.4 [32] was used to estimate the number of hypothetical subpopulations (K) and to assign cultivars to them. Structure runs were performed with two marker set (4,201 markers and a reduced set with 300 markers) applying the admixture model with correlated allele frequencies. A burn-in of 100,000 iteration followed by 100,000 Markov Chain Monte Carlo iterations were set for accurate parameter estimates. The value of K was evaluated from 1 to 8, with 5 iterations for each K value. The most likely number for subpopulations was estimated by following the ΔK method, described by [33]. Each cultivar was assigned to one subpopulation based on the membership probability. Principal Coordinate Analysis (PCoA) was also used as an alternative way of visualizing the genetic stratification within the collection, by means of the software package PAST v.3.12 [34]. Additionally, the phylogenetic relationship among the cultivars was estimated using PAST software using the neighbour joining method. Bootstrap values for 1,000 replicates are indicated. Analysis of molecular variance (AMOVA) and genome wide estimation of population differentiation using Wright’s F-statistics (F_ST) among subpopulations were performed with software package Arlequin 3.5.2.2 [35] to investigate levels of genetic variation revealed by Structure analysis.

Linkage disequilibrium analysis

Genome-wide LD analysis was performed using the LD function in the software TASSEL 5.0 [36]. Intra-chromosomal LD was estimated for the entire population. The analysis comprised the pairwise estimated squared allele-frequency correlation (r²), normalized coefficient of linkage disequilibrium (D-prime or D’) and the significance of each pair of loci. The locus were considered to be in significant LD when p<0.05. To estimate the LD decay, significant r²-values were plotted against the genetic distance between the marker-pairs and a second-degree smoothed Loess curve was fitted using SPSS 22.0. [37]. The interception of the Loess curve and background LD (critical r²) was considered as an estimate of LD decay. The critical r²-value was determined by root transforming the unlinked r²-values and taking the 95^th percentile of the distribution as the threshold beyond which LD is likely caused by genetic linkage [23,38]. According to [39], marker pairs with a distance above 50 cM were considered as unlinked. LD decay was estimated separately in all chromosomes, for the three genomes (A, B and D) in the entire population.

Marker-trait association analysis

Genome-wide association analysis was performed using TASSEL 5.0 software [36] for each measured and calculated trait and for genotype response to N fertilisation too. Response to N fertilisation (RN) was estimated as the ratio between N120 and N0 for GY, SN, GN, GPC, NHI, NUPfull, NUpgrain and NUtE. In the association analysis each cultivar was represented as the phenotypic mean of the three plots in each season and the seasons were analysed separately. Four different statistical models were adopted to calculate P-values for MTAs to avoid spurious associations: (1) general linear model (GLM) with Q-matrix as correction for population structure; (2) GLM with Principal component analysis (PCA) as correction for population structure; (3) mixed linear model (MLM) with Kinship-matrix (K-matrix) as correction for population structure and (4) MLM with Q-matrix and K-matrix as correction for population structure [32;40]. PCA and K-matrix were calculated with TASSEL 5.0 considering recommendations of [41]. MLM was run without compression and selecting the option “population parameters previously determined”. The critical threshold for assessing the significance of MTAs were calculated by false discovery rate separately for each trait in each year [42]. An MTA was defined significant if the calculated q-value passed the FDR threshold for a given trait in all of the four models. When a trait from the same environment (i.e. year and treatment) was found to be associated with more than one marker, that MTA was considered, which possessed its highest effect size within a 5 cM region.

Moreover quantile–quantile plots of -log₁₀ P-values in S3 Fig illustrate the relationship between the observed and the expected dataset based on the MLM+K+Q model. Appropriate control for population structure and relatedness can be seen, since empirical distribution do not deviate from expectations.

Results

Genotypic data

Genotyping the Hungarian wheat collection with DArTseq® platform resulted in a final dataset comprising 4,201 quality-filtered, polymorphic DArTseq® markers of which 3,290 were placed on the genetic map. Among the mapped markers, 1,631 were located on the B genome, 1,210 on the A genome and only 449 on the D genome. Their chromosomal distribution can be found in S1 Fig. The total map length covered 5,880 cM. The average distance between markers was 1.79 cM, 1.22 cM and 5.77 cM in the A, B and D genome, respectively. Neither A nor B genome had gaps larger than 50 cM. However, the largest gap on chromosomes 3D, 4D and 5D were 60.42 cM, 74.21 cM and 64.55 cM, respectively. In summary, D genome presented the largest and the highest number of gaps among the genomes.

Analysis of population structure

To analyse the genetic diversity within the collection, the relatedness of the cultivars was firstly investigated with the Bayesian approach implemented in Structure software [32]. Evidence of significant population structure was provided by the analysis based on the method of [33]. The maximum ΔK value occurred at K = 2, so two subpopulations (referred as Sp1 and Sp2) were identified. The Q matrix (membership probability estimates) was extracted from Structure runs and each cultivar was assigned to one of the two subpopulations based on a membership probability >0.51. The Sp1 contains the majority of the cultivars (79 cultivars), while Sp2 consisted of the remaining 14 cultivars. The separation of the two subpopulations reflects the breeding history of the cultivars. Since the algorithms in the Structure software assume independent loci, measured on randomly sampled unrelated individuals, another Bayesian clustering was carried out with a reduced marker set. The analysis of these 300 unlinked markers also led to a very similar result, two subpopulations were identified. PCoA was used as an alternative way of analysing and visualising population structure. The first two principal coordinates explained 15.2% and 11.2% of the molecular variance, and the separation into two subpopulations was confirmed by this independent analysis (Fig 1). The DArTseq® marker-based phylogenetic structure of the wheat collection with the two subpopulations, Sp1 and Sp2 is presented in S2 Fig.

Download:

Fig 1. Principal coordinate analysis of 93 winter wheat genotypes based on Jaccard similarity index.

PCo 1 and PCo 2 are the first and second principal coordinates, respectively, and the numbers in parentheses refer to the proportion of variance explained by the principal coordinates.

https://doi.org/10.1371/journal.pone.0189265.g001

The distribution of molecular variation among and within the two subpopulations was estimated by AMOVA, which revealed 23% (p≤0.001) of total genetic variation partitioned among subpopulations, whereas 77% of the variation was maintained within subpopulations. Additionally, the subpopulations were compared on the basis of the phenotypic data collected and significant differences were found in five out of six (3 year × 2 N level) environments with respect to GPC, NUpE, NUp_grain and NUp_full, and in 4 cases with respect to GNACE, HI, GN, HD, NUE, GY (S2 Table).

Linkage disequilibrium

LD was calculated for the entire population by pairwise marker r² for each chromosome. The number of intra-chromosomal pairs, the number of significant marker pairs, critical r², mean r²-values and the distributions of LDs are detailed in Table 1. Furthermore, LD analysis revealed differences between chromosomes (S3 Table). In the entire population 118,398 (35.2%) intra-chromosomal pairs showed a significant level of LD (Table 1). Analysis of the LD in the different wheat genomes revealed, that the highest number of the significant pairs and also the pairs that are in perfect LD (i.e., where r² = 1 and D’ = 1) was found in B genome. Only 10,543 pairs showed significant LD in the D genome (compared to 29,405 in the A and 78,450 in the B genome) because of the lower marker density. On the other hand, the rates of significant and linked marker pairs were much higher on the D genome than on the A or B genome. The comparison of the mean r²-values showed that the D genome had a higher mean r²-value (0.221) with respect to the B (0.102) and A (0.062) genome. The highest mean r²-values occurred on 2D and 1B chromosomes, while the lowest ones on 3A, 7A, 3B and 7B chromosomes (S3 Table). The critical r²-value was quite similar in the three genomes, ranging from 0.2137 to 0.4371, with the maximum and minimum values on chromosomes 1B and 4D, respectively. The LD decay in the three genomes, including the Loess curve, is illustrated on Fig 2. The LD decay, estimated as the intercept of the Loess curve at the line of critical r²-value, was 9 cM for the whole genome and 9 cM and 30.5 cM for the B and D genomes, respectively. In the A genome the Loess curve did not intercept the critical line. The marker pairs in total LD had a mean distance of 8.44 cM in the entire population.

Download:

Fig 2. Intra-chromosomal LD (r²) decay of marker pairs in the whole genome and in the three wheat genomes as a function of genetic distance (cM).

Horizontal line indicates the 95% percentile of the distribution of unlinked r², which gives the critical r²-value. Second-degree LOESS fitting curve illustrates the LD decay (grey line).

https://doi.org/10.1371/journal.pone.0189265.g002

Download:

Table 1. Overview of intra-chromosomal LD in the whole genome and in the three genomes in the entire population.

https://doi.org/10.1371/journal.pone.0189265.t001

Characterisation of the phenotypic traits

Phenotypic variation: The phenotypic performance of the entire population and the subpopulations across the six environments for the 15 investigated traits are shown in S2A Table. Large phenotypic differences were observed in the population for all traits. Grain yield ranged from 2.33 t ha^-1 (in 2012-2013 at N0) to 6.65 t ha^-1 (in 2013-2014 at N120). Strong environmental effect was observed, causing remarkable variation of phenotypic traits between seasons. For most of the traits greater differences have been found between seasons than between N treatments. However, N fertilisation had significant effect on most of the studied traits (S2A Table). The significant difference of TGW, NUtE, NHI and HI varied between seasons, while N fertilisation did not have significant effect on HD. The largest differences between N0 and N120 were detected for GY, NUp_grain, and NUp_full (S2A Table). Other traits (except HD) also showed significant but moderate changes in response to N level. For example, HI changed only by 10%, while NUp_grain changed by 36% from the 3 years’ average. The values of NUE, NUpE and GNACE greatly varied between seasons, probably because these traits strongly depend on the theoretically available N in the soil, which was 21, 494 and 78 kgha^-1 in three successive cropping seasons (2012-2013, 2013-2014 and 2014-2015). These results indicate that the environment (including the effect of different N_mineral) had greater influence on the phenotypic variability than the 120 kgha^-1 N fertiliser applied in each year. The diversity of NUE in each environment are shown in S4 Fig, which indicate that genetic potential indeed exists to improve NUE in wheat in all environment. Investigation of yield components revealed that N fertilisation increased SN and GN in all cropping seasons, while TGW decreased in the season of 2013-2014 due to N fertilisation. Additionally, the lowest GPC values were measured in 2013 and population means was almost the same in 2014 and 2015. Significant differences between the two subpopulations were observed for all of the traits (except PH) in at least one environment (S2B Table), but none of the traits showed significant difference in all environment. Consequently, the phenotypic difference between the two subpopulations was highly influenced by environmental effects. In the last season (2014–2015), when the plant nutrient availability was moderate, Sp2 significantly differed from Sp1 for most of the investigated traits.

Analysis of variance components: The ANOVA results of the different traits are presented in S4 Table. The highest proportion of the variance was explained by environmental effect for most of the traits except for TGW and HI. In their cases the genotypic factor was more relevant. These results also indicated that the environment (including the effect of different N_mineral) was the most important factor in explaining the overall phenotypic variance, however, the input level and the genotype effect also had significant role in it. The genotype effect explained the second highest proportion of the phenotypic variance for most of the traits (9 traits). Genotype effect was highly significant for 14 traits, but not for NUpE and GNACE. Separate analysis of the years revealed that genotype had no significant effect on NUpE and GNACE in 2012-2013, when the lowest level of N_mineral was observed. The input level effect was significant for all traits except for TGW and HD (S4 Table); moreover, it explained the second highest proportion of the phenotypic variance for NUE, GNACE, NUpE, NUp_full and NUp_grain. Among the two-way interactions, genotype × environment interaction explained the highest proportion of the variance for most of the traits and it was significant for 11 traits. Additionally, very strong input level × environment interaction (at p>0.001 level) was found for 10 traits and moderate effect for NUtE and SY (at p>0.05), while no significant input level × environment interaction was found for NUp_grain, PH, SN and NUp_full (S4 Table).

Correlations between traits: Correlation analysis was performed separately for all environments (three cropping seasons, two N fertilisation levels) for the most relevant traits, and several significant correlations were found at p<0.01 level (Table 2). NUE was calculated as GY divided by available N_soil; therefore, the correlation analysis of NUE and GY with any other trait in the six environments separately shows the same result, hence only NUE was visualised in the analysis. Higher correlation coefficient was observed between NUE (GY) and NUpE than between NUE (GY) and NUtE in all environments. Additionally, GNACE correlated very strongly with both NUE (GY) and NUpE. GPC showed a negative correlation with NUE (GY) and NUtE, except in the season of 2012-2013 when GPC was not significant with NUE (GY). NHI correlated positively with NUE (GY) and negatively with GPC in most environments. The correlation between NUE (GY) and GN was stronger than the correlation between NUE (GY) and SN, except in the season of 2013-2014. TGW was not correlated with NUE (GY), but TGW showed slightly negative correlation with SN and GN in most of the environments.

Download:

Table 2. Significant (p< 0.01) phenotypic trait correlations with mean values for each genotype for season 2012-2013 (A); 2013-2014 (B); 2014-2015 (C).

https://doi.org/10.1371/journal.pone.0189265.t002

Genome-wide mapping of agronomic and N-related traits

Genome-wide association analysis was performed for 11 investigated traits (GY, NUE, NUpE, NUtE, NUp_full, NUp_grain, GPC, NHI, GNACE, GN, SN) in all environments (three cropping seasons, two N fertilisation levels) and for the response to N fertilisation for 8 selected traits in order to identify genomic regions involved in the response to N fertilisation. Altogether, 183 MTAs were identified for 130 DArTseq® markers. Most chromosomes were involved in determining at least one trait except chromosome 1D and 3D. All the details of the significant MTAs defined above are given in Table 3, while their chromosomal location is presented in Figs 3–7. SN was involved in the highest number of MTAs (17 MTAs) followed by GPC trait (16 MTAs), while for GY_RN and NHI_RN only 2 MTAs were identified. Altogether, the B genome offered the highest number of MTAs (93) followed by the A (75), while on the D genome only 15 MTAs were identified. Considering the homeologous groups, group 2, 5 and 1 contained the highest number of MTAs (34, 33 and 31), while lower number of MTAs were found on groups 3 and 7 (22 and 27). Chromosome group 4 and 6 had the lowest number of significant MTAs (18) among all traits.