Introduction

Variety improvement programs are tasked with capturing heritable genomic response to selection across multiple growing environments and field seasons. While climatic uncertainty is outpacing variety development, the condition of global food, fuel, and fiber insecurity has become more vulnerable (Feynman and Ruzmainkin 2007). In the face of diverse abiotic stresses, when considering genomic selection for variety improvement (GS, Meuwissen et al. 2001), reliable prediction of genotype performance across environmental variabilities has become increasingly critical.

However, selection using single-environment (SE) models becomes unreliable in the presence of genotype-by-environment (G × E) interaction (Burgueño et al. 2012; Crossa et al. 2017), due to the heterogeneity of genetic variance across environments, or imperfect genetic correlation of the same traits across sites/seasons (Crossa et al. 2004). Recently, GS models capable of assessing single population performance across multiple environments (ME) have been proposed (López-Cruz et al. 2015; Crossa et al. 2016; Lado et al. 2016; Montesinos-López et al. 2016; Spindel and McCouch 2016; Cuevas et al. 2018). For example, Cuevas et al. (2018) examined the prediction accuracy of six different GS models with G × E interactions using two maize and two wheat datasets. Though some degree of advantage over the conventional SE counterparts can be identified, such gain can only be observed when the phenotypic correlation between environments was high (i.e., above 0.6), and when the traits of interest had moderate to high heritability (Burgueño et al. 2012; López-Cruz et al. 2015; Cuevas et al. 2016; Monteverde et al. 2018). The negative impact of the G × E to the performance of GS models is evident, even more so when inbred lines or homogeneous growing conditions are unavailable (Resende et al. 2012). For example, in tree genetic improvement programs that primarily use open-pollinated families, the half-sib pedigree structure has eventually prevented partitioning G × E interaction from genetic variance owing to the lack of genetically uniform replications, thus impeding the performance of GS models (Beaulieu et al. 2014; Chen et al. 2018; Gamel El-Dien et al. 2018; Alves et al. 2020; Thistlethwaite et al. 2020). Even to this date, studies dealing with genetically heterogeneous lines and half-sibs progeny without replications usually approach this problem by removing or accounting for the environmental effects, instead of directly estimating G × E in the model. For example, Albrecht et al. (2014) studied both Pedigree-BLUP and Genomic-BLUP for heterogeneous populations in ME prediction by employing cross-validation to assess cross-environment prediction performance. To address this challenge in GS, here we developed a statistical method capable of incorporating G × E effect while evaluating the predictability of genetically different populations across environments.

Genome-wide approaches have played an instrumental role in the discovery of new biological insights underpinning complex trait variation (Frazer et al. 2009; Huang et al. 2010; Wang et al. 2019). Despite the tens of thousands of variant-trait associations cataloged (Buniello et al. 2019), few studies have incorporated knowledge learned from genome-wide association studies (GWAS) in a full GS framework. In Lloyd-Jones et al. (2019), it was demonstrated that the inclusion of GWAS summary statistics could return favorable results for height and BMI for over 500,000 individuals from UK Biobank. Through different approaches, Bian and Holland (2017) reached a similar conclusion by directly employing GWAS-estimated effects from the maize NAM RILs (recombinant inbred lines). Furthermore, MacLeod et al. (2016) also proposed that including additional functional knowledge, such as non-synonymous coding change, promoter regions, and known causal variants, could add to predictability.

Predicted by population genetics models, studies attempting to understand the impact of rare or less common variants on complex traits have shown an inverse relationship between the variant’s effect size and its frequency in the population (Park et al. 2011; Bomba et al. 2017). Empirical results from recent GWAS studies are mostly in agreement- that is, common variants have small effects, and rare variants have large effects (Bloom et al. 2019; Fournier et al. 2019; Wainschtein et al. 2019). Low-frequency and rare variants with small to modest effects that are thought to contribute to the missing heritability of many complex traits (Manolio et al. 2009; Eichler et al. 2010) may often have been overlooked because of the process in array production (Ziegler et al. 2008. Bouwman et al. 2017; Zhang et al 2018). As a consequence of not able to capture these rare but favorable alleles, selection based on the genomic estimated breeding values (GEBVs) could lead to loss of genetic diversity which further reduces the long-term genetic gain and prediction accuracy (Jannink 2010; Eynard et al. 2015; Liu et al. 2015; Doublet et al. 2019; Meuwissen et al. 2020; Vanavermaete et al. 2020). In this study, we proposed a flexible GS framework that incorporates marker information beyond just genotypic values, while extending the capability of conventional ME models. Our study uses examples in winter wheat and Interior spruce populations to demonstrate the advantage of including trait- and population-specific genetic characteristics, such as single nucleotide polymorphism (SNP) allele frequency and strength of association with the target phenotypes. Compared to the existing Gaussian Kernel (GK) that assigns a uniform weight to every SNP, our proposed Weighted Kernel (WK) captured more realistic functional genetic relationship of individuals within and cross environments by differentiating the contribution of SNPs. This capacity to address the trait- and population-specific environmental effects is not limited to the use of clonal genetic material or inbred lines, making modeling G × E in GS feasible for trials that utilize highly heterogeneous genetic resource to examine genetic adaptability across the range of a species or environmental variabilities (Risk et al. 2021).

High-throughput technologies have revolutionized biological and medical research and will continue to explore other omics spaces responsible for trait variation and adaptive responses to environmental variability (Halstead et al. 2021; Hasin et al. 2017; Li et al. 2019; Kim et al. 2016; Westhues et al. 2017). Integrating various omics information has become increasingly crucial for complex trait prediction and disease diagnostics (Tieri et al. 2011; Gomez-Cabrero et al. 2014; Higdon et al. 2015; Huang et al. 2017). The ability of kernel-based approaches to leverage the complementing favorable properties of predictors (Schrag et al. 2018), and the association significance to trait variation across growing conditions, would have the potential to provide consistent predictability across traits and environments.

Materials and methods

Duster x Billings hard red winter wheat Doubled Haploid (DH) population

Developed cooperatively by the Oklahoma Agriculture Experiment Station (OAES) and the USDA-ARS, a total of 242 DH lines derived from the intercross of Duster and Billings winter wheat varieties were used in the study. Traits analyzed include grain yield (GY) in kilograms per hectare (kg/ha), sodium dodecyl sulfate sedimentation value (Lorenzo and Kronstad 1987) adjusted for flour protein content (SDS), kernel weight measured by the single kernel characterization system (Perten Instruments, Segeltorp, Sweden) (SKCSKW), and wheat protein on a 12% moisture basis (WHTPRO). Each of these traits was evaluated in three harvest years with varied rainfall (i.e., 19.8, 41.3, and 45.2 cm for 2014, 2015, and 2016, respectively) in Stillwater, OK, USA (36.12 N, 97.09 W), representing three different environments. The average of two field replicates of each DH line per year was used in the analysis.

Genotypes were derived using genotype-by-sequencing (GBS) technology, and 16,265 SNP markers were selected after filtering markers with >50% missing ratio. Missing genotypes of markers were imputed by the marker mean (Nazzicari et al. 2016). Although the genetic profile is the same across three years, the effects of environment on phenotype can vary in different years. Hence, we estimated the single- (\(h_{SE}^2\)) and multi-environment narrow-sense heritability (\(h_{ME}^2\)) from GBLUP using single-year phenotypes and the average of 3-year phenotypes for each trait, respectively. The estimation was implemented in the R package BGLR (Pérez and de los Campos 2014).

Interior spruce population

The Interior spruce breeding population includes a total of 1126 38-year-old trees growing over three sites in British Columbia Canada, i.e., Prince George Tree Improvement Station (PGTIS), Aleza Lake, and Quesnel. Each site has 25 families with various sample sizes (Gamal El-Dien et al. 2015). To reduce the impact of unbalanced sample size on modeling, we randomized each family with respect to its minimum sample size (range from 6 to 16), resulting in 340 trees per site. Phenotypes used for prediction are height in m (HT) and diameter at breast height in cm (DBH) as growth traits; and two wood quality attributes, resistance to drilling (WDres) and wood density in kg/m3 using X-ray densitometry (WDX-ray). The genotypic information regarding GBS SNP can also be found in Gamal El-Dien et al. (2015). The single- (\(h_{SE}^2\)) and multi-environment narrow-sense heritability (\(h_{ME}^2\)) of each trait were estimated from GBLUP by Gamal El-Dien et al. (2015) and were reported in Table 1.

Table 1 Observed phenotypic correlation, single and multi-environment heritability estimates from genomic best linear unbiased prediction (GBLUP) for Duster × Billings winter wheat and the Interior spruce populations.

Statistical models

Single-environment (SE) model

The kernel matrix in GS models is normally used to represent the genetic correlation between individuals that can be derived from either pedigree information or molecular marker data. To account for the relatedness in genetic background, the SE model implemented here was an extension of model 1 in Cuevas et al. (2017), by adding a random background genetic effect, bj. The SE model is used for comparison with multi-environment (ME) model we proposed in the next section and the SE model is expressed as follows:

$${{{\mathbf{y}}}}_j = {{{\mathbf{1}}}}_{n_j}\mu _j + {{{\mathbf{g}}}}_j + {{{\mathbf{b}}}}_j + {{{\mathbf{e}}}}_j$$
(1)

where yj is the response vector with length nj, nj is the total number of phenotypic observations in the jth environment, j = 1, …, m, m is the number of environments; 1nj is a vector of ones with length nj, μj is the overall phenotypic mean of individuals in the jth environment; gj is the random genetic effect of individuals in the jth environment, and we assume \({{{\mathbf{g}}}}_j\sim N( {{{{\mathbf{0}}}},\,\sigma _{g_j}^2{{{\mathbf{K}}}}_j})\) where \(\sigma _{g_j}^2\)is the genetic variance of individuals in the jth environment, Kj (size nj × nj) is the kernel matrix used to describe genetic similarity between individuals in the jth environment; bj is the random background genetic effect of the jth environment that is not explained by genetic markers in the gj, and we assume \({{{\mathbf{b}}}}_j\sim N( {{{{\mathbf{0}}}},\,\sigma _{b_j}^2{{{\mathbf{B}}}}_j})\) where \(\sigma _{b_j}^2\) is the background genetic variance of individuals in the jth environment, Bj (size nj × nj) is a matrix representing the background genetic relationship of two individuals in the jth environment; ej is the random error term of the jth environment, and we assume \({{{\mathbf{e}}}}_j\sim N( {{{{\mathbf{0}}}},\,\sigma _{e_j}^2{\rm I}_{n_j}})\) where \(\sigma _{e_j}^2\) is the residual variance of the jth environment and Inj is the identity matrix with size nj; gj, bj and ej are assumed to be independent.

Multi-environment (ME) model

To fully capture G × E interaction, we proposed a generalization of model 3 in Cuevas et al. (2017). The generalized ME model is capable of predicting different individuals across different environments and the model is expressed as follows:

$${{{\mathbf{y}}}} = \mu + {{{\mathbf{g}}}} + {{{\mathbf{b}}}} + {{{\mathbf{e}}}}$$
(2)

where y = (y1, y2, …, ym)T; \(\mu = \left( {{{{\mathbf{1}}}}_{n_1}\mu _1,\,{{{\mathbf{1}}}}_{n_2}\mu _2, \ldots ,\,{{{\mathbf{1}}}}_{n_m}\mu _m} \right)^T\); g = (g1, g2, …, gm)T and g ~ N(0, ∑g); b = (b1, b2, …, bm)T and b ~ N(0, ∑b); e = (e1, e2, …, em)T and e ~ N(0, ∑e); g, b, and e are assumed to be independent; ym, μm, gm, bm and em are defined the same as in SE model.

In general, the genetic covariance matrix is

$$\mathop {\sum}\nolimits_g { = \left[ {\begin{array}{*{20}{c}} {\sigma _{g_1}^2{{{\mathbf{K}}}}_1} & {\sigma _{g_{12}}{{{\mathbf{K}}}}_{12}} & \cdots & {\sigma _{g_{1m}}{{{\mathbf{K}}}}_{1m}} \\ {\sigma _{g_{21}}{{{\mathbf{K}}}}_{21}} & {\sigma _{g_2}^2{{{\mathbf{K}}}}_2} & \cdots & {\sigma _{g_{2m}}{{{\mathbf{K}}}}_{2m}} \\ \vdots & \vdots & \ddots & \vdots \\ {\sigma _{g_{m1}}{{{\mathbf{K}}}}_{m1}} & {\sigma _{g_{m2}}{{{\mathbf{K}}}}_{m2}} & \cdots & {\sigma _{g_m}^2{{{\mathbf{K}}}}_m} \end{array}} \right]}$$

where \(\sigma _{g_m}^2\) and Km are defined the same as in SE model; σg1m is the genetic covariance of individuals in the 1st environment and mth environment; K1m (size n1 × nm) is the kernel matrix representing genetic relationship between individuals explained by genetic markers from the 1st and mth environment.

The background genetic covariance matrix is

$$\mathop {\sum}\nolimits_b { = \left[ {\begin{array}{*{20}{c}} {\sigma _{b_1}^2{{{\mathbf{B}}}}_1} & {\sigma _{b_{12}}{{{\mathbf{B}}}}_{12}} & \cdots & {\sigma _{b_{1m}}{{{\mathbf{B}}}}_{1m}} \\ {\sigma _{b_{21}}{{{\mathbf{B}}}}_{21}} & {\sigma _{b_2}^2{{{\mathbf{B}}}}_2} & \cdots & {\sigma _{b_{2m}}{{{\mathbf{B}}}}_{2m}} \\ \vdots & \vdots & \ddots & \vdots \\ {\sigma _{b_{m1}}{{{\mathbf{B}}}}_{m1}} & {\sigma _{b_{m2}}{{{\mathbf{B}}}}_{m2}} & \cdots & {\sigma _{b_m}^2{{{\mathbf{B}}}}_m} \end{array}} \right]}$$

where \(\sigma _{b_m}^2\) and Bm are defined the same as in SE model; \(\sigma _{b_{1m}}\) is the background genetic covariance of individuals in the 1st and the mth environment; B1m (size n1 × nm) is a matrix constructed to present the background relationship between two individuals that is not explained by genetic markers in the 1st and mth environment. With this model, background genetic relationship can be appropriately incorporated into both SE and ME models. For instance, when two individuals from the same family in the Interior spruce population, a half-sib relatedness of 0.25 was assigned to indicate their shared genetic background. While when the informative background relationship is unavailable, an identity matrix can be used for both background genetic variance and covariance matrices (Crossa et al. 2017), which was the case for Oklahoma wheat DH population demonstrated in this study.

The covariance matrix of the residuals is

$$\mathop {\sum}\nolimits_{e} { = \left[ {\begin{array}{*{20}{c}} {\sigma _{e_1}^2I_{n_1}} & {{{\mathbf{0}}}} & \cdots & {{{\mathbf{0}}}} \\ {{{\mathbf{0}}}} & {\sigma _{e_2}^2I_{n_2}} & \cdots & {{{\mathbf{0}}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{{\mathbf{0}}}} & {{{\mathbf{0}}}} & \cdots & {\sigma _{e_m}^2I_{n_m}} \end{array}} \right]}$$

where \(\sigma _{e_m}^2\) and Inm are defined the same as in SE model.

Gaussian kernel (GK) and Weighted kernel (WK)

Here in this study, we compared the model prediction performance with two different kernels, Gaussian kernel (GK) used in de Los Campos et al. (2010) and our proposed weighted kernel (WK). The GK in (3) transforms genetic distance into genetic correlation between individuals.

$$K_G\left( {x_i,\,x_k} \right) = \exp \left( { - h\,d_{ik}^2/s} \right)$$
(3)

where KG(.,.) is a positive definite function evaluated by marker genotypes; xi, xk are vectors of marker genotypes for the ith and kth individuals respectively, I, k = 1, …, nj, xi = (xi1, …, xil, …, xip)T and xk = (xk1, …, xkl, …, xkp)T, l = 1, …, p, p is the total number of markers; the allelic states of xil are coded as 0, 1, 2 for AA, Aa, and aa respectively; h is a positive bandwidth parameter that controls the rate of decay of the genetic correlation between two individuals. To determine the optimal value of the parameter, either a grid search method from cross-validation procedure or an empirical Bayesian approach (Pérez-Elizalde et al. 2015) can be applied. In this study, h = 1 was used for simplicity. \(d_{ik}^2\) is squared Euclidean distance between two individuals i and k explained by marker genotypes, and s is the largest value of all \(d_{ik}^2\).

Motivated by Wu et al. (2011) and Yan et al. (2014), we proposed to model additional information such as the frequency and the effects of the variants by a WK method in Eq. (4).

$$K_W\left( {x_i,\,x_k} \right) = \exp \left( { - h\,d_{ik}^{ \ast 2}/s^ \ast } \right)$$
(4)

where \(d_{ik}^{^\ast 2} = \mathop {\sum}\nolimits_{l = 1}^p {w_l\left( {x_{il} - x_{kl}} \right)^2}\), s* is the sample maximum of \(d_{ik}^{ \ast 2}\), and the weight wl assigned to the lth marker is based on its minor allele frequency (MAF) and p-values from GWAS model with G×E interaction. The detailed formula of wl is the following

$$w_l = \left( {c_1 \ast {{{\mathrm{Beta}}}}\left( {{{{\mathrm{MAF}}}}_l;\alpha ,\,\beta } \right) + \frac{1}{{0.1 + {{{\mathrm{pvalue}}}}\,1_l}} + \frac{1}{{0.1 + {{{\mathrm{pvalue}}}}\,2_l}}} \right)^2$$
(5)

where c1 is a constant; MAFl is the minor allele frequency of the lth marker; α and β are the parameters of Beta distribution density function; pvalue1l and pvalue2l are p-values of main genotypic effect and G × E interaction effect respectively from GWAS for the lth marker, these p-values are adjusted by false discovery rate at 0.05 to account for the multiple hypothesis testing problem (Benjamini and Hochberg 1995; Storey and Tibshirani 2003).

To account for the potential effects of low-frequency variants while incorporating the test statistics from GWAS, we proposed the following formula (6) to determine the value of c1.

$$c_1 = \frac{{\max \left( {\frac{1}{{0.1\, + \,{{{\mathbf{pvalue1}}}}}},\,\frac{1}{{0.1\, + \,{{{\mathbf{pvalue2}}}}}}} \right)}}{{{{{\mathrm{Beta}}}}\left( {\min \left( {{{{\mathbf{MAF}}}}} \right);\alpha ,\,\beta } \right)}}$$
(6)

where pvalue1 = (pvalue1l, … ,pvalue1l, … ,pvalue1p)T, pvalue2 = (pvalue2l, … ,pvalue2l, … ,pvalue2p)T, and MAF = (MAF1, … ,MAFl, … ,MAFP)T.

As for the setting of α and β in Eqs. (5) and (6), Wu et al. (2011) and Yan et al. (2014) suggested to set α = 1 and β = 25 as a general way to control the impact of rare genetic variants in their GWAS research. In this study, we proposed to fix α = 1 and explore the impact of β on the performance of prediction as such beta density decreases as MAF increases (see Fig. S1 for details). As expected, when both MAF and p value are very small, the value of c1was found to be determined approximately by β, i.e., \(c_1 \approx \frac{{10}}{\beta }\). As a result, β cannot go to infinity to shrink c1 toward zero. To further document the impact of β on prediction performance of model with WK, five values were inspected, i.e., β = 12, 25, 50, 100, and 200. Thus, \(c_1 \approx \frac{{10}}{\beta } \approx\) 0.83, 0.40, 0.20, 0.10, and 0.05.

In addition, we compared the prediction performance of the model using the proposed WK with the model that implements the WK by MAF or p-value alone, denoted as WKMAF and WKPvalue respectively. As comparison, we denoted the WK contributed by both MAF and P value as WKMAF_Pvalue, and its weight wl is from Eq. (5). The weight in WKMAF is calculated by Eq. (7).

$$w_l^{{{{\mathrm{MAF}}}}} = \left( {c_1 \ast {{{\mathrm{Beta}}}}\left( {{{{\mathrm{MAF}}}}_l;\alpha ,\,\beta } \right)} \right)^2$$
(7)

where c1, α, and β are defined the same as above.

Similarly, the weight in WKPvalue is formulated as following

$$w_l^{{{{\mathrm{P}}}}\,{{{\mathrm{value}}}}} = \left( {\frac{1}{{0.1\, + \,{{{\mathrm{pvalue}}}}1_l}} + \frac{1}{{0.1\, + \,{{{\mathrm{pvalue}}}}2_l}}} \right)^2$$
(8)

Model implementation

SE model

Analysis was conducted in R (R Core Team 2020). The SE model was implemented using R package BGLR with 12,000 iterations and the first 6000 as burn-in for both wheat and spruce data sets (Pérez and de los Campos 2014).

ME model

Duster × Billings hard red winter wheat DH population

In the cases where the same genetic line was evaluated in multiple environments, such as Oklahoma winter wheat DH breeding populations, the ME model was fitted using R package MTM (de los Campos and Grüneberg 2016) with 20,000 iterations and the first 10,000 samples as burn-in.

Interior spruce population

Since no clonal or inbred line material was available for Interior spruce, we expanded the ME model for such scenario by a Bayesian approach to estimate all parameters in the ME model. The detailed implementation of Bayesian approach for ME model can be seen in the Supplementary Material. The model was implemented in R with 100,000 iterations and the first 50,000 as burn-ins. The convergences of Markov Chains for all models were assessed by visualizing the trace plots and running convergence diagnosis using R package CODA (Plummer et al. 2006).

Prediction accuracy evaluation

Both Pearson’s correlation coefficient (PCOR) between observed and predicted phenotypes, and its mean squared error (MSE) were used to assess the model prediction accuracy for each environment. To assess the model prediction performance, we split the data into a training set (TRN) and a testing set (TST). We applied the estimation of model parameters from TRN to TST to get predicted phenotypes. For SE model, we randomly selected 70% of single-environment data as TRN (n70 × 1) and the remaining 30% as TST (n30 × 1). For ME models, we followed cross-validation 2 (CV2) procedure in López-Cruz et al. (2015) to assign individuals to TRN and TST. CV2 mimics the practical prediction scenario related to plant breeders where individual plants are only tested in some environments (Burgueño et al. 2012). We randomly selected 70% of multi-environment data as TRN (n70 × m) and the rest 30% as TST (n30 × m). n70 and n30 stand for 70% and 30% of data, respectively. The random partition was repeated 50 times for SE and ME models respectively to generate an average prediction performance of each model. For WK, we selected the value that produced the highest prediction accuracy in TRN. The calculations of MAF and p values were based on each TRN-TST partition.

Results

For each dataset, we present the following: 1) MAF distribution; 2) summary of phenotypes and the estimated heritability, and 3) prediction performance of the proposed models in the single- and multi-environment settings. PCORs were used to illustrate the prediction performance. Additionally, the results of MSE were found to be consistent with PCORs, i.e., a lower MSE tends to have a higher PCOR. The detailed evaluation of prediction performance can be found in Tables S1, S2.

Minor allele frequency distribution

Duster × Billings hard red winter wheat DH population

The distribution of common and rare SNP allele frequency for Duster × Billings DH population is shown in Fig. 1A, The wheat DH population has ~64% of the SNPs with MAF < 0.2, about 59% <0.1 and 19% between 0.4 and 0.5. Further demonstrated in Fig. S1, the density of Beta(1: β) will converge to zero with large value of Beta random variables. Additionally, regarding to the context of MAF as the value of Beta random variable, 0.2 was considered as the large value.

Fig. 1: The distributions of minor allele frequency.
figure 1

The distributions of minor allele frequency from genomic data of red hard winter wheat doubled haploid (DH) (A), and Interior spruce populations: Prince George Tree Improvement Station (PGTIS) (B), Aleza Lake (C), and Quesnel (D).

Interior spruce population

The distribution of MAF was found with a greater degree of rare alleles in all three sites of Interior spruce. There were about 86%, 88%, and 91% of SNPs with MAF < 0.2, for, PGTIS, Aleza Lake, and Quesnel, respectively (Fig. 1B–D).

Summary of phenotypes and heritability estimates

Duster × Billings hard red winter wheat DH population

Boxplots of the four traits (GY, SDS, SKCSKW, and WHTPRO) across 3 years (2014–2016) for Oklahoma winter wheat are shown in Fig. 2A. Both trait distributions and phenotypic variation are quite different among the three years for all four traits. For observed phenotypic correlation between years, SDS exhibited the highest average phenotypic correlation (0.52–0.72, average = 0.59), while WHTPRO was the lowest (0.13–0.37, average = 0.22) (Table 1). Heritability for each year (\(h_{SE}^2\)), as well as cross-year estimates (\(h_{ME}^2\)), are also listed in Table 1. GY showed the highest and the most stable single-year heritability among the studied four traits (\(h_{SE}^2\) = 0.60–0.63). Conversely, the other three traits had much lower and varying heritability estimates. For multi-year heritability estimates (2014–2016), only GY and SDS have higher heritability than their single-year estimates (i.e., \(h_{SE}^2\) = 0.60–0.63, \(h_{ME}^2\) = 0.66 for GY;\(h_{SE}^2\) = 0.33–0.46, \(h_{ME}^2\) = 0.46 for SDS).

Fig. 2: Boxplots of phenotypes.
figure 2

A Winter wheat, grain yield (GY), SDS sedimentation value (SDS), kernel weight (SKCSKW), and wheat protein (WHTPRO) in each environment (Environment 1/2/3 = Year 2014/2015/2016); B Interior spruce, height (HT), diameter at breast height (DBH), resistance to drilling (WDres), and wood density in kg/m3 using X-ray densitometry (WDX-ray) in each environment (Environment 1/2/3 = PGITS/Aleza Lake/Quesnel, PGTIS, Prince George Tree Improvement Station).

Interior spruce population

Growth phenotypes, HT and DBH, varied among the three Interior spruce sites, while traits related to wood density (e.g., WDres and WDX-ray), Aleza Lake and Quesnel showed similarity in distribution and ranges (Fig. 2B). However, unlike the winter wheat population, the observed pairwise phenotypic correlations were relatively low for all studied traits (Table 1). Overall, WDX-ray had the highest average phenotypic correlation at 13% over the three sites (5–19%), and the lowest was found in the correlation with DBH (1–8%, average at 4%). The single-site heritability ranged from moderate to high (i.e., \(h_{SE}^2\) ranged from 0.26 to 0.56). Generally, traits measured in Quesnel showed higher heritability than the other two sites. The overall heritability estimated across the three sites was reduced to 0.07–0.20 (\(h_{ME}^2\), Table 1), with the highest in HT and the lowest in DBH.

Single and multi-environment predictions

For the performance of WK, we present the results with β that produced the highest prediction accuracy of using WKMAF_Pvalue for both data, i.e., β = 12 for wheat (Fig. S2) and β = 200 for spruce (Fig. S3).

Duster × Billings hard red winter wheat DH population

The average prediction accuracies of SE and ME models are shown in Fig. 3 for the DH hard red winter wheat population. In general, significant improvement in prediction accuracy can be seen with modeling across multiple environments (Fig. 3). For example, the prediction accuracy of GY using single year Gaussian kernel (SE_GK, in Fig. 3) ranged from 0.38 (2016) to 0.55 (2014). With the same Gaussian kernel, ME_GK trained the model with data from all years and generated 6–10% improvement in GY prediction accuracy. The gain from ME models can be as significant as a four-time increase (0.1 in SE_GK and 0.38 for ME_GK, for the SKCSKW in Fig. 3); substantial increase in ME_GK prediction accuracy was also found for SDS with an average increase of 35% over the SE_GK (Fig. 3). SDS also showed the highest gain of the estimated genetic variance from ME_GK vs. SE_GK (Table S3).

Fig. 3: Prediction performance of genomic selection models for winter wheat.
figure 3

Average Pearson’s correlation coefficients were collected over 50 replications of CV2 scheme; SE_GK, single-environment model with Gaussian kernel; ME_GK, multi-environment model with Gaussian kernel; ME_WKMAF, multi-environment model with weighted kernel (WK) by minor allele frequency (MAF); ME_WKPvalue, multi-environment model with WK by genome-wide association study p value; ME_WKMAF_Pvalue (with β = 12), multi-environment model with WK by both MAF and p value; GY, grain yield; SDS, SDS sedimentation value;, SKCSKW, kernel weight; WHTPRO, wheat protein.

Weighting with MAF and the association signal further improved the prediction performance by WK for ME models, with the exceptions of the reduced accuracy found in the ME_WKMAF model for GY and WHTPRO (Fig. 3). Among all three methods of WK, WKMAF_Pvalue performed similarly to WKPvalue, and both significantly outperformed WKMAF for all studied traits. Compared with the ME_GK models, the increase of prediction accuracy from ME_WKMAF_Pvalue models ranged from 1 to 3%, 4 to 5%, and 6 to 7% for SDS, SKCSKW, and WHTPRO, respectively (Fig. 3). For GY, the performance of ME_WKMAF_Pvalue and ME_GK was found similar in 2014 and 2015, but ME_WKMAF_Pvalue produced significantly higher prediction accuracy for 2016 (i.e., 31% higher than ME_GK, Fig. 3).

Interior spruce population

Different from what was observed in the wheat dataset, the advantage of ME_GK over SE_GK was not as evident for Interior spruce, which might be a result of the large amount of variance that cannot be accounted for in the multi-environment models (Table S4). The highest prediction accuracy for SE_GK was found in Quesnel for HT (Fig. 4); similar performance in HT was found for ME_GK as well. Additionally, wood quality traits showed consistent prediction accuracy for all Gaussian models, ranging from 0.19 to 0.31 for WDres and 0.24 to 0.28 for WDX-ray. In Table 1, DBH had the lowest multi-environment heritability estimates, which is further reflected by the average of 5% reduction in ME_GK prediction performance (Fig. 4).

Fig. 4: Prediction performance of genomic selection models for the interior spruce population.
figure 4

Average Pearson’s correlation coefficients were collected over 50 replications of CV2 scheme; SE_GK, single-environment model with Gaussian kernel; ME_GK, multi-environment model with Gaussian kernel; ME_WKMAF, multi-environment model with weighted kernel (WK) by minor allele frequency (MAF); ME_WKPvalue, multi-environment model with WK by genome-wide association study p value; ME_WKMAF_Pvalue (with β = 200), multi-environment model with WK by both MAF and p value; HT, height; DBH, diameter at breast height; WDres, resistance to drilling; WDX-ray, wood density in kg/m3 using X-ray densitometry; PGTIS, Prince George Tree Improvement Station.

In general, modeling with MAF and specific-trait association improved predictability, even when predicting phenotypes for genetically heterogeneous material across environments. Among the WK implementations, WKMAF_Pvalue outperformed the other WK models almost in all traits, and the WKPvalue model showed slight advantage for WDres prediction in the PGTIS site (Fig. 4). HT was the most predictable phenotype, with a moderate prediction accuracy in Quesnel, increased from 0.48 of SE_GK and 0.47 of ME_GK to 0.56 ME_WKMAF_Pvalue. The greatest gain by using the WK models was, however, found in DBH in Aleza Lake; the benefit of using WK models for DBH was, however, diminished in PGTIS (Fig. 4). The benefit of including genomics signals was not significant for WDX-ray. Due to the relatively indifferent prediction performance for WDX-ray across all models, the benefit of incorporating MAF and association signal was not observed. We suspect that the SNP predictors generated for Interior spruce are in weak LD with the underlying genes and QTLs.

Discussion

GS performance can be influenced by many interrelated factors, including trait genetic architecture, heritability, and the relatedness among individuals between training and testing populations (Crossa et al. 2017). When ME prediction across sites or growing seasons was conducted with a more defined set of genetic diversity like populations derived from controlled crosses, the advantage of incorporating available genetic correlation between environments was evident. As shown in Fig. 3, our ME_GK models using conventional GK demonstrated a consistent improvement over the SE model, showing a 4–38% gain in predictability for Oklahoma winter wheat DH population. The greatest improvement for this population was observed in SDS for 2015, the trait that also showed the most consistent cross-year prediction in Hu et al. (2019). Our results demonstrated that, even in the presence of identifiable environmental variability (\(h_{ME}^2\) ranges from 0.33 to 0.66, Tables 1 and S3), the benefit of employing ME prediction can be anticipated in this case, because of the model capacity to leverage genotype’s environment-specific effect.

Shown in Fig. 4, the ME_GK model, on the other hand, exhibited a slightly unfavorable performance for Interior spruce, except for WDX-ray whose accuracies were found indifferent with the SE model. Compared to our results, the prediction analysis using the same half-sib families in Gamel El-Dien et al. (2015) presented a much-reduced GS accuracy with cross-site validations, even when the prediction accuracy was calculated by correlating the breeding values with the GEBVs. The non-additive effect of these traits was found significant in Gamel El-Dien et al. (2018), with WDX-ray being the only exception. In this study, the multi-site heritability estimates (\(h_{ME}^2\)) ranged from 0.07 to 0.20 (Table 1); this small amount of additive genetic variance would be one of the leading attributes that hinder the performance of the ME_GK model.

The conventional GK using genetic markers is only able to capture the overall genetic similarity between individuals. Although the bandwidth parameter in GK can adjust the distribution of genetic similarity (Pérez-Elizalde et al. 2015), such tuning is uniform to all genetic markers. For Interior spruce, the genetic marker data revealed a much lower relatedness of these trees within each site, suggesting the actual sibling relatedness within families rarely met the half-sib relatedness assumption. In the case where various degree of genetic relatedness between individuals exists within the same family across sites, the strength of incorporating genetic correlation in the ME models using only the Gaussian kernel might be confounded by the heterogeneous genetic background, resulting in an accuracy slightly lower to the SE models (Fig. 4).

The bandwidth parameter tuning in conventional kernel models could potentially create a better mapping between the overall genetic distance among individuals to the phenotypic variation (Pérez-Elizalde et al. 2015). However, it does not reflect the trait’s genomic functional space, leaving important biological insights, such as allele frequencies and the underlying genetic architecture, out of the genome-to-phenome mapping in the GK models. GWAS studies have been a powerful tool to assessing the association between genetic variants and trait variations. The genetic variants identified indicate their functional roles or a close linkage with important genetic determinants for the traits of interest (Wu et al. 2011; Yan et al. 2014; Lin et al. 2016). Several studies have suggested prioritizing GWAS variants when creating the genomic relationship matrix could improve SE predictability of unrelated individuals (de los Campos et al. 2013; Ober et al. 2015; Morgante et al. 2018).

Despite the increases in GWAS statistical power afforded in large international consortia (Willer et al. 2013; Wood et al. 2014; Liu et al. 2015; Astle et al. 2016; Bomda et al. 2017), GWAS still only accounts for a fraction of heritability for most complex traits, a well-known phenomenon called “missing heritability” (Manolio et al. 2009). Genetic variants outside of the reach of the GWAS statistical power are considered to also contribute to the missing heritability (Speed et al. 2012), including common variants with weak effects, low-frequency (MAF 1–5%), and rare variants (MAF < 1%) of small to modest effects, or their combination (Agarwala et al. 2013). When the true causative genetic variants remain unknown, GS has been proven more effective than classic marker-assisted selection. This is because GS employs all available markers as a “compete modeling” methodology for estimating trait performance (Jia 2017). Compared to phenotypic selection, GS could lead to the acceleration of annual inbreeding rate and the loss of genetic diversity as it encourages selecting individuals with high GEBV early in variety improvement programs and those closely related to the training populations (Bassi et al. 2016; Doekes et al. 2018; Forutan et al. 2018). In order to provide stable predictability across populations, GS might also contribute to rapid fixation of genomic regions where consistent marker effects across populations can be identified (Clark et al. 2011; Pszczola et al. 2012; Allier et al. 2019). When the breeding decision is made to optimize short-term genetic gain with conventional GS, rare but favorable alleles could be overlooked. That will essentially reduce selection accuracy and genetic gain in the long term. Demonstrated in simulation studies, up-weighting such alleles would provide 8–30.8% greater long-term gain than that of un-weighted prediction methods (Jannink 2010; Liu et al. 2015), further advocating the WK approaches proposed in this study for the long-term reliability of GS (Rutkoski et al. 2015; Zhang et al. 2018; Ramasubramanian and Beavis 2021).

Here, we presented a flexible GS framework capable of incorporating important genetic attributes to breeding populations and trait variability while addressing the shortcomings of conventional GS models. Shown in Figs. 3, 4, the advantage of incorporating trait- and population-specific genetic characteristics, like p-values of GWAS and MAF, was evident. The MAF component in our WK models aided in preserving rare but favorable variants, which are usually underpowered in GWAS, and in some cases, not even included in the analysis (Pongpanich et al. 2010; Marees et al. 2018). In addition, the WK considers the contribution of genetic markers to the trait-specific G × E. By further differentiating the effects of SNPs between growing environments, GS predictability can be improved for all traits studied for DH genotypes, as well as for the half-sib families of Interior spruce with considerable degree of environmental variability across sites. Finally, the Bayesian kernel methodology presented in the present study offers the flexibility required for predicting multiple populations across environments without using genetically clonal material. This kernel implementation can further encourage integration of other predictors, such as variables in environmental typing (Gianola 2021), to further improve GS performance of highly genetically heterogeneous populations across environments.