Introduction

Maize (Zea mays L.) yields in sub-Saharan Africa (SSA) are amongst the lowest in the world (FAO 2021). Average yields in this region range from 1 to 3 t ha−1, well below the global average of around 5 t ha−1 (Prasanna et al. 2020, 2021). Over a quarter of households in SSA are deemed persistently food insecure, with that figure climbing to 40% during the dry season (Fraval et al. 2019). Furthermore, demand for maize in this region is expected to triple by the year 2050 as a result of rapid population growth (Ekpa et al. 2018). While drought stress and increased climate variability are linked to low yields, low fertilizer use is also a key driver of the maize yield gap in this region (Tittonell and Giller 2013). While average N fertilizer use in SSA by smallholder farmers has increased over the past decade, however it remains very low at 17.9 kg N ha−1 (Jayne and Sanchez 2021). This difference is particularly pronounced on female managed plots which tend to receive less nitrogen inputs than male management plots within a farm (Cairns et al. 2021; Farnworth et al. 2017).

For vulnerable populations, increased dietary diversity and consumption of nutrient-rich food is essential for increased nutrition (Poole et al. 2021). However, in more than 20 countries in SSA, maize accounts for more than 30% of calories consumed (Goredema-Matongera et al. 2021). In Lesotho, Malawi, Zambia, and Zimbabwe, the average per capita consumption of maize is more than 100 kg per person per year (i.e., 300 g per day), roughly half of daily calorie intake (Cairns et al. 2021; Prasanna et al. 2021). High-quality protein sources include eggs, meat, and dairy products, but vulnerable populations in SSA have limited access to these foods and rely heavily on maize as a primary source of protein (Nuss and Tanumihardjo 2011). Average protein supply from maize is 10 g per capita per day in SSA, with up to 35 g per capita per day in southern Africa (FAO 2021). In Burkina Faso Eswatini, Ethiopia, Lesotho, Malawi, Mozambique, Nigeria, Tanzania, Togo, Zambia and Zimbabwe, maize provides a greater source of protein than protein derived from animal sources (FAO 2021).

Generally, maize grain has low oil (4.4%) and protein (9.1%) contents but a relatively high starch content (73.4%) basing on dry matter measurements (Dei 2017). In temperate maize, breeding has led to significant increase in grain yield. However, grain protein content is estimated to have decreased 0.3% per decade and grain starch content has increased at 0.3% per decade (Duvick 2005). Grain oil content has also reduced over time in temperate maize (Scott et al. 2006). Grain quality is linked closely linked to soil quality (Wood et al. 2018). Maize protein content increases as the level of N applied increases (Zhang et al. 2020). To date, studies looking at the effect of reduced N level on grain quality have used significantly higher levels of N than is relevant to smallholder farmers in SSA. Under low nitrogen stress in SSA environments, previous maize grain quality assessment studies, such as Abu et al. (2021), Ngaboyisonga and Njoroge (2014), and (Oikeh et al. 1998), used N rates ranging from 30 to 120 kg ha−1. Such application rates are higher than the average N application rates (which is between 12 and 16 kg ha−1 (Heffer and Prud’homme 2015; Sheahan et al. 2014) in smallholder agriculture in SSA. Incorporating grain quality traits into breeding can involve significant costs associated with grain nutrient analyses, morphological characterizations, and associated trait complexities stymie progress in improving maize grain quality. The high-cost requirement of measuring grain quality traits stems from the wet chemistry procedures required to create near-infrared spectroscopy (NIRS) calibration curves, which are relatively expensive. There is a significant opportunity to assess and employ the potential of genomic selection in the improvement of grain quality traits in maize. This reinforces the need for more rapid progress to improve maize grain quality components (particularly, protein, starch, and oil content) through accelerated and more efficient breeding enabled by molecular markers.

The integration of genomic tools such as genomic-wide association studies (GWAS), linkage mapping and genomic selection (GS) with traditional breeding approaches have increased the efficiency of grain quality and Low N tolerance selection. These techniques have been used to identify causal variants for Low N tolerance. Linkage mapping is a common method for locating quantitative trait loci (QTL) based on a segregating population derived from a cross of two parental lines with significantly divergent phenotypes (Xiao et al. 2017). Linkage mapping can be combined with GWAS to provide a more comprehensive strategy for identifying markers linked with a trait of interest, by the discovery of trait linked markers by GWAS and their validation through linkage mapping. The most robust markers can then be employed for marker-assisted selection (MAS). However, the accuracy with which genetic maps are constructed is dependent on the mapping population since doubled haploid (DH) lines, near-isogenic lines (NILs), recombinant inbred lines (RILs), and backcross lines are extremely effective yet to date labour-intensive and time-consuming to generate (Rao et al. 2021). GS has been highlighted as a powerful option for selecting polygenic traits that are difficult to select through MAS. GS can accomplish this by employing genome-wide dense markers for predictions, and therefore can support association analyses to determine the genetic basis of grain yield and quality related traits (Bentley et al. 2014). Galli et al. (2020) reported that GWAS could identify both additive and dominant genetic effects that influence Low N tolerance in maize.

To understand how low N stress affects grain yield and grain quality traits such as grain protein, starch, and oil content, this study was performed using a tropical maize population under low N and optimum conditions across multi-location field trials in Kenya and South Africa. The study had the objectives of (i) Assessing the genetic architecture of low soil-N tolerant maize test crosses using their responses to grain quality and yield traits under two management conditions (optimum and low soil N); (ii) Identifying the significant quantitative trait nucleotides (QTNs) and putative candidate genes and QTLs for quality traits in tropical maize germplasm tested in multiple locations; and (iii) Assessing the potential of utilizing GS in the improvement of grain quality traits.

Materials and methods

Germplasm, experiment design and management

An association mapping panel of 410 tropical maize lines developed under CIMMYT’s Improved Maize for African Soils (IMAS) project (Ertiro et al. 2020a) in collaboration with the Kenya Agricultural and Livestock Research Organisation (KALRO) and Agricultural Research Council (ARC, South Africa) was used. The study also evaluated two DH populations from CIMMYT’s Heterotic group B (221 lines of CML550/CML504, 115 lines of CML550/CML511), and two DH population from CIMMYT’s Heterotic group A (175 lines of CML505/LaPostaSeqC7-F64-2-6-2-2 and 131 lines of CML536/LaPostaSeqC7-F64-2-6-2-2) (Ertiro et al. 2020b). Test cross hybrids were generated by crossing all inbred lines with a broadly adapted CIMMYT maize inbred line tester from the opposite heterotic group.

All testcross progenies for both association panel and DH populations were evaluated in three optimal and six low N-stressed sites (Table 1). Experiments conducted in the same site over several years were classified as separate environments (Das et al. 2019; Ertiro et al. 2020a). Kiboko lies within longitudes 37.7235°E and latitudes 2.2172°S, at an elevation of 975 m above sea level. The station receives between 545 and 629 mm of rainfall split in two seasons and lies in a hot, semi-arid region with annual temperature ranging from 16.0 to 33.6 °C. The soils are well drained, very deep, dark reddish brown to dark red, friable sandy clay to clay (Acri-Rhodic Ferrassols) developed from undifferentiated basement system rocks, predominantly banded gneisses (Ertiro et al. 2022). Other location Embu lies at an elevation of 1350 m above sea level and receives an average of 893 mm of rainfall and lies in the foothills of Mount Kenya with annual temperature ranging from 15.0 to 27.9 °C. Cedara research station in South Africa lies 1037 m above sea level and receives an average of 990 mm of rainfall annually. All testcross progenies were evaluated in an alpha-lattice design with two replications. The sites for low N trials were depleted for soil N content by growing sorghum for several years without applying any external N fertilizer. Experiments were planted in one-row plots, with a planting density of 5.33 plants/m2 (Kenya and South Africa). In each location, two seeds per hill were sown, then thinned to one after emergence. At planting, triple phosphate (46% P2O5) was applied to the low N trials at the rate of 50 kg P2O5/ha. On optimum trials, diammonium phosphate (DAP) fertilizer was used at the rate of 54 kg N per hectare. Optimum trials were top-dressed with urea fertilizer at the rate of 138 kg N per hectare three weeks after planting. All trials under both optimum and low-N were irrigated as required to avoid any moisture stress. Trials under both conditions were kept weed-free and other standard agronomic practices were conducted.

Table 1 Description of the field trials used in the study

Measurements of grain yield and quality traits

Data were recorded for grain yield and quality traits (i.e., protein, oil, and starch contents). Shelled grain yield was measured in kilograms (kg) and converted to tons per hectare and reported at 12.5% moisture. Protein, starch, and oil content were measured using a FOSS Infratec TM 1241 from 500-g samples of grain taken from each plot and are reported as a percentage of whole grain. The FOSS Infratec is a non-destructive whole-grain analyzer that uses near-infrared reflectance (NIR) to estimate quality parameters. Five 100-g subsamples were assayed and the mean reading for each parameter was reported per plot.

Phenotypic data analysis

Analyses of variance for each biparental population and IMAS panel at each and across locations under optimum and Low-N conditions was performed in the R program embedded in META-R (Alvarado et al. 2020) and ASREML-R (Gilmour et al. 2009). The linear mixed model with the restricted maximum likelihood (REML) was used to calculate all variance components. The study treated replication as fixed effect and all other treatment effects as random. On an entry-mean basis, the broad-sense heritability (H2) was estimated using the genotypic to phenotypic variance ratio from the derived variance components. Furthermore, to determine the genotypic effects of the investigated lines for each and across environments, best linear unbiased estimation (BLUE) and best linear unbiased prediction (BLUP) were obtained. For GWAS and linkage mapping, BLUPs were used. On the other hand, BLUEs were used for GS analyses. The classification of the genotypic correlation coefficients followed the guidelines provided by Profillidis and Botzoris (2019). To determine the impact of low N stress on the aforementioned traits, we used a t-test to compare the mean values of the two management conditions. We also determined the percentage change (i.e., decrease or increase) in grain quality trait performance.

Genotyping-by-sequencing (GBS)

DNA was extracted according to the CIMMYT high-throughput mini-prep Cetyl Trimethyl Ammonium Bromide (CTAB) method (Semagn (2014). Following the protocol presented in Elshire et al. (2011), maize DNA samples were genotyped using a restriction enzyme (ApeKI) and 96-plex multiplexing at the Institute of Biotechnology at Cornell University, USA. The Institute of Genomic Diversity (IGD) at Cornell provided raw GBS data for a maximum of 955,120 SNP loci spread throughout the 10 maize chromosomes (Ertiro et al. 2020a). Raw data was filtered for linkage mapping according to the criteria used in Ertiro et al. (2020a) of > 10 percent minor allele frequency (MAF) and no missing data. Furthermore, the genotype data were filtered for GWAS using the Trait Analysis by Association, Evolution, and Linkage (TASSEL v.5.2.7.2, Bradbury et al., 2007) software, with a baseline count of SNPs on 90% and a MAF of > 5% of the sample size as presented in Ertiro et al. (2020a). Principal Component Analysis (PCA) was carried out in TASSEL (v.5.2.7.3), as were genetic distances and kinship.

Genome-wide association study analysis

In natural populations or association panels, the population structure and relative kinship cause high level of spurious positives during association studies. To assess the effect of population structure, PCA, and the relative kinship (K) on association results in IMAS panel, we used the following statistical models: (1) uncorrected genotypic data only (GLM with G only); (2) GLM with PCA + G; (3) a mixed linear model (MLM) with PCA + K + G; (4) and FarmCPU model. G = genotype (fixed), PCA = three principal components (fixed), K = kinship matrix (random). The R package ‘FarmCPU-Fixed and random model Circulating Probability Unification’ (Liu et al. 2016) was used for GWAS analysis for all traits. FarmCPU utilized the first three PCs derived by TASSEL as input for GWAS. The kinship was computed using FarmCPU’s default kinship algorithm as presented in Ertiro et al. (2020a). The analysis was performed with maxLoop of five, p threshold of 0.1, QTN threshold of 0.1 and MAF threshold of 0.05. The maxLoop refers to the total number of iterations used. The p threshold, QTN threshold and MAF threshold refers to p values selected into the model for the first iteration, the p value selected into the model from the second iteration and the minimum MAF of SNPs used in the analysis. False discovery rate threshold of 0.1 was used to set a significant level in Manhattan plots. The Manhattan and quantile–quantile (QQ) plots, GWAS findings, and a table of marker effects of user-provided variables were all produced by the FarmCPU using the “GAPIT” function. To annotate putative candidate genes for traits under study, the physical positions of the significant SNPs were compared with the Maize B73 reference genome version 2 (RefGen_v2), available at the MaizeGDB database (www.maizegdb.org) and functional gene annotations were retrieved from http://ensembl.gramene.org/Zea_mays. The presence of the protein-coding genes was searched within the range of 20 kb (10 kb upstream and downstream) in the vicinity of the detected significant SNPs.

QTL mapping and genomic prediction

The four DH populations were genotyped with GBS and data was further filtered to a manageable size using TASSEL software with > 0.10 MAF, < 5% heterozygosity, and 90% the minimum count of the total size (Bradbury et al. 2007; Sitonik et al. 2019). In all the populations, homozygous marker loci for both parents and uniformly distributed polymorphic markers between parents were retained. Linkage maps were constructed by using QTL IciMapping version 4.1 (Meng et al. 2015) in all four DH populations. BIN is an inbuilt tool implemented in QTL IciMapping was used to remove the highly correlated SNPs. This resulted into retain 2699, 1962, 1985 and 2086 high-quality SNPs in CML550/CML504, CML550/CML511, CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2, respectively. These SNPs were used to construct linkage maps using the MAP function. IciMapping used the grouping, ordering, and rippling steps to construct a linkage map. The ICIM is an effective two-step statistical approach that allows separation of co-factor selection from interval mapping process, in order to control the background effects and improve mapping of QTL with additive and dominance effects. The Kosambi genetic distance mapping function which assumes that recombination events influence the occurrence of adjacent recombination was used.

For QTL mapping, we used IciMapping 4.1 based on biparental populations (BIPs) module with inclusive composite interval mapping (Meng et al. 2015). BLUP values across environments for each trait in each the DH populations were used in QTL detection analysis. The mapping populations were grouped by the SNPs and the significant difference between the means (P-value < 0.0001) was detected based on the markers that were linked to a QTL controlling the selected target trait (Collard et al. 2005). The highest peak of one LOD that supports the confidence interval was used to declare the significance of the QTL map position on both sides of the QTL (Hackett 2002). A LOD threshold of 3.0 with a scanning step of 1 cM were used to declare significant QTL. Stepwise regression was adopted to determine the percentages of phenotypic variance explained (R2) by individual QTL and additive effects at LOD peaks. The phenotypic variation explained (PVE) by each QTL and together for all QTLs for each trait was estimated. The origin of the favourable allele for each trait was identified based on the sign of the additive effects of each QTL.

BLUEs across environments for each trait in each population were used in the GS analysis. The Ridge-regression BLUP (RR-BLUP, Zhao et al. 2012) with fivefold cross-validation for each trait was used for the analysis. A sample of 4000 SNPs with all data values, equally distributed throughout the genome, and MAF > 0.05 was chosen from the GBS data for the IMAS panel and all four DH populations. Individual DH population and the IMAS set were sampled to form a training and prediction set. The prediction accuracy was calculated as the correlation between the observed phenotypes and genomic estimated breeding values (GEBVs) divided by the square root of heritability (Dekkers 2007). In each population, 100 iterations were done for the sampling of the training and validation sets.

Results

Effect of low N stress on grain yield and quality traits

There was significant variation in protein, starch and oil content, and grain yield within all four biparental DH populations and the IMAS panel under optimum and low N stress conditions (Fig. 1 and Table 2). In the IMAS panel and CML505/LaPostaSeqC7-F64-2-6-2-2 DH pop, yield under low N stress was reduced by 59% and 48%, respectively. In DH pop CML550/CML511, the mean yield under low N stress was 5.45 t ha−1; however, this was a reduction of 47% relative to optimal conditions. Low N stress significantly (p < 0.01) reduced protein and oil content (except in DH pop CML505/LaPostaSeqC7-F64-2-6-2-2) but had no significant effect on starch content. Although the level of N stress and therefore the reduction in grain yield was the lowest in DH pop CML550/CML511, both protein and oil content had the largest reduction in this population under low N stress.

Fig. 1
figure 1

Phenotypic distribution for grain yield and quality traits evaluated under optimum and Low N stress conditions. The sky blue and red colour plots represent the trials conducted under optimum and low N stress conditions, respectively. DH pop1 = CML550/CML504; DH pop2 = CML550/CML511; DH pop3 = CML505/LaPostaSeqC7-F64-2-6-2-2; and DH pop4 = CML536/LaPostaSeqC7-F64-2-6-2-2

Table 2 Genetic parameters for IMAS panel and four DH Populations evaluated under optimum and low N stress conditions in multiple environments

The genotypic, environmental, and genotype-environment interaction effects (G, E and G x E, respectively) were significant at p ≤ 0.05 for yield and quality traits (Table 2). For protein, starch, and oil content under optimal conditions, the magnitude of genotypic variance was greater compared to low N stress conditions. Under low N conditions, the effect of genotype, environment, and G x E interactions on oil content was significant across all genotypes tested. Interestingly, under the same conditions, the genotypic effects on protein content were only significant in DH pops CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2. The G x E effects on protein content and starch content in DH pop CML550/CML511 were significant. The zero estimates of G x E interactions for starch and protein content (observed on DH pop CML550/CML511 under low N stress) indicate that genotypic performance for these traits was stable across the tested environments. H2 values of each trait under both optimal and low N stress are presented in Table 2. In general, H2 of all traits was lower under low N stress than optimal, with the exception of starch content which increased under low N stress across all populations.

Grain yield was negatively correlated with protein content across populations, regardless of N stress level (Table 3). Similarly, starch content showed a negative correlation with protein and oil content across genotypes and management options. A weak positive correlation was reported between protein and oil content across populations and N levels. The only exceptions were in DH pop CML550/CML504 under optimum conditions where oil content was significantly (at p < 0.01) and negatively correlated to protein content (r = − 0.92**).

Table 3 Genetic correlations between BLUPs of grain yield and grain quality traits evaluated under optimum and low N stress management

Protein content had a negative correlation with grain yield (r = − 0.41**) and starch content (r = − 0.54**) in the IMAS panel under optimum conditions. Similarly, a weak positive correlation was observed in the IMAS panel, DH pops CML550/CML504 and CML505/LaPostaSeqC7-F64-2-6-2-2 between protein content and oil content under low N stress. Starch and oil contents were shown to be significantly (p < 0.05) and negatively correlated under optimum (r = − 0.65) and low N stress (r = − 0.18) in the IMAS panel. The genotypic correlation coefficients for DH pops CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2 showed that oil content under optimum conditions had no correlation with protein content (r = 0.00). As demonstrated in Table 3, further significant (p < 0.05 and p < 0.01) trait correlations were established among the phenotypic parameters measured across the management conditions for each set of genotypes. The observed correlations between grain quality and yield can be useful for selection decisions or trade-offs in genotypic selection.

GWAS analyses

From GBS, 337,113 SNPs were produced for the 410 genotypes, in which 77.1% (259,798) remained after filtering using the > 5% MAF and 10% missing per marker criteria (Supplementary Figure S1). The kinship relations among the IMAS panel were determined using the filtered 259,798 SNP markers and depicted as a genetic cluster, indicating that the panel of genotypes are split into four potential genetically differentiated subgroups. The heatmap of the panel’s kinships was used to predict the magnitude of the existing relationships in the genotypes: this established that the genotypes were not closely related and that there is no strong population structure (Supplementary Figure S2). Further partition of the population structure of the IMAS panel using STRUCTURE 2.3.4 is presented in an earlier study by Kibe et al. (2020a) and Gowda et al. (2015). PCA was carried out using 259,798 high-quality SNPs (Supplementary Figure S3). The first principal component (PC1) accounted for roughly 4.5% of the overall variation, whereas the second principal component (PC2) explained 2.5% (Supplementary Figure S3). Calculation of genome-wide LD using 259,798 SNPs showed a significant decline in LD as genetic distance rose, with different rates of attenuation for each of the ten chromosomes (Supplementary Figure S4). Association analyses for grain yield and quality traits evaluated under optimum management were performed to evaluate the effects of different models on the control of false associations (Supplementary Figure S5). For all four traits, the observed P values from the GLM(G) and GLM (G + PCA) models showed the higher deviation from the expected P values is possibly due to either no associations or more false positives were detected. The P values from the MLM (PCA + K) and FarmCPU models were similar and close to the expected P values and are more effective in controlling the false associations (Supplementary Figure S5). With MLM model, between Kinship and some of the markers, the confounding effect is more severe and may results into overfitting of the model. On the other hand, FarmCPU model which uses both the fixed effect model and the random effect model iteratively, able to completely remove the confounding from kinship by using a fixed-effect model without a kinship derived either from all markers, or associated markers. This process overcomes the model overfitting problems of stepwise regression (Liu et al. 2016). Therefore, further in this study we used the results only FarmCPU model for both optimum and low N management conditions. Figures 2 and 3 depict the GWAS findings for protein, starch, and oil content, and yield across the two N managements as Manhattan and Q-Q plots of p-values evaluating the anticipated and observed − log10 p-values. Sixty-one SNPs were significantly (P = 2 × 10–5, p = 0.1 False Discovery Rate (FDR)) associated with the protein, starch, and oil content, and yield under optimal conditions and were spread across 10 chromosomes (Table 4). Under low N conditions, 42 SNPs are linked to the aforementioned traits.

Fig. 2
figure 2

Manhattan and quantile–quantile plots generated using a mixed linear model for grain yield (A), Protein content (B), starch content (C) and Oil content (D) under optimum management. The significance level (P = 2 × 10–5 at 0.1 False Discovery Rate (FDR)) is represented by the dashed horizontal line. The X-axis shows the position of SNPs along the 10 maize chromosomes, with various colours indicating distinct chromosomes. The Y-axis shows the − log10(P observed) in each analysis

Fig. 3
figure 3

Manhattan and quantile–quantile plots generated using a mixed linear model for grain yield (A), protein content (B), starch content (C) and oil content (D) under low N stress management. The significance level (P = 2 × 10–5 at 0.1 False Discovery Rate (FDR)) is represented by the dashed horizontal line. The X-axis shows the position of SNPs along the 10 maize chromosomes, with various colours indicating distinct chromosomes. The Y-axis shows the − log10(P observed) in each analysis

Table 4 Chromosomal positions and SNPs significantly associated with grain yield, protein, starch, and oil content detected by SNP-based

Under optimal conditions, three SNPs linked with protein content were significant on chromosomes 1 (S1_17679954 and S1_214242607) and 10 (S10_114836465). Under low N stress, however, two different SNPs S3_198394847 and S4_120988951 were associated with protein content. Starch content (low N) was associated with five SNPs, the most significant of which was S5 10542862. Eight SNPs on chromosomes 2 (S2_174345463 and S2_174345465), 3 (S3_180044790), 5 (S5_10542862), 6 (S6_5158703 and S6_60978968), 7 (S8_3430590), and 8 (S7_14465153) were linked with the starch content under optimum conditions. Under optimal conditions, twelve SNPs were significantly associated with oil content, with one-third of these loci located on chromosome 6. Twelve SNPs, with loci on all chromosomes except 4 and 7, were significantly linked with oil content under low N. For starch and oil content at low N conditions, only the significant SNP on chromosome 6 (S6_60978968) was co-detected. The proportion of detected SNPs for the other traits are presented in Table 4. The Q-Q plot for grain yield and oil content under optimum conditions and oil content under low N stress revealed that some observed P-values were more significant as the marker points migrated from the dotted red line towards the y-axis.

To elucidate the molecular and physiological mechanisms controlling grain quality traits under optimum and low N conditions, candidate genes were identified (Harper et al. 2016). On all chromosomes, a total of 51 candidate genes were discovered (Table 4). The lowest number of candidates (2) and the highest number (12) were related to protein content under low N and oil content under both optimum and low N, respectively. From these candidates, 80.39% (41 genes) were functionally annotated, whereas 19.61% (10 genes) were classified as unknown proteins. The study revealed four candidate genes with protein serine/threonine kinase activity that play a role in soil N response. Under optimum conditions, GRMZM2G159307 and GRMZM2G104325 were encoded as ATP binding proteins for grain yield and starch content, respectively. GRMZM2G10816 (yield), GRMZM2G070523 and GRMZM2G080516 (oil content) were associated with DNA biosynthesis under low N stress conditions. Under both optimal and low N circumstances, GRMZM2G033694 was annotated in the Histone-lysine N-methyltransferase family. Genes coding for shoot apex development were discovered to be associated with grain yield, protein, starch, and oil content under low N stress.

QTLs associated with grain yield and quality traits

The four populations used in this study for linkage mapping were also used in our earlier study (Ertiro et al. 2020a) which includes detailed information about genetic maps. Table 5 shows the detected QTLs and their positions and genetic effects. In DH pop CML550/CML504, two QTLs each were detected for grain yield and starch content, three QTL for protein content and five QTL for oil content under low N stress. The PVE by these QTL was varied from 4.67 to 22.19% and together the total PVE was varied from 12.5% for grain yield to 47.9% for oil content. In DH pop CML550/CML511, one QTL each were detected for grain yield, starch content and oil content under low N stress. In DH pop CML505x LaPostaSeqC7-F64-2-6-2-2, five QTL were detected for grain yield with one QTL on chromosome 3 having a major effect with 12.17% of PVE. For protein content nine QTL were detected with all individually having minor effects except a QTL on chromosome 3 with 11.78% of variance explained. For starch content, three QTL each were detected under optimum and low N conditions with two major effects QTL on chromosome 8. For oil content three QTL were detected under optimum and six QTL were identified under low N conditions with one common QTL on chromosome 2 across management conditions. Four major effect QTL were identified for oil content on chromosomes 2, 4 and 5 which explained > 10% of the phenotypic variation (Table 5). In DH pop CML536xLapostaSeqiaF64, one QTL each were detected for protein and oil content and three QTL were detected for starch content, with one major effect QTL at chromosome 4 which contributes 20.3% of phenotypic variation for oil content.

Table 5 Number of QTL detected for grain yield, grain protein content, starch content and oil content under optimum (opt) and low-nitrogen (LN) stress across environments in four DH populations

Comparison of the QTLs across association mapping panel and DH populations revealed several QTLs overlapped for same traits for across optimum and low N conditions, as well as for multiple traits (Supplementary Figure S5, S6). In Chromosome 1, two regions, between 209 to 214 Mb and 268 to 280 Mb had QTL for more than one trait. In chromosome 2, bin 2.03 harboured QTL for both GY and oil content, whereas bin 2.06 has QTL for both Starch content and oil content under both optimum and low N conditions. At chromosome 3, bin 3.06 had QTL for grain yield, protein content, oil content and starch content (Supplementary Figure S5). Further QTL clustering for more than one trait was observed on chromosome 4, bin 3.06 and 3.07, on chromosome 5, at bin 5.02 and 5.05, on chromosome 6, at bin 6.02 and 6.06 and on chromosome 10 at bin 10.06. Significant SNP, S1_269023923 associated with oil content detected through GWAS was co-located with QTL (qOC_01_269) detected in DH pop1. Another SNP S5_11883140 detected for GY in GWAS panel was co-located with QTL qGY_05_15 detected on DH pop2 (Tables 4 and 5).

The RR-BLUP model (Endelman 2011) was used to estimate the performance of maize genotypes for grain quality traits for each population (Fig. 4 and Supplementary Table S2). Under low nitrogen conditions, average prediction accuracies across the studied genotypes were higher for oil content (0.78) and lower for grain yield (0.08). In IMAS panel we observed the prediction accuracy of 0.41, 0.38. 0.39 and 0.44 under optimum and 0.35, 0.35, 0.41 and 0.56 under low N conditions, respectively. Interestingly, in DH pop CML550/CML504 outperformed other DH populations in terms of genomic prediction accuracy. Under low N, CML550/CML504 had the best prediction accuracy for protein (0.66), oil (0.73), and starch (0.7) content. The prediction accuracy for protein content was highest in DH pop CML505/LaPostaSeqC7-F64-2-6-2-2 under optimum (r = 0.69) and for DH pop CML550/CML504 under low N stress (r = 0.66). For starch content under low N, prediction correlation was highest for CML550/CML504 (r = 0.70) followed by the CML536x LaPostaSeqC7-F64-2-6-2-2 (r = 0.56), IMAS panel (r = 0.41), CML550xCML511 (r = 0.26) and CML505x LaPostaSeqC7-F64-2-6-2-2 (r = 0.23). CML536x LaPostaSeqC7-F64-2-6-2-2 had the highest prediction correlation for oil content under low N (r = 0.78), followed by CML550/CML504 (r = 0.73) and CML505x LaPostaSeqC7-F64-2-6-2-2 (r = 0.71). CML550xCML511 had the lowest prediction for grain yield (r = 0.08), protein (r = 0.17) starch (r = 0.16), and oil content (r = 0.11).

Fig. 4
figure 4

Genome-wide prediction accuracies for grain yield and quality traits in IMAS panel (A), DH pop2 = CML550/CML511 (B), DH pop3 = CML505/LaPostaSeqC7-F64-2-6-2-2 (C), DH pop1 = CML550/CML504 (D) and DH pop4 = CML536/LaPostaSeqC7-F64-2-6-2-2 (E). Blue and red colour box plots indicate traits were evaluated under optimum and low N stress conditions, respectively (color figure online)

Discussion

Significant levels of malnutrition (Christian and Dake 2021) and food insecurity (Giller 2020) continue to be experienced by maize-dependent smallholder farming populations in SSA that cultivate in nitrogen-depleted soils. Unravelling the genetic architecture of grain yield and quality traits through GWAS and GS is critical for the development of superior genotypes conferring high expression of grain quality traits both under optimum and low N stress. This study aimed to understand the underlying genetics of low N stress on grain quality traits by combining GWAS with the IMAS panel, QTL detection, and GS in four bi-parental populations. Grain quality traits, notably protein, starch, and oil content, are critical for reducing the incidence of undernutrition in SSA. Understanding the performance of grain quality traits and associated genetic markers under low N stress can aid in the development of maize lines with high protein, starch, and oil content.

Phenotypic evaluation under optimum and low N stress

Phenotypic analyses showed that protein, starch, and oil content were significantly decreased under low N stress compared to optimal conditions across all tested genotypes. This is consistent with the findings of Liu et al. (2008), who found that in lower N conditions, protein and oil content are considerably reduced. However, the same research reported the opposite for starch content increased under lower levels of N stress. According to Jahangirlou et al. (2021) and Simić et al. (2020), high N conditions are associated with higher protein content and yield. On the other hand, low N substantially decreases protein concentration and all zein fractions apart from β-zeins, according to research conducted in Zimbabwe by Shawa et al. (2021). The impact of soil nutrient management on oil content was significant in the study of Ray et al. (2019). That study reported that using N as a component in NPK blends increased the quantity of saturated fatty acids while decreasing the percentage of unsaturated fatty acids in grain maize oil. Kaplan et al. (2017) suggested that N fertilizer application in combination with adequate irrigation has a favourable effect on the oil content. However, in SSA due to the impoverished economic situation of smallholder farmers, N fertilization is not currently an accessible solution to combat endemic undernutrition. Therefore, maize lines showing high protein, starch and oil content under low N stress should be considered for incorporation into maize breeding programs targeting the SSA region.

In low N stress environments, genetic variability is crucial for the effective selection of enhanced grain quality traits in maize. Ertiro et al. (2020a) asserted that, due to the intrinsic unpredictability of various traits of interest, phenotypic data for trials conducted under low N conditions typically show poor heritability. However, under both optimum and low N environments, our study estimated wide genetic variances and moderate to high broad sense heritabilities. Estimates of heritability ranging from moderate to high imply that the traits have the potential to be enhanced by recurrent selection (Gowda et al. 2021). The influence of G, E, and G x E interactions on oil content was significant under low N conditions across all genotypes examined. The genotypic effects on protein content were significant in some of the genotypes tested (CML505/LaPostaSeqC7-F64-2-6-2-2 and CML536/LaPostaSeqC7-F64-2-6-2-2) under low N conditions. The G × E effects of CML550/CML511 on protein and starch content were significant. The detected significant genotypic variation for the assessed traits in this study indicated the possibility of selecting for improved protein, starch, and oil content under low N stress. Among the three grain quality traits investigated, starch content had the lowest H2 estimate, whereas oil content had the highest H2 estimate under low N conditions. Oil content’s high broad-sense heritability suggests that its narrow-sense heritability may be even greater, implying that significant genetic gain for this trait is attainable.

Grain yield had a negative genotypic correlation with protein content in all populations, regardless of N level. This supports Arisede et al. (2020)’s findings that increased grain yield was associated with decreased grain protein content in both susceptible and tolerant maize hybrids when residual soil N was low, despite the fact that tolerant hybrids showed a substantially smaller loss in grain protein content. It is well known that choosing between yield and quality is hard in breeding. The observed relationship between grain yield and quality traits in this research under both optimal and low N conditions imply that selecting for grain yield alone will not increase protein, starch, or oil content. On the other hand, protein content had a significant negative correlation with starch content in all populations, which is consistent with previous results (Liu et al. 2008; Zheng et al. 2021). Thus, to increase grain quality, particularly under low N stress, there is a need to select for both grain yield and grain quality. Obviously, this would be very expensive, and a negative relationship makes breeders balance in the selection of these traits hence the need to investigate the potential of molecular breeding in the improvement of these traits.

Association mapping and candidate genes

Association studies targeting protein, oil, and starch content in maize have been conducted utilizing a range of genotypes and marker sets (Alves et al. 2019; Cook et al. 2012; Zheng et al. 2021). The use of GWAS in maize genetics has been highly effective in discovering causal genes for grain quality traits (Zheng et al. 2021). In particular, GWAS is an effective technique for mapping loci linked with complex plant traits in genetically heterogeneous populations (Deng et al. 2021). The power of detection of GWAS is dependent on the LD between the markers and QTL. In outcrossing plant species such as maize, LD declines at a short distance and rapidly (Dinesh et al. 2016). In this study, the LD declined rapidly across physical distance (Kibe et al. 2020a), showing that the IMAS panel has significant genetic diversity and was, therefore, suitable for GWAS.

Candidate genes and SNPs discovered by GWAS for maize grain nutrient content can provide critical information for maize breeding efforts focusing on developing high-quality varieties (Zheng et al. 2021). In this study, GWAS identified 42 SNPs linked to the grain quality traits studied under low N conditions. However, there were no overlapping SNPs for grain quality traits under low N. Under low N stress, two SNPs on chromosomes 3 (S3_198394847) and 4 (S4_120988951) were discovered to be linked to protein content. Moreover, 12 SNPs with loci on all chromosomes except 4 and 7 were also shown to be substantially associated with oil content under low N stress. The genetic regions identified in this work through GWAS will be increasingly relevant in future breeding approaches for accurate selection of high grain quality and to increase tolerance of maize lines to low N stress.

Comparison of SNPs identified in this study under low N and optimum conditions revealed no overlapping of SNPs for grain yield and protein content possibly we were not able to detect the common variants responsible for these traits in different management conditions. Nonetheless, for starch content, all the SNPs detected under low N were also detected under optimum conditions. Whereas for the oil content SNP S6_60978968 on chromosome 6 was consistently detected in both the conditions and also found common SNPs in bin 2.01, 5.03, 6.05 and 9.01 on chromosome 2, 5, 6 and 9, respectively (Table 4). Further comparison of the detected SNPs with the previous studies revealed some overlapping with earlier reported QTL (Wang et al. 2016; Zheng et al. 2021). For instance, SNP S1_214242607 associated with grain protein content under optimum was closely located with marker detected through GWAS (Zheng et al. 2021) and co-located with QTL detected in two populations (Wang et al. 2016). SNP S1_191845162 detected for oil content was co-located within the QTL (bnlg2086-umc1122 interval) reported by Zhang et al. (2008) and Wang et al. (2010, umc1395-umc2237 interval). Marker 1_190758142 detected through GWAS for oil content by Zheng et al. (2021) was also located within the same QTL region pointing to the importance of the region for improving oil content in maize. Another SNP S2_148879075 detected for oil content was located within the QTL region (bnlg108-phi092 interval) reported by Zhang et al. (2008). SNP S1_17679954 for protein content was co-located within the QTL detected for oil content on chromosome 1 (umc1685-umc1044 interval) in F3 population (Wang et al. 2010). Nevertheless, some SNPs did not coincide with earlier reports in terms of their physical location. This possibly due to several reasons like these SNPs might be specific to the population in this study, the variation for quality traits in these populations is different, and different methods used to estimate quality traits in different studies also contribute to variation. However, new specific SNPs detected in this study need further validation, nevertheless, these results can serve as a reference for future studies.

We identified 51 candidate genes potentially underlying the molecular and physiological processes governing grain quality traits under optimum and low N environments. The identification of candidate genes based on associated SNPs can aid with the identification of genes important in grain quality performance under optimal and low N environments. Under low N stress, genes coding for shoot apex growth were revealed to be linked with grain yield, protein, starch, and oil content. Peng et al. (2010) asserted that shoot growth, rather than root size, is a good indicator of N sufficiency in maize. The research also identified four candidate genes with protein serine/threonine kinase activity that play a role in soil N response. Protein kinases are well-known regulators of the response of plants to abiotic stresses (Diédhiou et al. 2008; Kulik et al. 2011; Mao et al. 2010). GRMZM2G159307 and GRMZM2G104325 encode ATP binding proteins for grain yield and starch content, respectively. ATP binding proteins are essential for cellular motility, membrane transport and the control of different metabolic activities (Chauhan et al. 2009). ATP-binding has also been reported in several studies to influence the maintenance of homeostasis in plants under both abiotic and biotic stresses (Dahuja et al. 2021; Franz et al. 2011; Jarzyniak and Jasiński 2014). GRMZM2G033694 was assigned to the Histone-lysine N-methyltransferase family at both optimum and low N conditions. It is important to note, however, that these candidate genes should be further validated before being used in breeding schemes. Further functional research on the candidate genes discovered in this study is necessary to validate their possible utility in high grain quality breeding under low N conditions.

Linkage mapping on grain yield and quality traits

Linkage mapping in four populations found multiple QTLs for the studied grain quality traits. Zheng et al. (2021) alluded that, numerous grain nutritional quality QTLs in maize have been identified by genetic dissection of nutrient quality over the last two decades using traditional QTL mapping. Despite the discovery of QTLs and genes that confer superior maize grain quality in some studies, further sources of genetic variation are likely to exist among currently unexplored populations. QTL analyses in four DH populations revealed 8, 13, 12 and 15 potential QTLs associated with grain yield, protein, starch, and oil content, respectively. One QTL on chromosome 3 (qGY3_187) for grain yield is overlapped with major effect QTL (qPC3_187) for protein content and located between 180 and 189 Mb, which might be an interesting region to improve both protein and grain yield by considering their negative relationship. Zhang et al. (2015) also identified a consistent QTL (umc1644-phi102228 interval) in the same genomic region for protein content. Another QTL qPC1_115 in CML550/CML504 which explained 11% of the phenotypic variance was consistent with earlier reported QTL (phi001-umc1988 interval) by Zhang et al. (2015) and qPC10_142 detected on chromosome 10 was consistent with QTL (SYN37373—PZE110095199 interval) reported by Wang et al. (2016) in recombinant inbred line population. There was one major effects QTL (> 10% phenotypic variance explained) for grain yield (qGY3_196), three QTL each for protein content (qPC1_115, qPC3_187, qPC5_67) and starch content (qSC1_180, qPC4_32, qPC8_124) and six QTLs for oil content (qOC2_186, qOC3_60, qOC4_70, qOC5_183, qOC6_133and qOC7_08) were detected in four biparental populations. A major QTL (qSC1_180) identified in DH pop CML550xCML511, explaining about 11.5% of total phenotypic variance and located between 175 and 188 Mb, was consistent with a QTL (SYN367-PZE101031077 interval) observed in a RIL population by Wang et al. (2016). Similarly, another QTL for starch content (qSC8_124) located between 123 and 124 Mb also coincided with earlier reported QTL on chromosome 8 (PZE108069534-SYN19928 interval; Wang et al. (2016)). Similarly, major QTL for oil content qOC2_186 was also overlapped with earlier detected QTL, indicating several consistent regions for quality traits across genetic back grounds which supports their stable nature and is amenable for MAS-based improvement. Overall, several QTLs were consistent with the previous studies indicating their reliability to be used in applied breeding.

Comparison of QTLs detected in both GWAS, and linkage mapping revealed clusters on several chromosomes either for same trait under both optimum and low N management as well as for both grain yield and quality traits (Supplementary Figure S5 and S6). Clustering of QTL for grain yield for both under optimum and low N conditions were detected on chromosome 3 between 186 to 200 Mb in DH pop3 and another QTL between 207 to 220 Mb in DH pop3 and GWAS panel. Several SNPs detected in GWAS were co-located within the QTLs detected in DH populations, for instance like for GY on chromosome 6 at 12 Mb and oil content at chromosome 6 at 60 Mb, these results help in further reducing the confidence interval of these QTLs. Focusing on increasing the favourable alleles associated with QTL in this region helps to improve the QTL for both low N and optimum management conditions. Similarly on chromosome 4, region between 140 to 180 Mb harbours QTL for grain yield, protein content and starch content and on chromosome 5 in region between 10 to 21 Mb harbours QTL for grain yield, starch content and oil content. These regions are of most important for simultaneous improvement of both grain yield and quality traits for both the management conditions. Nevertheless, further reducing the confidence interval of these regions helps to get more strongly associated markers for these QTLs which enhance the success rate to improve the multiple traits for both optimum and low N management.

Genomic predictions on grain yield and quality traits

GS in tropical maize for various traits of interest revealed moderate to high prediction accuracies in several studies (Azmach et al. 2018; Beyene et al. 2019; Crossa et al. 2014; Gowda et al. 2021). The relative merits of GS over phenotypic selection influence its widespread application in breeding programs ( Beyene et al. 2019, 2021; Kibe et al. 2020a). Moderate to high accuracies observed in this study for the bi-parental populations and IMAS panel offer promise in breeding for quality traits in tropical maize. Under N-starved soils, average prediction accuracies (Fig. 4) were higher for oil content (0.78) and lower for grain yield (0.08) which ascribed to their differences in their genetic architecture as oil content is relatively less complex in nature. DH population CML550/CML504 exhibited the highest prediction accuracy for protein (0.69), oil (0.73), and starch (0.70) content under low N stress. GS prediction accuracy has a direct influence on the degree of trait variation and heritability in each population (Kibe et al. 2020a). This is confirmed by this study, especially for oil content which had the highest genetic prediction accuracy is agrees with high magnitude of genotypic variation and H2 estimates. Prediction accuracy for quality traits in the IMAS panel was in agreement with various studies on moderately complex traits like resistance for grey leaf spot (Kibe et al. 2020a), common rust (Kibe et al. 2020b), Striga (Gowda et al. 2021), maize lethal necrosis and maize chlorotic mottle virus (Sitonik et al. 2019). In the IMAS panel, the observed moderate prediction accuracy can be attributed to its genetic structure and high LD between adjacent markers, which could also be credited to its moderate heritability. Overall, this study indicates that utilizing a common training population to predict grain quality trait performance under low N stress in many linked but separate populations can be beneficial. In addition, the results also suggest that for complex traits like GY, selective marker-based approaches are less effective to improve their performance, however MAS can help to improve for quality traits irrespective of the management conditions.

Compared to grain yield, quality traits are less complex, however, improving them under low N stress conditions through traditional breeding is laborious. Further improvement of these traits through MAS or MABC, there is need of more validation experiments for each trait to confirm the identified genomic regions for their consistency under diverse genetic background and environment, and fine map these regions to have stable markers which is resource intensive. On the other hand, GS is not needed any prior information on trait specific markers but is efficient in predicting lines performance on desired traits. In addition, the major QTL information available through QTL mapping, and GWAS can be incorporated into GS model as fixed effects which further enhance the prediction accuracy for these traits and helps to improve the efficiency in developing high yielding and high protein and oil content genotypes for optimum as well as low N stress conditions. Therefore, integration of GS in breeding program is beneficial to improve multiple traits.

Conclusions

To investigate the genetic basis of protein, starch, and oil content performance under low N stress, we employed a single panel consisting of 410 tropical maize lines for GWAS and genomic prediction. QTL mapping was also used to investigate the underlying genetic architecture in four bi-parental populations to better understand the grain quality traits. The genotypic correlations of the grain quality traits investigated indicated that these populations can be used to select better-performing lines under low N stress. GWAS identified 42 SNPs associated with grain quality traits. In addition, several QTLs for the examined grain quality traits were identified by linkage mapping across populations. The genomic regions identified can be used for selection efforts to enhance grain quality trait performance in low-nitrogen soils. Furthermore, the findings showed that including GS in maize breeding can successfully support phenotypic selection to improve grain quality trait performance under low N stress. Future work should, therefore, focus on validating the identified QTLs to enhance the efficacy of maize breeding in SSA.