Introduction

Soybean (Glycine max L.) is one of the most important crops in the world due to its high grain yield and protein and oil contents. The increased production over the last decades is due to technological progress and genetic breeding targeting the improvement of several traits, such as increased grain yields1 and obtaining genotypes adapted to the abiotic conditions of the growing region2. Thus, genetic variability is essential to breeding programs, accounting for the feasibility of techniques to identify superior genotypes and use them to generate improved soybean cultivars3. There is undeniable an effort in the breeding programs to select genotypes with higher yields than those already grown today. However, some gaps also need to be addressed by breeders, especially regarding traits of interest to the industry (oil and protein contents)4, uptake of nutrients by genotypes in Brazilian Cerrado5, besides seeking to increase grain yields6.

Developing soybean varieties that combine high grain yield and satisfactory oil and protein contents is a complex process due to the crop cycle and the genetic grain characteristics, which are highly influenced by environmental factors7. However, throughout the soybean breeding program, it is possible to select genotypes for higher oil and protein contents up to a given threshold, after which the selection for higher means for one variable results in the decrease of the another8. Such an association between contents of oil, protein and other variables of industrial interest has been reported in the literature. Jiang et al.9 grain fiber content throughout the breeding processes. Conversely, Santana et al.10, in a study to classify soybean genotypes for industrial variables, they found a strong positive correlation between oil and protein contents.

Given this scenario, the need for further studies also investigating the factors involved in the expression of these traits, such as inheritance and gene effects controlling the trait, and genetic variability among genotypes is evident. Such information is crucial for obtaining improved genotypes for industrial variables. Among the approaches that can help to solve these shortcomings is the study of the combining ability between genotypes through diallel crossings. This approach provides parameter estimates that are useful for selecting parents for hybridization and understanding the action of genes involved in determining the traits evaluated11. When selecting parents, diversity and performance per se are relevant due to the possibility of planning crossings between parents or groups of parents that provide a high heterotic effect in their offspring, making it possible to obtain populations with a broad genetic variety12.

Diallel studies carried out to establish segregating populations in soybean predominantly evaluate agronomic traits (plant height, cycle, grain yield, among others). Research seeking to understand the inheritance of agronomic traits together with nutritional (macro and micronutrient contents) and industrial traits (such as protein and oil contents) in soybean is scarce. The objectives were: (i) to select parents and segregating populations for nutritional, agronomic, and industrial traits in soybean and (ii) to understand the relationship between these traits.

Results

Joint analysis of variance and grouping of means for nutritional traits

There was a significant effect of genotypes (p-value < 0.05) for the nutritional contents of P and Mg (Table S1). GCA effect was significant only for Mg content, while SCA effects were significant for P and Mg content. Environment effects (E) were significant for all nutrient contents, except for K. Genotypes by environments interaction (G × E) and SCA × E were significant for all nutritional contents, except P. GCA × E interaction was significant for K, Ca, Fe, Mn, and Zn.

Table 1 shows the grouping of means among the genotypes for the levels of macronutrients evaluated. Segregating population P1 stood out by presenting the highest mean levels of K, Ca, Mg, and S in all sites. The populations P4 and P5 stood out by obtaining the highest means of P, K, Ca, and S in both environments. Except for the Mg content in Chapadão do Sul, the population P20 and the parent G7 had the highest averages for all macro-nutrients evaluated.

Table 1 Grouping of means for the nutritional contents of macronutrients (P, K, Ca, Mg and S) evaluated in parents and F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

The grouping of means for the micronutrient contents are shown in Table 2. The segregant population P7 stood out by obtaining higher means for Cu and Zn in Aquidauana and Fe, Mn, and Zn in Chapadão do Sul. The population P21 stood out by having higher Cu and Mn means in Aquidauana and Fe in Chapadão do Sul.

Table 2 Grouping of means for the nutritional contents of micronutrients (Cu, Fe, Mn e Zn) evaluated in parents and F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Joint analysis of variance and grouping of means for agronomic traits

There were significant effects of genotypes and SCC (p-value < 0.05) for grain yield (Table S2). Environmental effects (E) were significant for DM and GY. G × E, GCA × E and SCA × E interactions were significant for DM.

Table 3 contains the grouping of means for the agronomic traits DM and GY. The G1 parent showed the highest mean for DM in Aquidauana. The segregant populations P1, P5, P16, P19 and parents G2 and G7 showed higher means for DM in both sites. Segregant populations P3, P10, P18, P21, P22, P24, P28 and parents G5, G6 and G8 obtained the highest DM means in Chapadão do Sul. Segregant populations P6, P7, P11, P17, P20, P26 and P27 obtained higher means for GY.

Table 3 Grouping of means for the agronomic traits days to maturity and grain yield (kg ha−1) evaluated in parents and F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Analysis of variance and grouping of means for industrial traits

There was significant effects of genotypes (G) and environments (E) (p-value < 0.05) for the industrial traits PC and OC (Table S3). There were significant GCA and SCA effects for OC. G × E and SCA × E interaction were significant for all the evaluated traits. GCA × E interaction was significant for all traits except AC.

Table 4 contains the grouping of means for the industrial traits. The segregant population P16 obtained the highest means for PC in Chapadão do Sul, OC and FC in both sites. The parents G2 and G5 and segregant populations P10, P15 and P20 obtained higher means for PC in both sites. The segregant population P24 obtained higher means for FC in both environments. Populations P13 and P17 obtained higher AC in both environments evaluated.

Table 4 Grouping of means for the industrial traits protein (PC), oil (OC), fiber (FC) and ash (AC) contents assessed in leaf samples from parents and F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Combining ability for nutritional traits

The G3 parent stood out for contributing positive values to the contents of K, Mg, Ca, Fe, Mn, and Zn in both evaluated environments (Table 5), except for Fe levels in Aquidauana and Mn in Chapadão do Sul, whose values were negative. Similarly, G8 presented positive estimates for K, Mg, Ca, Fe, Mn, and Zn in both locations, except for K levels in Chapadão do Sul and Ca in Aquidauana, which showed negative values.

Table 5 Estimates of the general combining (gi effect) ability for nutritional contents of K, Mg, Ca, Fe, Mn, and Zn, evaluated in soybean parents in Aquidauana (E1) and Chapadão do Sul (E2).

SCA estimates for macronutrient contents evaluated in the F2 segregating populations of soybean are shown in Table 6. Population P7 stood out for presenting positive estimates for the contents of P, K, Ca, Mg and S at both locations, except for the K levels in Chapadão do Sul. Similarly, P20 also presented positive estimates for these traits, except for S contents in Chapadão do Sul.

Table 6 Estimates of specific combining ability (sij effects) for macronutrient contents (P, K, Ca, Mg, and S), evaluated in F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Table 7 contains CEC estimates for micronutrient levels evaluated in F2 segregating populations. Population P7 stood out for presenting positive estimates for Cu, Mn, and Zn levels at both locations. Population P13 presented positive estimates for all micronutrients except for Fe in Aquidauana and Zn in Chapadão do Sul, which were negative. Population P15 presented positive estimates for all micronutrients at both locations except for Mn.

Table 7 Estimates of specific combining ability (sij effects) for micronutrient contents (Cu, Fe, Mn e Zn), evaluated in F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Combining ability for agronomic traits

For the GCA of the DM trait (Table 8), the parent G3 stood out for presenting negative values at both evaluated locations. Conversely, the parents G6 and G8 presented positive GCA estimates for this trait in Aquidauana and Chapadão do Sul.

Table 8 Estimates of general combining ability (gi effects) for the days to maturity (DM) trait in soybean parents in Aquidauana (E1) and Chapadão do Sul (E2).

The segregating populations P1 and P15 stood out by presenting positive SCC estimates for the traits days to maturity and grain yield (Table 9). Other populations that deserve to be highlighted are P11, P12 and P26 for presenting negative estimates for DM in both locations, as well as positive GCA estimates for grain yield.

Table 9 Specific combining ability estimates (sij effects) for agronomic traits days to maturity (DM) and grain yield (GY) evaluated in segregating F2 populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Combining ability for industrial traits

The parent G5 stood out by contributing positive values for all industrial traits in both locations (Table 10).

Table 10 General combining ability estimates (gi effects) for the industrial traits protein (PC), oil (OC) and fiber (FC) contents evaluated in soybean parents in Aquidauana (E1) and Chapadão do Sul (E2).

Segregant populations P1, P24, P25, P26 and P27 stood out by showing positive estimates of protein content for both environments (Table 11). The segregant population P6 showed positive estimates for protein and oil content in Aquidauana and Chapadão do Sul. The population P11 obtained positive estimates for oil and fiber in both locations and ash in Aquidauana. The segregant population P19 obtained positive estimates for oil and ash contents in both locations evaluated.

Table 11 Specific combining ability estimates (sij effects) for the industrial traits protein (PC), oil (OC), fiber (FC), and ash (AC) contents evaluated in F2 segregating populations of soybean in Aquidauana (E1) and Chapadão do Sul (E2).

Discussion

Diallel crosses allow estimation of the general combining ability (GC), which is associated with predominantly additive genes, and the specific combining ability (SCC), which is related to non-additive effect genes13. GCA was defined by Sprague and Tatum (1942) as the mean behavior of a parent line across a series of hybrid combinations, and this behavior results from the additive gene effect of the alleles. These authors defined the SCA as the vigor of a cross compared to that expected by the estimated GCA of the parents used in hybridization, which is determined by dominance genetic effects (complete or partial) and or epistasis.

SCA is interpreted as an additional effect on hybrid expression regarding the parental GCA effects, and can be positive or negative. SCA results from the interaction of parental GCA effects and can improve or worsen hybrid expression relative to the expected effect based on GCA alone14. SCA effects, estimated as deviation of behavior from what would be expected based on GCA, are measures of non-additive gene effects, those hybrid combinations with more favorable SCA estimates, involving at least one of the parents that showed the most favorable GCA effect, are desirable15.

We can note that the parents showed significant differences regarding the genotypes × environments interaction for most of the nutritional contents, which is one of the main challenges in the choice and recommendation of superior cultivars. This interaction allows the emergence of stable genotypes for specific environments or genotypes with general behavior adapted to a wide range of environments16. In this research, besides the distinct climatic factors (Fig. 2) of each location, the physical–chemical properties of the soil are important factors for the occurrence of significant genotype × environment interaction. Besides the need to obtain more productive cultivars, it is necessary to spend more on fertilizers. Adopting nutrient-efficient genotypes is a strategy, especially in cerrado soils, aiming to save costs and prevent environmental impacts.

When evaluating the nutrient contents, we found that the P content was significant for CEC, in which the parents G1, G5, and G7 and segregating populations P4, P5, P11, P14, P18, P19, P24, and P27 stand out. Thus, the difference between the means of the nutrients evaluated does not summarize only the individual behavior of the genitors11,17, but can also be attributed to environmental growing conditions. The soils of the cerrado are very weathered, with low levels of plant-available P, besides retaining nutrients in their colloids18. Under P limiting conditions, several metabolic problems can occur in the plants leading to delayed maturation and yield decrease19.

Obtaining information on the uptake and metabolization of P in plants allows the selection of these lines that have good development in soils with low P contents, in addition to makes it possible to use less phosphate fertilizers, which is essential for the sustainability of agricultural production aiming avoiding environmental problems caused by the incorrect use of fertilizers20. By studying P efficiency and responsiveness in soybean genotypes21, classified the cultivars studied into four groups: efficient and responsive, efficient and non-responsive, non-efficient and responsive, and non-efficient and non-responsive. The authors found that selecting P-use efficient cultivars in an environment with low availability of this nutrient favored the selection of cultivars responsive to the nutrient.

Using genotypes with a better capacity to accumulate potassium (K) contents results in improved carbohydrate and protein metabolism and starch translocation, which are used in grain formation22. Similarly, the selection of parents and segregating populations with higher Ca content provides enhanced structural metabolism, since the element acts in the cell wall synthesis, pollen tube growth, and pollen grain germination23. Genotypes more efficient in sulfur (S) uptake and metabolization, which has a structural and metabolic function in plants24 may be a promising strategy to develop cultivars better adapted to degraded soils, especially in the Brazilian Cerrado.

Mg content was significant for GCA, for which the genotypes G1, G3 and G7 and the segregating populations P1, P2, P20, P21, P24 and P27 stood out. Thus, at least one of the parents used in the crossings differed from the others regarding the concentration of alleles favorable for higher Mg expression17. The differential of obtaining lines with higher Mg contents in the plant is related to the role of this nutrient in activating enzymatic reactions and its presence in the structural part of plants (chlorophyll molecule)25.

In this context, the increasingly intensive use of soil, due to successive cropping, may result in degradations that lead to nutritional disorders in plants. Given this scenario, selecting genotypes containing higher levels of micronutrients is crucial for breeding programs. The parents G3 and G8 and the populations P7, P13, and P15 stood out for their relationship with micronutrients. Thus, selecting efficient genotypes in micronutrient uptake and metabolization makes the plant require a reduced amount of nutrients and have the same performance as the others to grow adequately in areas with nutrient limitations.

Besides improving plants' nutritional efficiency, soybean breeding programs have also aimed to develop cultivars combining higher earliness and grain yield26,27. According to Almeida et al., soybean cultivars can be classified as early (111 days), semi-early (112 to 124 days), and late (above 125 days). Soybean is a crop strongly influenced by weather conditions. By using early genotypes, farmers can minimize losses from end-of-cycle diseases26,28, besides allowing the growing of second-season maize29.

In the search for genotypes adapted to specific environments, the duration of the vegetative phase is an essential attribute to be considered. Plants that do not have juvenility genes will flower early, thus reducing the plant's size and leading to losses in grain yield30. Selecting early-flowering genotypes may lead to lower grain yield since these plants have a reduced height and number of nodes31.

Regarding DM, it is desirable that the genotypes present lower means (i.e., higher earliness), and that at least one of the parents has negative GCA estimates32. Parent G3 showed high negative GCA estimates for DM, suggesting a high concentration of alleles favorable to shortening the cycle of these soybean lines. The significant GCA effects indicate that some parents will contribute with a higher number of favorable alleles transmitted to the offspring33.

Considering the segregating populations evaluated in this experiment, P2, P9, P11, P12, P14, P23, P25, and P26 stood out for presenting negative specific combining ability in both locations evaluated. These findings reveal the possibility of obtaining earlier genotypes from these crossings after a few inbreeding generations34. However, among these populations, only P11, P12, P25, and P26 showed positive values for grain yield, with P11 and P26 showing the highest means.

Besides combining favorable traits such as nutritional efficiency, earliness and high yields, the current soybean cultivars must have improved levels for traits of industrial interest. With the increasing consumption of animal protein, the demand for bran for poultry, cattle and confined swine feed has been rising. To supply this sector, the industry needs soybean to have high grain protein and oil contents, which are at least 40% protein and 20% oil contents, while the national average protein is around 37%35. Early and indeterminate cycle cultivars tend to have higher protein contents, which may be a response to increased exposure to solar radiation and heat during the grain-filling phase36. By evaluating four soybean cultivars developed and improved in Brazil37, found protein contents ranging between 33.4% and 35.1%, values below those found here.

However, the increase in ash, fiber, oil and protein contents in soybean grains is a complex task for breeders, due to the high environmental influence on the genes38 and the existence of a negative relationship between these traits and with grain yield. For example, it is known that fiber levels have decreased over the years due to the increase in oil content, variables that are negatively correlated. Likewise, the oil content tends to decrease by selecting genotypes for higher protein content. Thus, the genotypes that stood out for the industrial variables should be used further in the breeding process, seeking to improve such traits simultaneously with the other characteristics of interest in the breeding pipeline.

The parents and segregating populations showed distinct responses for selecting nutritional, earliness, yield, and industrial traits. These genotypes should be monitored in the breeding process because they guide the breeders toward what they want to improve and attempt to achieve genotypes containing one or more traits of interest in a soybean cultivar.

Seeking to identify segregating parents and populations of soybean that get better characteristics regarding the uptake and metabolism of nutrients, earliness, yield, and contents of ash, fiber, oil and protein, our study aimed to identify this genetic variability between parents and populations through a diallel analysis. Our findings reveal that the parent G3 and the segregating populations P20 and P27 can be used for improved nutritional efficiency in new soybean cultivars. The segregating populations P11 and P26 show higher potential for selecting genotypes combining early maturity and high grain yield. The parent G5 and segregant population P6 are promising for selection seeking to improve industrial traits in soybean.

Materials and methods

Obtaining the progenies in the F1 generation

Hybrids were obtained in the greenhouse, using commercial cultivars as parents (Table 12). Divergence based on the relative maturity group (RMG) was considered as a selection criterion for the parents. Twenty-eight crossbreeds were performed to obtain the F1 generation, as described in Table 13. All methods were carried out in accordance with relevant guidelines with relevant institutional, national, and international guidelines and legislation.

Table 12 Characteristics of soybean cultivars used as parents.
Table 13 List of the twenty-eight F1 soybean populations obtained.

Obtaining the progenies in the F2 generation

Cultivation of the F1 hybrids was carried out in the greenhouse. The hybrids were sown in 3 L plastic pots (0.4 m of height and 0.3 m of width) and, after identification of the hybrid plants characterized by the purple color of the hypocotyl, one plant per pot was kept. Pest and disease control was performed according to technical recommendations for the crop using 2 L of soil in each pot.

Conducting the F2 generation

Conduction of F2 populations was carried out in two locations: Aquidauana and Chapadão do Sul (Fig. 1). In the first site, the trial was installed at the State University of Mato Grosso do Sul, University Unit of Aquidauana (20°27′S, 55°48′W and average altitude of 120 m). The region's climate is Aw (Tropical Savanna) with mean annual rainfall of 1200 mm, and mean annual temperature of 24.2 °C.

Figure 1
figure 1

Location of experiments in the State of Mato Grosso do Sul (MS), Brazil.

In the second site, the trial was carried out at the experimental field of the Federal University of Mato Grosso do Sul, Chapadão do Sul Campus (18°46′S, 52°37′W and average altitude of 810 m). The region's climate is classified as Aw, with mean annual rainfall of 1850 m and mean annual temperature of 20.5 °C.

In Aquidauana, the soil of the experimental area was classified as a sandy-textured Red Dystrophic Argissolo, with the following chemical properties: pH (CaCl2) = 6.2; organic matter = 19.7 (g dm−3); P = 67.5 (mg dm−3); H + Al = 3.2; K = 32.0 (mg dm−3); Ca = 3.30 (cmolc dm−3); Mg = 2.10 (cmolc dm−3); cation exchange capacity (CEC) = 5.1 (cmolc dm−3); base saturation (V) = 45.0%.

The soil of the experimental area in Chapadão do Sul was classified as Red Dystrophic Latossolo, and has the following chemical properties: pH (CaCl2) = 4.8; organic matter = 17.6 (g dm−3); P = 5.0 (mg dm−3); H + Al = 5.3; K = 69.0 (mg dm−3); Ca = 1.6 (cmolc dm−3); Mg = 0.5 (cmolc dm−3); cation exchange capacity (CEC) = 7.6 (cmolc dm−3); base saturation (V) = 30.0%. Three months before sowing, liming was performed on the soil of both experimental areas to raise the base saturation to 60%.

In both locations, the experiments were implemented adopting a tillage system with one plowing and two harrowing (crusher and leveling harrows). Row opening and fertilization were mechanized with a five-row seeder spaced at 0.45 m between rows. The base fertilizer used was 300 kg ha−1 of the 04-14-08 NPK formulation. Seeding was performed manually by distributing 15 seeds per meter.

Seeds were treated with fungicide (Pyraclostrobin + Methyl Thiophanate) and insecticide (Fipronil), at a rate of 200 mL of the commercial product for every 100 kg of seeds to protect against the attack of pests and soil fungi. For biological nitrogen fixation (BNF), the seeds were inoculated with Bradyrhizobium spp. bacteria using a rate of 200 mL of concentrated liquid inoculant for every 100 kg of seeds.

Crop management was performed according to the needs of the soybean crop. Figure 2 shows the weather conditions during the experiment.

Figure 2
figure 2

Weather conditions during the 2019/2020 crop season in the municipalities of Aquidauana (left) and Chapadão do Sul (right), MS, Brazil.

Experimental design and treatments

A randomized block design was used with two repetitions, eight parents (Table 1) and 28 F2 populations (Table 2). The plots consisted of one three-meter row, with 0.45 m spacing between rows and a density of 15 plants m−1. This size was adopted due to the limited quantity of seeds from the crosses carried out so that the genotypes could be evaluated in two locations.

Traits evaluated in the F2 generation

At 60 days after emergence (DAE), the nutritional contents of phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), copper (Cu), iron (Fe), manganese (Mn), and zinc (Zn) of each treatment were evaluated, following the methodology described in39.

For the nutritional analysis, we used the third fully developed leaf from the plant's apex, considered diagnostic for soybean nutritional analysis, where most metabolic processes responsible for energy acquisition occur. Twenty-five leaves with petioles were collected from each experimental unit. The nutritional contents of macronutrients were expressed in g kg−1, while micronutrients were expressed in mg kg−1.

Agronomic traits evaluated were: days to maturity (DM) and grain yield (GY, kg ha−1). DM corresponded to the days between emergence and maturation of more than 50% of plants in each experimental unit. GY was evaluated by harvesting the central 2 m of each plot and correcting for 13% moisture.

The measurement of protein (TP, %), total oil (TO, %), fiber (TF, %) and ash (TC, %) contents in F2 populations was performed by near-infrared spectroscopy (NIRS) (Metrohm, DS2500 spectrometer, Herisau, Switzerland) with high optical precision. Grain samples were homogenized and placed in a sampling dish. The analysis was based on illuminating a sample with a specific radiation wavelength in the near-infrared region and then measuring the difference between the amount of energy emitted by the spectroscope and reflected by the sample to the detector (AOAC, 2000). This difference was measured in several bands, creating a spectrum for each sample. The output was compared with a calibration set and expressed as a percentage.

Statistical analyses

Initially, a joint analysis of variance was performed in Genes software according to the statistical model described below:

$$ {\text{Y}}_{{{\text{ijk}}}} = \upmu + {\text{B/A}}_{{{\text{jk}}}} + {\text{G}}_{{\text{i}}} + {\text{A}}_{{\text{j}}} + {\text{GxA}}_{{{\text{ij}}}} + {\text{e}}_{{{\text{ijk}}}} $$
(1)

wherein: Yijk is the observation in the k-th block, evaluated in the i-th genotype and j-th environment; µ is the overall mean of the experiments; B/Ejk is the effect of the block k within the environment j; Gi is the effect of the i-th genotype considered as fixed; Aj is the effect of the j-th environment taken as random; GxAij is the random effect of the interaction between genotype i and environment j; eijk is the random error associated with Yijk.

Afterward, the unfolding of the genotype and the G × A interaction effects at each location was performed according to the partial diallel structure based on the progeny of F2 to obtain additive (gi) and dominance (sij) effects. Diallel analysis followed the model 4 proposed by Griffing40 to estimate general and specific combining abilities, as described below:

$$ {\text{Y}}_{{{\text{ij}}}} = \upmu + {\text{g}}_{{\text{i}}} + {\text{g}}_{{\text{j}}} + {\text{s}}_{{{\text{ij}}}} + {\text{e}}_{{{\text{ij}}}} $$
(2)

wherein: Yij is the mean of the crossbreeding between the i-th line from group 1 and the j-th line from group 2; µ is the overall mean of the diallel; gi is the general combining ability of the i-th line from group 1; gj is the general combining ability of the j-th line from group 2; sij is the specific combining ability between the lines from groups 1 and 2; eij is the mean experimental error.

Subsequently, grouping of means (overall across environments) of the 36 crossbreeds was performed by the Scott and Knott test at 5% probability level. All analyses were performed using the software Genes41, following the procedures recommended by Cruz et al.