Background & Summary

Nitrogen (N) is one of the most limiting factors for agricultural productivity, and mineral N fertilizers represent a key input for many cropping systems worldwide1. However, the over-application of N fertilizers to field crops has a significant impact on the environment through release of greenhouse-gas emissions and air pollutants2,3, groundwater pollution, and eutrophication of freshwater4 and marine ecosystems5. Increasing N use efficiency could reduce the footprint of N fertilization in agricultural systems while ensuring a high level of production. This goal could be achieved through more precise N rate recommendations and a better application schedule adapted to the crop N requirements.

Defining the optimal N fertilization is a daunting task, due to the large uncertainties in predicting soil N supply6, plant growth and N demand7. Nitrogen fertilization is often either insufficient (under high fertilizer prices) or excessive when growers adopt conservative overfertilization strategies8. Therefore, for improving N management and recommendation guidelines, both soil and plant processes should be considered for defining optimal N fertilization rates and application schedules.

Improving our understanding of the co-regulation of N uptake by the availability of N from the soil and plant N demand is relevant to define the overall critical N plant concentration (%Nc), herein defined as the minimum plant N concentration (%N) required to achieve maximum crop mass (W) for a given range of crop species in different environments and N-fertilized conditions. This critical value %Nc is known to decline over time as W increases, and the relationship between %Nc and W defines a function named “critical N dilution curve”9. This function is very useful to conduct crop N diagnosis because it allows the computation of the N nutrition index (NNI), defined as the ratio of the actual %N (%Nact) to the value of %Nc corresponding to the observed value of W (Wact). A NNI value close to 1 indicated N sufficiency, while a NNI below or above 1 reveal N deficiency and excessive N status, respectively.

Critical N dilution curves were established for several major field crops, including wheat (Triticum aestivum L.)10,winter oilseed rape (Brassica napus L.)11, maize (Zea mays L.)12, potato (Solanum tuberosum L.)13, rice (Oryza sativa L.)14, and tall fescue (Festuca arundinacea Schreb.)15, among other crops. These curves have been parametrized using either data collected in single experiments or data set pooled from multi-year and multi-sites experiment networks. These curves are now used by many agricultural scientists and engineers for improving fertilization guidelines based on NNI. Many investigations were recently conducted to develop site-specific critical N dilution curves in different parts of the world for many crop species10,11,12,13,14,15. However, the scientific value of such local critical N dilution curves is debatable because these curves were parametrized using a small number of local data, leading to high uncertainty and to a risk of false discoveries16. Although sensitivity analyses were performed in some of these studies, they generally suffer from inadequate protocols and methods17. Thus, there is a need for generic and robust critical N dilution curves valid across a large number of crop species in different environments and N-fertilized conditions18. To date, any attempt to develop generic critical N dilution curves has been hampered by the lack of a reliable large-scale dataset. Yet, a global dataset including a large number of pairs of W and %N values is required to drive research efforts on N plant status diagnosis.

Here, we present a new dataset resulting from a systematic search of all data made publicly available on critical N dilution curves. Our dataset contains 4454 observed pairs of plant W and %N for 19 major crop species collected between 1982 to 2021 in 16 countries worldwide. These data were extracted from a total of 36 peer-reviewed scientific manuscripts. It offers a unique source of information for developing generic critical N dilution curves and identifying knowledge gaps to guide future research programs on N plant status diagnosis.

Methods

Data collection

A literature search was executed between June and August 2021 (2 months) using the following keywords: ‘Nitrogen dilution curve’ & ‘Nitrogen nutrition index’ & ‘Critical nitrogen concentration’ & the name of each crop. The search was performed in the scientific databases ‘ScienceDirect’, ‘Scopus’, and ‘Science Citation Index (Web of Science)’. A last update on this revision to conclude the dataset and include any potential new studies was executed during January 2022. The overall selection process was conducted using the R package revtools19. A total of 1612 potentially critical papers were identified by screening the title of the paper, and the following steps on the selection process are highlighted in Fig. 1. The first step filtered papers by excluding studies that did not have relevant keywords and abstract information, resulting in the exclusion of 1124 papers. As the second step, manuscripts not reporting data of W and %N for different sampling times (corresponding to different vegetative stages of fields crops) were excluded; a total of 320 papers were removed at this step. At the third step, relevant studies were selected using the following criteria: i) at least 3 or more sampling dates during the vegetative period are recommended to explore a large variation on both plant W and %Nact to achieve a reasonable level of uncertainty for the critical N dilution curve (Fig. 2), ii) reporting of plant W and %Nact, iii) at least 3 or more fertilizer N rate levels, iv) reporting of crop, location, and year of experiment. In each study, three or even more N fertilizer rates are required to discriminate the non-limiting from the limiting N data, and then to determine the maximum plant W (Wmax) achieved at each sampling date. Data were visually inspected to verify that at least one biomass could be considered obtained under non-limiting nitrogen conditions on each observation date. This step helps to assure that from the studies included within a crop, a plateau of plant W has been achieved reducing the uncertainty for the estimation of the critical plant %N in the dilution model. In addition, the determination of critical %N-W data point require the fitting of a linear and plateau %N -W model (Fig. 2). To avoid issue related to lack of identifiability (lack of sufficient data to allow the estimation of the linear-plus-plateau model and the determination of the critical %N-W model), it is necessary to have at least three fertilizer N rate to fit this type of model (Fig. 3). Based on these criteria, 132 papers were discarded (Fig. 1). The total number of papers retained from this search were 36, each paper provided data for one crop except for two papers providing data for two crops each and another one for four crops, totalizing 41 entries (further details presented in Table 1).

Fig. 1
figure 1

Sankey diagram describing paper search, collection, filtering, and selection.

Fig. 2
figure 2

Standard framework considered to determine critical N dilution curve for plant N concentration (%) and plant biomass (W). The white points correspond to observations (bars indicate the standard error of the mean) and the black points correspond to critical %N (minimum %N leading to maximum biomass Wmax) determined by fitting a linear-plus-plateau response model at each sampling time. The critical N dilution curve passes through the black points. Here between 3 and 6 different fertilizer N rates are available at each sampling dates. Data and figure redrawn from Plénet and Lemaire12.

Fig. 3
figure 3

Theoretical representation of the linear-plus-plateau model for the plant N concentration (%N) and plant biomass (W) for three different scenarios (A) with an identifiable linear-plus-plateau model with four fertilizer N rates, (B) non-identifiable linear-plus-plateau model with three fertilizer N rates (“lack of identifiability”), and lastly (C) an identifiable linear-plus-plateau model with three fertilizer N rates.

Table 1 Study identification (ID), author/year, species, country, experimental design, years of study, fertilizer N levels and rates, genotypes, number of observations (and sampling times), and main topics for 19 crop species around the globe.

The data from all those papers were manually extracted (with the main plant traits reported or derived), together with relevant details from each study (e.g., author, year of experimentation, fertilizer N rates, description of treatments). If the data were not available in table format, information was retrieved from figures. Data extraction was assisted using the R package juicr20. The main crop species, papers, number of observations and geographical distribution of the data are presented in Fig. 3. Among the 36 papers selected (between 1982 and 2021), 33 report data for one crop species, 2 papers report data on 2 crops, and one study provides data on 4 crops. The dataset includes 4454 pairs of observed W and %N for different field crops and N fertilizer rates (Table 1). For each of the 19 crop species reported on this study, the total number of pairs of observed W and %N per crop was: annual ryegrass (Lolium multiflorum Lam.) 174, broomcorn millet (Panicum Miliaceum L.) 144, cotton (Gossypium hirsutum L.) 80, fodder beet (Beta vulgaris L. ssp. vulgaris) 72, hybrid ryegrass 294, maize 1530, oat (Avena sativa L.) 70, perennial ryegrass (Lolium perenne L.) 19, potato 89, rescue grass (Bromus catharticus H.B.K.) 30, rice 709, sorghum (Sorghum bicolor L.) 95, sugarcane (Saccharum officinarum) 112, sunflower (Helianthus annuus L.) 67, sweet potato (Ipomoea batatas) 120, tall fescue 346, timothy grass (Phleum pratense L.) 256, wheat 234 and white cabbage (Brassica oleracea L.) 13 observations in 16 countries (Fig. 4).

Fig. 4
figure 4

Geographical distribution of the observations included in the dataset. The distinct sizes of the circles indicate the amount of data (one observation corresponds to a pair of crop biomass and plant N concentration) for a given species at a given location, while point colors indicate crop species.

Data Records

The data are accessible on the figshare repository21, available at https://doi.org/10.6084/m9.figshare.19105049.v1, and which includes the following files:

  1. 1.

    “NNI_Database.csv” includes the data.

  2. 2.

    “Summary of the database.docx”, includes a summary of the dataset (meta-data), defining each column, trait collected in the data and the units for each variable.

  3. 3.

    “List of references.docx”, presents all the references of the publications included in the dataset.

  4. 4.

    “Figures_NNI.zip”, includes all the codes to build the figures of this study.

The “NNI_Database” contains all the information collected on this systematic analysis. The “Summary of the database” presents a description of the “NNI_Database” file with the information separated into three categories:

Category I, general details about the dataset, comprising information for author and publication year, and DOI or other identification for each study included in the dataset.

Category II, relevant to the study, defining species, country, experimental design, years of study, fertilizer N rates levels (and rates, kg ha−1), crop material (varieties/hybrids).

Category III, key for the dataset related to the pairs of crop biomass (W) and plant N concentration (%N), number of observations per identification number (ID). All the information of W and %N are reported in dry matter basis, as expressed in the data collected from those respective studies.

Table 1 describes the main characteristics of the 41 selected studies, including a specific identification number for each study by crop (ID, from 1 to 41 total), species, country for the study location, author, experimental design of the field trial, years where the study was carried out, number and N fertilizer rates, information on crop material, number of total observations, and relevant keyword for the study.

Data of plant %N and W are presented in Fig. 4 for all 19 species across sampling times starting at early vegetative growth stages. The W-%N relationships presented for different crops reveal contrasted plant N status (Fig. 5).

Fig. 5
figure 5

Relationship between plant N concentration (%N) and crop biomass (W) for 19 different crop species (annual ryegrass, broomcorn millet, cotton, fodder beet, hybrid ryegrass, maize, oat, perennial ryegrass, potato, rescue grass, rice, sorghum, sugarcane, sunflower, sweet potato, tall fescue, timothy grass, wheat, and white cabbage). Colors represent different crop species, and n represents the number of studies for each species.

Observed values of W and %N are typically used to fit critical N dilution curves. Fitted curves can help to delineate situations of luxury (excess of N), sufficiency, and deficient (lack of N) plant N status. A recent review of critical N curves obtained for maize crop18 reported negligible differences across studies. This result revealed that it is more relevant to fit generic critical N dilution curves from a large set of studies covering different environments rather than fitting individual curves to local data. Below, we show that our dataset can be used to establish more universal critical N dilution curve than those generally parametrized from local data.

Technical Validation

To demonstrate the practical value of the dataset, data collected can be used to parametrize a generic critical N dilution curve for the given field crop. After the step on checking for outliers, the field crop W and %N data is used to fit a critical N dilution curve using the Bayesian modeling approach proposed by Makowski et al.22. The fitted curve for the field crop can be then compared with a reference curve available in the scientific literature. Lastly as the final part of the second step, an independent dataset can be utilized to assess the plausibility of the NNI values derived from the fitted curve compared to the NNI values derived from the reference critical N dilution model established in the literature. Details of all steps are provided below.

Step 1 - Technical validation of the range of biomass and N%

This first step is relevant for inspecting the data and detecting potential outliers. Errors of data extraction were eliminated by comparing the extracted data with tables and figures of the original manuscripts. For each crop and paper, the data were inspected for outliers (by study and sampling time) based on the interquartile range (IQR) rule detection method23 and with boxplots, as summarized in the Supplementary material. Any observation with W or %N beyond the threshold of 1.5 difference for the IQR to third quartile was pinpointed as a potential outlier for each trait documented for those studies (Fig. 5).

Lastly, an overall N responsiveness (i.e., the difference between minimum and maximum fertilizer N rate reported by each study within a crop) was calculated to portray the variation on both W and %N obtained from this dataset due of differential N availability (Fig. 6). The N responsiveness was calculated as the difference between the medians of the maximum minus minimum fertilizer N rates across studies within a crop species [((median for maximum rate – median for minimum rate)/(median for minimum rate)) × 100]. Species were then ranked according to their responsiveness to N fertilizer for W and %N, respectively. For W, the species order from high to low responsiveness was: rescue grass (431%), hybrid ryegrass (217%), annual ryegrass (183%), oat (163%), perennial ryegrass (152%), tall grass (152%), timothy grass (83%), wheat (68%), maize (49%), rice (47%), broomcorn millet (45%), sunflower (40%), fodder beet (39%), white cabbage (32%), sweet potato (42%), potato (36%), cotton (21%), sorghum (9%) and sugarcane (6%). The ranking was similar for %N, with grasses portraying the largest differences near 100% between N rates. In contrast, negligible differences between N rates were observed for sorghum.

Fig. 6
figure 6

Boxplots for plant biomass (W) and N concentration (%N) for each species for the minimum (Min) and the maximum (Max) fertilizer N rates (median values included in each boxplot for all crop species) utilized in each study. The difference between Max and Min fertilizer N rates defines the N responsiveness of each plant trait (W and %N) for each species. The dots presented in each boxplot refer to the detected outliers based on the interquartile range (IQR) rule detection method21.

Step 2- Fitting a critical N dilution curve for maize

For this step, a case study was established for maize field crop utilizing a subset of the plant W-%N data from the 11 studies on this crop analyzed using a Bayesian model for estimating coefficients of the critical N dilution curve (Fig. 7). A standard equation was used to relate %Nc to W, specifically %Nc = A1 × WA2, where A1 and A2 are two parameters9,10,11,12,18,22. The modeling and validation of the critical N dilution curve were performed into four phases:

  1. (i)

    A pre-processing of the data was conducted following the approach used by Plénet and Lemaire12 for maize crop. First, the data were filtered to include plant W above 1 Mg ha−1. In addition, dates of measures were selected to include maize vegetative and reproductive growth until silking plus 25 days (or approximately until milk stage, R3 growth stage).

  2. (ii)

    A Bayesian hierarchical model was fitted to the data following the procedure defined by Makowski et al.22. A Markov chain Monte Carlo algorithm (MCMC) was implemented using the R package rjags24. The algorithm was first run with three chains of 50,000 iterations each. Convergence was achieved approximately near 50,000 iterations according to visual inspection of the trace plots and Gelman-Rubin diagnosis. These first 50,000 samples were discarded as “burn-in”, and the algorithm was run again during 100,000 additional iterations to determine the posterior medians and 95% credible intervals of A1 and A2 for maize crop.

  3. (iii)

    The results obtained were used to compute the posterior distribution for the critical N dilution curve. The critical N dilution for maize crop was estimated from 1 Mg ha−1 to the maximum value of W in the dataset. Median and 95% credible intervals were determined and plotted against the reference curve established for maize crop by Plénet and Lemaire12 (Fig. 7A).

  4. (iv)

    The critical N% determined using the Bayesian procedure on this data was compared against the corresponding values estimated by the reference curve for maize12. This reference N dilution curve was derived from five studies located in France. To avoid any bias due to the use of the same data for fitting and testing, an independent dataset of four maize studies25,26,27 was used to compute NNI using the two fitted critical N dilution curves (Bayesian N curve fitted to our global dataset vs. the traditional model for maize12). Error metrics describing the agreement between NNI values were computed using the metrica package28. The root mean square error (RMSE) and mean absolute error (MAE) are standard measures of prediction accuracy, quantifying the average magnitude of the errors in the predictions. The concordance correlation coefficient (CCC) is a measure of both accuracy and precision of the model, quantifying the agreement between an estimated and a reference value. From these metrics, the CCC reflected a strong agreement between both models (CCC = 0.96) and both the RMSE (RMSE = 0.053) and MAE (MAE = 0.046) confirmed that the critical N dilution curve developed with this dataset is very similar to the well-established reference curve12 available for maize (Fig. 7B). Lastly, the lack of departure of the two critical N dilution curves from the 1:1 line also confirm that these two models derived in similar NNI values. Thus, these critical curves are not statistically different, confirming the past findings for maize crop from Ciampitti et al.18. Likewise, Fernandez et al.15 reported a universal critical N dilution curve for 14 environments and N-fertilized conditions for tall fescue. A similar method could be used to fit critical N dilution curves for other species from our data.

Fig. 7
figure 7

Validation of a critical N dilution curve for maize field crop estimated using the current dataset. (A) Blue line represents the reference N dilution curve for maize defined by Plénet and Lemaire:12 %Nc = 3.4 W−037. Red and dashed lines represent the critical N curves (median) and their 95% credible intervals (CI), respectively: %Nc [95%CI] = 3.86 [3.68,4.06] W−0.44 [0.42,0.47]. (B) Comparison between the NNI computed using the reference curve by Plénet and Lemaire12 and the critical N dilution curve based on this dataset. Symbols represent the four independent studies (i.e. not included in the main dataset) used for comparison of the NNI estimates. The metrics determined using the metrica package27 were concordance correlation coefficient (C), relative root mean square error (RMSE), and mean absolute error (MAE).

Usage Notes

The current data set can be used to develop critical N dilution curves for the diagnostic of N nutritional status of wide range of crop species in different environments and N-fertilized conditions. Specifically, our dataset can be used to fit critical N dilution curves, assess their uncertainties, and determine the statistical significance of the difference between two critical N dilution curves.

In addition, the current data set could be used for testing the universality of critical N dilution curves among crop species. For example, when comparing the N dilution curves of multiple grasses (annual, hybrid and perennial ryegrass, oat, rescue grass, timothy grass, and wheat crop), we found that the parameters of the model (A1 and A2) did not differ significantly (Fig. 8). For the different species, the N dilution models were benchmarked with the critical N dilution curves established in previous studies. Results show that the curves were relatively similar for all species considered. Results also reveal the uncertainty (reflected as the length of the 95% credibility interval) is higher for the A2 parameter than for the A1 parameter, except wheat crop. The level of uncertainty depends on the number of observations within a study and on the total number of studies for a crop. When the number of data is small, the determination of the critical N curve can produce estimates with large uncertainty (wide credibility intervals).

Fig. 8
figure 8

Plant N concentration and biomass for grass species (annual, hybrid and perennial ryegrass, oat, rescue grass, timothy grass, and wheat crop), portraying the critical N dilution curves for grass forages and wheat (Marino et al.29; Agnusdei et al.30; Gislum and Boelt31; Jégo et al.32) (panel A), and parameters of the N dilution model (A1, A2), estimates (black squares) and their 95% credibility intervals for each grass species (panel B).

In the future, our dataset could be easily updated with data generated by new studies and, also, with previous studies where data were not available. Studies to be incorporated in this global dataset may be sent to the corresponding author (IAC, ciampitti@ksu.edu). The goal of our global initiative is expand the dataset by including more data and involve more collaborators. The ambition is the make the largest possible amount of W and %N data available and stimulate the development of reliable critical N dilution curves. In addition to %N, this database could be expanded in near future to cover other nutrients such as phosphorus, sulfur, and potassium, and their interaction with other environmental factors such water stress. Such a collaborative approach may set a milestone in plant physiology and nutrition from which more universal and reliable models could be developed to improve fertilizer management practices and reduce their environmental footprint.