Introduction

With the world’s population set to reach nine billion by the middle of this century, pressures on the food supply system are increasing (Godfray 2010). At the same time, inefficient use of nutrients in agriculture has led to perturbations of the global nitrogen (N) and phosphorus (P) cycles (Steffen 2015). A reduction of agriculture’s impact on the environment is urgently required. The major challenge for agriculture therefore is to maximize crop yields with fewer inputs, and hence, to increase the recovery of nutrients applied vis-a-vis mineral or organic fertilisers.

Nutrient recovery can be described in several ways, one of which is the apparent recovery, which describes the ability of a crop to take up a nutrient from added fertiliser. Studies indicate that the recovery of N, which is often the most limiting nutrient, can be increased when P is applied simultaneously (Colomb et al. 2007; Duan et al. 2014; Wang et al. 2009), emphasising the importance of ensuring adequate P availability. In Denmark, P availability is measured using the Olsen-P method, and one common practice to increase Olsen-P has been to increase the P budget (application minus P harvested). Apart from N and P, potassium (K) is the third major plant nutrient. Similar to P, low or unbalanced fertilisation can result in significant reductions of K availability (Zörb et al. 2014). Balanced nutrient applications thus play an important role for maintaining soil fertility, and subsequently for attaining a high use efficiency of agricultural inputs.

Long-term experiments are essential to evaluate the effects of different fertiliser strategies, and provide important information to evaluate the sustainability of agricultural systems (Johnston 1997). However, many long-term experiments focus on yield maximisation. Trials with low inputs where yields are likely to be limited by one or several factors throughout the crop cycle are sparse. Likewise, long-term studies examining yield stability have typically focussed on different cropping systems, and rarely include treatments with limiting or reduced nutrient applications.

Under low input conditions, early nutrient acquisition is essential for securing crop yields, but it can often be limited by low early season temperatures (Grant et al. 2001; Oelofse et al. 2013). At lower temperatures, solubility of nutrients in the soil solution, diffusion rates, and mineralization rates of organically bound nutrients are lower (Bassirirad 2000; Gardner and Jones 1973; Pendall 2004), while at the same time nutrient uptake may also be limited via reduced root growth (Gavito et al. 2001; Porter and Gawith 1999). Fertilisation strategies may therefore play an important role in improving early nutrient supply, i.e. to stabilize yields under adverse conditions.

Fertilisation strategies can also have effects on soil physicochemical properties such as soil carbon (C), which is closely linked to soil quality (Christensen and Johnston 1997). Changes of soil C may only become visible after repeated inputs over an extended period of time (Edmeades 2003). Long-term experiments are therefore vital for the evaluation of different management. Soil C content can be increased directly by the addition of C from organic inputs, or indirectly via improved crop growth and subsequent returns of plant residues (including roots and rhizodeposition). Several long-term studies have shown that crop production in response to common applications of mineral fertiliser or animal manure only slightly differed, but that applications of animal manure had an additional beneficial effect on soil C stocks (Blair et al. 2006; Christensen and Johnston 1997; Kätterer et al. 2014; Ludwig et al. 2011). In addition, in a meta analysis of 49 sites, cumulative manure-C inputs explained more than half the variability in soil C stock differences compared to mineral fertilised or unfertilised soils (Maillard and Angers 2014).

The Long-Term Nutrient Depletion Trial (LTNDT) in Taastrup, Denmark, was established over 50 years ago. The field did not receive any P or K containing fertilisers for 30 years, before commencement of the long-term input experiment in 1996. With its low soil fertility and low inputs treatments, the LTNDT can provide valuable information for future humid-temperate cropping systems in relation to the challenges as described above. In this study, we examine the effect of long-term mineral and organic nutrient applications on 1) soil nutrient availability indices, soil C concentrations and crop N-uptake efficiency; and 2) yields of winter and spring sown crops in relation to early season temperature variability. We hypothesised that 1) soil fertility changes, as expressed by the Olsen-P and exchangeable K soil indices, are closely related to the nutrient budget; 2) apparent N recovery (ANR) is higher in treatments with long-term P applications; 3) the amount of applied C in animal slurry, rather than the fertiliser value, is the dominant driver for soil C accumulation; and 4) relative yield responses to long-term fertilisation of winter and spring sown crops are associated with early season temperatures.

Materials and methods

Experimental site and management

The LTNDT, which was established in 1964, is situated on the experimental farms of the University of Copenhagen, 20 km west of Copenhagen, Denmark (55°40’ N, 12°17′ E). The soil type is a sandy loam (clay 164 g kg−1, silt 173 g kg−1, fine sand 333 g kg−1, coarse sand 312 g kg−1, and organic matter 17 g kg-1), and is classified as a Luvisol (FAO 2015). The experimental history is illustrated in Fig. 1. Briefly, the field was continuously cropped with cereals between 1964 and 1995, while only receiving N containing mineral fertiliser (60 kg N ha−1 a−1). A long-term fertilisation experiment was implemented in late 1995 with seven mineral fertilisation and animal slurry treatments, and two more or less equal crop rotations (but different positions in the rotation each year for the two) in a complete block design. The average soil C and N concentrations in 1995, when the long-term treatments commenced, were 10.0 g C kg−1 and 1.1 g N kg−1, with a C/N ratio of 9.5. Olsen-P and exchangeable K were 10.6 mg kg-1and 54.8 mg kg−1 respectively. After 13 years, the seven application treatments were omitted for a single year in 2009 in order to evaluate the long-term residual effects. A detailed documentation of the initial soil properties, and for the period 1995–2009 are given by van der Bom et al. (2017).

Fig. 1
figure 1

Development of the Long-Term Nutrient Depletion Trial over its 50-year history, with a 30-year depletion phase (1964–1995) prior to commencement of the input trial. The current design (since 2010) includes 14 mineral and organic (slurry) fertiliser applications. Indicated nutrient application levels are in kg ha−1 a−1 of each specific element (N = nitrogen, P = phosphorus, K = potassium, U = unfertilised). Slurry was applied on an NH4-N basis, which was aimed to corresponding with the mineral N levels (i.e. the coding corresponds to N level: M½ = N½, M1 = N1). See supplementary materials for exact annual slurry applications during the current experimental period, and see van der Bom et al. (2017) for more information about the period before 2010. Figure from van der Bom et al. (2018)

The seven fertilisation strategies in the experimental design of 1995 generated an incomplete and relatively narrow range of P and K budgets, therefore only relatively small changes in soil nutrient availability indices were observed. The experimental design was therefore expanded in 2010, with a broader range and more complete combinations of input treatments (now 14 in total; Fig. 1), including a significant P surplus (to gradually increase soil P stocks and availability from the original rather low fertility), and only a single crop rotation. Furthermore, along the duration of the long-term experiment it was observed that the slurries that were imported from nearby farms showed a trend of declining P concentrations (related to farmers’ improved feeding practices). Therefore two new slurry treatments were included in the new design, to which additional mineral P was applied. The diversification of treatments was achieved in such a way that the original seven treatments remained intact in half of each plot, while the new treatments were formed in the other half by adjusting the N, P or K application rates. All treatments were replicated in four blocks and had a gross plot size of 50 m × 20 m. No permanent borders between gross plots were established, but only central net plots of 40 m × 10 m were sampled, providing an effective 10 m border zone between all plots.

Every year all the plots were ploughed and sowed based on the specific crop requirements, and on the prevailing weather and soil conditions of that year (Table 1). The soil was left bare each year between harvest and sowing. Ploughing was done in one direction one year and the opposite direction the next, in order to minimize continuous soil movement between plots. The soil amendments were applied as a basal fertiliser application before sowing in the case of spring crops, or as a springtime top dressing for the winter crops. Mineral N was applied as calcium-ammonium-nitrate-sulphate (27% N and 4% S), P as triple-superphosphate (21% P), and K as potassium chloride (KCl, 50% K). For the slurry treatments (coded ‘M’), pig slurry was applied at an NH4-N level that was aimed to be on par with the respective mineral N treatments (M½ = N½, M1 = N1). When applied on bare soil, the slurry was incorporated into the soil within 6–12 h to minimise ammonia losses. In several of the earlier years in the long-term experiment (1996–2008), instead of pig slurry, cattle slurry was applied. Herbicides, fungicides or insecticides were used as required according to standard agronomic management practices. Soil pH (in 0.01 M CaCl2) was brought to 5.5 (2016, corresponding to a pH(H2O) of 6.0), by intermittent liming in the autumns of 2004 and 2015, at a rate of 3 t ha−1 and 2.5 t ha−1 respectively. Nevertheless, some variations existed between the individual treatments (i.e. pH (CaCl2) varied between 5.1 and 5.9 depending on the treatment; Table S1). Due to scattered crop Mg deficiency and low soil Mg test values, magnesium sulphate (Kiserit, 15% Mg; 75 kg Mg ha−1) was applied to the entire experiment in the spring of 2016. Finally, each year all aboveground biomass (grain and straw) was harvested from the field and, hence, only stubble residues were incorporated at the time of ploughing.

Table 1 The Long-Term Nutrient Depletion Trial crop rotation from 2010 to 2016, and the seasonal weather details. For spring crops, the 40 days temperature sum indicates the first 40 days after sowing, as calculated according to Eq. 5. For the winter crops, temperature sums and precipitation were calculated from the ‘spring growth inititation’ date, which indicates the first day of the season after which daily temperature sums were constantly above 0 dD. Each year the soil was left bare between successive crops

Sampling and chemical analyses

Annual yield determinations were made with an experimental combiner in 15 m2 (10 m × 1.5 m) subplots of each treatment. Samples of grain and straw were oven dried at 60 °C until constant weight, and then weighed. The samples were finely ground using zirconium oxide grinding balls. Total C and total N concentrations were measured on 120 mg subsamples, by dry combustion at 1200 °C, using an elemental analyser (Elementar, Macro EA, Germany). Total P and K were analysed using inductively coupled plasma-optical emission spectroscopy (ICP-OES) (Optima 5300 DV, PerkinElmer Inc., USA) after digestion of 100 mg subsamples with 2.5 ml 70% HNO3 and 1.0 ml 15% H2O2 in a pressurised single-chamber microwave oven (UltraWave, Milestone Inc., USA) and subsequent dilution to 50 ml using milli-Q water.

Soil samples were taken from the plough layer (0–20 cm) of each plot before the start of the experiment in December 1995, after harvest in October 2008, and in September 2016, following spring barley (1995, 2008) and spring wheat (2016) respectively. Between 25 and 30 cores were taken from each plot down to 20 cm depth using a soil corer (18 mm), following a ‘W’-shaped sampling pattern in the central 40 m × 10 m of each plot. The cores were bulked for each plot and the samples were directly analysed at a routine analytical lab (OK-Lab, Viborg) according to standard soil analysis methods in Denmark (Landbrugsministeriet Plantedirektoratet 1994), i.e. P according to the Olsen method; by extraction of 5 g soil (dry weight) with 100 ml of 0.5 M NaHCO3 adjusted to pH 8.5 (Olsen et al. 1954), and exchangeable K by extraction of 10 g soil (dry weight) with 100 ml of 0.5 M CH3COONH4 and 3 mM LiCl. Soil C concentrations were determined with an elemental analyser as described above.

Each year a sample of the slurry applications was taken and analysed directly, or frozen until further analysis. Concentrations of total N were determined using the Kjeldahl method, NH4-N by titration, total P by digestion by HNO3 and colorimetric analysis, and total K by ICP-OES or AAS (Table S2). Total C concentrations were estimated by measuring the dry matter content, and assuming a C concentration of 0.40 g C g−1 in dry matter (Johnston et al. 2017).

Calculations

Relative crop yields (%) were calculated for each crop and each year by:

$$ Relative\ yield=\frac{plot\ yield}{reference\ yield}\times 100\% $$
(1)

with ‘plot yield’ representing the DM grain yield (kg ha−1) of a specific plot, and ‘reference yield’ as the DM yield (kg ha−1) of the plot of the reference treatment. In order to compare with the predicted yields in the weather model as described below, treatment N½P½K½ was used as a reference. The maximum yield (Ym) under mineral fertilisation (either treatment N1P1K1 or N1P2K2) of each year was used for comparisons of treatment responses.

For each plot, the P and K budgets were calculated as the difference between nutrients imported via applications and nutrients harvested or removed from aboveground biomass (kg ha−1; both grain and straw):

$$ Nutrient\ budget= nutrient\ input- nutrient\ output $$
(2)

As only inputs from fertiliser and slurry applications and outputs via harvested biomass were considered, these are sometimes also referred to as partial budgets. The P and K budgets are an attempt to quantify the underlying changes expected to control the availability of P and K in the soil. As only inputs and offtake were considered for the calculation of the nutrient budgets, there will inevitably be a degree of uncertainty since other fluxes are not considered to be different, even if the budgets could be expected to affect these (e.g. leaching or runoff may depend on the magnitude of the budget). However, this ‘soil surface budget’ approach can be considered appropriate for the purpose of evaluating the net loading of the system (Oenema et al. 2003). In addition, the same approach was previously applied for the earlier experimental phase (1996–2008) in the LTNDT (van der Bom et al. 2017), therefore its use allows for a direct comparison between the two periods. The P and K applications (kg ha−1) were based on the fertiliser manufacturer’s specifications and on the annual analysis of nutrient concentrations in the slurry applications. Offtake (kg ha−1) was calculated from the harvested components and multiplied by the nutrient concentrations as measured in their respective samples. In a few cases, concentration values of specific plots that were missing were filled in based on the analysis of the same crop in the same plot in a previous year, which could be a source of uncertainty, however yield differences (kg ha−1) rather than nutrient concentrations were the major driver for treatment differences.

To evaluate the efficiency of the fertilisation treatments, P-balance efficiency (PBE) and K-balance efficiency (KBE) were calculated as described by Syers et al. (2008):

$$ Balance\ efficiency=\frac{nutrient\ output}{nutrient\ input}\ast 100\% $$
(3)

where ‘Nutrient output’ and ‘Nutrient input’ denote the average crop offtake and fertiliser application, respectively, of P or K (kg ha−1). A value exceeding 100% is an indication that soil reserves are being mined.

For each plot, C stocks to 20 cm depth (kg ha−1) were calculated from the measured soil C concentrations, using a bulk density that was estimated following Peltre et al. (2015), who described a linear relationship between soil C concentrations and bulk density for a long-term trial with mineral and organic amendments (since 2002) adjacent to the LTNDT; of the same soil type; and within the same experimental farm. For each slurry level (M½ and M1), slurry C retention (%) in the topsoil (0–20 cm) was then calculated as:

$$ C\kern0.2em retention=\frac{\Delta C\kern0.2em slurry-\Delta C\kern0.2em reference}{total\kern0.2em C\kern0.2em applied}\ast 100\% $$
(4)

Where ‘ΔC slurry’ indicates the average change of total soil C (kg ha−1) in the slurry treatment, between the initial level in 1995 and at the end of the experimental period in 2016; and ‘ΔC reference’ is the average change of total soil C (kg ha−1) as observed under the mineral applications. This correction was made to compensate for the C accumulation from increased crop biomass productivity, and hence to compensate for the sum of root and rhizodeposition C inputs. Slurry C retention was calculated for two scenarios: I) yield-related belowground C inputs, II) constant belowground inputs. The belowground C inputs into the topsoil (0–20 cm) for both scenarios were calculated from crop yields following the approach by Kätterer et al. (2011), who used the allometric relationships between crop productivity and C inputs described by Bolinder et al. (2007), and a Michaelis-Menten-type function to describe root distribution along the soil profile. A detailed overview of all the parameters is provided in the supplementary materials.

Nitrogen response subplots in 2016

In the 2016 growing season, three six-metre-wide subplots were added to all replicates of treatments N1K1, N1P1K1 and M1P1, with the goal of determining the fertilisation history effect on marginal crop N-uptake efficiency in the spring wheat crop grown that year. Three weeks after sowing, additional mineral fertiliser N was applied at application rates of 30, 60 and 90 kg N ha−1 (in effect creating three 120 m2 subplots with N levels elevated above the long-term N1 application rate of 120, i.e. 150, 180 and 210 kg N ha−1). Grain and straw dry matter yields and total N concentrations were determined as described above. Apparent N recovery from the N-response subplots was calculated as the slope of a linear regression on the steep part of the N response curve (i.e. up to the 180 kg N ha−1 level, excluding the highest level of 210 kg N ha−1).

Weather conditions and relative yield response model

Daily weather data were collected from an on-site weather station. In the case data was absent due to malfunctioning of the weather station, data was collected from the Danish Meteorological Institute (DMI). The DMI data are an interpolation from a nationwide grid and represent the area-weighted climate values ​​for a 10 × 10 km (precipitation) and 20 × 20 km (global radiation, air temperature and potential evapotranspiration) cell, corresponding to the location of the field trial (DMI 2012). The DMI data were calibrated with daily weather data from 1989 to 2008 to correspond to the on-site weather station. The mean annual temperature at the site was 8.1 °C, and the average annual precipitation was 639 mm (1962–2015). Seasonal temperature and precipitation for the experimental period are given in Table 1.

In a previous study of LTNDT spring barley yields and weather data (1996–2009; van der Bom et al. 2017) a statistical model was developed for the original seven treatments of the long-term trial (multiple R2 = 0.64). This model described that the relative yield response to fertilisation (expressed as the percentage of yield of N½P½K½) correlated with the interaction between the nutrient applications and early-season temperatures (during the first 40 growing days). The study also described that a similar model utilising moisture as explanatory variable was a poorer predictor, probably because winter precipitation in this location is generally sufficient to build up substantial soil water reserves. Moisture was therefore unlikely to be the limiting factor early in the season (van der Bom et al. 2017). To evaluate the model, 40-day temperature sums were calculated for the current seasons and crops, using the same approach:

$$ AT={\sum}_{i=d1}^{40}{dgC}_t $$
(5)

where AT represents the accumulated temperature and dgCt the temperature sum on day t. A base temperature of 0 °C was considered as the threshold (Strand 1987). As opposed to seeding date, the first day after the last frost was chosen as d1 for the winter crops (Table 1). Relative yields were then predicted on the basis of the relationships that were previously established in the original model, i.e. by entering the sums into the individual regression equations. Hence, the previous period (1996–2009) can be interpreted as a calibration set, and the current period (2010–2016) as a sample set for model validation. Because of the incomplete design of the original long-term experiment, commenced in 1995, the model is limited to the range of the original treatments: N½K½, N½P½, N1P1K1, M½ and M1, with the treatment N½P½K½ serving as the reference.

The performance of the model was determined by the coefficient of determination (R2):

$$ {R}^2=\frac{\sum_i{\left({y}_i-{f}_i\right)}^2}{\sum_i{\left({y}_i-\overline{y}\right)}^2} $$
(6)

where yi represents the observed values and fi the predicted values. The closer the R2 is to 1, the better the fit of the reference values (yi) to the regression line.

The accuracy of the model was determined by the root-mean-square error (RMSE):

$$ RMSE=\sqrt{\sum_{i=0}^n\frac{{\left({f}_i-{y}_i\right)}^2}{n}} $$
(7)

Statistical analyses

Soil data, P and K budgets, and P and K recovery were analysed by ANOVA, with treatment as the main factor and an error term for the field replication in blocks. When significant treatment effects (P ≤ 0.05) were indicated, post-hoc comparisons were performed using Tukey's HSD method.

Crop yields of each plot were used to construct Linear Mixed-effects Models (LMM) including crop, and fertilisation treatment as fixed effects. Because plots were repeatedly harvested over many years, a random error term with the nesting structure was included to account for the replicated block design in the field, and for the repeated measures. The contributions of the fixed effects were evaluated using Maximum Likelihood (ML), after which the final model was refitted using Restricted Maximum Likelihood (REML). Pairwise comparisons for each season were performed using least square means method with Tukey-adjusted comparisons (P ≤ 0.05).

A paired t-test was used for each treatment individually to compare Olsen-P, exch. K and total C concentrations between the beginning and the end of the experimental period. The relationship between fertility changes and P and K budgets for the period between the soil samplings (October 2008 – September 2016) were analysed by simple linear regression.

All statistical models were checked for normality of residuals and homogeneity of variance using diagnostic plots. The analyses were performed with R version 3.3.2 (R Core Team 2015), using the RStudio development environment (RStudio Team 2015), making substantial use of the agricolae (de Mendiburu 2016), nlme (Pinheiro et al. 2015) and ggplot2 (Wickham 2009) packages.

Results

Nutrient offtake, budgets and balance efficiency

Overall, P and K offtake during the 2010–2016 period increased with larger total nutrient applications (P < 0.001) (Table 2). The P budgets were negative under no-P fertilisation (−78 to −101 kg P ha−1) and improved when the half-P dose was applied (−19 to −37 kg P ha−1) (P < 0.05). As P offtake at the P1 and P2 levels was not significantly different to no P (N1) or P½ (N1P½K½), the larger applications resulted in an increase in P budgets (up to +163 under N1P2K2) (P < 0.05). Conversely, more P was harvested from N1P1K1 than from M½ (P < 0.05), but since the total inputs under N1P1K1 were higher, both budgets were similar (24–35 kg P ha−1). The additional P supplied to M1P1, N1P2K2 and M½P½ (compared with M1, N1P1K1 and M½) did not result in additional offtake, and hence the largest P surpluses were found in the order of total applications, M½P½ < M1 < N1P2K2 < M1P1 (P < 0.05).

Table 2 The phosphorus and potassium budgets of the fertilisation treatments from the Long-Term Nutrient Depletion Trial during the experimental period of 2010–2016 (averages ± standard error, n = 4). Different letters indicate statistically significant differences (P < 0.05) according to Tukey’s HSD test. Annual application levels of N, P and K for each treatment are given in Fig. 1. PBE = P balance efficiency; KBE = K balance efficiency

The largest K budget (+1204 kg ha−1) was naturally found under N1P2K2, which had the highest K application rate. The surplus decreased with lower application rates (P < 0.001). The K budget was between −183 (N½P½) and − 328 (N1) kg K ha−1 for the no-K treatments. The two M1 treatments accumulated substantially more K than both M½ treatments (K budget of 315–329 kg ha−1 for M1 vs. 67–89 kg ha−1 for M½, P < 0.05). Increases of K offtake following mineral P additions (i.e. M½ vs. M½P½ and M1 vs. M1P1) were not significant.

The PBE decreased with higher P application rates (P < 0.001) (Table 2, Fig. 2). The alleviation of N limitation tended to increase P offtake and hence crop recovery (i.e. PBE was on average 154% for N1P½K½ vs. 129% for N½P½K½, P < 0.05). The relationship between K applications and KBE was less clear, but as with P, KBE decreased with higher applications (P < 0.001, Fig. 2). Furthermore, KBE from soils receiving half-dose K applications (K½) was relatively high when K was applied together with P (on average 78% from N½P½K½,) and when applied with both P and additional N (N1P½K½, 91%), but this was not the case when K was applied with N alone (N½K½, 53%).

Fig. 2
figure 2

The relationship between annually repeated nutrient applications of P (top) and K (bottom) and the average nutrient balance efficiency (%, PBE = P balance efficiency; KBE = K balance efficiency) during the experimental period of 2010–2016. Different symbols indicate the half-dose and full-dose N applications, including equivalent slurry applications (applied at a similar NH4-N level, N½ = M½ and N1 = M1). Error bars indicate the standard error of the mean

Changes in soil-available phosphorus and potassium and their relationship to the budgets

Between Oct 2008 and Sept 20106, soil Olsen P declined for all no-P and half-P treatments except N1 and N1K1, but these showed a relatively large variation (Fig. 3). Interestingly, the greatest decline was found for N1P½K½ (−4 mg kg−1) (P < 0.05), even though it did not have the most negative P budget. Conversely, the greatest increase was not found for M1P1 (+3 mg kg−1), but for N1P2K2 (+5 mg kg−1) (P < 0.05).

Fig. 3
figure 3

Changes of soil Olsen P (left) and exchangeable K (right) in the period of Oct 2008 and Sept 2016 as analysed in soil samples taken after harvest both years. Exact P and K input levels for each treatment are given in Fig. 1. Error bars indicate standard error of the mean (n = 4). Stars indicate whether the treatment change between Oct 2008 and Sept 2016 was significant at P < 0.1 (+),P < 0.05 (*) or P < 0.01 (**) according to a paired t-test for each individual treatment. Mean Olsen P and exchangeable K concentrations in 2008 and 2016 can be found in Supplementary Table S3

Soil exchangeable K decreased in the period for all treatments where K was not applied, but the decline was only significant for N1P1 (P <0.05). Significant increases were found for N1P2K2, which almost doubled from 113 to 220 mg kg−1, N1K1 (+50 mg kg−1) and N1P1K1 (+39 mg kg−1) (P < 0.01). No increases were observed for M1 and M1P1, even though the budgets were similar to those of N1K1 and N1P1K1. However, changes for these two treatments were relatively variable.

Linear relationships between the P and K budgets and the soil tests explained 34% of the change in Olsen P (r2 = 0.34, P < 0.001), and 59% of the change in exchangeable K (r2 = 0.59, P < 0.001). However, there was a better fit between the P and K budgets and the actual soil test values at the end of the experimental period in 2016 (Fig. 4). Here, the nutrient budgets explained 68% of the variation in Olsen P (r2 = 0.68, P < 0.001) and 86% of the variation in exchangeable K (r2 = 0.86, P < 0.001).

Fig. 4
figure 4

Changes in Olsen P (top left) and exchangeable K (top right) in the period between harvest in 2008 and 2016; and actual 2016 Olsen P (left-bottom) and exchangeable K (right-bottom) vs. the cumulative balances of P and K in the same period. P < 0.001 for all shown regressions

Nitrogen response subplots

To examine the effects of the long-term fertilisation treatments on N-uptake efficiency, additional mineral fertiliser N was applied in 2016 to subplots under treatments N1K1, N1P1K1 and M1P1. Grain yields were between 3530 and 4203 kg ha−1 under N1K1 (Fig. 5). The grain yields increased under N1P1K1 and M1P1 (on average + 17% and + 11% respectively, P < 0.001). Additional N applications improved yields compared with the 120 kg ha−1 baseline (P < 0.001), but the yields did not increase further when the N applications were larger than 150 kg N ha−1. Likewise, total N offtake (grain and straw) was highest under N1P1K1 (135–163 kg ha−1), which was significantly more than under N1K1 (120–145 kg ha−1P < 0.01), and additional N also led to increased N offtake (P < 0.001), but it tended to level off at a total N rate above 180 kg N ha−1. ANR, calculated as the slope of the linear increase between 120 and 180 kg N ha−1, varied between 42 to 50% but the differences were not significant (Fig. 5).

Fig. 5
figure 5

Average spring wheat grain yields (top) and total N uptake in grain and straw (bottom) under three different long-term fertilisation treatments (kg ha−1): mineral N1K1, 120 N-120 K; mineral N1P1K1, 120 N-20P-120 K and animal slurry M1P1, 120 N(NH4)-17P-86 K, plus four levels of additional mineral N inputs: 0, +30, +60, +90 kg N ha−1 (elevating the N1 applicatition level to between 120 and 210 kg ha-1). Equations indicate the linear regression between N application rates and total N uptake for the linear part of the curve (120–180 kg N ha−1). Error bars indicate standard error of the mean

Variation in total soil carbon concentrations, inputs and slurry carbon retention

After 20 years of inputs (2016), total soil C concentrations ranged from 11.1 to 14.6 g C kg−1, which was significantly higher than in 1995 for all soils except for N½, N½K½ and N1P1 (Fig. 6). The greatest increases were observed under the applications of animal slurry, with a total soil C concentration of on average 14.2 g C kg−1 (an increase of +3.8 g C kg−1), leading to significant differences with N½, N½K½ and N½P½ (11.1 to 11.5 g C kg−1, P < 0.05). The mean annual C inputs from slurry were 0.55 for M½ and 1.1 Mg ha−1 for M1 (Fig. 7). Despite this large difference in C inputs, changes in total soil C concentrations were similar (Fig. 6). Estimated root-derived C inputs were between 0.82 and 1.20 Mg ha−1 a−1 (Fig. 7). Inputs of slurry-C amounted to 33% (M½) and 46% (M1) of the total C inputs. In both scenarios the calculated retention of slurry-derived C was much higher for the half-dose M½ level (44 to 47% for M½, and 17 to 19% for M1; Table 3).

Fig. 6
figure 6

Changes of Total C concentrations (g kg−1) as analysed in soil samples taken from the Long-Term Nutrient Depletion Trial before the start of the long-term input experiment in 1995, and after harvest in Sept 2016. Mean total C concentrations in 1995 were 10.0 g kg−1. Error bars indicate the standard error of the mean (n = 4). Stars indicate whether the treatment change between 1995 and 2016 was significant at P < 0.05 (*) or P < 0.01 (**) according to a paired t-test for each individual treatment. Letters indicate significant differences between treatments in 2016 according to Tukey’s HSD test (P < 0.05). Mean total C concentrations for each treatment in 1995 and 2016 can be found in Supplementary Table S3

Fig. 7
figure 7

Estimated mean annual carbon inputs into the topsoil (0–20 cm) from stubble, roots (incl. rhizodeposition) and annual slurry-derived carbon inputs (1995–2016). Details of the treatment are presented in Fig. 1

Table 3 Total slurry-derived C retention (0–20 cm depth) for the two slurry application levels (M½ and M1) in the period 1996–2016. In scenario I belowground C inputs were calculated based on yield levels. In scenario II belowground C inputs were assumed the same for all treatments

Yield responses of autumn and spring-sown crops

With the exception of spring wheat in 2013, maximum yields (Ym, Table 4) were always observed under treatments N1P1K1 and N1P2K2, with no significant differences between them. Overall, winter wheat (2013) showed the highest productivity of all crops in the experimental period, with grain yields ranging from 6415 to 9441 kg ha−1 (Fig. 8). Nitrogen applications significantly increased the winter wheat grain yields (P < 0.05), but within the same N level there were no significant responses to P and K (Table 4). Although all slurry treatments tended to yield less than their mineral equivalents, only the difference between M1P1 (8232 kg ha−1) and its mineral N1P2K2 counterpart (9441 kg ha−1, P < 0.05) was significant.

Table 4 Seasonal maximum yields (Ym) and relative yields under no-P and no-K fertilisation (treatments N1K1: 120 kg N ha−1 + 120 kg K ha−1, and N1P1: 120 kg N ha−1 + 20 kg P ha−1 in mineral fertiliser). Relative yields were calculated according to Eq. 1, using Ym as the reference yield. Asterisks (*) indicate a significance yield reduction according to Tukey’s HSD test (P < 0.05)
Fig. 8
figure 8

Mean grain (seed) yields of crops grown under different mineral and organic input combinations from 2010 to 2016. Differently coloured bars indicate different (mineral or organic) N application levels (i.e. N½, N1, M½ and M1). A complete overview of the treatments and application levels can be found in Fig. 1. Grain yields are given in DM and are to be multiplied by 1.13 for comparison with farmers’ yields (at standard 13% moisture for storage stability). Error bars indicate the standard error of the mean (n = 4). Letters indicating statistical treatment differences (Tukey HSD) for each year are provided in Table S5 of the supplementary materials

Of the other winter crops, winter oilseed rape (2012) yields increased with higher N applications (3287 kg ha−1 for N½ vs. 4270 kg ha−1 for N1, P < 0.05). The responses to the different slurry levels or the different P and K levels were not significant. Winter barley (2011) tended to show very strong responses to the different P and K inputs (Table 4), with the most prominent increase resulting from the addition of P, more than doubling yields between N1K1 vs. N1P1K1 (P < 0.05). However, owing to considerable variability that year, most other pairwise comparisons were not significant.

Spring wheat yields differed considerably between the two years; Ym was 5827 kg ha−1 in 2013 and 4466 kg ha−1 in 2016 (Table 4). Within each year the differences between treatments were small (Fig. 8), particularly in 2013, when differences were limited to the extremes only (N½, N½K½ vs. M1, M1P1,P < 0.05). In 2016, N applications significantly increased yields (from 3195 kg ha−1 for N½P½K½ to 3966 kg ha−1 for N1P½K½, P < 0.05). Applications of P or K did not increase yields when they were applied exclusively. However, when P and K were applied together in high dose, yields increased with 27% (N1P2K2, P < 0.05). Noteworthy, in 2016 yields under the M½ treatments were lowest (2666 to 2884 kg ha−1) but they increased to close to Ym under M1 (3876 to 3990 kg ha−1, P < 0.05). Yields under the M½ treatments (4823 to 5017 kg ha−1) and the M1 treatments (5827 to 5520 kg ha−1) were not significantly different in 2013.

Lastly, for the three spring barley years, Ym varied between 4740 and 6022 kg ha−1 (Table 4). A consistent treatment ordering was observed, but the relative magnitude of the responses varied between the years (Fig. 8). Yields under slurry applications were predictably in the order of M½ ≤ M½P½ ≤ M1 ≤ M1P1. Compared with N only, mineral P applications tended to increase yields (+12 to +19% for N1P1 vs. N1), but this was not the case for K. Likewise, in all years yields declined compared with Ym when P fertilisation was withheld (−45% for 2010, −27% for 2014 and − 24% for 2015, Table 4), which was significant in 2010 and 2014 (P < 0.05). Yields under N1P1, i.e. when K was withheld, varied between 70 and 85% of Ym, depending on the year. This reduction of yield was significant in 2010 (P < 0.05).

Early season temperature variations and influence on relative yields

A maximum AT40d of 494 dD was observed in 2014 when spring barley was grown. The other spring barley years were cooler (388 dD for 2010 and 390 dD for 2015) (Table 1). For the spring wheat seasons, 2016 was mild (486 dD), but 2013 was cool (330 dD). For the winter crops, AT40d was calculated from the first day after which daily temperature sums were consistently above 0 dD, which led to values below the range of the original model (AT40d between 206 and 283 dD). The accuracy of the prediction of the relative yield responses varied between the crops (Fig. 9), and the R2 (coefficient of determination) values of the predictions were 0.78 for spring barley, 0.49 for spring wheat, 0.97 for winter barley, 0.18 for winter rape, and 0.79 for winter wheat. The root-mean-square error (RMSE) for the same predictions were 9.64, 13.88, 24.54, 27.34, and 17.04, respectively.

Fig. 9
figure 9

Observed vs. predicted relative yields (2010–2016) of five crops growing under different fertilisation treatments in the Long-Term Nutrient Depletion Trial. Error bars represent standard error of the mean. Diagonal black line indicates the 1:1 line between actual and predicted relative yields. Regression line indicates the correlation between actual (measured) and predicted relative yields (in % DM). r2 = coefficient of determination, RMSE = root-mean-square error. See Fig. 1 for treatment details. The relative yield predictions were made based on the seasonal temperatures during the first 40 growing days with yields of treatment N½P½K½ as a reference (100%)

Discussion

Balanced fertilisation

In most years, a clear N response was observed between the half-N (N½) and full-N (N1) levels, indicating that available N was a major limitation for crop growth (Fig. 8). Compared to N only, the addition of P often improved yields. Responses to K applications were mostly only observed for the complete NPK combinations (e.g. N1P1K1 vs. N1P1), hence after the limitation of N and P was alleviated. This can be interpreted as an illustrative example of Liebig’s Law of the Minimum and points towards a serial limitation of N, P and K (Olde Venterink 2016). Considering the field history of 30-year nutrient depletion, multiple limitations would indeed be expected. The soil type may explain K being the least limiting of the three macronutrients. The soil is rich in illite clay minerals, and can thus gradually release interlayer K from the non-exchangeable pool through mineral weathering.

The increased PBE at the N1P½K½ level compared with the N½P½K½ level (Fig. 2) can be explained by increased crop nutritional demand following enhanced growth, which led to additional mining of soil P. As full N1P1K1 and N1P2K2 fertilisation led to a large surplus of P (Table 2), it can be deduced from their crop P uptake that replacement P applications would be approximately 18 kg P ha−1 a−1 to fulfil the objectives of attaining high yields and minimising environmental impact (provided that N is also applied at the N1 level). Conversely, in all cases the KBE was lower than 100%, which indicates that all the K applications fulfilled crop K demand. Therefore it is not possible to make an exact estimate for K. However, the relatively high recoveries under N½P½K½ and N1P½K½ indicate that replacement K applications of slightly lower than the 60 kg ha−1 a−1 may be recommended. Conversely, the high output of K under treatments N1K1, N1P1K1, and N1P2K2 suggests a strong luxury uptake, which can often observed when K is applied in excess.

The slurry (Mx) treatments varied greatly between the years, most probably due to variations in plant-available N. For instance, in 2016 the slurry had a relatively low proportion of slurry NH4-N to total N content (Table S2), which led to markedly lower yields. The low PBE and KBE for the slurry treatments can thus be related to available N being a primary limitation for crop growth, while simultaneously applications of P and K via slurry were relatively high. Hence, the unbalanced ratio between N, P and K applications resulted in positive budgets for P and K.

Nutrient budgets and their relationship with soil indicators

The P and K budgets were closely related to the final Olsen-P and exch. K status, but the relationship with the observed change over the experimental period was less strong (Fig. 4). This was surprising since the long-term changes could previously be well predicted from the budgets (van der Bom et al. 2017). However, the current experimental period (2010–2016) is shorter than the previous 1996–2008 period. Furthermore, the comparison partially relies on historical data, therefore analytical bias or sampling error cannot be completely ruled out (e.g. in the case of P the poor fit was partially related to two relatively high observations of Olsen P in N1K1 and N1 in 2016 (Fig. 3)). Nevertheless, Olsen P is known to decline in a strongly nonlinear fashion, until it reaches a very low, essentially constant level (Johnston et al. 2014; McCollum 1991). When the Olsen P level is higher the rate of decline will initially be large, and will then gradually reduce over time (Dodd et al. 2012; Schulte et al. 2010). Some of the treatments with the lowest Olsen P budget (Fig. 3) had not received any P for over 50 years, and the decline in Olsen P of these soils may thus be levelling off. This is also reflected in the slope of the change of Olsen P (0.017, Fig. 4), which was smaller than the 0.022 as reported for the previous experimental period of 1996–2008 (van der Bom et al. 2017), though it would be larger (0.025) if the no-P applications are not taken into account.

From the slope of the relationship (0.017) it can be calculated that Olsen P increased by 1.7 mg P kg−1 for every 100 kg P ha−1 surplus. Although values of up to 6 mg kg−1 have previously been reported (Shepherd and Withers 1999), these were observed on sandy soils with a much higher P status (35 mg kg−1 and higher) and hence with a likely saturation of the sorption capacity. Conversely, on soils with a low Olsen-P status (up to 15 mg kg−1), changes as low as 1.4 mg kg−1 per 100 kg P ha−1 surplus have been reported (Cao et al. 2012), which is close to what was observed in the present experiment.

Up to 49 kg K ha−1 a−1 (N1) was taken up from treatments that did not receive any K inputs (Table 2), which was close to the 65 kg K ha−1 a−1 that was estimated as potentially being released from a similar soil (Simonsson et al. 2007). As discussed above, this loamy soil is rich in illite clay minerals, which means that it has a basal K supply capacity from mineral weathering.

Apparent recovery of applied nitrogen fertiliser

Both yields and total N yield were affected by the long-term addition of P (Fig. 5), which is in agreement with the observation that sufficient P fertilisation can increase N recovery (Duan et al. 2014). However, slurry treatment M1P1 tended to produce a slightly lower yield than the N1P1K1 treatment. As also mentioned above, the low slurry content of available N in 2016 probably led to limited yield responses. This was also reflected in the total N yield pattern which, in contrast to N1K1 and N1P1K1, did not level off after 180 kg N ha−1 for the M1P1 treatment (Fig. 5). Alternatively, speciation of P in the slurry treatments could have affected the availability of P in the M1P1, but although P forms were not analysed in the current experiment, pig slurry has previously been reported to consist of 91% inorganic P (Sharpley et al. 2013). Furthermore, in a review of 110 experiments, Möller et al. (2018) indicated that P in animal manures had quite similar fertiliser efficiency as soluble fertiliser P. Therefore N limitation seems a more likely explanation. The high ANR (on average 46%) was very close to the maximum ANR of 45% observed in the same field in 2009 (van der Bom et al. 2017), and to the maximum range as described by (Duan et al. 2014). Although ANR tended to increase following larger P applications, it did not change significantly, therefore the second hypothesis that ANR would be higher in treatments with long-term P applications could not be confirmed.

Influence of fertiliser and slurry applications on soil carbon accumulation

The majority of aboveground biomass (grain and straw) was removed each year. Therefore the overall increased C concentrations under mineral applications (Fig. 6) suggest that rhizodeposition and crop returns via roots increased with a larger total biomass production. This is supported by several other long-term studies that observed increased of crop production and soil C concentrations in response to fertilisation (e.g. Kätterer et al. 2011; Kätterer et al. 2014; Ludwig et al. 2011), and shows that root-derived C can make an important contribution to soil C. Such an overall positive effect of fertilisation is not always observed though. In a long-term experiment on a sandy loam at Askov, Denmark (established in 1894), soil C concentrations tended to slowly decline over the years (from initial concentrations of 16 to 20 mg C kg−1 to approximately 12 to 17 mg C kg−1 in 1997) (Christensen and Johnston 1997). However, at this site, mineral fertilisation or applications of animal manure did improve soil C concentrations compared with the unfertilised soils. Likewise, in Woburn, England (trials established in 1938), slow reductions in soil C concentrations were observed under arable crops grown on a sandy loam soil (Johnston et al. 2017), but here too a positive effect of animal manure was observed. Prior to commencement of the long-term inputs in 1996, the LTNDT had been cropped with spring cereals for over 30 years. All crop residues had been removed, and very moderate N (60 kg N ha−1 a−1, equal to N½) and no P or K inputs had been applied. Therefore C inputs from crop residues and rhizodeposits will have been limited. The cropping history prior to 1965 is not known, but the area has been used for arable production for at least the last two centuries. The rather low soil C concentrations in 1995 (around 10 mg C kg−1) can therefore be assumed to be a quasi-equilibrium with relatively low plant primary productivity and long-term C inputs. The increase in soil C concentrations has then been driven by greater crop productivity since the experiment commenced in 1995, and hence greater C inputs to the soil from 1996 onward. An adjacent site having a long-term arable history, and having received modest NPK fertilisation (100 kg N ha−1) had soil C concentrations around 15 mg C kg−1 (Peltre et al. 2015), indicating that the LTNDT history had depleted not only P, but also a considerable amount of soil C. This, in turn, could increase the capacity of the depleted soils to strongly adsorb e.g. bacteria (van Veen and Kuikman 1990), proteins, and other molecular components (Talbot et al. 2000; Quiquampoix and Burns 2007). These  would be expected to be abundant in manure relative to plant roots, which are comparably large and particulate, and thus not similarly amenable to quasi-irreversible adsorption (Staunton et al. 2015). Thus, the increases of soil C concentrations were largest under the application of animal slurry (Fig. 6) even though maximum crop productivity was typically observed under the mineral NPK treatments, indicating that animal slurry has the potential to further raise total C levels compared with mineral fertilisers. This is also supported by the calculations of annual C inputs (Fig. 7) that suggest that direct C inputs from the slurry are the major source of C, which agrees with the third hypothesis that the slurry-derived C, rather than the fertiliser value is the dominant driver for soil C accumulation. Although the relative magnitude of the changes differ, other studies with both mineral and manure treatments have also observed larger C concentrations under manure applications (Blair et al. 2006; Kätterer et al. 2011; Kätterer et al. 2014; Ludwig et al. 2011), including the aforementioned Askov trial, in which animal manure inputs of 15 t FYM + 6 t urine ha−1 a−1 (wet weight; 38 t slurry ha−1 a−1 instead from 1972) improved soil C concentrations compared with mineral or low-dose manure treatments (Christensen and Johnston 1997).

Slurry C retention was estimated by correcting the observed soil C changes in the slurry treatments based on estimated belowground C inputs (scenario I) (Table 3). These assumptions inevitably introduce uncertainty, e.g. whether the used relationships between crop productivity and C inputs (from Bolinder et al. 2007) are valid in the current system; or whether the shoot:root ratios, and the root biomass and activity are the same under mineral vs. organic inputs, or under different crop productivity (Böhm 1979; Paustian 1990).The latter was addressed in scenario II, where belowground inputs assumed equal. This scenario led to only slightly different numbers in terms of C retention (Table 3). Previous estimates of manure C retention reported in literature range from 27 to 46% (Gerzabek et al. 1997; Johnston et al. 2017; Kätterer et al. 2011; Kätterer et al. 2014), and in a meta analysis of manure application and soil C stocks, Maillard and Angers (2014) estimated a global manure-C retention coefficient of 12%, indicating that the retention of manure-C may be greater in the short term. Generally, manure organic matter is more recalcitrant than crop residues, leading to this higher C retention (Johnston et al. 2017).

Effects of early season temperatures on relative crop yields

In the analysis of spring barley yields in the first period of the LTNDT (1996–2009), variations in relative responses to the original seven fertilisation treatments of the LTNDT were correlated with early season temperature constrains (van der Bom et al. 2017). Subsequently, AT40d values were calculated for the current experimental period, and they were used to make yield predictions based on the previous statistical model. For reference, AT40d ranged from 360 to 592 dD in the previous 1996–2008 analysis. The relatively large RMSE for the winter crops indicates that the predictions did not correspond well to the actual observations (Fig. 9). For these crops AT40d was calculated from the first day after which daily temperature sums were consistently above 0 dD, at which they can typically start growing (Strand 1987). This led to AT40d values that were outside the range of the original model, which could explain the large RMSE, and indicates that the model is strongly limited to the original temperature range. The validity of the relative yield prediction model indeed appears to be limited to spring barley and spring wheat, which are planted at similar times, though the latter had a relatively small r2 value. A further explanation could be that some of the crop species inherently respond differently to fertilisation (e.g. see Delogu et al. 1998 on winter barley and winter wheat responses to N). Spring cereals tend to be more susceptible to P limitation than winter cereals (Zicker et al. 2018), which can be due to differences of root traits (Gahoonia and Nielsen 1996; Gahoonia and Nielsen 2004) or larger and deeper rooting of winter cereals (Thorup-Kristensen et al. 2009). It may be hypothesised that spring cereals may therefore also be less tolerant to the combination of P limitation and environmental stress. Although the original model did not sufficiently predict yield responses for the other crops, relative yield differences of spring barley indeed correspond with such observations for the previous experimental period, corroborating that relative yield differences between a fully fertilised and a nutrient-limited spring barley crop were larger when early season temperatures were low. Hence, sufficient nutrient supply may help alleviate adverse effects of early season cold stress.

Conclusions

Based on the examined period (2010–2016) of this long-term field experiment, crop yields, PBE and KBE can be significantly improved by balanced fertilisation strategies for both mineral fertiliser and animal slurry inputs. Olsen P and exchangeable K were closely related to the long-term nutrient budgets, regardless of the type or form of application (mineral vs. organic). However, as opposed to the previous period (1996–2009) there was a poorer fit in the relationship between the changes in soil-available P and K and the budgets.

When supply of N was sufficient, P limited crop productivity more than K. However, the relative response varied between the seasons. Relative yield responses of spring barley were associated with early season cold stress, i.e. differences between the fully fertilised crop and a nutrient-limited crop were greater in years with a cool spring. Nevertheless, the model developed in the previous experimental period (1996–2009) to predict relative yield responses performed poorly when applied to the other cereals and oilseed rape in the current period. The poor performance indicates that the model was strongly limited to the original crop and temperature ranges.

Yields and total N uptake in response to additional mineral N fertilisation were higher in treatments with P, but the already relatively high ANR was not significantly improved between the treatments with and without long-term P inputs.

Since the start of the long-term experiment in 1995, total soil C concentrations increased in response to the long-term repeated nutrient applications, except when very low application rates similar to the period before the experiment were applied. Animal slurry applications resulted in a greater C accumulation than mineral fertiliser applications, even with the latter generally displaying a higher crop productivity. Depending on the treatment and calculation, retention of slurry-C varied between 17 and 47% of the total accumulated slurry-C input. The observations indicate that in C-depleted soils, proteins, microbial compounds and other substances abundant in animal slurry have greater potential for stabilization than carbon derived from roots either by rhizosphere excretion or during root system decomposition. This implies that inputs of C from the slurry applications, which is presumably dominantly microbially derived, may raise total C levels compared to biomass inputs from increased crop growth.