1 Introduction

Soil nutrients provide a scientific accordance in fertilizer applications, especially in paddy rice planting farms. However, soil properties not only have spatial variability, but also oscillate with the time changing. It’s meaningful to analyze the extent of temporal and spatial variation of soil nutrient contents for more reasonable fertilizer applications.

Lots of works have been actualized on soils by measuring and analyzing the spatial dependencies on soil fertility [14]. For example, Weijun Fu et al. studied the spatial variation of soil nutrients in a dairy farm in southeastern Ireland [5], Kelin Hu et al. studied patterns of spatial and temporal variation of SOM in Beijing’s urban–rural transition zone [6], and Zhang Xing-Yi studied the spatial variability of nutrient contents in northeast China where has black soils [7]. However, most of the researches learned temporal variation in period of time with long intervals, neglecting the continuous changing year after year [8]. In this paper, methods of statistics and geostatistics were applied to study the spatio-temporal variation for data of soil pH, available nitrogen (AN), soil organic matter (SOM), available potassium (AK), and available phosphorus (AP) collected from 2009 to 2013 once a year in a paddy rice planting farm in northeast China.

2 Materials and Methods

2.1 Study Area

The farm of Erdaohe located in northeast boundary of China, closed to Russia across the Ussuri River in the east, and the Heilongjiang in the north (Fig. 1). This area belongs to the Sanjiang Plain, and has a humid or semi-humid continental monsoon climate of the North Temperate Zone, which is suitable for agriculture production, especial for paddy rice planting. The total area of this farm is 534.2 km, with an area of 362 km for cultivated land, including 360.7 km for paddy rice planting.

Fig. 1.
figure 1

Location of the study area

2.2 Soil Sampling

The soil samples were planed to collect at the depths of 0-20 cm from 2009 to 2013 once year (Fig. 2). The sample time were mostly between autumn harvests and fertilizers. Generally, this work was mostly done by experienced technicians, who has a quantity knowledge of agriculture production in the sampling region. The count of sampling points are 651 in 2009, 1488 in 2010, 954 in 2011, 483 in 2012, and 471 in 2013. On the other hand, location of these points were selected according to space distribution of land parcels, soil types, land use types and experience of technicians. After sampling, soil samples were naturally dried at ventilation place and then sieved to pass a 2-mm mesh after crushed. In this article, soil test results of pH, soil organic matter (SOM), available potassium (AK), available nitrogen (AN), and available phosphorus (AP) are used for spatio-temporal variation analysis, the soil test methods are revealed in Table 1.

Fig. 2.
figure 2

The distribution of soil sampling points from 2009 to 2013

Table 1. Soil test methods used in study area

3 Analytical Methods

3.1 Exploratory Statistical Analysis

In this paper, methods of exploratory statistical and Geostatistical analysis are both chosen to study the temporal and spatial variation of soil nutrient contents in this paper’s study area. First of all, descriptive analysis indexes such as maximum (max), minimum (min), median, mean, skewness and kurtosis, coefficient of variation (C.V.) and standard deviation (S.D.) were chosen to achieve the summary information of soil nutrients distribution. These indexes can be divided into three different types: location, spread, and shape, which provide diverse descriptions of soil nutrients. Index of mean is the arithmetic average of data measured, which shows the center of the distribution of the original data.

Besides indexes described above, Normal Q–Q plots (quantile–quantile plots) were created to identifying the probability and some distinct outliers (same as extreme values). Usually on the x-axis marked the observed values, and for a normal distribution values expected were marked on the y-axis. In general, samples which have a normal distribution cluster would follow a diagonal straight line [9]. Meanwhile, on the normal Q–Q plots, it’s easy to observe the low or high value outliers, because these points will be away from the calculated normal Q–Q line.

3.2 Geostatistical Analysis

In this study, the spatial variation of each soil nutrient content was measured by geostatistics. Experimental variogram evaluator is approximately uninfluenced for any inherent random function, whatever it’s really sensitive to external values since it is on account of squared distinctions among calculated data. The semivariogram model was established for each microbiological parameter for the sake of characterizing the level of spatial variability between neighboring samples. Meanwhile, the proper model function was suitable to the semivariogram model. The value of semivariogram \( {\mathbf{Y}} \)(h) was calculated using equation below

$$ \gamma (h) = \frac{1}{2N(h)}\sum\limits_{i = 1}^{N(h)} {(Z(x_{i} ) - Z(x_{i} + h))} $$
(1)

In the equation, h represents the demarcation distance from the locations of xi to locations of xi + h. Z(xi) and Z(xi + h) represents the values which are calculated for the regionalized variables at locations xi or xi + h. The last one N(h) represents the quantity of two sets at any demarcation distance of h [10, 11].

Including spherical, Gaussian, exponential, linear and power models, there are several models available to adjust the experimental semivariogram [1214]. On the other hand, a semivariogram includes three primary parameters which define the spatial structure of original data as: \( {\mathbf{Y}} \)(h) = C0 + C. Where C0 delegates the nugget effect which means the local variation coming at scales smaller than the samples’ interval, like sampling error, measurement error and fine-scale spatial variability. The sum of C0 and C is the sill which represents total variance in the equation. The distance is called the range at which the semivariogram levels away from the sill. Furthermore, sampling points are not spatially connected whenever the numerical value of separation distances is larger than the range [10].

The equation of different models are described below. Model of spherical anisotropic was adjusted to the empirical semivariance, which is defined as:

$$ \left\{ \begin{aligned} & \gamma (h) = C_{0} + C_{1} \left[ {\frac{3h}{2a} - (h/a)^{3} /2} \right]\begin{array}{*{20}c} {} & {0 < h < a} \\ \end{array} \\ & \gamma (h) = C_{0} + C_{1} \begin{array}{*{20}c} {} & {} & {} & {} \\ \end{array} \begin{array}{*{20}c} {} & {} & {} & {h > = a} \\ \end{array} \\ \\ & \gamma (0) = 0\begin{array}{*{20}c} {} & {} & {} & {} \\ \end{array} \begin{array}{*{20}c} {} & {\begin{array}{*{20}c} {\begin{array}{*{20}c} {} & {} \\ \end{array} } & {} \\ \end{array} } & {} & {h = 0} \\ \end{array} \\ \end{aligned} \right. $$
(2)

In the equation, C0 represents the nugget value which means the spatial variability produced by the random components such as micro-scale processes and measured error. C1 is the structural variance which means the spatial heterogeneity produced by spatial autocorrelation. C1 + C0 represents the sill, while A represents the range (or spatial correlation distance).

Other stationary models such as Gaussian (Eq. (3)), exponential (Eq. (4)) and linear (Eq. (5)) equations are defined as:

$$ \gamma (h) = C_{0} + C_{1} \left[ {1 - \exp ( - h/a)^{2} } \right] $$
(3)
$$ \gamma (h) = C_{0} + C_{1} \left[ {1 - \exp ( - h/a)} \right] $$
(4)
$$ \gamma (h) = C_{0} + bh $$
(5)

Where C0, C1, h and a represent the same meanings as spherical anisotropic model, while b is slope of the semivariance line in Eq. (5).

4 Results and Discussion

4.1 Variation of Soil Properties in Past Five Years

According to exploratory statistical analysis, Table 2 shows the soil nutrients determined values of minimum, maximum, median, mean, coefficient of variation (C.V.), standard deviation (S.D.), kurtosis and skewness from 2009 to 2013. First of all, considering the values of different soil nutrients described by min, max, median and mean, in the past five years, test results of pH shows a relatively inflexible constant, while the other four properties appears more variable. Among these nutrient types, except for SOM, the variation degree tested by C.V. (%) all decreased with small fluctuations in the past five year. Furthermore, for pH data the C.V. value was relatively small while that for AK data was relatively large.

Table 2. The statistical values of soil properties.

On the other hand, the degree of dispersion tested by S.D. shows that the sequence from high to low is AK, AN, AP, SOM, AP and pH, besides, the degree of dispersion of SOM increased significantly in recent two years. At last, kurtosis and skewness indicated the shape of distribution of soil raw data compared with normal distribution. The results showed that except for AK and several years’ data of SOM and AN, the others were closest to normal distribution.

4.2 Normal QQ-Plots Analysis and Data Transformation

Since parts of soil values did not fit the normal distribution, data Fig. 3 shows the normal QQ-plots of five different soil nutrients in 2009. Besides AK values displayed a concave shape, the pH, SOM and AP values obeyed a shape of nearly straight line, which means a near normal distribution. While the AN data obeyed a straight diagonal line with data points nearby except for several points deviated at both ends. Normal QQ-plots of the other four years were not shown for limitation of paper length. While the analysis results found that in 2010 and 2011, soil nutrients data except AK, followed a nearly straight diagonal line just like that in 2009. And in 2012 and 2013, only pH values remained the same distribution, the others became not very accordant with that more or less.

Fig. 3.
figure 3

Normal QQ-plots of soil nutrients in 2009

Transformation was needed for the followed analysis. With the combination of exploratory statistical analysis and normal QQ-plot analysis, different transformation method were chosen, seen from Table 3. Figure 4 shows the Normal QQ-plots of data before and after the transformation for AK values in 2011, it is obvious that after transformation, the points were more fitted with the straight line.

Table 3. Data transformation method selected
Fig. 4.
figure 4

Normal QQ-plots of AK raw data and transformation result Box-Cox (λ = -0.4) in 2011

4.3 Spatio-temporal Variation of Different Soil Properties

According to the theory of geostatistical, semivariance analysis was applied to soil properties from 2009 to 2013, the results indicated that spatial autocorrelation existed for the soil properties in study area, which means spatial interpolation method of kriging could be used to predict the soil nutrients in missing data area. While the step was to choose the appropriate semivariogram model for each property and each year, Fig. 5 shows the semivariogram for soil pH in 2009 (anisotropic) with different models (Spherical, Gaussian and Exponential).

Fig. 5.
figure 5

Semivariograms of different models for soil pH in 2009

In order to select the best model for following analysis, comparison of precision for different models is needed. Table 4 offered the precision analysis results of spatial data of soil nutrients in 2009. For in condition that mean standardized closer to zero, the root-mean-square was smaller, average standard error closer to root-mean-square, and root-mean-square standardized closer to one, semivariogram model may be the most appropriate one. Based on the precision errors, the best semivariogram model for soil data collected in 2009 were: Spherical for pH and SOM, Gaussian for AN, Exponential for AP and AK.

Table 4. Comparison of precision analysis among different models for soil test data in 2009

Table 5 displays the selected best models for each soil properties from 2009 to 2013 and their parameters. First of all, directional features were observed for the majority soil data except for SOM values collected in 2009 and 2010, which also became the special cases of range values above 25 km. The value of Nugget/Still shows the relative size of the nugget effect among varied soil properties [15]. This value was used to define different classes of spatial dependence for the soil variables as such rules:

Table 5. Semivariogram models selected for soil nutrients and parameters of each model (2009-2013)
  1. (a)

    If the ratio was smaller than 25 %, the variable was considered extremely spatially dependent;

  2. (b)

    If the ratio was between 25 % and 75 %, the variable was considered to be moderately spatially dependent;

  3. (c)

    If the ratio was larger than 75 %, the variable was considered to be weakly spatially dependent, which indicated that random factors were the majority factors affect the spatial variation of soil properties [16].

From this table, it is clear that all of the soil properties measured in this study were not strongly spatially dependent. From 2009 to 2013, the value of Nugget/Still for each soil property decreased in overall trend, relatively, soil pH was the considered as the most strongly spatial dependent, while soil AN was the weakest one in study area.

According to the results below, interpolation method of ordinary kriging was used to predict the spatial distribution maps for soil nutrients in study area. Figure 6 shows the prediction maps for soil pH from 2009 to 2013. For the limitation of paper length, the maps of other soil nutrients were not shown in this paper.

Fig. 6.
figure 6

Spatial distribution maps for soil pH (2009-2013)

The spatial distribution maps of soil pH shows that the soil pH in this study area were mostly acidic, or strongly acidic in some regions. From 2009 to 2011, the spatial distribution is similar in the whole area, while there was obvious changes in 2012, especially in the south area, soil pH became more acidic. Until the next year, distribution changed similar as that in 2009. For other soil nutrients, the spatial distribution remained not very stable in the five years, which probably because of the fertilization changed every year.

5 Conclusions

This study analyzed the spatio-temporal variation for several soil properties in a paddy rice farm from 2009 to 2013. Among these soil properties, data of soil pH collected all the five years followed a normal distribution, and had a relatively small C.V. values which decreased along the study time range. On the other hand, the spatial variation of soil pH increased, became more strongly spatial dependent. The others, such as soil AK, followed a log-normal distribution in 2009, 2010 and 2012, while in 2011 and 2013 followed neither normal nor log-normal distribution. Thus, data transformations were acquired for better analysis. Except for soil pH, raw data of other soil properties were transformed by log or box-cox method more or less. According to the analysis results, the spatial variation of soil pH, AN, SOM, AP and AK all increased from 2009 to 2013, while except soil pH, most of that were not strongly spatial dependent, which means the spatial variation was mostly based on random factors in this study area.