Holistic environmental soil-landscape modeling of soil organic carbon
Introduction
Environmental soil-landscape modeling (ESLM) is a useful tool for predicting soil properties and classes and understanding the relationships between soils and the environmental factors (Grunwald, 2005). The ESLM lays its foundation on work by Jenny (1941) and V.V. Dokuchaev (Glinka, 1927) who conceptualized the soil formation as a function of five factors, i.e., CLimate, Organism, Relief, Parent material, and Time (CLORPT model). It has been undergoing additional development over the past decades. McBratney et al. (2003) first encapsulated the conceptual model into a quantitative framework with the SCORPAN model which describes the relationships between soil and environmental factors for the purpose of spatial prediction of soils. For the past century, human activities have been influencing the environment, exerting critical impact on the pedosphere in terms of soil formation, change, and degradation (Richter et al., 2011). In response, the STEP-AWBH model (S: soil, T: topography, E: ecology, P: parent material, A: Atmosphere, W: Water, B: Biota, H, Human) was devised to explicitly model the effects induced by human activity on the soil system (Grunwald et al., 2011, Thompson et al., 2012).
It is a common practice in ESLM that the environmental factors are selected based on the researchers' domain knowledge of the soil–environment processes in the study area (Florinsky et al., 2002, Grunwald, 2009). This variable selection strategy heavily relies on the legitimacy of the researchers' knowledge. In some cases when the process knowledge is not comprehensive, a limited set of predictor variables could lead to biased and suboptimal model performance (Grunwald, 2009). Therefore, it is necessary to adopt a more unbiased strategy that allows models to access a broad set of environmental variables that represent a spectrum of possible soil-forming processes operating in a given landscape. The more exhaustive such a set of predictive variables is, the higher the potential is to unravel complex soil–environmental interactions and identify an unbiased, optimal model to predict a soil property of interest. Just as Jakeman et al. (2006) argued, a good practice in the development of environmental models should embrace alternative model families and structures to allow model comparison that avoids biased or even false conclusions drawn from a certain model favored by the model developer(s). Poggio et al. (2013) used stepwise methods to select the most predictive variables from a large pool of satellite-derived variables in a regional application of mapping properties, however their covariate scope was confined in Digital Elevation Model (DEM) and vegetation indices derived from Moderate-Resolution Imaging Spectroradiometer (MODIS) products while other important factors (e.g., soil, atmosphere, water) were not considered.
With the advance of Geographic Information Systems (GIS), Global Positioning System (GPS), and remote and proximal sensing technologies, it is feasible to build a comprehensive pool of spatially exhaustive environmental variables to characterize a full spectrum of environmental properties. These spatially explicit environmental datasets are available in much more abundance and finer spatial resolutions when compared with more sparsely sampled soil pedon data. In that sense, digital environmental covariates serve as critical predictors to infer on soil properties, although it is usually not known which combination of the environmental predictors has the highest predictive power in a given geographic region due to their scale dependent behavior (Vasques et al., 2012). It should be noted that collecting a large set of predictive variables for models can potentially be problematic as well. Some key issues are redundancy and collinearity between the variables, and the deleterious effects of noisy or non-informative variables. Strategic variable selection is required to identify the major ecosystem processes and identify parsimonious predictive models (Guyon and Elisseeff, 2003). In addition, variable selection can reduce model development and application time, increase model interpretability, and reduce overfitting (Belanche-Muñoz and Blanch, 2008, May et al., 2008). Variable selection has been an important research topic in machine learning. It involves two problems – minimal-optimal and all-relevant. The former is aimed at searching for the minimum set of predictor variables yielding the best prediction accuracy (Guyon and Elisseeff, 2003, Nilsson et al., 2007), while the latter is focused on finding all-relevant variables to the target property. Therefore, the minimal-optimal set is of special interest for developing predictive models, while the all-relevant set has great value in understanding the mechanisms underlying the soil–environment relationship. Nilsson et al. (2007) gave an in-depth discussion about the relationships between the two problems and showed that the minimal-optimal set is a subset of the all-relevant set when the data conforms to the strictly positive distribution which is the case for most data encountered in practical applications (Fig. 1).
Environmental variables that represent soil-landscape processes may come as different data types, generally continuous and categorical (including ordinal and nominal). Categorical variables (e.g., land use and geology type) discretize observations (samples) into unbalanced groups and may impose problems for model validation and predictions. A common approach in model validation is data-division, in which the observation data are split into calibration and validation sets (Bennett et al., 2013). The split of data may result in some classes of a categorical predictor underrepresented or not represented by the calibration set, which can lead to poor predictions or failed predictions for the underrepresented or non-represented classes. The same issues related to modeling using categorical predictors can occur in validation mode. The occurrence of this problem increase exponentially as the number of categorical variables included in a model increases. Therefore, it worthwhile to pay special attention to the categorical variables in ESLM and build models that strike the balance between model performance and the number of categorical variables used.
Soil organic carbon (SOC) is a key property that not only indicates soil quality but also has profound significance to the global climate system (Trumbore et al., 1996). Thus, the focus of this study is to model SOC in Florida, USA. The aim of the study was to demonstrate a new holistic ESLM strategy based on a comprehensive environmental variable pool using variable selection techniques that serve two purposes – revealing the underlying processes and making predictions of SOC. It involves five steps – model conceptualization, data compilation, process identification, predictive model calibration and model validation (Fig. 2). The specific objectives are threefold: 1) from a comprehensive set of environmental variables, identify an all-relevant set of variables of topsoil SOC in order to reveal the underlying SOC processes; 2) from the all-relevant set, identify the minimal-optimal sets that simplify models and optimize model performance for prediction; 3) explore the possibility of reducing the use of categorical variables in predictive models.
Section snippets
Study area
The study area is the state of Florida, located in the southeastern region of the United States, with latitudes from 24°27′ N to 31°00′ N and longitudes from 80°02′ W to 87°38′ W. Florida covers approximately 150,000 km2 (United States Census Bureau, 2000). The climate is humid and subtropical in northern and central Florida and is humid and tropical in southern Florida. The mean annual precipitation of Florida is 1373 mm and the mean annual temperature is 22.3 °C (National Climatic Data
Characteristics of soil organic carbon measurements
The SOC in the top 20 cm soils showed considerable variation with a range of 33.7 kg m−2, mean of 5.0 kg m−2, and median of 3.4 kg m−2 (Table 2). The data were strongly positively skewed with most values lying on the low value end. A high kurtosis value evidenced that the deviations of infrequent high SOC values accounted for a large amount of the SOC variation as opposed to frequent modest deviations. Both skewness and kurtosis indicate that the distribution of SOC values was highly
Conclusions
In this study, a new holistic ESLM strategy based on the STEP-AWBH conceptual model was demonstrated to effectively reveal soil–environmental processes and relationships while also making efficient predictions of SOC. An exhaustive set of 210 potential environmental variables was compiled to characterize Florida's soil-landscape based on state-of-the-art pedological knowledge and technical and computational capabilities. Models were developed to predict SOC (0–20 cm) using the comprehensive
Acknowledgments
This study was funded by USDA-CSREES-NRI grant award 2007-35107-18368 ‘Rapid Assessment and Trajectory Modeling of Changes in Soil Carbon across a Southeastern Landscape’ (National Institute of Food and Agriculture (NIFA) – Agriculture and Food Research Initiative (AFRI)). This project is a Core Project of the North American Carbon Program. The authors would like to thank Aja Stoppe, Christopher Wade Ross, Samiah Moustafa, Lisa Stanley, Adriana Comerford, and Anne Quidez for their hard work in
References (57)
- et al.
Machine learning methods for microbial source tracking
Environ. Modell. Softw.
(2008) - et al.
Characterising performance of environmental models
Environ. Modell. Softw.
(2013) - et al.
Prediction of soil properties by digital terrain modelling
Environ. Modell. Softw.
(2002) Stochastic gradient boosting
Comput. Stat. Data An.
(2002)Multi-criteria characterization of recent digital soil mapping and modeling approaches
Geoderma
(2009)- et al.
Genetic algorithm for optimization of water distribution systems
Environ. Modell. Softw.
(1999) - et al.
Ten iterative steps in development and evaluation of environmental models
Environ. Modell. Softw.
(2006) - et al.
Non-linear variable selection for artificial neural networks using partial mutual information
Environ. Modell. Softw.
(2008) - et al.
On digital soil mapping
Geoderma
(2003) - et al.
Regional scale mapping of soil properties and their uncertainty with a large number of satellite-derived covariates
Geoderma
(2013)
Digital soil mapping: Interactions with and applications for hydropedology
Soil carbon storage response to temperature: an hypothesis
Ann. Bot.
Regional modelling of soil carbon at multiple depths within a subtropical watershed
Geoderma
Design of multi-paradigm integrating modelling tools for ecological research
Environ. Modell. Softw.
Carbon losses from all soils across England and Wales 1978-2003
Nature
Bagging predictors
Mach. Learn.
Random forests
Mach. Learn.
Introduction to Algorithms
Acceleration of global warming due to carbon-cycle feedbacks in a coupled climate model
Nature
Temperature sensitivity of soil carbon decomposition and feedbacks to climate change
Nature
Relationships among vegetation, surficial geology and soil water content at the Pocono mesic till barrens
J. Torrey Bot. Soc.
Litter quality and the temperature sensitivity to decomposition
Ecology
Florida Vegetation and Land Cover Data Derived from Landsat ETM+ Imagery
Dokuchaiev's ideas in the development of pedology and cognate sciences
Environmental soil-landscape modeling: geographic information technologies and pedometrics
Digital soil mapping and modeling at continental scales: finding solutions for global issues
Soil Sci. Soc. Am. J.
Soil carbon stocks and land use change: a meta analysis
Glob. Change Biol.
Analysis of factors controlling soil carbon in the conterminous United States
Soil Sci. Soc. Am. J.
Cited by (94)
Developing an improved extreme gradient boosting model for predicting the international roughness index of rigid pavement
2023, Construction and Building MaterialsMapping topsoil pH using different predictive models and covariate sets in Henan Province, Central China
2023, Ecological InformaticsThe AMG model coupled with Rock-Eval® analysis accurately predicts cropland soil organic carbon dynamics in the Tuojiang River Basin, Southwest China
2023, Journal of Environmental ManagementAssessment of machine-learning methods for the prediction of STN using multi-source data in Fuzhou city, China
2023, Remote Sensing Applications: Society and Environment