Canadian Forest Service Publications

soil properties in the Canadian boreal forest with limited data: comparison of spatial and non-spatial statistical approaches. 2017. Beguin, J.; Fuglstad, G.A.; Mansuy, N.; Paré, D. Predicting Geoderma 306: 195-205.

Year: 2017

Issued by: Canadian Wood Fibre Centre

Catalog ID: 38844

Language: English

Availability: PDF (request by e-mail)

Available from the Journal's Web site.
DOI: 10.1016/j.geoderma.2017.06.016

† This site may require a fee

Mark record


Digital soil mapping (DSM) involves the use of georeferenced information and statistical models to map predictions and uncertainties related to soil properties. Many remote regions of the globe, such as boreal forest ecosystems, are characterized by low sampling efforts and limited availability of field soil data. Although DSM is an expanding topic in soil science, little guidance currently exists to select the appropriate combination of statistical methods and model formulation in the context of limited data availability. Using the Canadian managed forest as a case study, the main objective of this study was to investigate to which extent the choice of statistical method and model specification could improve the spatial prediction of soil properties with limited data. More specifically, we compared the cross-product performance of eight statistical approaches (linear, additive and geostatistical models, and four machine-learning techniques) and three model formulations (“covariates only”: a suite of environmental covariates only; “spatial only”: a function of geographic coordinates only; and “covariates +spatial”: a combination of both covariates and spatial functions) to predict five key forest soil properties in the organic layer (thickness and C:N ratio) and in the top 15 cm of the mineral horizon (carbon concentration, percentage of sand, and bulk density). Our results show that 1) although strong differences in predictive performance occurred across all statistical approaches and model formulations, spatially explicit models consistently had higher R2 and lower RMSE values than non-spatial models for all soil properties, except for the C:N ratio; 2) Bayesian geostatistical models were among the best methods, followed by ordinary kriging and machine-learning methods; and 3) comparative analyses made it possible to identify the more performant models and statistical methods to predict specific soil properties. We make modeling tools and code available (e.g., Bayesian geostastical models) that increase DSM capabilities and support existing efforts toward the production of improved digital soil products with limited data.

Plain Language Summary

The purpose of this study was to assess the performance of several statistical approaches in order to improve the predicting accuracy of digital soil mapping products. Many statistical approaches are currently being used by the producers of digital soil mapping products. The results show that not all approaches are equal; depending on the property to be predicted, some models performed better than others. In almost every case, the addition of a spatial component to the models improved their performance. The main conclusion is that producers of digital soil mapping products should test several statistical models. The article includes an appendix where model codes are provided to facilitate their use.

The field of soil mapping is undergoing tremendous change. Maps of soil types are being replaced with maps of soil properties (carbon, nitrogen, pH, heavy metals). These maps have numerous applications (climate change, land management, carbon budget).