Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra
<p>Location of sampling sites.</p> "> Figure 2
<p>Structure of ELM (<span class="html-italic">w</span><sub>j</sub>, weighting; <span class="html-italic">b</span><sub>j</sub>, biases).</p> "> Figure 3
<p>Mean absorbance spectra and their continuum-removal spectra and GA selected bands of SOM and pH in the calibration dataset.</p> "> Figure 4
<p>Fitness (RMSECV) versus number of variables used by each individual from genetic algorithm (GA) 10 times.</p> "> Figure 5
<p>Predicted values plotted against measured values of the validation set for SOM using the PLSR, LS-SVM, ELM and Cubist methods with full bands.</p> "> Figure 6
<p>Predicted values against measured values of the validation set for SOM using the PLSR, LS-SVM, ELM and Cubist methods with bands reduced by GA.</p> "> Figure 7
<p>Predicted values against measured values of the validation set for pH using the PLSR, LS-SVM, ELM and Cubist methods with full bands.</p> "> Figure 8
<p>Predicted values against measured values of the validation set for pH using the PLSR, LS-SVM, ELM and Cubist methods with bands selected by GA.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area and Soil Sampling
2.2. Spectroscopic Measurement and Pre-Processing of Spectra
2.3. Multivariate Regression Models
2.3.1. Partial Least Squares Regression (PLSR)
2.3.2. Least Squares-Support Vector Machines (LS-SVM)
2.3.3. Cubist Regression Model
2.3.4. Extreme Learning Machine
2.4. Model Evaluation
3. Results
3.1. Descriptive Statistics of the Soil Properties
3.2. Soil Spectral Characterization
3.3. Predictive Accuracy of the Machine Learning Models
3.4. Comparison of Model Performance
4. Discussion
4.1. GA Selection
4.2. Performance of Models
5. Conclusions
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Darilek, J.L.; Huang, B.; Wang, Z.; Qi, Y.; Zhao, Y.; Sun, W.; Gu, Z.; Shi, X. Changes in soil fertility parameters and the environmental effects in a rapidly developing region of China. Agric. Ecosyst. Environ. 2009, 129, 286–292. [Google Scholar] [CrossRef]
- Vohland, M.; Besold, J.; Hill, J.; Fründ, H.-C. Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy. Geoderma 2011, 166, 198–205. [Google Scholar] [CrossRef]
- Morra, M.J.; Hall, M.H.; Freeborn, L.L. Carbon and nitrogen analysis of soil fractions using near-infrared reflectance spectroscopy. Soil Sci. Soc. Am. J. 1991, 55, 288–291. [Google Scholar] [CrossRef]
- Stenberg, B.; Viscarra Rossel, R.A.; Mouazen, A.M.; Wetterlind, J. Chapter Five—Visible and Near Infrared Spectroscopy in Soil Science. In Advances in Agronomy; Sparks, D.L., Ed.; Academic Press: Cambridge, MA, USA, 2010; pp. 163–215. [Google Scholar]
- Araújo, S.R.; Wetterlind, J.; Demattê, J.A.M.; Stenberg, B. Improving the prediction performance of a large tropical vis-NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques. Eur. J. Soil Sci. 2014, 65, 718–729. [Google Scholar] [CrossRef]
- Hu, B.; Chen, S.; Hu, J.; Xia, F.; Xu, J.; Li, Y.; Shi, Z. Application of portable XRF and VNIR sensors for rapid assessment of soil heavy metal pollution. PLoS ONE 2017, 12, e0172438. [Google Scholar] [CrossRef] [PubMed]
- Shi, Z.; Ji, W.; Viscarra Rossel, R.A.; Chen, S.; Zhou, Y. Prediction of soil organic matter using a spatially constrained local partial least squares regression and the Chinese vis–NIR spectral library. Eur. J. Soil Sci. 2015, 66, 679–687. [Google Scholar] [CrossRef]
- Li, S.; Ji, W.; Chen, S.; Peng, J.; Zhou, Y.; Shi, Z. Potential of VIS-NIR-SWIR Spectroscopy from the Chinese Soil Spectral Library for Assessment of Nitrogen Fertilization Rates in the Paddy-Rice Region, China. Remote Sens. 2015, 7, 7029–7043. [Google Scholar] [CrossRef] [Green Version]
- Viscarra Rossel, R.A.; Webster, R. Predicting soil properties from the Australian soil visible–near infrared spectroscopic database. Eur. J. Soil Sci. 2012, 63, 848–860. [Google Scholar] [CrossRef]
- Viscarra Rossel, R.A.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
- Durand, A.; Devos, O.; Ruckebusch, C.; Huvenne, J.P. Genetic algorithm optimisation combined with partial least squares regression and mutual information variable selection procedures in near-infrared quantitative analysis of cotton–viscose textiles. Anal. Chim. Acta 2007, 595, 72–79. [Google Scholar] [CrossRef]
- Krofcheck, D.; Eitel, J.; Lippitt, C.; Vierling, L.; Schulthess, U.; Litvak, M. Remote sensing based simple models of GPP in both disturbed and undisturbed piñon-juniper woodlands in the southwestern U.S. Remote Sens. 2016, 8, 20. [Google Scholar] [CrossRef]
- Menze, B.H.; Kelm, B.M.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinf. 2009, 10, 213. [Google Scholar] [CrossRef] [PubMed]
- Morellos, A.; Pantazi, X.-E.; Moshou, D.; Alexandridis, T.; Whetton, R.; Tziotzios, G.; Wiebensohn, J.; Bill, R.; Mouazen, A.M. Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosyst. Eng. 2016, 152, 104–116. [Google Scholar] [CrossRef]
- Huang, G.; Zhu, Q.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
- Kennard, R.W.; Stone, L.A. Computer Aided Design of Experiments. Technometrics 2012, 11, 137–148. [Google Scholar] [CrossRef]
- Clark, R.N.; Roush, T.L. Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications. J. Geophys. Res. 1984, 89, 6329–6340. [Google Scholar] [CrossRef]
- Gourvénec, S.; Capron, X.; Massart, D.L. Genetic algorithms (GA) applied to the orthogonal projection approach (OPA) for variable selection. Anal. Chim. Acta 2004, 519, 11–21. [Google Scholar] [CrossRef]
- Lucasius, C.B.; Beckers, M.L.M.; Kateman, G. Genetic algorithms in wavelength selection: A comparative study. Anal. Chim. Acta 1994, 286, 135–153. [Google Scholar] [CrossRef]
- Xu, D.; Zhao, R.; Li, S.; Chen, S.; Jiang, Q.; Zhou, L.; Shi, Z. Multi-sensor fusion for the determination of several soil properties in the Yangtze River Delta, China. Eur. J. Soil Sci. 2018. [Google Scholar] [CrossRef]
- Leardi, R.; Seasholtz, M.B.; Pell, R.J. Variable selection for multivariate calibration using a genetic algorithm: Prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data. Anal. Chim. Acta 2002, 461, 189–200. [Google Scholar] [CrossRef]
- Shi, T.; Chen, Y.; Liu, H.; Wang, J.; Wu, G. Soil organic carbon content estimation with laboratory-based visible-near-infrared reflectance spectroscopy: Feature selection. Appl. Spectrosc. 2014, 68, 831–837. [Google Scholar] [CrossRef]
- Viscarra Rossel, R.A. ParLeS: Software for chemometric analysis of spectroscopic data. Chemometr. Intell. Lab. 2008, 90, 72–83. [Google Scholar] [CrossRef]
- Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemometr. Intell. Lab. 2001, 58, 109–130. [Google Scholar] [CrossRef]
- Clyde, M. BAS: Bayesian Adaptive Sampling for Bayesian Model Averaging. R Package Version 1.4.6. 2017. Available online: https://CRAN.R–project.org/web/packages/BAS (accessed on 2017).
- The R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. Available online: https://www.R–project.org/ (accessed on 6 March 2017).
- Suykens, J.A.K.; Van Gestel, T.; Brabanter, J.D.; Moor, B.D.; Vandewalle, J. Least Squares Support Vector Machines; World Scientific Publishing Co.: Singapore, 2002. [Google Scholar]
- Quinlan, J.R. Learning with Continuous Classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; pp. 343–348. [Google Scholar]
- Minasny, B.; McBratney, A.B. Regression rules as a tool for predicting soil properties from infrared reflectance spectroscopy. Chemometr. Intell. Lab. 2008, 94, 72–79. [Google Scholar] [CrossRef]
- Kuhn, M.; Weston, S.; Keefer, C.; Coulter, N.; Quinlan, R.; Rulequest Research Pty Ltd. Rule- and Instance-Based Regression Modeling. 2018. Available online: https://topepo.github.io/Cubist (accessed on 2018).
- Bellon-Maurel, V.; Fernandez-Ahumada, E.; Palagos, B.; Roger, J.-M.; McBratney, A. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. TrAC Trends Anal. Chem. 2010, 29, 1073–1081. [Google Scholar] [CrossRef]
- Wilding, L.G. Spatial variability: Its documentation, accommodation and implication to soil survey. In Soil Spatial Variability, Proceedings of the ISSS and SSSA, Las Vegas, NV, USA, 30 November–1 September 1985; Nielsen, D.R., Bauma, J., Eds.; Pudoc: Wageningen, The Netherlands, 1985; pp. 166–187. [Google Scholar]
- Abdi, D.; Tremblay, G.F.; Ziadi, N.; Bélanger, G.; Parent, L.-É. Predicting soil phosphorous and other properties using near infrared spectroscopy. Soil Sci. Soc. Am. J. 2012, 76, 2318–2326. [Google Scholar] [CrossRef]
- Morris, R.V.; Lauer, H.V.; Lawson, C.A.; Gibson, E.K.; Nace, G.A.; Stewart, C. Spectral and other physicochemical properties of submicron powders of hematite (α-Fe2O3), maghemite (γ-Fe2O3), magnetite (Fe3O4), goethite (α-FeOOH), and lepidocrocite (γ-FeOOH). J. Geophys. Res. 1985, 90, 3126–3144. [Google Scholar] [CrossRef] [PubMed]
- Stoner, E.R.; Baumgardner, M.F. Characteristic variations in reflectance of surface soils. Soil Sci. Soc. Am. J. 1981, 45, 1161–1165. [Google Scholar] [CrossRef]
- Clark, R.N.; King, T.T.V.; Matthew, K.; Swayze, G.A.; Vergo, N. High spectral resolution reflectance spectroscopy of minerals. J. Geophys. Res. 1990, 95, 12653–12680. [Google Scholar] [CrossRef]
- Zhao, M.; Fu, C.; Ji, L.; Tang, K.; Zhou, M. Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert Syst. Appl. 2011, 38, 5197–5204. [Google Scholar] [CrossRef]
- Xu, S.; Zhao, Y.; Wang, M.; Shi, X. Determination of rice root density from Vis–NIR spectroscopy by support vector machine regression and spectral variable selection techniques. Catena 2017, 157, 12–23. [Google Scholar] [CrossRef]
- Viscarra Rossel, R.A.; Lark, R.M. Improved analysis and modelling of soil diffuse reflectance spectra using wavelets. Eur. J. Soil Sci. 2009, 60, 453–464. [Google Scholar] [CrossRef]
- Li, S.; Shi, Z.; Chen, S.; Ji, W.; Zhou, L.; Yu, W.; Webster, R. In situ measurements of organic carbon in soil profiles using vis-NIR spectroscopy on the Qinghai–Tibet Plateau. Environ. Sci. Technol. 2015, 49, 4980–4987. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Zhao, Y.; Wang, M.; Shi, X. Comparison of multivariate methods for estimating selected soil properties from intact soil cores of paddy fields by Vis–NIR spectroscopy. Geoderma 2018, 310, 29–43. [Google Scholar] [CrossRef]
- Mouazen, A.M.; Kuang, B.; De Baerdemaeker, J.; Ramon, H. Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy. Geoderma 2010, 158, 23–31. [Google Scholar] [CrossRef]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef] [Green Version]
- Prasad, R.; Deo, R.C.; Li, Y.; Maraseni, T. Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition. Geoderma 2018, 330, 136–161. [Google Scholar] [CrossRef]
- Huang, G.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
- Huang, G.B. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn. Comput. 2015, 7, 263–278. [Google Scholar] [CrossRef]
- Schmidt, W.F.; Kraaijveld, M.A.; Duin, R.P.W. Feedforward neural networks with random weights. In Proceedings of the 11th IAPR International Conference on Pattern Recognition Methodology and Systems, Hague, The Netherlands, 30 August–3 September 1992; pp. 1–4. [Google Scholar]
Texture | Sample Number | Crop | Parent Material |
---|---|---|---|
Clay | 83 | Idle field, Silkworm | Acidic crystalline |
Clay loam | 220 | Rice | Alluvial deposit |
Loam | 120 | Rice | Alluvial deposit |
Sandy loam | 100 | Grass, Idle field | Red sandstone |
Properties | Dataset | N | Range | Mean | Median | SD | CV | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|---|---|
SOM | All | 523 | 2.44–60.50 | 25.97 | 24.90 | 10.92 | 42.05% | 0.44 | 0.14 |
Calibration | 350 | 2.44–60.50 | 25.43 | 24.67 | 10.56 | 41.53% | 0.36 | −0.01 | |
validation | 173 | 4.96–60.50 | 27.07 | 25.5 | 11.57 | 42.74% | 0.53 | 0.24 | |
pH | All | 523 | 3.92–8.60 | 5.52 | 5.25 | 0.86 | 15.40% | 1.54 | 2.20 |
Calibration | 350 | 4.32–8.60 | 5.53 | 5.23 | 0.84 | 15.22% | 1.64 | 2.50 | |
validation | 173 | 3.92–8.32 | 5.52 | 5.32 | 0.87 | 15.81% | 1.37 | 1.75 |
Properties | Methods | Bands | Calibration • | Validation | ||||
---|---|---|---|---|---|---|---|---|
R2 | RMSECV | R2 | RMSE | ME | RPIQ | |||
SOM | PLSR | full | 0.79 | 4.81 | 0.72 | 6.27 | −0.40 | 2.33 |
GA | 0.76 | 5.20 | 0.71 | 6.52 | −0.63 | 2.24 | ||
LS-SVM | full | 0.83 | 4.43 | 0.76 | 6.14 | −0.66 | 2.39 | |
GA | 0.86 | 3.91 | 0.77 | 5.71 | −0.59 | 2.56 | ||
ELM | full | 0.87 | 3.78 | 0.81 | 5.18 | −0.36 | 2.83 | |
GA | 0.89 | 3.71 | 0.81 | 5.17 | −0.28 | 2.87 | ||
Cubist | full | 0.76 | 5.07 | 0.74 | 6.29 | −0.39 | 2.32 | |
GA | 0.78 | 5.08 | 0.76 | 6.25 | −0.33 | 2.33 | ||
pH | PLSR | full | 0.67 | 0.48 | 0.57 | 0.58 | 0.04 | 1.61 |
GA | 0.46 | 0.62 | 0.48 | 0.64 | 0.09 | 1.46 | ||
LS-SVM | full | 0.82 | 0.36 | 0.74 | 0.45 | 0.03 | 2.07 | |
GA | 0.74 | 0.43 | 0.75 | 0.44 | 0.02 | 2.08 | ||
ELM | full | 0.85 | 0.33 | 0.74 | 0.45 | 0.03 | 2.07 | |
GA | 0.72 | 0.45 | 0.76 | 0.43 | 0.01 | 2.15 | ||
Cubist | full | 0.70 | 0.49 | 0.72 | 0.47 | 0.03 | 1.98 | |
GA | 0.64 | 0.57 | 0.62 | 0.54 | 0.05 | 1.71 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, M.; Xu, D.; Chen, S.; Li, H.; Shi, Z. Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra. Sensors 2019, 19, 263. https://doi.org/10.3390/s19020263
Yang M, Xu D, Chen S, Li H, Shi Z. Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra. Sensors. 2019; 19(2):263. https://doi.org/10.3390/s19020263
Chicago/Turabian StyleYang, Meihua, Dongyun Xu, Songchao Chen, Hongyi Li, and Zhou Shi. 2019. "Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra" Sensors 19, no. 2: 263. https://doi.org/10.3390/s19020263
APA StyleYang, M., Xu, D., Chen, S., Li, H., & Shi, Z. (2019). Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra. Sensors, 19(2), 263. https://doi.org/10.3390/s19020263