[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
The Impact of Bamboo on Rainfall-Triggered Landslide Distribution at the Regional Scale: A Case Study from SE China
Previous Article in Journal
A Comprehensive Analysis of the Laccase Gene Family of Pinus densiflora Reveals a Functional Role of PdeLAC28 in Lignin Biosynthesis for Compression Wood Formation
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mapping Soil Properties in Tropical Rainforest Regions Using Integrated UAV-Based Hyperspectral Images and LiDAR Points

1
Hainan Academy of Forestry, Haikou 571100, China
2
Key Laboratory of Tropical Forestry Resources Monitoring and Application of Hainan Province, Haikou 571100, China
3
State Key Laboratory of Subtropical Building and Urban Science, Shenzhen University, Shenzhen 518060, China
4
Key Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Shenzhen University, Shenzhen 518060, China
5
School of Architecture & Urban Planning, Shenzhen University, Shenzhen 518060, China
6
Department of Strategic and Advanced Interdisciplinary Research, Peng Cheng Laboratory, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Forests 2024, 15(12), 2222; https://doi.org/10.3390/f15122222
Submission received: 28 October 2024 / Revised: 5 December 2024 / Accepted: 11 December 2024 / Published: 17 December 2024
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Figure 1
<p>Workflow of the soil property mapping method in tropical rainforest regions.</p> ">
Figure 2
<p>(<b>a</b>) Geographic location of Hainan Province, China; spatial distribution of soil samples in (<b>b</b>) Diaoluo, and (<b>c</b>) Limu mountain.</p> ">
Figure 3
<p>Top and side 3D view of LiDAR point cloud of (<b>a</b>) Diaoluo and (<b>b</b>) Limu mountains.</p> ">
Figure 4
<p>Comparison of soil properties between samples from Diaoluo and Limu mountains: (<b>a</b>) pH; (<b>b</b>) soil organic carbon (SOC); (<b>c</b>) total nitrogen (TN); and (<b>d</b>) total phosphorus (TP). Dashed lines represent the mean value.</p> ">
Figure 4 Cont.
<p>Comparison of soil properties between samples from Diaoluo and Limu mountains: (<b>a</b>) pH; (<b>b</b>) soil organic carbon (SOC); (<b>c</b>) total nitrogen (TN); and (<b>d</b>) total phosphorus (TP). Dashed lines represent the mean value.</p> ">
Figure 5
<p>Importance ranking of the 15 selected features for predicting the (<b>a</b>) pH, (<b>b</b>) soil organic carbon (SOC), (<b>c</b>) total nitrogen (TN), and (<b>d</b>) total phosphorus (TP).</p> ">
Figure 6
<p>Scatter plots of the measured values against soil property levels predicted by the optimal models: (<b>a</b>) pH predicted by the GBDT model; (<b>b</b>) soil organic carbon (SOC) predicted by the XGBoost model; (<b>c</b>) total nitrogen (TN) predicted by the GBDT model; (<b>d</b>) total phosphorus (TP) predicted by the XGBoost model.</p> ">
Figure 7
<p>Spatial distributions of the soil properties, including pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP), in (<b>a</b>) Diaoluo and (<b>b</b>) Limu mountains.</p> ">
Figure 7 Cont.
<p>Spatial distributions of the soil properties, including pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP), in (<b>a</b>) Diaoluo and (<b>b</b>) Limu mountains.</p> ">
Review Reports Versions Notes

Abstract

:
For tropical rainforest regions with dense vegetation cover, the development of effective large-scale soil mapping methods is crucial to improve soil management practices to replace the time-consuming and laborious conventional approaches. While machine learning (ML) algorithms demonstrate superior predictability of soil properties over linear models, their practical and automated application for predicting soil properties using remote sensing data requires further assessment. Therefore, this study aims to integrate Unmanned Aerial Vehicles (UAVs)-based hyperspectral images and Light Detection and Ranging (LiDAR) points to predict the soil properties indirectly in two tropical rainforest mountains (Diaoluo and Limu) in Hainan Province, China. A total of 175 features, including texture features, vegetation indices, and forest parameters, were extracted from two study sites. Six ML models, Partial Least Squares Regression (PLSR), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosting Decision Trees (GBDT), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), were constructed to predict soil properties, including soil acidity (pH), total nitrogen (TN), soil organic carbon (SOC), and total phosphorus (TP). To enhance model performance, a Bayesian optimization algorithm (BOA) was introduced to obtain optimal model hyperparameters. The results showed that compared with the default parameter tuning method, BOA always improved models’ performances in predicting soil properties, achieving average R2 improvements of 202.93%, 121.48%, 8.90%, and 38.41% for soil pH, SOC, TN, and TP, respectively. In general, BOA effectively determined the complex interactions between hyperparameters and prediction features, leading to an improved model performance of ML methods compared to default parameter tuning models. The GBDT model generally outperformed other ML methods in predicting the soil pH and TN, while the XGBoost model achieved the highest prediction accuracy for SOC and TP. The fusion of hyperspectral images and LiDAR data resulted in better prediction of soil properties compared to using each single data source. The models utilizing the integration of features derived from hyperspectral images and LiDAR data outperformed those relying on one single data source. In summary, this study highlights the promising combination of UAV-based hyperspectral images with LiDAR data points to advance digital soil property mapping in forested areas, achieving large-scale soil management and monitoring.

1. Introduction

Soil properties, including total nitrogen (TN), total phosphorus (TP), and soil organic carbon (SOC), are vital in vegetation growth, while soil pH plays an essential role in influencing the migration and transformation of soil properties [1]. Considering this, monitoring the spatial distribution and temporal changes of these soil properties is critical for early soil quality detection [2]. Early assessments enable policymakers to develop and implement protective measures and sustain the ecological and agricultural function of soils [3]. However, conventional soil surveys based on field sampling are time-consuming and laborious [4]. This warrants the development of non-contact, large-scale soil mapping methods to improve soil management practices.
Over the past decade, in line with advanced computer vision and sensor technologies, Unmanned Aerial Vehicles (UAVs) have provided a new remote sensing platform that enables the capture of high spatial resolution and hyperspectral data [5]. Numerous studies have applied hyperspectral techniques to predict soil properties [6,7]. As such, UAV-based remote sensing technology has been applied to acquire reflectance spectra of soil for mapping soil property distributions in cultivated lands [8].
Nevertheless, directly acquiring surface soil spectra in regions with dense canopy closure, such as tropical rainforests, poses a challenge that spectral predictions of soil nutrients in these environments must overcome [9]. To address this, researchers have explored the use of vegetation characteristics, derived from remote sensing data, to indirectly predict soil properties. Remote sensing techniques also allow accurate assessment of forest biochemical compositions across broad spatial scales. Numerous studies have developed models linking reflected spectra to the observed vegetation characteristics, such as leaf area index (LAI), biomass, and chlorophyll [10,11]. These models utilize remote sensing data for rapid acquisition of vegetation characteristics across large areas. These vegetation characteristics are linked to soil fertility, with sufficient nutrients supporting vegetation growth and development [12]. Meanwhile, soil nutrients significantly impact vegetation growth, for example, tropical tree species at specific locations [13].
Therefore, the SOC, TN, TP, and pH levels are key indicators that reflect soil fertility levels and are closely related to plant growth status and species composition. The association between soil properties and plant growth suggests the potential application of remote sensing data, which captures vegetation characteristics, to estimate soil property levels. Although various studies have focused on the direct measurement of soil properties through remote sensing, only a few reports have explored the indirect prediction of soil properties via vegetation characteristics derived from remote sensing data, particularly in tropical rainforest regions.
Currently, remote sensing technology primarily provides access to hyperspectral images and Light Detection and Ranging (LiDAR) data. The integration of hyperspectral images and LiDAR-based cloud points has been widely applied to monitor the plant growth status and species compositions accurately [14]. Moreover, hyperspectral data has been demonstrated as a practical approach for classifying forest species and predicting forest biochemical properties [15]. Despite providing detailed spectral information, hyperspectral imagery lacks structural insights. In response, LiDAR serves as a valuable complement to this limitation. As an active remote sensing technology, LiDAR excels at delivering high-resolution three-dimensional (3D) information into the forest’s canopy structure, such as canopy height and vertical structure [16,17]. Previous studies have employed point cloud density, intensity, and density characteristics acquired through LiDAR to estimate forest above-ground biomass [18,19]. In addition, LiDAR point cloud intensity and elevation data effectively describe terrain features and canopy properties in forested areas [20,21], establishing the correlations between point cloud features and underlying soil conditions. Moreover, canopy structure directly corresponds with the plant growth status and species composition [22]. Meanwhile, terrain changes affect the accumulation and movement of soil moisture and organic and inorganic materials, thereby influencing the levels of soil properties [23,24]. This highlights the significant role of terrain in shaping the spatial distribution of soil components. Accordingly, LiDAR point cloud, which provides in-depth terrain information, holds enormous potential for describing spatial changes in soil properties. Given its capacity to reflect plant growth and species composition, the integration of hyperspectral and LiDAR data holds significant potential for the indirect prediction of soil properties. Therefore, we explore and validate this integration, contributing to a deeper understanding of its applicability.
Currently, multivariate linear and non-linear models have been established to model the spatial distribution of soil properties through remote or proximal sensing techniques [25,26]. Nevertheless, linear regression methods often fall short of accurately establishing the complex and non-linear relationships between objection soil property and its decisive spectral features, leading to poor prediction accuracy [4]. In contrast, machine learning (ML) algorithms, such as Random Forest (RF) [27], Support Vector Machine (SVM) [28], and Extreme Gradient Boosting (XGBoost) [29], have demonstrated superior predictability of soil properties over linear models due to their effective approach to handling non-linear problems [30]. Furthermore, these ML techniques were compared to enhance model performance and determine the best-performing model for capturing the distribution patterns of soil parameters in specific domains or tasks [26,31]. Nevertheless, it is worth exploring the most suitable ML methods for indirectly predicting soil properties using spectral and LiDAR point cloud data from the forest.
Selecting suitable hyperparameters of ML algorithms is also vital for achieving optimal models with excellent prediction accuracy. In current studies, model parameters are often manually adjusted through trial and error, leading to a certain level of randomness in the selection process. Among the frequently employed methods for optimizing hyperparameters are random search, manual search, and grid search [32]. However, random search does not guarantee the effectiveness of parameter selection, manual searching is time-consuming and laborious, and grid search requires a significant time to adjust the values. On the contrary, the Bayesian optimization algorithm (BOA) is an intelligent optimization algorithm that outperforms other optimization algorithms as it employs prior knowledge to expedite the search to achieve optimal solutions [33]. For instance, Baili et al. [23] proposed a combined Bayesian Tree-structured Parzen Estimator (TPE) optimization algorithm to automatically select prediction features and adjust hyperparameters to enhance soil salinity prediction of the XGBoost model. More recently, Wang et al. [34] adopted BOA to optimize the hyperparameters of the LightGBM model for predicting soil salinity. However, the practical and automated application of BOA optimization in ML models should be further assessed for predicting soil properties using remote sensing data.
In light of the need for large-scale soil management and monitoring through advanced digital mapping of soil properties in forested areas, this study aims to integrate UAV-based hyperspectral images and LiDAR points for estimating soil properties in tropical rainforests indirectly. The study was structured into three main objectives: (1) investigating the feasibility of fusing UAV hyperspectral and LiDAR data to map the spatial patterns of soil properties in tropical rainforests; (2) comparing various ML algorithms for predicting soil properties using optimally selected features; and (3) evaluating the effectiveness of BOA in automatically optimizing model parameters to enhance soil property inversion. Two tropical rainforest mountains (Diaoluo and Limu) in Hainan Province, China, were selected as the study sites. These sampling locations were used to compare multivariate ML methods for predicting and mapping SOC, TN, TP, and pH levels using UAV hyperspectral images and LiDAR point clouds.

2. Materials and Methods

2.1. Research Framework

Figure 1 shows the framework of the predicting and mapping methods for soil properties in tropical rainforest regions. The methods encompassed three main steps:
(1)
Soil property extraction. Soil samples were collected from the study area, including Diaoluo and Limu mountains. The properties of these soil samples were obtained through chemical analysis. The soil properties from both mountains were integrated to form a whole sample set.
(2)
Feature extraction and selection. Hyperspectral imagery and LiDAR data were acquired using UAV. Vegetation indices and texture features were extracted from the preprocessed hyperspectral data, while terrain-related variables were derived from denoised LiDAR data. We introduced the XGBoost algorithm to calculate feature importance, selecting the most relevant predictors for soil properties to ensure efficient and accurate model training. In addition, the performance of the models using different numbers of features were compared.
(3)
Model construction and soil property mapping. BOA was applied to optimize model parameters, aiming to better capture the potential relationships between soil properties and features. Six ML models were built for soil property estimation. We also compared the performance of models using both hyperspectral data and LiDAR data or using a single type of data. Ultimately, the optimal model was selected to generate spatial distribution maps of soil properties, providing insights into their spatial variability across the study area.
Further supporting materials, including data files and code, are available in the Supplementary Materials.

2.2. Study Area and Sampling Sites

The Diaoluo and Limu mountains are located in southeastern and central Hainan Province, respectively (Figure 2). Diaoluo mountain is in Lingshui County, while Limu mountain is within Qiongzhong County, adjacent to Danzhou and Baisha. Separated by approximately 49.48 km, both mountains represent extremely rare pristine tropical rainforest areas in China, featuring a tropical marine monsoon climate with an average annual temperature of 24 °C and precipitation of 2.25 m. They are among the most biologically diverse regions in China, home to over 3500 plant species. The study area in Diaoluo mountain spans roughly 33.54 hm2, while Limu mountain covers about 37.16 hm2. Diaoluo mountain has a terrain that slopes from west to east, with an elevation of 1499 m, whereas the main peak of Limu mountain rises to 1411.7 m and runs from southwest to northeast.

2.3. Sampling Procedure and Laboratory Chemical Analysis

A total of 14 and 23 sampling sites (Figure 2) were selected from various spatial scenes throughout the Limu and Diaoluo mountains, respectively, to ensure adequate representation of soil properties. At each site, three to four sample quadrats (10 m × 10 m) were set up, with five sampling points—one at each corner and one at the center. Soil profiles were divided into three depth intervals: 0–20 cm, 20–40 cm, and 40–60 cm. Within each quadrat, five soil samples were collected at random depths and combined into one sample. Eventually, 60 samples from Diaoluo mountain and 88 samples from Limu mountain were obtained (Figure 2).
After removing surface debris, such as stones, leaves, and roots, the soils were placed in a plastic bag and transported to the laboratory for chemical analysis. Soil samples were first air-dried at 25 °C without light exposure until achieving constant weight. After grinding, sieving, and acidification, the TN, SOC, and TP of the soil samples were measured using the wet oxidation method [35]. The soil pH was then analyzed using a pH meter, followed by spectrometric analysis using a universal dye indicator [36]. Finally, the properties of the soil samples from both mountains were combined to develop the prediction model.

2.4. UAV Data Acquisition

Hyperspectral images and LiDAR point cloud data were acquired using the UAV (Matrice 600 Pro, DJI Sciences and Technologies Ltd., Shenzhen, China) [37] equipped with a hyperspectral imaging camera (GaiaSky-mini, Dualix Spectral Imaging Ltd., Wuxi, China) and an ARS-100 sensor. Data collection took place from 21–24 February 2023, under clear skies, windless weather, and stable lighting conditions, with temperatures ranging from 21 to 25 °C to ensure high data quality. The hyperspectral sensor recorded hyperspectral data in the visible-to-near-infrared optical range (380–1000 nm) with a 2.8 nm spectral resolution across 270 bands. The UAV flight altitude was maintained at 100 m, producing a spatial resolution of 0.25 m for canopy reflectance imagery. Meanwhile, the ARS-100 sensor collected LiDAR point cloud data with a spatial density of 80 points/m2.

2.5. Hyperspectral Image Preprocessing and Features Extraction

The reflectance bands at the edge (380–420 nm and 900–1000 nm) with a low signal-to-noise ratio were removed due to noise interference. The images were processed for geometric corrections using ENVI software (version 5.6), with multiple ground control points employed to eliminate geometric distortions caused by instability in the sensor platform’s posture and terrain variations. The computational model used for geometric correction was an affine transformation model called the Rotation-Scaling-Translation (RST), and the resampling method applied was the nearest-neighbor technique. Subsequently, the atmospheric correction was applied to the hyperspectral images using the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) module in ENVI, with parameters set to the local atmospheric conditions and sensor-specific settings, such as the sensor type, altitude, and atmospheric model. This correction step removes the atmospheric scattering effects, ensuring more accurate reflectance data. Finally, a Savitzky–Golay (SG) filter with a second-order polynomial and a window size of 10 was employed to preserve important spectral features and reduce spectral curve noise [38].
Visible and near-infrared spectral bands were mathematically combined to derive vegetation indices. A total of 48 vegetation indices were computed (refer to Table 1), which enhance specific spectral information, thereby effectively reflecting vegetation’s physical and chemical indicators, such as above-ground biomass, LAI, and chlorophyll content [39,40]. Soil properties in forested areas can then be indirectly assessed using the interaction between soil and plants.
Texture features represent the typical arrangement of shapes in an image that can be described in terms of roughness, contrast, directionality, and regularity. Texture analysis aids in tracking vegetation growth stages, allowing for determining essential forest biophysical parameters, such as canopy coverage, vertical height, and above-ground biomass, using high-resolution spectral data [41]. In this study, the Gray Level Co-occurrence Matrix (GLCM) was first computed, followed by a statistical analysis of 28 texture features (Table 1).

2.6. LiDAR-Based Feature Extraction

After noise removal, the point cloud data were classified into ground and non-ground cloud points. The former was applied to construct a Digital Elevation Model (DEM), while the latter was utilized to develop a Digital Surface Model (DSM). By subtracting DSM from DEM, a Canopy Height Model (CHM) was calculated and resampled to a 0.25 m spatial resolution using the nearest neighbor method. Figure 3 illustrates the 3D LiDAR point cloud structure of both mountains.
After normalizing the LiDAR point cloud data, statistical parameters related to the elevation values were calculated to obtain height variables in a specific unit, while those associated with point cloud intensity were computed to derive intensity variables. In total, 99 LiDAR-based features with a spatial resolution of 2 m were calculated, including 42 intensity variables, 56 height variables, and a 1 gap ratio variable (GapFraction) (Table 2).

2.7. Informative Features Selection

The excessive use of variables in modeling processes contributes to overfitting, reduced model generalization, and decreased performance and interpretability. In contrast, compressed feature space shortens the training time, enhances predictive accuracy, and prevents overfitting [42]. The XGBoost model can select the most informative feature subsets by iteratively training multiple decision trees and calculating feature importance’s [43]. The gain for each criterion is measured by evaluating the contribution of each tree in the model for enhanced precision, indicating the relative accuracy variation for each feature. The informative features with higher importance are chosen, which serve as key indicators for improving model interpretability. Therefore, this study employed XGBoost to select the top-ranked features for reaching the optimal predicted models for soil properties, including the SOC, TN, TP, and pH values.

2.8. Regression Modeling of Tropical Rainforest Soil Properties

2.8.1. Bayesian Optimization Algorithm (BOA)

This study selected six ML algorithms, including Partial Least Square Regression (PLSR), Adaptive Boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), owing to their ability to fit the complex non-linear relationships between soil properties and selected informative features. The hyperparameter set is crucial for achieving optimal regression models with high prediction accuracy. During model construction, the hyperparameters cannot be directly preset based on the input training samples and must be adjusted during the training process [23]. Although the default parameters, which were chosen empirically from the scikit-learn library for some ML algorithms, provide reasonable performance in most cases, they may exhibit suboptimal behavior in specific scenarios.
The model parameters were optimized by adopting BOA, a black-box optimization algorithm, to identify the extreme values for continuous functions [44]. This algorithm constructs a Gaussian process of the objective function over the given parameter space using sample data, which generates a posterior probability distribution called a Gaussian distribution. Based on this distribution, BOA sampled new potential parameters and calculated their expected improvement, selecting a potential set as the input for the next iteration. In each iteration, the search direction was adjusted, and more optimal candidate points were chosen by integrating previously observed optimal results, progressively approaching the optimal solution of the objective function. In contrast to traditional random and grid search algorithms, this approach considers the performance of previously searched parameters, reducing the number of optimization iterations and enhancing efficiency [34]. Table 3 summarizes the hyperparameters and their optimization limits of the six regression algorithms used for predicting the SOC, TN, TP, and pH levels.

2.8.2. Regression Methods

PLSR is a multivariate statistical analysis that extracts orthogonal or latent predictive variables, which are then processed into a new orthogonal space to construct a linear regression model, especially when the variables exhibit high multicollinearity [45]. As for the widely used ensemble learning algorithm, RF performs regression and prediction by developing multiple decision trees, each trained by randomly selecting feature and sample subsets [46]. Meanwhile, AdaBoost is a classic ensemble learning algorithm that iteratively trains multiple weak learners, enhancing the model by adjusting the weights of misclassified data points [47]. Similarly, GBDT is an iterative decision tree algorithm that improves the model by computing negative gradients to identify issues [48]. Unlike AdaBoost, GBDT is more versatile as it utilizes a wider range of loss functions. As mentioned earlier, XGBoost is a modern ensemble learning algorithm that generates a new tree at each iteration by fitting the residuals from the predictions made in the previous iteration [43]. This algorithm is based on gradient boosting and retains more information about the objective function. It also fastens training speed and stabilizes model complexity by incorporating regularization terms into the objective function, reducing the risk of overfitting [49]. MLP is a type of artificial neural network composed of multiple layers of interconnected nodes, designed for supervised learning tasks such as classification and regression [50].

2.8.3. Model Evaluation

In this study, the soil samples dataset (n = 148) was split using the Kennard –Stone [51] algorithm into the calibration dataset (70%, n = 103) and independent validation dataset (30%, n = 45). The applied algorithm in the calibration/validation dataset exhibited a similar statistical distribution to the entire dataset, preventing inconsistent data distributions between the training and testing phases of the regression models. The regression models were initially trained using the calibration dataset and validated using the validation dataset. The performance of the regression models was assessed using the determination coefficient (R2), root mean square error (RMSE), and relative percent deviation (RPD). Higher validated R2, RPD, and lower RMSE indicate better regression models for predicting soil properties. The formulas for these calculations are as follows:
R 2 = i = 1 n ( y i ^ y ¯ ) 2 i = 1 n ( y i y ¯ ) 2
R M S E = i = 1 n y i y ¯ 2 n 1   ( n > 1 )
R P D = s t d . R M S E
where, y i , y ^ i represent the observed and predicted values of the   i t h sample, respectively. In addition, y ¯ , y ^ ¯ denote the mean values of the observed and predicted values, while std. refers to the standard deviation of the predicted values.

3. Results

3.1. Statistical Description of Soil Properties from Both Study Sites

The soil properties of the 148 soil samples from the two tropical forest areas recorded a pH range of 3.90–7.41 (average of 4.69), SOC of 1.74–8.13% (3.84%), TN of 0.06–0.54% (0.17%), and TP content of 20.00–380.00 ppm (182.13 ppm) (Table 4). The pH recorded the smallest coefficient variation (CV) value of 11.08, indicating a lower likelihood of outliers. Comparatively, the CVs for the SOC, TN, and TP were 28.40%, 38.13%, and 41.75%, respectively, suggesting moderate data variation. As shown in Figure 4, soil pH values at Diaoluo and Limu mountains are approximately similar. However, SOC and TN levels at Limu mountain were lower compared to those from Diaoluo mountain, while the average soil TP content in Limu mountain samples was higher than that of Diaoluo mountain.

3.2. Selected Informative Features

The XGBoost algorithm selected the top 20 most informative features from 175 features derived from hyperspectral imagery and LiDAR data. The optimal number of features for model construction was determined by sequentially building models using the top 1–20 most informative features. As detailed in Table 5, the results indicate generally poor model performance with fewer features. Interestingly, the model performance improved with an increasing number of features, and the accuracy stabilized using around 15 features. However, additional features beyond this peak showed a negligible impact on the model’s performance and may lead to the curse of dimensionality, resulting in a decline in accuracy [52,53]. Therefore, this study utilized the top 15 most informative features as modeling factors. Figure 5 presents the importance ranking of the 15 selected features for pH, SOC, TN, and TP.
In terms of pH, the top five most important features were int_min (32.89%), elev_max (13.58%), int_AII_80th (11.16%), int_AII_60th (8.64%), and elev_AIH_50th (8.25%). GSAVI was the only hyperspectral-derived feature (1.05%) in the selected informative features. For SOC, the int_stddev feature contributed over half of the reversal model (55.68%). As observed in Figure 5c, int_variance was the most crucial feature (57.62%) for predicting the TN content in both tropical forest areas, followed by elev_percentile_10th (9.84%), int_aad (7.87%), elev_AIH_60th (5.30%), and elev_AIH_50th (4.60%). Additionally, NDMI, SIPI, CRI1, and GEMI were more important than other hyperspectral vegetation indices. The intensity percentiles of LiDAR (int_percentile_75th, int_percentile_80th, and int_percentile_99th) were also essential in predicting the TP.

3.3. Optimal Hyperparameter and Model Validation

The selected features were pre-normalized and combined with the measured soil properties to construct the inversion models. In this process, BOA was employed to optimize the model hyperparameters by considering their complex dependencies with the input informative features [23]. Initial hyperparameters were randomly generated from the hyperparameter space (Table 3), and six ML models were subsequently trained, each evaluated using specific metrics. Therefore, the optimal hyperparameters were determined based on the model with the lowest RMSE value of the validation dataset.
Table 5 shows the prediction accuracies of the soil properties based on the proposed ML models using only features derived from hyperspectral images (HF), LiDAR-derived features only, and a combination of both. When using LiDAR-derived features, the prediction model performances for various soil properties were significantly higher than those constructed using HF. In contrast, the models built using the combined features achieved higher accuracy compared to LiDAR-derived features alone.According to the pH prediction, the proposed ML models showed different performances. For instance, the GBDT model (R2 = 0.49) exhibited higher inversion accuracy than the other models with an RPD exceeding 1.40, indicating a fair model for correlation assessment [6]. Notably, the PLSR model exhibited significantly lower accuracy compared to other models (Table 6). In terms of SOC prediction, the XGBoost model demonstrated the best performance, with an R2 of 0.46, followed by GBDT (R2 = 0.44), AdaBoost (R2 = 0.43), MLP (R2 = 0.41), RF (R2 = 0.29), and PLSR (R2 = 0.22). The applied BOA enhanced the prediction accuracy of the AdaBoost, RF, GBDT, XGBoost, and MLP models, with R2 values increasing by 53.57%, 3.57%, 109.52%, 206.67%, and 355.56%, respectively (Table 6).
With regards to TN, the RF and XGBoost models achieved equal performance (R2 = 0.58), while the PLSR, AdaBoost, and MLP models recorded R2 values of 0.34, 0.51, and 53%, respectively. The GBDT model attained the highest TN estimation accuracy, with an R2 value of 0.6 and RPD of 1.60, allowing for the discrimination of sample component content [6]. Meanwhile, the TP prediction accuracy showed a relatively poor inversion performance for all ML models, possibly due to several issues, such as a small sample size and high data variation. The XGBoost model exhibited the highest R2 value of 0.39, with the RF and GBDT models recording relatively close R2 value of 0.37. The PLSR model remained the least-performing model (R2 = 0.29).
By BOA optimizing, the models obtained higher accuracies than those by the default parameters tuning, indicating the effectiveness of the hyperparameter tuning by BOA in improving model accuracy. As such, the R2 of the AdaBoost and GBDT models showed the highest improvements with 172.89% and 172.63%, followed by the MLP (110.66%), XGBoost (79.13%), and PLSR (20.46%) models (Table 6). Conversely, the RF model showed the least accuracy improvement (1.82%). However, the R2 of the XGBoost model for TN prediction declined by −5.17% after BOA optimization, likely due to the extensive BOA-processed maximum depth value, leading to an overly complex decision tree and subsequent overfitting [54]. Similarly, the n_estimators and min_samples_split values for pH prediction in the RF model were higher after BOA processing, resulting in a decreased accuracy of −27.03%. These results reflect that BOA specifically calculated local optimal values, while default model parameters are more universally applicable, yielding better performance than those tuned by BOA.
Figure 6 illustrates the scatter plots of the measured values versus soil property levels predicted by the best models for each soil property. The R2 values for the validation dataset for pH, SOC, TN, and TP were 0.49, 0.46, 0.60, and 0.39, respectively. While the models show some degree of alignment with the 1:1 line, indicating a general trend, some discrepancies are observed. The models appear to perform reasonably well for SOC and TN, with more variation for pH and TP. Notably, the pH values are concentrated within a narrower range (approximately between 4 and 5), which results in a higher R2 value for pH compared to SOC and TP. These results suggest that while the models show moderate accuracy for pH, SOC, and TN, their ability to capture the full range of variation for TP may be limited.

3.4. Mapping the Spatial Distribution of Soil Properties

The optimal ML models for each soil property were used to construct spatial distribution maps in Diaoluo mountain (Figure 7a) and Limu mountain (Figure 7b). The soil pH values in most regions of the Diaoluo and Limu mountains were under 7, indicating predominantly weak soil acidity. Higher pH values were predicted in the center, northeastern, and northwestern parts of Diaoluo mountain, while Limu mountain exhibited a less drastic variation in soil pH, with higher concentrations in the southern region compared to the northern region.
Figure 7b shows the same spatial distribution trend of SOC and TN in Limu mountain, with an overall higher spatial trend in the northeast compared to the southwest. Similarly, the SOC and TN in the west of Diaoluo mountain were higher than in the east. When compared, the TP content in Limu mountain was generally higher than in Diaoluo mountain.

4. Discussion

4.1. Feature Selection and Importance Analysis

The results of this study showed that the use of hyperspectral images and LiDAR data enabled indirect quantitative prediction of soil properties in tropical rainforests. Apart from that, feature importance ranking serves as a crucial indicator for selecting features and interpreting models [42]. In this study, the feature selection results revealed that the number of features also plays an essential role. The complex relationships between the features and soil properties could not be established using insufficient features, while the excessive use of features may lead to overfitting. The present finding demonstrates that point cloud features significantly outperformed vegetation indices in estimating soil properties in the tropical rainforest study sites. Vegetation indices contributed minimally to the properties inversion, while features derived from point clouds consistently ranked in the top five importance rankings for various soil property inversions.
Spectral imaging struggles to directly detect surface soil information in regions with dense vegetation coverage, likely due to vegetation indices reaching saturation levels [55]. In contrast, LiDAR point clouds can capture structural features of vegetation canopies and terrain changes, offering an edge in describing the physical and chemical parameters of vegetation over vegetation indices derived from spectral images. While models built using only hyperspectral features registered lower accuracy, the integrated hyperspectral and LiDAR data achieved modest improvement in prediction performance compared to LiDAR alone. This finding underscores both the limitations of hyperspectral imaging in closed-canopy environments and the potential benefits of integrating hyperspectral and LiDAR data for more accurate soil property predictions.

4.2. Performance Analysis of ML Models and BOA Optimization

Additionally, the present findings revealed that the ML models without BOA processing showed poor performances, making default parameters generally unsuitable for various data types. On the contrary, ML models optimized through BOA-based hyperparameter tuning exhibited higher prediction accuracy. The parameters optimized by BOA were more effective in capturing the complex relationships within the data for predicting specific soil properties, providing a valuable reference for improving the prediction accuracy of soil properties and reducing manual tuning costs [56].
In the experiment, BOA displayed the highest improvement in the pH inversion model, followed by SOC. During the optimization process, BOA may encounter local optima due to the parameter space and the setting of acquisition functions [44]. Typically, the parameters optimized by BOA enhanced the accuracy of the model. The PLSR model required the least optimization time but consistently underperformed compared to other models, while the GDBT model took the longest optimization time but yielded better performance. Moreover, models constructed using ML algorithms outperformed linear PLSR models, suggesting the non-linear relationship between soil properties and feature variables. More complex ML models are better suited to identifying the intrinsic non-linear relationship between predictive features and soil properties [4,34].
In this study, the AdaBoost, GBDT, and XGBoost models remarkably outperformed the RF and PLSR models, providing satisfactory estimation of soil properties. These three decision tree-based ensemble learning models effectively comprehend complex relationships within the data by iteratively training decision trees and integrating the results of multiple weak learners, thereby progressively enhancing predictive performance [57]. This ensemble learning approach also allows the models to correct errors from previous iterations in each round, forming more robust and accurate predictions [58]. Hence, the effectiveness of AdaBoost, GBDT, and XGBoost highlights that ensemble learning is a vital method for estimating soil properties, offering a valuable reference for improving prediction accuracy. In addition, the performance of MLP model did not meet expectations, which may be due to the limited sample size of the dataset. In contrast, tree-based algorithms perform well even on small-scale datasets and have relatively lower data requirements.

4.3. The Prospect and Limitations of the Proposed Method

Data quality contributes to the development of ML models, given the significant impact on their predictive accuracy, robustness, and generalization [59]. A well-structured dataset that covers substantial data variation and effectively describes the variability of new datasets frequently enhances the performance of ML models. This study found that the ML models exhibited comparable predicting performance across different properties, with the highest accuracy achieved in predicting TN content, followed by pH and SOC, while TP prediction was the least accurate. This may be attributed to the highest CV in TP levels, leading to higher sensitivity to noise and outliers. Oppositely, the excellent predictive performance for TN may be linked to its extremely low SD, indicating relatively stable data that facilitate the model’s ability to capture the expected patterns.
Furthermore, the spatial maps of soil properties indicate that high SOC and TN values were primarily distributed at higher altitudes. This aligns with previous studies that demonstrated positive correlations of high elevations against SOC and TN levels [42]. Most soils in the Diaoluo and Limu mountain regions were weakly acidic, possibly due to the high temperature and humidity in tropical rainforest areas that promote organic matter degradation, generating more acidic inorganic substances [60]. In addition, regions with high TN and TP content mainly exhibited acidity, which may be attributed to higher nitrate and phosphate levels, further supporting the reliability of the spatial distribution observed in the inversion results.
Uncertainties in the predicted results of the constructed model may occur due to data quality issues resulting from noise, instrument limitations, collection errors, or an insufficient number of features selected during model training. In particular, the use of features selected by XGBoost could affect the accuracy of other models, as XGBoost-based feature selection may not be optimal for all ML algorithms. Additionally, the uneven spatial distribution of soil samples, particularly in Limu mountain, may have introduced further uncertainty in the predictions. It is worth noting that although Limu mountain and Diaoluo mountain are spatially separated, they share similar ecological conditions as part of the tropical rainforest biome on Hainan Island. Hence, samples from both forests were combined during model construction to capture the overall trends and variations in soil properties across this region.
However, the measured data distribution between the two mountains is not entirely similar (Figure 4). For instance, soil TN and TP values exhibited significant spatial variability, likely due to substantial spatial variability in the soil-forming environment. Overall, the soil TP content in Limu mountain is significantly higher compared to Diaoluo mountain, while its soil TN content is generally lower. In future studies, combining predictions from multiple models, assigning weights based on model accuracy, and comparing training data across different models using various model-agnostic feature selection processes could mitigate these uncertainties.

5. Conclusions

This study compared the modeling performances of six ML algorithms (PLSR, RF, AdaBoost, GBDT, XGBoost, and MLP) in predicting soil properties (pH, SOC, TN, and TP) from two tropical rainforest regions of the Diaoluo and Limu mountains using integrated UAV hyperspectral data and LiDAR point clouds. The XGBoost algorithm selected 15 features from the combined 175 vegetation indices, texture features, and LiDAR-based features for the inversion model of different soil properties. The model parameters were also optimized using BOA, and the performance was compared before and after optimization. In conclusion, the integration of UAV-based hyperspectral images with LiDAR point clouds offered a promising approach for indirectly estimating soil properties in tropical rainforest areas. In addition, the feature importance ranking revealed the varying roles of remote sensing features in predicting multivariate soil properties, highlighting the ability of combined hyperspectral data and LiDAR point clouds to capture canopy structure and terrain changes. Furthermore, LiDAR-derived point cloud features outperformed vegetation indices, enhancing inversion accuracy by effectively characterizing vegetation growth conditions and terrain changes. The BOA effectively tuned the model hyperparameters by considering the complex interaction between predictive features and soil properties, enhancing the accuracy of the prediction models compared to models with default parameters. Overall, these findings offer valuable insights for future research on soil property mapping in tropical rainforests. Future research is expected to extend this proposed method to other tropical rainforest regions by expanding the initial feature set.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/f15122222/s1, including code files and data.

Author Contributions

Conceptualization, Y.C. and T.S.; Methodology, Y.C. and Q.L.; Software, T.S.; Validation, C.Y.; Formal analysis, C.Y. and X.P.; Investigation, X.P.; Resources, Z.W.; Data curation, Z.C.; Writing—original draft, Q.L.; Writing—review & editing, Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Hainan Provincial Science and Technology Program (ZDKJ202008) and the Peng Cheng Laboratory Research Project (PCL2023AS6-1).

Data Availability Statement

The data generated and analyzed during this study are available from the corresponding author on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Roelofsen, H.D.; van Bodegom, P.M.; Kooistra, L.; van Amerongen, J.J.; Witte, J.-P.M. An Evaluation of Remote Sensing Derived Soil pH and Average Spring Groundwater Table for Ecological Assessments. Int. J. Appl. Earth Obs. Geoinf. 2015, 43, 149–159. [Google Scholar] [CrossRef]
  2. Chen, S.; Lin, B.; Li, Y.; Zhou, S. Spatial and Temporal Changes of Soil Properties and Soil Fertility Evaluation in a Large Grain-Production Area of Subtropical Plain, China. Geoderma 2020, 357, 113937. [Google Scholar] [CrossRef]
  3. Morvan, X.; Saby, N.P.A.; Arrouays, D.; Le Bas, C.; Jones, R.J.A.; Verheijen, F.G.A.; Bellamy, P.H.; Stephens, M.; Kibblewhite, M.G. Soil Monitoring in Europe: A Review of Existing Systems and Requirements for Harmonisation. Sci. Total Environ. 2008, 391, 1–12. [Google Scholar] [CrossRef]
  4. Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef]
  5. Aasen, H.; Honkavaara, E.; Lucieer, A.; Zarco-Tejada, P.J. Quantitative Remote Sensing at Ultra-High Resolution with UAV Spectroscopy: A Review of Sensor Technology, Measurement Procedures, and Data Correction Workflows. Remote Sens. 2018, 10, 1091. [Google Scholar] [CrossRef]
  6. Shi, T.; Cui, L.; Wang, J.; Fei, T.; Chen, Y.; Wu, G. Comparison of Multivariate Methods for Estimating Soil Total Nitrogen with Visible/near-Infrared Spectroscopy. Plant Soil 2013, 366, 363–375. [Google Scholar] [CrossRef]
  7. Morellos, A.; Pantazi, X.-E.; Moshou, D.; Alexandridis, T.; Whetton, R.; Tziotzios, G.; Wiebensohn, J.; Bill, R.; Mouazen, A.M. Machine Learning Based Prediction of Soil Total Nitrogen, Organic Carbon and Moisture Content by Using VIS-NIR Spectroscopy. Biosyst. Eng. 2016, 152, 104–116. [Google Scholar] [CrossRef]
  8. Gulhane, V.; Rode, S.; Pande, C. Wavelet for Predicting Soil Nutrients Using Remotely Sensed Satellite Images. Int. J. Comput. Appl. 2017, 174, 35–38. [Google Scholar] [CrossRef]
  9. Chen, Z.; Chen, Y.; Shi, T.; Chen, X.; Pan, X.; Lei, J.; Wu, T.; Li, Y.; Liu, Q.; Liu, X. Estimation of Soil Organic Carbon in Tropical Rainforest Regions by Combining Uav Hyperspectral and Lidar Data. SSRN 2023. [Google Scholar] [CrossRef]
  10. Chlus, A.; Townsend, P.A. Characterizing Seasonal Variation in Foliar Biochemistry with Airborne Imaging Spectroscopy. Remote Sens. Environ. 2022, 275, 113023. [Google Scholar] [CrossRef]
  11. Shen, X.; Cao, L.; Coops, N.C.; Fan, H.; Wu, X.; Liu, H.; Wang, G.; Cao, F. Quantifying Vertical Profiles of Biochemical Traits for Forest Plantation Species Using Advanced Remote Sensing Approaches. Remote Sens. Environ. 2020, 250, 112041. [Google Scholar] [CrossRef]
  12. Santillano Cázares, J.; Roque Díaz, L.G.; Núñez Ramírez, F.; Grijalva Contreras, R.L.; Robles Contreras, F.; Macías Duarte, R.; Escobosa García, I.; Cárdenas Salazar, V. Soil Fertility Affects the Growth, Nutrition and Yield of Cotton Cultivated in Two Irrigation Systems and Different Nitrogen Rates. Terra Latinoam. 2019, 37, 7–14. [Google Scholar] [CrossRef]
  13. John, R.; Dalling, J.W.; Harms, K.E.; Yavitt, J.B.; Stallard, R.F.; Mirabello, M.; Hubbell, S.P.; Valencia, R.; Navarrete, H.; Vallejo, M.; et al. Soil Nutrients Influence Spatial Distributions of Tropical Tree Species. Proc. Natl. Acad. Sci. USA 2007, 104, 864–869. [Google Scholar] [CrossRef]
  14. Khaleghi, B.; Khamis, A.; Karray, F.O.; Razavi, S.N. Multisensor Data Fusion: A Review of the State-of-the-Art. Inf. Fusion 2013, 14, 28–44. [Google Scholar] [CrossRef]
  15. Vaglio Laurin, G.; Chen, Q.; Lindsell, J.A.; Coomes, D.A.; Frate, F.D.; Guerriero, L.; Pirotti, F.; Valentini, R. Above Ground Biomass Estimation in an African Tropical Forest with Lidar and Hyperspectral Data. Isprs J. Photogramm. Remote Sens. 2014, 89, 49–58. [Google Scholar] [CrossRef]
  16. Shen, Z.; Miao, J.; Wang, J.; Zhao, D.; Tang, A.; Zhen, J. Evaluating Feature Selection Methods and Machine Learning Algorithms for Mapping Mangrove Forests Using Optical and Synthetic Aperture Radar Data. Remote Sens. 2023, 15, 5621. [Google Scholar] [CrossRef]
  17. Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest Canopy Height Mapping by Synergizing ICESat-2, Sentinel-1, Sentinel-2 and Topographic Information Based on Machine Learning Methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
  18. Campbell, M.J.; Dennison, P.E.; Kerr, K.L.; Brewer, S.C.; Anderegg, W.R.L. Scaled Biomass Estimation in Woodland Ecosystems: Testing the Individual and Combined Capacities of Satellite Multispectral and Lidar Data. Remote Sens. Environ. 2021, 262, 112511. [Google Scholar] [CrossRef]
  19. Qin, S.; Nie, S.; Guan, Y.; Zhang, D.; Wang, C.; Zhang, X. Forest Emissions Reduction Assessment Using Airborne LiDAR for Biomass Estimation. Resour. Conserv. Recycl. 2022, 181, 106224. [Google Scholar] [CrossRef]
  20. Wu, X.; Shen, X.; Zhang, Z.; Cao, F.; She, G.; Cao, L. An Advanced Framework for Multi-Scale Forest Structural Parameter Estimations Based on UAS-LiDAR and Sentinel-2 Satellite Imagery in Forest Plantations of Northern China. Remote Sens. 2022, 14, 3023. [Google Scholar] [CrossRef]
  21. Gao, L.; Chai, G.; Zhang, X. Above-Ground Biomass Estimation of Plantation with Different Tree Species Using Airborne LiDAR and Hyperspectral Data. Remote Sens. 2022, 14, 2568. [Google Scholar] [CrossRef]
  22. Majasalmi, T.; Rautiainen, M. The Impact of Tree Canopy Structure on Understory Variation in a Boreal Forest. For. Ecol. Manag. 2020, 466, 118100. [Google Scholar] [CrossRef]
  23. Chen, B.; Zheng, H.; Luo, G.; Chen, C.; Bao, A.; Liu, T.; Chen, X. Adaptive Estimation of Multi-Regional Soil Salinization Using Extreme Gradient Boosting with Bayesian TPE Optimization. Int. J. Remote Sens. 2022, 43, 778–811. [Google Scholar] [CrossRef]
  24. Peng, J.; Biswas, A.; Jiang, Q.; Zhao, R.; Hu, J.; Hu, B.; Shi, Z. Estimating Soil Salinity from Remote Sensing and Terrain Data in Southern Xinjiang Province, China. Geoderma 2019, 337, 1309–1319. [Google Scholar] [CrossRef]
  25. Shi, T.; Guo, L.; Chen, Y.; Wang, W.; Shi, Z.; Li, Q.; Wu, G. Proximal and Remote Sensing Techniques for Mapping of Soil Contamination with Heavy Metals. Appl. Spectrosc. Rev. 2018, 53, 783–805. [Google Scholar] [CrossRef]
  26. Zhou, T.; Geng, Y.; Chen, J.; Pan, J.; Haase, D.; Lausch, A. High-Resolution Digital Mapping of Soil Organic Carbon and Soil Total Nitrogen Using DEM Derivatives, Sentinel-1 and Sentinel-2 Data Based on Machine Learning Algorithms. Sci. Total Environ. 2020, 729, 138244. [Google Scholar] [CrossRef]
  27. Zhang, Y.; Sui, B.; Shen, H.; Ouyang, L. Mapping Stocks of Soil Total Nitrogen Using Remote Sensing Data: A Comparison of Random Forest Models with Different Predictors. Comput. Electron. Agric. 2019, 160, 23–30. [Google Scholar] [CrossRef]
  28. Jiang, H.; Rusuli, Y.; Amuti, T.; He, Q. Quantitative Assessment of Soil Salinity Using Multi-Source Remote Sensing Data Based on the Support Vector Machine and Artificial Neural Network. Int. J. Remote Sens. 2019, 40, 284–306. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Liang, S.; Zhu, Z.; Ma, H.; He, T. Soil Moisture Content Retrieval from Landsat 8 Data Using Ensemble Learning. ISPRS J. Photogramm. Remote Sens. 2022, 185, 32–47. [Google Scholar] [CrossRef]
  30. Swapna, B.; Manivannan, S.; Kamalahasan, M. Prognostic of Soil Nutrients and Soil Fertility Index Using Machine Learning Classifier Techniques. Int. J. E-Collab. 2022, 18, 14. [Google Scholar] [CrossRef]
  31. Jeong, G.; Oeverdieck, H.; Park, S.J.; Huwe, B.; Ließ, M. Spatial Soil Nutrients Prediction Using Three Supervised Learning Methods for Assessment of Land Potentials in Complex Terrain. CATENA 2017, 154, 73–84. [Google Scholar] [CrossRef]
  32. Putatunda, S.; Rama, K. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, Shanghai, China, 28 November 2018; pp. 6–10. [Google Scholar]
  33. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
  34. Wang, L.; Hu, P.; Zheng, H.; Liu, Y.; Cao, X.; Hellwich, O.; Liu, T.; Luo, G.; Bao, A.; Chen, X. Integrative Modeling of Heterogeneous Soil Salinity Using Sparse Ground Samples and Remote Sensing Images. Geoderma 2023, 430, 116321. [Google Scholar] [CrossRef]
  35. Xu, S.; Zhao, Y.; Wang, M.; Shi, X. Comparison of Multivariate Methods for Estimating Selected Soil Properties from Intact Soil Cores of Paddy Fields by Vis–NIR Spectroscopy. Geoderma 2018, 310, 29–43. [Google Scholar] [CrossRef]
  36. Das, P.; Paul, S.; Bhattacharya, S.S.; Nath, P. Smartphone-Based Spectrometric Analyzer for Accurate Estimation of pH Value in Soil. IEEE Sens. J. 2021, 21, 2839–2845. [Google Scholar] [CrossRef]
  37. Support for Matrice 600 Pro. Available online: https://www.dji.com/support/product/matrice600-pro (accessed on 5 December 2024).
  38. Vasques, G.M.; Grunwald, S.; Sickman, J.O. Comparison of Multivariate Methods for Inferential Modeling of Soil Carbon Using Visible/near-Infrared Spectra. Geoderma 2008, 146, 14–25. [Google Scholar] [CrossRef]
  39. Tillack, A.; Clasen, A.; Kleinschmit, B.; Förster, M. Estimation of the Seasonal Leaf Area Index in an Alluvial Forest Using High-Resolution Satellite-Based Vegetation Indices. Remote Sens. Environ. 2014, 141, 52–63. [Google Scholar] [CrossRef]
  40. Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between Leaf Chlorophyll Content and Spectral Reflectance and Algorithms for Non-Destructive Chlorophyll Assessment in Higher Plant Leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef] [PubMed]
  41. Yang, K.; Gong, Y.; Fang, S.; Duan, B.; Yuan, N.; Peng, Y.; Wu, X.; Zhu, R. Combining Spectral and Texture Features of UAV Images for the Remote Estimation of Rice LAI throughout the Entire Growing Season. Remote Sens. 2021, 13, 3001. [Google Scholar] [CrossRef]
  42. Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
  43. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  44. Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: San Francisco, CA, USA, 2012; Volume 25. [Google Scholar]
  45. Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
  46. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  47. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  48. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  49. Lin, M.; Zhu, X.; Hua, T.; Tang, X.; Tu, G.; Chen, X. Detection of Ionospheric Scintillation Based on XGBoost Model Improved by SMOTE-ENN Technique. Remote Sens. 2021, 13, 2577. [Google Scholar] [CrossRef]
  50. Dam Nguyen, D.; Roussis, P.C.; Thai Pham, B.; Ferentinou, M.; Mamou, A.; Quang Vu, D.; Thi Bui, Q.-A.; Kien Trong, D.; Asteris, P.G. Bagging and Multilayer Perceptron Hybrid Intelligence Models Predicting the Swelling Potential of Soil. Transp. Geotech. 2022, 36, 100797. [Google Scholar] [CrossRef]
  51. Chen, S.; Xu, H.; Xu, D.; Ji, W.; Li, S.; Yang, M.; Hu, B.; Zhou, Y.; Wang, N.; Arrouays, D.; et al. Evaluating Validation Strategies on the Performance of Soil Property Prediction from Regional to Continental Spectral Data. Geoderma 2021, 400, 115159. [Google Scholar] [CrossRef]
  52. Jain, A.; Zongker, D. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 153–158. [Google Scholar] [CrossRef]
  53. Zhang, W.; Wu, C.; Zhong, H.; Li, Y.; Wang, L. Prediction of Undrained Shear Strength Using Extreme Gradient Boosting and Random Forest Based on Bayesian Optimization. Geosci. Front. 2021, 12, 469–477. [Google Scholar] [CrossRef]
  54. Huang, X.; Liu, W.; Guo, Q.; Tan, J. Prediction Method for the Dynamic Response of Expressway Lateritic Soil Subgrades on the Basis of Bayesian Optimization CatBoost. Soil Dyn. Earthq. Eng. 2024, 186, 108943. [Google Scholar] [CrossRef]
  55. Gao, S.; Zhong, R.; Yan, K.; Ma, X.; Chen, X.; Pu, J.; Gao, S.; Qi, J.; Yin, G.; Myneni, R.B. Evaluating the Saturation Effect of Vegetation Indices in Forests Using 3D Radiative Transfer Simulations and Satellite Observations. Remote Sens. Environ. 2023, 295, 113665. [Google Scholar] [CrossRef]
  56. Lee, S.; Bae, J.H.; Hong, J.; Yang, D.; Panagos, P.; Borrelli, P.; Yang, J.E.; Kim, J.; Lim, K.J. Estimation of Rainfall Erosivity Factor in Italy and Switzerland Using Bayesian Optimization Based Machine Learning Models. CATENA 2022, 211, 105957. [Google Scholar] [CrossRef]
  57. Tajik, S.; Ayoubi, S.; Zeraatpisheh, M. Digital Mapping of Soil Organic Carbon Using Ensemble Learning Model in Mollisols of Hyrcanian Forests, Northern Iran. Geoderma Reg. 2020, 20, e00256. [Google Scholar] [CrossRef]
  58. R, S.; Ayachit, S.S.; Patil, V.; Singh, A. Competitive Analysis of the Top Gradient Boosting Machine Learning Algorithms. In Proceedings of the 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 191–196. [Google Scholar]
  59. Jain, A.; Patel, H.; Nagalapatti, L.; Gupta, N.; Mehta, S.; Guttula, S.; Mujumdar, S.; Afzal, S.; Sharma Mittal, R.; Munigala, V. Overview and Importance of Data Quality for Machine Learning Tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 6–10 July 2020; pp. 3561–3562. [Google Scholar]
  60. Sierra, C.A.; Trumbore, S.E.; Davidson, E.A.; Vicca, S.; Janssens, I. Sensitivity of Decomposition Rates of Soil Organic Matter with Respect to Simultaneous Changes in Temperature and Moisture. J. Adv. Model. Earth Syst. 2015, 7, 335–356. [Google Scholar] [CrossRef]
Figure 1. Workflow of the soil property mapping method in tropical rainforest regions.
Figure 1. Workflow of the soil property mapping method in tropical rainforest regions.
Forests 15 02222 g001
Figure 2. (a) Geographic location of Hainan Province, China; spatial distribution of soil samples in (b) Diaoluo, and (c) Limu mountain.
Figure 2. (a) Geographic location of Hainan Province, China; spatial distribution of soil samples in (b) Diaoluo, and (c) Limu mountain.
Forests 15 02222 g002
Figure 3. Top and side 3D view of LiDAR point cloud of (a) Diaoluo and (b) Limu mountains.
Figure 3. Top and side 3D view of LiDAR point cloud of (a) Diaoluo and (b) Limu mountains.
Forests 15 02222 g003
Figure 4. Comparison of soil properties between samples from Diaoluo and Limu mountains: (a) pH; (b) soil organic carbon (SOC); (c) total nitrogen (TN); and (d) total phosphorus (TP). Dashed lines represent the mean value.
Figure 4. Comparison of soil properties between samples from Diaoluo and Limu mountains: (a) pH; (b) soil organic carbon (SOC); (c) total nitrogen (TN); and (d) total phosphorus (TP). Dashed lines represent the mean value.
Forests 15 02222 g004aForests 15 02222 g004b
Figure 5. Importance ranking of the 15 selected features for predicting the (a) pH, (b) soil organic carbon (SOC), (c) total nitrogen (TN), and (d) total phosphorus (TP).
Figure 5. Importance ranking of the 15 selected features for predicting the (a) pH, (b) soil organic carbon (SOC), (c) total nitrogen (TN), and (d) total phosphorus (TP).
Forests 15 02222 g005
Figure 6. Scatter plots of the measured values against soil property levels predicted by the optimal models: (a) pH predicted by the GBDT model; (b) soil organic carbon (SOC) predicted by the XGBoost model; (c) total nitrogen (TN) predicted by the GBDT model; (d) total phosphorus (TP) predicted by the XGBoost model.
Figure 6. Scatter plots of the measured values against soil property levels predicted by the optimal models: (a) pH predicted by the GBDT model; (b) soil organic carbon (SOC) predicted by the XGBoost model; (c) total nitrogen (TN) predicted by the GBDT model; (d) total phosphorus (TP) predicted by the XGBoost model.
Forests 15 02222 g006
Figure 7. Spatial distributions of the soil properties, including pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP), in (a) Diaoluo and (b) Limu mountains.
Figure 7. Spatial distributions of the soil properties, including pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP), in (a) Diaoluo and (b) Limu mountains.
Forests 15 02222 g007aForests 15 02222 g007b
Table 1. The calculation formulas of vegetation indices and texture features derived from UAV hyperspectral images.
Table 1. The calculation formulas of vegetation indices and texture features derived from UAV hyperspectral images.
Variable NameFormulaVariable NameFormula
anthocyanin reflectance index 1 A R I 1 = 1 / ρ 550 1 / ρ 700 structure insensitive pigment index S I P I = ( ρ 800 ρ 445 ) / ( ρ 800 ρ 680 )
anthocyanin reflectance index 2 A R I 2 = ρ 800 × 1 / ρ 550 1 / ρ 700 sum greenness indexSGI is the average reflectance in the 500–600 nanometer portion of the spectrum.
carotenoid reflectance index 1 C R I 1 = 1 / ρ 510 1 / ρ 550 transformed chlorophyll absorption reflectance index T C A R I = 3 [ ρ 700 ρ 670 0.2 ρ 700 ρ 550 ρ 700 / ρ 670 ]
carotenoid reflectance index 2 C R I 2 = 1 / ρ 510 1 / ρ 700 transformed difference vegetation index T D V I = 0.5 + ( N I R R e d ) / ( N I R + R e d )
difference vegetation index D V I = N I R R e d triangular greenness index T G I = R e d B l u e R e d G r e e n R e d G r e e n R e d B l u e / 2
enhanced vegetation index E V I = 2.5 × N I R R e d N I R + 6 × R e d 7.5 × B l u e + 1 triangular vegetation index T V I = 120 ρ 750 ρ 550 200 ρ 670 ρ 550 / 2
global environment monitoring index G E M I = e t a 1 0.25 × e t a ( R e d 0.125 ) / ( I R e d ) e t a = 2 × N I R 2 R e d 2 + 1.5 × N I R + 0.5 × R e d N I R + R e d + 0.5 visible atmospheric pressure index V A R I = ( G r e e n R e d ) / ( G r e e n + R e d B l u e )
green atmosphere resistance index G A R I = N I R G r e e n γ B l u e R e d N I R + G r e e n γ B l u e R e d ,   γ = 1.7 vogelmann red edge index 1 V R E I 1 = ρ 740 / ρ 720
green chlorophyll cndex G C I = N I R / G r e e n 1 vogelmann red edge index 2 V R E I 2 = ( ρ 734 ρ 747 ) / ( ρ 715 + ρ 726 )
green difference vegetation index G D V I = N I R G r e e n water band index W B I = ρ 970 / ρ 900
green leaf index G L I = G r e e n R e d + G r e e n B l u e 2 × G r e e n + R e d + B l u e energy E = i = 1 N j = 1 N P i j 2
green normalized vegetation index G N D V I = ( N I R G r e e n ) / ( N I R + G r e e n ) entropy H = i = 1 N j = 1 N P i j l g P i j
green optimized soil-adjusted vegetation index G O S A V I = ( N I R G r e e n ) / ( N I R + G r e e n + 0.16 ) correlation C = ( i j ( i i ¯ ) ( j j ¯ ) P i j ) / ( σ i σ j )
green vegetation index G R V I = N I R / G r e e n inverse difference moment I D M = i j P i j / [ 1 + ( i j ) 2 ]
greenness adjusted vegetation index G S A V I = 1.5 + ( N I R G r e e n ) / ( N I R + G r e e n + 0.5 ) inertia I = i = 1 N j = 1 N ( i j ) 2 P i j
infrared percentage vegetation index P V I = N I R / ( N I R + R e d ) cluster shade C S = i = 1 N j = 1 N ( i + j i ¯ j ¯ ) 3 P i j
leaf area index L A I = 3.618 × E V I 0.118 cluster prominence C P = i = 1 N j = 1 N ( i + j i ¯ j ¯ ) 4 P i j
modified chlorophyll absorption ratio index M C A R I = [ ρ 700 ρ 670 0.2 × ρ 700 ρ 550 ] × ρ 700 / ρ 670 haralick correlation H C = i j ( i i ¯ ) ( j j ¯ ) P i j / ( σ i σ j )
modified chlorophyll absorption ratio index–modified M C A R I 2 = 1.5 [ 2.5 ( ρ 800 ρ 670 1.3 ρ 800 ρ 550 ] ( 2 × ρ 800 + 1 ) 2 6 × ρ 800 5 × ρ 670 0.5 mean M = i ,   j = 1 N P i j / N
modified non-linear index M N L I = N I R 2 R e d × 1 + L N I R 2 + R e d + L ,   L = 0.5 variance D = i = 1 ,   j = 0 N 1 ( P i j μ )
modified red edge normalized difference vegetation index M R E N D V I = ( ρ 750 ρ 705 ) / ( ρ 750 + ρ 705 2 × ρ 445 ) dissimilarity H = i = 1 ,   j = 0 N 1 P i j | i j |
modified red edge ratio index M R E S R = ( ρ 750 ρ 445 ) / ( ρ 750 + ρ 445 ) sum average S A = k = 2 2 N k × P s u m ( k )
modified simple ratio M S R = ( N I R / R e d 1 ) / ( N I R / R e d + 1 ) sum variance S V = k = 2 2 N ( k S A ) 2 × P s u m ( k )
modified soil-adjusted vegetation index 2 M S A V I 2 = 2 × N I R + 1 ( 2 × N I R + 1 ) 2 8 ( N I R R e d ) 2 sum entropy S E = k = 2 2 N P s u m k × l o g ( P s u m k )
modified triangular vegetation index M T V I = 1.2 × [ 1.2 × ρ 800 ρ 500 2.5 × ρ 670 ρ 550 ] difference of entropies D O F = i = 1 N j = 1 N P i j × l o g ( P i j ) S E
modified triangular vegetation index 2 M T V I 2 = 1.5 [ 1.2 ( ρ 800 ρ 550 1.3 ρ 670 ρ 550 ] ( 2 × ρ 800 + 1 ) 2 6 × ρ 800 5 × ρ 670 0.5 difference of variances D O V = i = 1 N j = 1 N ( i j ) 2 P i j S V
non-linear index N L I = ( N I R 2 R e d ) / ( N I R 2 + R e d ) information measures of correlation IC1 I M C I C 1 = i = 1 N j = 1 N [ P i j × log P i j ] / [ 1 + ( i j ) 2 ]
normalized difference mud index N D M I = ( ρ 795 ρ 990 ) / ( ρ 795 + ρ 990 ) information measures of correlation IC2 I M C I C 2 = i = 1 N j = 1 N [ P i j × log 1 + i μ 2 ] / [ 1 + i j 2 ]
normalized vegetation index N D V I = N I R R e d / ( N I R + R e d ) short run emphasis S R E = i = 1 N j = 1 N P i j / [ 1 + i j 2 ]
optimized soil-adjusted vegetation index O S A V I = ( N I R R e d ) / ( N I R + R e d + 0.16 ) long run emphasis L R E = i = 1 N j = 1 N P i j × i j 2 / [ 1 + ( i j ) 2 ]
photochemical reflectance index P R I = ( ρ 531 ρ 570 ) / ( ρ 531 + ρ 570 ) gray-level nonuniformity G L N = i = 1 N ( j = 1 N P i j ) 2
plant attenuation index P S R I = ( ρ 680 ρ 500 ) / ρ 750 run length nonuniformity R L N = R u n ( j = 1 N P i j ) 2
red edge normalized vegetation index R E N D V I = ( ρ 750 ρ 705 ) / ( ρ 750 + ρ 705 ) run percentage R P = R u n ( j = 1 N P i j ) 2 / N 2
red edge position indexthe wavelength of the max reflectance derivative in the vegetation red edge region of the spectrum ranges from 690 to 740 nm.low gray-level run emphasis L G L R E = i = 1 N g j = 1 N s P i j i 2 / N 2
red-green ratio index R G R I = i = 600 699 R i / i = 500 599 R j high gray-level run emphasis H G L R E = i = 1 N g ( j = 1 N s P i j × i 2 ) / N 2
renormalized vegetation index R D V I = N I R R e d / N I R + R e d short run low gray-level emphasis S R L G L E = i = 1 N g j = 1 N s P i j / ( i 2 j 2 )
ratio index S R I = N I R / R e d short run high gray-level emphasis S R H G L E = i = 1 N g j = 1 N s P i j × i 2
soil-adjusted vegetation index S A V I = 1.5 N I R R e d / ( N I R + R e d + 0.5 ) long run low gray-level emphasis L R L G L E = i = 1 N g j = 1 N l P i j / i 2 j 2
Table 2. The calculation formulas of features derived from the UAV-based LiDAR data.
Table 2. The calculation formulas of features derived from the UAV-based LiDAR data.
Variable NameFormulaVariable NameFormula
mean absolute deviation of intensity i n t _ a a d = i = 1 n I i I ¯ n height cubic mean e l e v _ c u r t _ m e a n _ c u b e = i = 1 n Z i 3 n 3
coefficient of intensity variation i n t _ c v = I s t d I m e a n × 100 % height percentile interquartile range e l e v _ I Q = e l e v _ p e r c e n t i l e _ 75 e l e v _ p e r c e n t i l e _ 25
kurtosis of intensity i n t _ k u r t o s i s = 1 n 1 i = 1 n ( I i I ¯ ) 4 σ 4 height skewness e l e v _ s k e w n e s s = 1 n 1 i = 1 n ( Z i Z ¯ ) 3 σ 3
median absolute deviation median i n t _ m a d m e d i a n = m e d i a n ( | I i m e d i a n ( I ) | ) height standard deviation e l e v _ s t d d e v = i = 1 n ( Z i μ ) 2 n
maximum intensity value i n t _ m a x = m a x ( I ) height variance e l e v _ v a r i a n c e = i = 1 n ( Z i Z ¯ ) 2 n
minimum intensity value i n t _ m i n = m i n ( I ) first percentile cumulative height e l e v _ A I H _ x = 1 n i = 1 x % n Z i
in a given statistical unit, the normalized LiDAR point cloud is sorted by height, and the cumulative height of all points is calculated. The cumulative height at which x% of points within each statistical unit are located represents the cumulative height percentile of that unit.
mean intensity value i n t _ m e a n = i = 1 n I i n fifth percentile cumulative height
median intensity value i n t _ m e d i a n _ z = m e d i a n ( I ) 10th percentile cumulative height
skewness of intensity i n t _ s k e w n e s s = 1 n 1 i = 1 n ( I i I ¯ ) 3 σ 3 20th percentile cumulative height
intensity standard deviation i n t _ s t d d e v = i = 1 n ( I i μ ) 2 n 25th percentile cumulative height
intensity variance i n t _ v a r i a n c e = i = 1 n ( I i I ¯ ) 2 n 30th percentile cumulative height
intensity percentile quartile spacing i n t _ c o n _ i q = i n t _ p e r c e n t i l e _ 75 i n t _ p e r c e n t i l e _ 25 40th percentile cumulative height
first percentile cumulative intensity i n t _ A l l _ x = 1 n i = 1 x % n I i
in a specific statistical unit, the internally normalized LiDAR point cloud is sorted by intensity, and the cumulative intensity of all points is calculated. The cumulative intensity at which x% of points within each statistical unit are located represents the cumulative intensity percentile of that unit.
50th percentile cumulative height
fifth percentile cumulative intensity60th percentile cumulative height
10th percentile cumulative intensity70th percentile cumulative height
25th percentile cumulative intensity75th percentile cumulative height
30th percentile cumulative intensity80th percentile cumulative height
40th percentile cumulative intensity90th percentile cumulative height
50th percentile cumulative intensity95th percentile cumulative height
60th percentile cumulative intensity99th percentile cumulative height
70th percentile cumulative intensityheight density variable of the 0th slice d e n s i t y _ m e t r i c s [ x ]
slicing the point cloud data into ten equal-height layers from low to high, the proportion of returns in each layer represents the corresponding density variable.
75th percentile cumulative intensityheight density variable of the first slice
80th percentile cumulative intensityheight density variable of the second slice
90th percentile cumulative intensityheight density variable of the third slice
95th percentile cumulative intensityheight density variable of the fourth slice
99th percentile cumulative intensityheight density variable of the fifth slice
first intensity percentiles i n t _ p e r c e n t i l e _ x = p e r c e n t i l e ( I , x )
in a specific statistical unit, the internally normalized LiDAR point cloud is sorted by intensity, and then the intensity of x% of points within each statistical unit is calculated. This represents the intensity percentile of that unit.
height density variable of the sixth slice
fifth intensity percentilesheight density variable of the seventh slice
10th intensity percentilesheight density variable of the eighth slice
25th intensity percentilesheight density variable of the ninth slice
30th intensity percentilesfifth percentile height e l e v _ p e r c e n t i l e _ x = p e r c e n t i l e ( Z , x )
in a specific statistical unit, the internally normalized LiDAR point cloud is sorted by height, and then the height of x% of points within each statistical unit is calculated. This represents the height percentile of that unit.
40th intensity percentiles10th percentile height
50th intensity percentiles20th percentile height
60th intensity percentiles25th percentile height
70th intensity percentiles30th percentile height
75th intensity percentiles40th percentile height
80th intensity percentiles50th percentile height
90th intensity percentiles60th percentile height
95th intensity percentiles70th percentile height
99th intensity percentiles75th percentile height
height mean absolute deviation e l e v _ a a d _ z = i = 1 n Z i Z ¯ n 80th percentile height
height canopy relief ratio e l e v _ c a n o p y _ r e l i e f _ r a t i o = e l e v _ m e a n e l e v _ m i n e l e v _ m a x e l e v _ m i n 90th percentile height
cumulative height percentile interquartile range e l e v _ A I H _ I Q = e l e v _ p e r c e n t i l e _ 75 e l e v _ p e r c e n t i l e _ 25 95th percentile height
height coefficient of variation e l e v _ c v _ z = Z s t d Z m e a n × 100 % 99th percentile height
height kurtosis e l e v _ k u r t o s i s = 1 n 1 i = 1 n ( Z i Z ¯ ) 4 σ 4 gap ratio variable G a p F r a c t i o n = n g r o u n d n
median of the height median absolute deviation e l e v _ m a d m e d i a n = m e d i a n ( | Z i m e d i a n ( Z ) | ) height mean e l e v _ m e a n = i = 1 n Z i n
height maximum e l e v _ m a x = m a x ( Z ) height quadratic mean e l e v _ s q r t _ m e a n _ s q = i = 1 n Z i 2 n 2
height minimum e l e v _ m i n = m i n ( Z )
Note: I: point cloud intensity; Z: point cloud height; n: the number of points in a statistical unit.
Table 3. Hyperparameters and their value ranges for the multiple ML algorithms employed in this study for predicting soil properties.
Table 3. Hyperparameters and their value ranges for the multiple ML algorithms employed in this study for predicting soil properties.
ModelHyperparametersDefinitionDefined Parameters
PLSRn_componentsthe number of components2–20
RFn_estimatorsthe number of trees1–100
min_samples_splitthe minimum number of samples in an internal node1–10
criterionthe function to measure the quality of a splitmae, mse
AdaBoostlrlearning rate0.001–1
n_estimatorsthe number of components30–250
lossthe loss functionlinear, square, and exponential
GBDTlrlearning rate0.001–1
n_estimatorsthe number of components30–300
max_depththe depth of the tree3–20
lossthe loss functionls, lad, huber, and quantile
XGBoostlrlearning rate0.001–1
n_estimatorsthe number of components30–300
max_depththe depth of the tree3–20
min_child_weightthe minimum sum of weights of all observations2–10
MLPalphastrength of the regularization term0.0001–1
hidden_layer_sizesthe number of neurons in the hidden layer50–300
activationactivation function for the hidden layeridentity, logistic, tanh, and relu
solverthe solver for weight optimizationlbfgs, sgd, and adam
Note: PLSR, Partial Least Square Regression; RF, Random Forest; AdaBoost, Adaptive Boosting; GBDT, Gradient Boosting Decision Tree; XGBoost, Extreme Gradient Boosting; MLP, Multilayer Perceptron.
Table 4. Statistical descriptions of the pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP) levels.
Table 4. Statistical descriptions of the pH, soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP) levels.
Soil PropertiesSample NumberMinMaxMeanSDCV (%)
pH value1483.907.414.690.5211.08
SOC (%)1481.748.133.841.0928.40
TN (%)1480.060.540.170.0638.13
TP (ppm)14820.00380.00182.1376.0441.75
Note: CV = coefficient variation; SD = standard deviation.
Table 5. Performance of soil properties prediction of validation dataset using different numbers of LiDAR, HF, and LiDAR + HF features.
Table 5. Performance of soil properties prediction of validation dataset using different numbers of LiDAR, HF, and LiDAR + HF features.
Soil PropertyModel R2
NoF1234567891011121314151617181920
Features
pHPLSRHF0.03 0.03 0.03 0.04 0.04 0.04 0.03 0.04 0.04 0.04 0.05 0.05 0.05 0.05 0.06 0.08 0.08 0.08 0.09 0.09
LiDAR0.11 0.13 0.14 0.14 0.15 0.15 0.160.16 0.15 0.17 0.17 0.16 0.17 0.18 0.19 0.18 0.21 0.21 0.21 0.22
LiDAR + HF0.11 0.13 0.14 0.14 0.15 0.16 0.17 0.17 0.18 0.17 0.19 0.20 0.17 0.19 0.20 0.20 0.20 0.20 0.21 0.21
RFHF0.05 0.05 0.05 0.05 0.06 0.10 0.10 0.11 0.12 0.13 0.13 0.14 0.15 0.16 0.16 0.17 0.18 0.18 0.21 0.19
LiDAR0.11 0.14 0.14 0.15 0.14 0.15 0.15 0.13 0.16 0.170.18 0.19 0.21 0.24 0.24 0.25 0.24 0.24 0.24 0.25
LiDAR + HF0.10 0.17 0.23 0.21 0.18 0.18 0.22 0.25 0.20 0.19 0.21 0.24 0.25 0.28 0.27 0.32 0.29 0.27 0.26 0.29
AdaBoostHF0.12 0.14 0.16 0.15 0.13 0.14 0.17 0.16 0.16 0.17 0.17 0.22 0.25 0.24 0.26 0.26 0.25 0.27 0.27 0.27
LiDAR0.18 0.19 0.21 0.23 0.21 0.21 0.19 0.25 0.29 0.25 0.24 0.31 0.27 0.33 0.37 0.41 0.38 0.37 0.40 0.43
LiDAR + HF0.18 0.19 0.23 0.29 0.28 0.31 0.34 0.19 0.34 0.34 0.28 0.29 0.33 0.39 0.42 0.43 0.36 0.43 0.48 0.39
GBDTHF0.07 0.13 0.13 0.13 0.13 0.14 0.18 0.15 0.17 0.17 0.21 0.22 0.24 0.21 0.23 0.23 0.24 0.24 0.25 0.25
LiDAR0.14 0.15 0.18 0.19 0.22 0.21 0.32 0.35 0.36 0.32 0.35 0.42 0.37 0.39 0.42 0.41 0.42 0.43 0.42 0.43
LiDAR + HF0.15 0.15 0.17 0.21 0.22 0.27 0.31 0.37 0.35 0.36 0.44 0.47 0.47 0.48 0.49 0.49 0.44 0.49 0.49 0.50
XGBoostHF0.04 0.04 0.04 0.05 0.06 0.06 0.09 0.07 0.09 0.10 0.13 0.14 0.14 0.14 0.15 0.15 0.14 0.14 0.15 0.16
LiDAR0.15 0.17 0.22 0.25 0.27 0.27 0.28 0.28 0.29 0.28 0.29 0.27 0.26 0.27 0.31 0.28 0.29 0.31 0.32 0.32
LiDAR + HF0.17 0.17 0.22 0.24 0.28 0.27 0.28 0.28 0.29 0.32 0.33 0.37 0.37 0.36 0.36 0.36 0.37 0.37 0.37 0.38
MLPHF0.030.060.070.070.090.110.120.130.140.160.170.190.200.190.190.190.180.190.190.20
LiDAR0.140.210.250.260.240.260.300.270.290.330.340.340.350.350.360.360.350.350.360.36
LiDAR + HF0.140.210.270.330.350.350.360.370.370.380.390.410.440.430.430.440.450.440.450.45
SOCPLSRHF0.04 0.07 0.08 0.09 0.10 0.10 0.11 0.11 0.11 0.11 0.12 0.12 0.12 0.13 0.13 0.13 0.14 0.14 0.14 0.14
LiDAR0.12 0.14 0.14 0.16 0.16 0.17 0.17 0.15 0.18 0.18 0.20 0.20 0.21 0.21 0.22 0.22 0.22 0.22 0.22 0.22
LiDAR + HF0.12 0.14 0.15 0.16 0.16 0.17 0.16 0.17 0.19 0.20 0.20 0.21 0.22 0.22 0.22 0.22 0.22 0.22 0.23 0.23
RFHF0.05 0.06 0.06 0.07 0.10 0.09 0.12 0.07 0.08 0.15 0.16 0.13 0.15 0.15 0.15 0.15 0.16 0.16 0.16 0.17
LiDAR0.16 0.20 0.23 0.23 0.24 0.26 0.26 0.26 0.26 0.26 0.26 0.27 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26
LiDAR + HF0.16 0.20 0.23 0.23 0.25 0.25 0.26 0.26 0.27 0.26 0.27 0.28 0.28 0.29 0.29 0.30 0.30 0.31 0.31 0.31
AdaBoostHF0.07 0.11 0.08 0.09 0.09 0.10 0.09 0.10 0.10 0.11 0.11 0.12 0.12 0.12 0.14 0.14 0.14 0.13 0.14 0.16
LiDAR0.21 0.28 0.29 0.30 0.30 0.31 0.32 0.30 0.33 0.36 0.37 0.38 0.38 0.39 0.39 0.41 0.41 0.40 0.38 0.42
LiDAR + HF0.23 0.29 0.32 0.31 0.32 0.34 0.33 0.39 0.38 0.39 0.41 0.40 0.40 0.43 0.43 0.41 0.42 0.44 0.43 0.44
GBDTHF0.11 0.13 0.13 0.14 0.14 0.12 0.13 0.14 0.15 0.17 0.15 0.18 0.17 0.17 0.17 0.17 0.17 0.16 0.17 0.17
LiDAR0.31 0.31 0.34 0.34 0.34 0.35 0.37 0.37 0.37 0.37 0.37 0.37 0.40 0.47 0.44 0.46 0.45 0.48 0.48 0.46
LiDAR + HF0.24 0.25 0.28 0.32 0.33 0.37 0.37 0.37 0.40 0.44 0.45 0.45 0.45 0.44 0.44 0.44 0.45 0.45 0.45 0.45
XGBoostHF0.15 0.18 0.19 0.20 0.22 0.22 0.23 0.23 0.23 0.22 0.23 0.23 0.24 0.24 0.25 0.25 0.26 0.25 0.25 0.25
LiDAR0.26 0.29 0.31 0.35 0.37 0.36 0.37 0.37 0.39 0.39 0.38 0.39 0.41 0.37 0.39 0.39 0.40 0.42 0.41 0.41
LiDAR + HF0.23 0.31 0.34 0.36 0.37 0.38 0.39 0.37 0.42 0.46 0.46 0.42 0.44 0.43 0.46 0.45 0.47 0.47 0.46 0.47
MLPHF0.090.110.110.130.120.130.110.140.130.140.150.140.150.150.150.160.150.160.160.16
LiDAR0.210.290.330.300.310.350.360.370.370.380.390.390.390.390.390.390.390.400.390.40
LiDAR + HF0.270.290.330.340.390.370.380.370.380.390.390.390.400.420.410.410.430.410.410.42
TNPLSRHF0.13 0.15 0.18 0.18 0.20 0.21 0.22 0.22 0.22 0.23 0.23 0.24 0.24 0.24 0.24 0.24 0.25 0.25 0.25 0.25
LiDAR0.21 0.25 0.26 0.26 0.27 0.28 0.28 0.29 0.29 0.29 0.31 0.32 0.31 0.31 0.31 0.31 0.32 0.32 0.32 0.32
LiDAR + HF0.19 0.24 0.27 0.28 0.29 0.28 0.29 0.29 0.30 0.29 0.33 0.34 0.34 0.33 0.34 0.32 0.32 0.33 0.33 0.34
RFHF0.21 0.22 0.22 0.22 0.22 0.23 0.23 0.22 0.23 0.24 0.26 0.27 0.27 0.27 0.29 0.29 0.28 0.26 0.29 0.29
LiDAR0.40 0.45 0.48 0.49 0.51 0.49 0.50 0.51 0.51 0.53 0.53 0.53 0.54 0.54 0.54 0.56 0.57 0.57 0.59 0.57
LiDAR + HF0.51 0.55 0.56 0.56 0.56 0.54 0.57 0.57 0.56 0.56 0.56 0.56 0.56 0.57 0.58 0.58 0.58 0.58 0.58 0.58
AdaBoostHF0.07 0.07 0.07 0.06 0.06 0.10 0.09 0.09 0.09 0.09 0.10 0.10 0.11 0.11 0.11 0.11 0.11 0.12 0.12 0.12
LiDAR0.26 0.26 0.29 0.29 0.28 0.32 0.34 0.33 0.32 0.30 0.31 0.33 0.33 0.33 0.33 0.35 0.33 0.33 0.35 0.32
LiDAR + HF0.38 0.45 0.45 0.45 0.45 0.45 0.45 0.47 0.49 0.50 0.48 0.51 0.50 0.51 0.51 0.49 0.52 0.50 0.53 0.52
GBDTHF0.14 0.14 0.16 0.13 0.14 0.16 0.17 0.18 0.18 0.18 0.19 0.21 0.21 0.20 0.21 0.20 0.25 0.20 0.22 0.22
LiDAR0.40 0.37 0.37 0.37 0.37 0.38 0.39 0.42 0.41 0.45 0.44 0.41 0.43 0.43 0.43 0.47 0.44 0.46 0.44 0.45
LiDAR + HF0.40 0.48 0.51 0.48 0.51 0.52 0.55 0.56 0.56 0.57 0.57 0.56 0.59 0.59 0.60 0.60 0.59 0.60 0.60 0.60
XGBoostHF0.09 0.06 0.06 0.12 0.11 0.12 0.11 0.11 0.11 0.15 0.08 0.07 0.06 0.08 0.11 0.10 0.11 0.13 0.13 0.14
LiDAR0.25 0.35 0.33 0.33 0.33 0.39 0.37 0.43 0.44 0.47 0.46 0.46 0.50 0.50 0.50 0.49 0.50 0.49 0.49 0.49
LiDAR + HF0.25 0.38 0.36 0.43 0.43 0.48 0.48 0.45 0.46 0.50 0.51 0.48 0.52 0.56 0.55 0.55 0.55 0.52 0.56 0.56
MLPHF0.160.180.180.180.180.190.190.210.210.200.210.210.220.220.220.220.210.220.230.22
LiDAR0.280.370.430.440.460.470.460.470.480.480.480.480.490.490.490.490.500.500.500.50
LiDAR + HF0.280.370.430.400.450.470.450.490.470.500.520.460.480.520.530.530.530.530.540.53
TPPLSRHF0.10 0.15 0.15 0.16 0.16 0.16 0.17 0.17 0.17 0.17 0.16 0.16 0.16 0.15 0.15 0.16 0.17 0.17 0.17 0.18
LiDAR0.16 0.26 0.24 0.25 0.25 0.24 0.26 0.26 0.27 0.26 0.25 0.25 0.26 0.27 0.27 0.27 0.27 0.27 0.27 0.27
LiDAR + HF0.16 0.26 0.24 0.26 0.25 0.22 0.25 0.26 0.22 0.27 0.27 0.29 0.29 0.28 0.29 0.29 0.29 0.30 0.30 0.30
RFHF0.11 0.15 0.15 0.16 0.16 0.16 0.16 0.15 0.15 0.16 0.16 0.16 0.17 0.17 0.17 0.17 0.18 0.16 0.18 0.18
LiDAR0.17 0.22 0.24 0.21 0.28 0.28 0.27 0.24 0.29 0.24 0.27 0.29 0.28 0.28 0.29 0.32 0.32 0.32 0.32 0.33
LiDAR + HF0.18 0.20 0.21 0.26 0.26 0.23 0.26 0.29 0.31 0.33 0.33 0.35 0.33 0.37 0.37 0.37 0.38 0.38 0.39 0.39
AdaBoostHF0.11 0.11 0.13 0.13 0.13 0.13 0.12 0.14 0.14 0.14 0.14 0.15 0.15 0.16 0.17 0.17 0.17 0.18 0.18 0.18
LiDAR0.24 0.25 0.24 0.26 0.26 0.26 0.26 0.26 0.25 0.27 0.28 0.29 0.32 0.32 0.32 0.32 0.32 0.32 0.34 0.33
LiDAR + HF0.20 0.22 0.23 0.26 0.31 0.34 0.32 0.30 0.32 0.36 0.34 0.32 0.35 0.34 0.34 0.34 0.35 0.35 0.35 0.35
GBDTHF0.11 0.13 0.15 0.15 0.14 0.16 0.17 0.16 0.18 0.18 0.18 0.18 0.18 0.19 0.20 0.20 0.19 0.21 0.21 0.20
LiDAR0.19 0.21 0.23 0.25 0.26 0.26 0.28 0.28 0.28 0.28 0.28 0.29 0.30 0.32 0.33 0.32 0.33 0.33 0.33 0.33
LiDAR + HF0.23 0.23 0.24 0.25 0.24 0.26 0.29 0.30 0.34 0.34 0.35 0.36 0.38 0.36 0.37 0.37 0.37 0.38 0.38 0.38
XGBoostHF0.11 0.12 0.15 0.15 0.16 0.16 0.17 0.18 0.21 0.18 0.21 0.22 0.22 0.21 0.23 0.23 0.23 0.23 0.24 0.23
LiDAR0.27 0.32 0.32 0.33 0.33 0.33 0.33 0.33 0.34 0.32 0.34 0.33 0.34 0.34 0.34 0.34 0.34 0.35 0.34 0.35
LiDAR + HF0.28 0.30 0.29 0.35 0.36 0.36 0.36 0.37 0.39 0.39 0.39 0.38 0.38 0.38 0.39 0.39 0.39 0.39 0.37 0.40
MLPHF0.110.120.140.150.150.160.160.170.180.180.190.190.190.190.190.190.190.180.180.19
LiDAR0.160.210.240.270.280.290.290.300.310.310.320.330.330.330.330.330.330.320.330.33
LiDAR + HF0.160.230.270.270.250.290.290.300.310.310.290.330.330.360.360.360.360.350.360.36
Note: HF, vegetation indices, and texture features derived from hyperspectral images; NoF, number of features.
Table 6. The optimal hyperparameter combinations and accuracy assessment for each model by using the Bayesian optimization algorithm and default parameters tuning.
Table 6. The optimal hyperparameter combinations and accuracy assessment for each model by using the Bayesian optimization algorithm and default parameters tuning.
Soil PropertyModel HyperparametersR2IncreasingRMSERPDTime Cost
pHPLSR n_components
default20.1181.82%0.471.07
BOA50.200.451.13123 s
RF n_estimatorsmin_samples_splitcriterion
default1002mse0.37−27.03%0.421.20
BOA1525mae0.270.411.33316 s
AdaBoost Lrn_estimatorsloss
default150linear0.06600.00%0.520.98
BOA0.00130exponential0.420.381.33222 s
GBDT Lrn_estimatorsmax_depthloss
default0.11003ls0.08512.50%1.500.59
BOA0.0359018huber0.490.361.41314 s
XGBoost Lrn_estimatorsmax_depthmin_child_weight
default0.1100310.3020.00%0.421.21
BOA0.005244340.360.401.26190 s
MLP alphahidden_layer_sizesactivationsolver
default0.0001100reluadam0.3330.30%0.411.23
BOA0.002246logisticsgd0.430.381.35233 s
SOCPLSR n_components
default20.220.00%0.771.15
BOA20.220.771.15119 s
RF n_estimatorsmin_samples_splitcriterion
default1002mse0.283.57%0.741.19
BOA1331mae0.290.741.20196 s
AdaBoost Lrn_estimatorsloss
default150linear0.2853.57%0.741.19
BOA0.00281exponential0.430.661.34218 s
GBDT Lrn_estimatorsmax_depthloss
default0.11003ls0.21109.52%0.781.14
BOA0.2911257lad0.440.661.37314 s
XGBoost Lrn_estimatorsmax_depthmin_child_weight
default0.1100310.15206.67%0.801.10
BOA0.021184360.460.631.43189 s
MLP alphahidden_layer_sizesactivationsolver
default0.0001100reluadam0.09355.56%0.921.07
BOA0.1178logisticlbfgs0.410.741.33233 s
TNPLSR n_components
default20.340.00%0.041.24
BOA70.340.041.24109 s
RF n_estimatorsmin_samples_splitcriterion
default1002mse0.547.41%0.041.49
BOA701mae0.580.041.55197 s
AdaBoost Lrn_estimatorsloss
default150linear0.502.00%0.041.44
BOA0.013104exponential0.510.041.43218 s
GBDT Lrn_estimatorsmax_depthloss
default0.11003ls0.4436.36%0.041.35
BOA0.0331477lad0.600.041.60212 s
XGBoost Lrn_estimatorsmax_depthmin_child_weight
default0.1100310.58−5.17%0.041.56
BOA0.02911520100.550.041.50251 s
MLP alphahidden_layer_sizesactivationsolver
default0.0001100reluadam0.4712.77%0.041.39
BOA0.0001149logisticlbfgs0.530.041.46233 s
TPPLSR n_components
default20.290.00%71.071.20
BOA20.2971.071.20104 s
RF n_estimatorsmin_samples_splitcriterion
default1002mse0.3023.33%59.581.20
BOA1726mse0.3756.731.27281 s
AdaBoost Lrn_estimatorsloss
default150linear0.2536.00%74.511.17
BOA0.171113exponential0.3459.411.25180 s
GBDT Lrn_estimatorsmax_depthloss
default0.11003ls0.2832.14%73.121.18
BOA0.147843ls0.3756.731.27321 s
XGBoost Lrn_estimatorsmax_depthmin_child_weight
default0.1100310.2095.00%76.111.13
BOA0.028158350.3951.421.28181 s
MLP alphahidden_layer_sizesactivationsolver
default0.0001100reluadam0.2544.00%74.431.17
BOA0.01485logisticsgd0.3657.961.26233 s
Note: Lr, learning rate; n_estimators, estimator number; Loss, loss function; min_samples_split, the minimum number of samples required to split an internal node; max_depth, maximum depth of the individual regression estimators; min_child_weight, minimum sum of instance weight needed in a child.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Shi, T.; Li, Q.; Yang, C.; Wang, Z.; Chen, Z.; Pan, X. Mapping Soil Properties in Tropical Rainforest Regions Using Integrated UAV-Based Hyperspectral Images and LiDAR Points. Forests 2024, 15, 2222. https://doi.org/10.3390/f15122222

AMA Style

Chen Y, Shi T, Li Q, Yang C, Wang Z, Chen Z, Pan X. Mapping Soil Properties in Tropical Rainforest Regions Using Integrated UAV-Based Hyperspectral Images and LiDAR Points. Forests. 2024; 15(12):2222. https://doi.org/10.3390/f15122222

Chicago/Turabian Style

Chen, Yiqing, Tiezhu Shi, Qipei Li, Chao Yang, Zhensheng Wang, Zongzhu Chen, and Xiaoyan Pan. 2024. "Mapping Soil Properties in Tropical Rainforest Regions Using Integrated UAV-Based Hyperspectral Images and LiDAR Points" Forests 15, no. 12: 2222. https://doi.org/10.3390/f15122222

APA Style

Chen, Y., Shi, T., Li, Q., Yang, C., Wang, Z., Chen, Z., & Pan, X. (2024). Mapping Soil Properties in Tropical Rainforest Regions Using Integrated UAV-Based Hyperspectral Images and LiDAR Points. Forests, 15(12), 2222. https://doi.org/10.3390/f15122222

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop