1. Introduction
Forest play an important role in the global carbon cycle [
1]. Deforestation and forest degradation impact the estimation of anthropogenic greenhouse gas emissions and carbon stock in forests [
2,
3]. Forest carbon stocks that change dynamically over time can be monitored through regular mapping of the total forest aboveground biomass (AGB, in Mg·ha
−1) [
3,
4,
5]. Therefore, mapping the magnitude and spatial distribution of forest AGB is necessary for improving estimates of terrestrial carbon sources and sinks. The traditional methods to estimate forest AGB is based on field measurements or long-term forest inventories; these methods can obtain good AGB estimation. National inventories (e.g., National Forest Resource Inventory Program of the State Forestry Administration, China, or the U.S. Forest Service Inventory and Analysis Program) provided the most detailed field estimates of forest AGB at the national scale [
6,
7]. However, the lack of field measurements in remote areas and the inconsistency of data requirements among different administrative units are the major constraints to obtain sufficient regional AGB estimation using field-based methods [
5,
8]. In addition, obtaining comprehensive, spatially-complete, temporally uniform, and accurate forest inventory data is usually time-consuming and labor-intensive over very large areas. Reducing the uncertainty in the AGB estimates requires spatially-continuous observations that are fine enough to capture the variability over a landscape that may undergo natural disturbances or landuse changes [
5,
9].
Remote sensing has the capability to map forest AGB over wide geographical areas. However, as no remote sensor has been developed that is capable of providing direct measurement of AGB, additional field-sampled AGB is required to correlate with canopy reflectance measured by passive optical sensors or backscatter intensity from Synthetic Aperture Radar (SAR) sensors from moderate to fine spatial resolution [
10]. Optical sensor data are appropriate for the retrieval of forest horizontal structures, such as forest types and canopy cover, due to its spectral sensitivity to different species [
11]. Furthermore, satellite-based optical sensors are perceptive to other forest structural variables, such as species abundance [
12], basal area [
13], stem density [
13], and crown size [
14], which are correlated to some degree with AGB because forest spectral reflectance contains information on the vegetation chlorophyll absorption bands in the visible region and sustained high reflectance in the near-infrared region [
15]. Previous studies have demonstrated the sensitivity of visible and shortwave infrared wavelengths to AGB [
4,
16,
17,
18]. In addition to the usage of single-band spectral signatures, various vegetation indices (VIs) derived from TM (Thematic Mapper), ETM+ (Enhanced Thematic Mapper Plus), MODIS (Moderate Resolution Imaging Spectroradiometer), etc., as primary data sources have been used to estimate forest AGB in different regions, such as tropical America and Asia, Central Europe, East Africa, the U.S., and China [
16,
18,
19,
20,
21]. Moreover, some optical sensor data, such as IKONOS, ALOS/PRISM (Advanced Land Observation Satellite/Panchromatic Remote-sensing Instrument for Stereo Mapping), and ZY-3 (Zi Yuan 3), provide a stereo-imaging capability that can be used to predict forest canopy structure [
22,
23,
24]. Although passive optical remote sensing is widely used for AGB estimation, frequent cloud coverage in mountains and moist regions, and the low saturation level of spectral signatures on AGB estimation in medium to high AGB forest (i.e., the spectral reflectance or vegetation index of optical remote-sensing sensors are of limited value in medium to high biomass forests) are the major disadvantages of this technology [
11,
25,
26].
Recently, much attention has been paid to the use of LiDAR (Light Detection And Ranging) remote sensing in AGB estimation due to its potential to capture the vertical structure of vegetation in great detail and it is the only sensor type available at present whose signal does not saturate in high AGB forests (e.g., 1200 Mg·ha
−1 and 1300 Mg·ha
−1) [
27,
28,
29,
30]. LiDAR footprints may be used as a manner of sampling similar to field plots, wherein the derived AGB is used to integrate optical remotely-sensed data in order to facilitate stratification, thereby extending plot-level AGB to large areas [
31]. Airborne LiDAR provides highly accurate AGB estimates [
32,
33,
34], but the associated large data volume, as well as the sophisticated technical equipment and high acquisition costs to observe remote areas, usually limit its application at regional and global scales [
16]. The Geoscience Laser Altimeter System (GLAS) sensor onboard the Ice, Cloud, and land Elevation satellite(ICESAT) satellite provided data freely and has proven itself suitable for AGB and canopy height estimation over continental-to-global areas [
3,
4,
35,
36,
37]. One major limitation of GLAS was the lack of imaging capability and the fact that it provided relatively sparse sampling information on forest structures [
38]. Therefore, GLAS data are often integrated with imaging optical systems to estimate forest structural variables at the regional scale [
39,
40].
Due to the concerns of global climate change, as well as large swaths and daily availability of moderate spatial resolution sensors data (e.g., MODIS), AGB estimation that integrated LiDAR with MODIS over continental-to-regional scales has gained increasing attention in the last few years [
3,
4,
8,
41]. However, moderate spatial resolution data loses spatial detail of AGB variability, and it is often difficult to associate field data with satellite observations because the inconsistencies in spatial resolutions result in a mismatch between filed plots and remotely-sensed imagery [
11,
42]. While medium resolution data, such as TM/ETM+, provides spatial detail compatible with the size of vegetation units and AGB field measurements [
16], there are increasing demands and attempts for estimating AGB at finer resolution from medium-resolution sensors [
43]. The integration of Landsat and GLAS data for large-area applications has become more practical with the free availability of their archives and the capability of near-global coverage [
44]. Duncanson et al. [
44] combined Landsat/TM and spaceborne LiDAR to estimate AGB in South-central Canada, and found that the data integration is most useful for forests with an AGB less than 120 Mg·ha
−1, an age less than 60 years, and a canopy cover less than 60%. However, a generic GLAS-based AGB model was developed from field plot data and GLAS waveform metrics, regardless of forest types. Moreover, an AGB map generated from the model that was developed from field measurements and a serial of airborne LiDAR metrics was designed to provide a reference for validation of AGB estimates. It was inevitable to introduce potential errors when considering additional processing procedures of airborne LiDAR and the mismatch between airborne LiDAR-derived AGB and GLAS-derived AGB in spatial resolution. Zhang et al. [
5] presented a simple parametric model that integrated leaf area index (LAI) estimates from Landsat and canopy height from GLAS for conifer-dominant forests of California. The relationships between the GLAS-derived maximum canopy height and Landsat-derived LAI were modeled using a linear model, which was based on the assumption that the power law between LAI and the maximum tree height is a first order approximation. In fact, the relationship between the two parameters was not linear, which would introduce uncertainty into the estimation of the maximum tree height, which was used for AGB estimation. Li et al. [
45] described a Landsat-lidar fusion approach for modeling canopy heights of young forests by integrating historical Landsat imagery with GLAS data. They used two methods to explore the relationships between forest height and predictor, including stepwise linear regression (SLR) and regression tree (RT). The RT models yielded substantially lower RMSD (root mean square of the difference) than the SLR models when use three different variables groups. The best RT model was developed when forest age metrics, NDVI metric, NBRI (normalized burn ratio index) metrics and IFZ (integrated forest z-score) metrics (that is an index as a measure of forest likelihood) were used together. Although RT can use multiple linear equations to approximate nonlinear relationships, the substance of the final models in this study had only two linear regression equations. Obviously, the linear regression models will show more overfitting with the increase of independent variables. Dolan et al. [
46] combined forest age information derived from Landsat data with canopy height data from GLAS to quantify rates of forest growth in the eastern United States. They regressed the GLAS-derived heights against the age of last disturbance in linear models to yield vertical growth rates. The growth rates that combined with height-biomass allometric relations can be converted to estimates of AGB. Due to lack of field measurements of AGB co-located with GLAS footprints, they used age (or growth rates) as an intermediate variable to convert GLAS-derived height to AGB at landscape-scale (the first step: GLAS-derived height to AGB at footprint-scale; the second step: linking relationships between GLAS-derived AGB and Landsat-derived age; and, the third step: extrapolating GLAS-derived AGB to landscape-scale using the relationships in the second step). The used allometric relations between AGB and GLAS-derive height was derived from mixed forest region. Helmer et al. [
47] used a method similar with Dolan et al. [
46] to estimate AGB of old-growth in Brazil. They also used age as an intermediate variable to convert GLAS-derived heights to AGB at landscape-scale. Be different from the other research, the used allometric relations between AGB and GLAS-derived height was derived from evergreen broadleaf forest region. In summary, to the best of our knowledge, few published studies have explored forest-type-specific AGB estimation integrated spaceborne LiDAR and Landsat data at the regional level. In these studies, both the forest-type-specific prediction models of AGB and the validation of AGB estimates need to be improved. Furthermore, the performance (importance) of remotely-sensed variables used to estimate forest AGB was not evaluated.
Changbai Mountain, in Northeast China, is well known for its contiguous biodiversity-rich, intact terrestrial ecosystems, including Korean pine-broadleaf mixed forest, coniferous forest (evergreen and deciduous), Erman birch forest, and tundra [
48]. Almost all vegetation types from temperate to Arctic areas can be found in the region. However, the AGB of the forest and the carbon storage is still unclear in the Changbai Mountain region. The main aims of the study were to:(1) clarify the characteristics of altitudinal forest zones in the Changbai Mountain region; (2) estimate a continuous AGB map from nonparametric models; and (3) determine which remotely-sensed variables have the largest influence on AGB estimation of four forest types. For these purposes, firstly, models to predict AGB at the GLAS footprint level were developed from GLAS waveform metrics and the AGB derived from field observation. Then, a nonparametric model integrated GLAS-derived AGB with Landsat-derived variables, as well as DEM data, was implemented to estimate AGB in the Changbai Mountain region. Finally, the contribution of remotely-sensed variables in modeling AGB was evaluated in different AGB ranges of four forest types.
2. Materials and Methods
2.1. Study Area
Changbai Mountain (Changbaishan or Changbai Shan in Chinese, Mount Baekdu in North Korean, or Paektu-san in South Korean) lies in the border region of China and North Korea and is the highest mountain in the Eastern Eurasian continent [
48]. The study site covers the Changbai Mountain Natural Reserve of China and its surrounding area. This region is well known for its contiguous biodiversity-rich, intact, terrestrial ecosystems and altitudinal vegetation zones [
48,
49]. The area is characterized by a temperate continental mountain climate with four distinct seasons affected by monsoons. The mean annual temperature is ~3.3°C, and the mean annual precipitation is ~672 mm [
50]. The dominant species are Korean pine (
Pinus koraiensis), Dahurian larch (
Larixg melinii), Manchurian fir(
Abies nephrolepis), Amur linden(
Tilia amurensis), Manchurian ash (
Fraxinus mandshurica), Mongolian oak (
Quercus mongolica), Maple (
Acer mono), and Erman’s birch (
Betula ermanii).
2.2. Field Data
Field survey data spanning four forest types were obtained from two sources: national forest inventory data and field measurements, as listed in
Table 1. A database of forest inventory data was assembled from the National Forest Resource Inventory Program of the State Forestry Administration, China (NFRI). The investigation of NFRI is based on the forest management unit (state-owned forest farms, nature reserves, forest parks, etc.) or county-level administrative areas. The purpose of the investigation is to meet the needs of the forest management program, overall design, forestry division, and planning. The field survey is developed from forest stands. A stand is a contiguous area that contains a community of trees that are relatively homogeneous or have a common set of characteristics (e.g., age and size class distribution, composition, structure, or spatial arrangement) [
51]. The field investigation of a stand includes records of stand area, forest types, dominant species, stand density, canopy cover, average stand height, and DBH, etc. Biomass of a forest stand is the summation of individual tree biomass per unit area. Forest stands were varied from ~0.003 ha to ~90 ha in size. The survey results of NFRI data are shown in the form of forest inventory maps (the maps used in the study are illustrated in
Figure 1, ~117 thousand hectares). Forest inventory maps are usually developed using a “classify and assign” method which provides an average AGB per stand usually based on a classification of optical data and topographic maps. The AGB of a forest stand is the summation of the individual tree AGB per unit area, i.e., stand density multiplied by single-tree AGB. Species-specific allometric equations (see in the next paragraph) were employed to estimate the single-tree AGB. GLAS footprints that were dropped in to forest stands maps were overlaid onto forest stands to derive the AGB of these footprints.
The geographic coordinates of 40 field plots were co-located with GLAS footprints. The other 18 field plots that were not co-located with footprints were intended to obtain as wide a range of AGB as possible. The total of 58 field observations were implemented in September 2006 and June 2007. The field measurement scheme employed a 7.5 m radius central plot located on the center of a GLAS footprint, plus three replicate plots located 20 m away from the center plot at azimuths of 120°, 240°, and 360°. The species, canopy height, and DBH of all trees ≥5 cm were recorded in each plot. Allometric equations of dominant species were used to estimate the single-tree AGB in all plots [
52,
53]. AGB estimation for most species were based on DBH-only allometric equations, except for spruce (
Picea koraiensis), which used DBH-H combined equations. All of the allometric equations were developed in northeastern China. Then, the AGB of all plots could be estimated using the method mentioned above. In addition, LAI was measured on fine days with few clouds during this period. The sample plots were identified as 40 m × 40 m square areas (larger than a TM pixel) to allow for the potential error of the global positioning system. A center point together with four other points, which were located 10 m away from the center point in the north, south, west and south, were measured in each sample plot. Each sample plot was measured 15 times (three times for each point) using an LAI-2000 (LI-Cor Inc., Lincoln, NE, USA), and the average LAI was derived and used LAI for validation.
2.3. GLAS Data and Its Processing
GLAS was an active LiDAR sensor with 1064-nm laser pulses operating at 40 Hz and recorded the echo of those pulses from an elliptical footprint of ~65 m diameter, spaced 172 m apart [
54]. GLAS carried three laser altimeters, named as L1, L2, and L3. L3 worked from 3 October2004 to 19 October 2008 (L3A–L3K, [
55]).GLAS had a total of 18 operational periods during its six-year mission. The data acquired during the L3C and L3F periods were usedin this research.The L3C and L3F datasets were acquired using the L3 laser from 20 May to 23 June 2005, and from 24 May to 26 June 2006, respectively. In Changbai Mountain region, forests were in the leaf-on growing season during the acquiring dates of L3C and L3F.
The ICESat program provides 15 data products (GLA01 to GLA15). Two of them, GLA01 (waveform data) and GLA14 (global land surface altimetry data) of version 33, were employed in the study. GLA01 products provide the raw waveform for each laser. The waveforms were recorded in 544 bins with a bin size of 1 ns or 15 cm for the land surface. GLA14 data provide surface elevation, latitude and longitude of the footprints, acquisition time, laser range offsets for the signal beginning and end, location, amplitude, and width of up to six fitted Gaussian peaks. Using the record index and shot number, the location of each GLAS footprint from GLA14 data was related to the individual waveform extracted from GLA01 data.
Since waveforms are greatly affected by atmospheric scattering and signal saturation, the cloud-contaminated and abnormal waveforms where identified and discarded before extracting waveform variables. Cloud-free waveforms were chosen using the cloud detection flag (FRir_qaFlag = 15 in the GLA14 product). Saturated signals were identified using the GLAS flag (SatNdx> 0). A maximum tree height of 40 m was set for filtering GLAS waveform data because trees higher than 35 m are very rare in the Changbai Mountain region. Since each 65-m diameter footprint is likely to intersect with as many as nine TM pixels, to reduce the impact of potential errors that are caused by geographical position mismatch between the TM image and GLAS data, as well as heterogeneous land cover within nine TM pixels, GLAS footprints are retained when all nine of the TM pixels are delineated as a 100% homogeneous forest cover (i.e., only GLAS footprints fell within a 100% homogeneous forest cover) in the land cover map derived from Landsat data. After screening, there were 16,114 GLAS shots left for extracting waveform variables.
Identifying the position of the signal beginning and the signal end was crucial for extracting waveform variables from waveforms. We, first, estimated the noise levels before the signal beginning and after the signal end from the raw waveform separately using a method based on the histogram of the waveform bins [
38]. Then, waveforms were smoothed using a Gaussian filter with a width similar to the transmitted laser pulse, and the signal beginning and end were identified using a noise threshold [
38]. The signal beginning was identified by searching downwards from the highest bin of the waveform until the location where the return signal was larger than three standard deviations of the estimated noise level. Searching backward from the signal ending, if the first peak was more significant than the others, i.e., the distance from signal end to the peak was notably larger than the half width of the transmitted laser pulse, this peak was considered as the ground peak. The total waveform energy was calculated by summing all of the return energy from the signal beginning to end. Starting from the signal end, the positions of 25% (h25), 50% (h50), and 75% (h75) of the accumulated return energy were located [
38]. In addition, the heights at which the waveform energy reached 10% (h10), 20% (h20), …, 90% (h90) of the total energy were also calculated. Finally, the GLAS data (referenced to TOPEX/Poseidon) were converted to WGS84. We used method described by Sun et al. [
38] to develop AGB estimates because the method was appropriate for AGB estimation at GLAS footprint-level. The GLAS waveform variables, along with other GLAS metrics from waveforms that captured the characteristics of the canopy structure used in the study, are given in detail in
Table 2. Furthermore, as the terrain is complex in the Changbai Mountain region, the waveform variables usually used in flat areas were not adequate to obtain AGB estimates with high accuracy. Several variables (e.g., leading edge extent, trailing edge extent [
56], and h
slope terrain index [
8]) that were sensitive to topography were used for estimating the AGB in GLAS footprints.
2.4. Landsat Data and Its Processing
Due to frequent cloud cover, we compiled a mosaic of two cloudless Landsat-5 TM images (path 116/row 30 and path 116/row 31) for two different years (23 September 2004 and 2 October 2007) from United States Geological Survey [
58] during the peak growing season in Changbai Mountain. Landsat ETM+ data for the time of acquisition lacked high quality because the scan line corrector (SLC) failed on 31 May 2003. All four of the images were calibrated to top-of-atmosphere reflectance using calibration coefficients (e.g., gains and offsets) provided by USGS. Then, a MOTRAN4-based algorithm, which was embedded in the ENVI/FLAASH module (ITT Visual Information Solutions, 2009), was applied to the atmospheric correction for Landsat images in order to obtain the surface reflectance. Some input parameters were set, such as the scene center location, sensor type and altitude, date and time of the acquired image, ground weather conditions observed on the day, etc. All of four Landsat images were co-registered using ground control points to attain improved geometric accuracy, while topographic correction was implemented for all images using a C-correction method, which was demonstrated to be appropriate for Landsat images [
59].
We used two temporal Landsat images and selected the image acquired in 2007 as the base image because: (1) most deciduous forests would be in leaf-off season during the end of September in the Changbai Mountain region [
60], so deciduous forests and evergreen forests could be delineated using only two-temporal images; (2) when the TM image acquired in 2007 was referred to as base image, inter-annual variation in acquisition time among field data, GLAS data, and Landsat images could be neglected; and (3) cloud coverage of the TM image acquired on 2 October 2007 was less than 5%, while cloud coverage of another TM image was about 15%.
2.5. Land Cover Classification
Eight land cover classes were defined based on the characteristics of the regional vegetation. The land cover sub-classes that were not related to forest AGB were merged. Since training data was required for supervised classification, field data combined with local expert knowledge were used to generate a calibration/validation dataset. About 70% of these data served as a calibration dataset, while the remaining 30% were used as a validation dataset. The maximum likelihood classification (MLC) for supervised classification was employed in land cover classification because sufficient reference data to be used as training samples were available and the spectral signatures of land cover classes were significant. The identification of at least 10 polygons totaling 1000–2000 pixels in each land cover class were used for training. The spectral signatures generated from the training samples were then used to train the classifier to classify the spectral data into a land cover map. Approximately 30% of field data were used as test samples to assess the classification accuracy. As the Changbai Mountain region is characterized by its contiguous biodiversity-rich altitudinal vegetation zones, in order to identify the characteristics and evaluate the classification result, we set two profiles in the direction of north–south and west–east, from one side of Changbai Mountain Tianchi Lake at an elevation of ~700 m, to the opposite side of Tianchi Lake at a similar elevation.
2.6. Estimating LAI from Landsat Data
LAI is an important vegetation structural parameter for quantitative analysis of many physical and biological processes related to vegetation dynamic change [
61]. We estimated LAI from the canopy spectral reflectance for the forested pixels using a radiative transfer model based on the inversion approach. Canopy spectral reflectance was simulated using the PROSAIL model [
62] in conditions of various LAI, and zenith and azimuth angles. The spectral reflectance simulated accorded with six Landsat-5 TM bands’ spectral response functions. The next step was inversing the LAI from the lookup tables (
Table 3). Based on land cover maps, the corresponding lookup tables were selected according to the class of forested pixels, and the reflectance of each band was compared using the minimum distance method to obtain one LAI value for each angle image.
2.7. Estimating Canopy Closure in Forested Areas from Landsat Data
Canopy closure is an ecological indicator that is commonly used in characterizing the canopy structure because of its dependence of canopy height [
63]. It has often been measured using hemispherical canopy images or optical instruments with an angular field of view (FOV) in the field [
64]. In order to obtain optimal measured results, field measurements must be taken under uniformly overcast skies to ensure evenly diffuse light conditions [
65]. These restrictions on field data collection make it difficult to collect large numbers of observations, particularly in remote areas. Therefore, it is necessary to link remotely-sensed data with field observations. Airborne LiDAR sensors are increasingly used for estimation of canopy closure [
57,
64,
66]. However, the associated large data volume and high acquisition costs to observe remote areas limit their application at regional scales. In this study, an alternative method using Landsat data was used to extract canopy closure. At first, forest canopy cover of forested areas was estimated from the NDVI (normalized difference vegetation index) with the help of commonly used spectral mixture analysis (SMA). Then, the ratios between canopy cover and canopy closure (RCC) of the main species [
67] were employed to covert canopy cover to canopy closure.
The NDVI-SMA method for canopy cover (f
c) estimation is assumed that each pixel signal of the satellite data results from a mixture of two endmembers: bare soil and vegetation. NDVI of a given pixel is then taken as a linear combination of NDVI values of bare soil and vegetation according to their proportion. Vegetation in the above assumption was referred to forest in this study, i.e., NDVI of a given TM pixel was estimated from a linear mixture of NDVI values of pure pixels with bare soil (NDVI
soil) and forest (NDVI
forest):
which can be rewritten as:
The NDVI value of the pixel has no subscript. Since NDVI depends on soil and vegetation types as well as soil moisture and chlorophyll content [
68,
69,
70], the NDVI value of bare soil and forest is site-specific. Therefore, we randomly selected 100 bare soil pixels from the study site and took the averaged NDVI values of these pixels as NDVI
soil. The NDVI
forest values represented the maximum value of fully-forested pixels. Due to the effect of tree species on NDVI, NDVI
forest of four main forest types were calculated, respectively. For each forest type, thirty 100% forest covered pixels were identified and the corresponding NDVI values were calculated and averaged as the NDVI
forest. Then, f
c of each forested pixel was estimated from Equation (2) in terms of forest types. Specifically, RCC of fir/spruce (16 plots), larch (eight plots) and beech (eight plots) in their study were averaged and used for estimating the canopy closure of evergreen coniferous forest, deciduous coniferous forest, and deciduous broadleaved forest in our research. Due to a lack of a tree species stand for mixed forest, we averaged the RCC of larch and beech to estimate the canopy closure of mixed forest because mixed forest in the Changbai Mountain region were mainly composed of Korean pine and broadleaved forest [
48].
2.8. Relating Ground-Based AGB to GLAS Waveform Variables
All of the GLAS footprints in the Changbai Mountain region were filtered, as stated above (see
Section 2.3), and masked to remove records that were not in the forest areas according to the TM land cover map. A total of about 2600 GLAS footprints were overlaid onto NFRI AGB to extract observed AGB within the footprints so as to link AGB estimates from field data to GLAS waveform metrics. About 70% of LiDAR footprints were randomly selected for training, and the remaining 30% were reserved for verification. Due to the effect of forest type on GLAS waveforms, models of AGB estimation were developed separately for different forest types. These AGB prediction models were used for all GLAS footprints that were not co-located with field data.
Stepwise regression was performed to predict AGB in GLAS footprints using S-PLUS (Insightful Corp. Seattle, 2007). Compared with other machine learning methods (such as decision tree or neural networks), the regression method was not limited to the range of data, and showed the ability to yield reasonable results, similar to complex methods [
4,
57]. Correlation analyses were conducted between every two explanatory variables. When any two explanatory variables show high correlation, the two variables were not imported into the regression models at the same time. To infer GLAS metrics that had high correlation with AGB, and to discard variables that had less effect on AGB estimation,
t-tests and
F-tests were conducted to identify the predictive waveform variables and the overall correlation of regression models. Neter and Wasserman [
71] described how to use the two methods to make comparisons of regression parameters and test whether any two regression lines were identical. The coefficient of determination (R
2) is widely used to evaluate the goodness of fit for regression models. As the number of explanatory variables increases, the R
2 values also increase. However, the additional explanatory variables may not be significant, and do not substantially improve R
2. Therefore, R
2 is not appropriate for a comparison between models having different numbers of explanatory variables [
72]. Adjusted R-square (R
2-a) was adopted for these comparisons. The root mean square error (RMSE) was used to assess the predicted AGB versus the AGB estimated from field observations at the footprint level.
In addition to some waveform variables that remain sensitive to topography being used, a stratified regression strategy was adopted to improve AGB estimation in GLAS footprints. Previous studies have shown a promising prospect for the strategy. Nelson et al. [
40] used GLAS footprints falling on slopes ≤10° and all GLAS footprints regardless of slope to predict timber volume separately in Siberia. Chi et al. [
8] used GLAS footprints on slopes less than 20° and slopes ≥20° to develop the models of AGB estimates, respectively. They compared the estimated AGB with field measurements and concluded that the stratified regression models were better than the models based on GLAS footprints, regardless of slope. In the Changbai Mountain region, slope distribution of GLAS forest footprints shows that 82.5% of the footprints occur on slopes less than 10°. Therefore, GLAS footprints on GDEM (Version 2 ASTER Global Digital Elevation Model) slopes <10° and ≥10° were considered separately in this study.
GLAS footprints co-located with forest stand data that were not used to train GLAS-derived biomass models were used to validate. As, perhaps one or more footprints were dropped into one forest stand, footprints were sorted according to their biomass values. If several footprints shared the same biomass value, they were randomly chosen. We kept at least one sample from every 10 Mg·ha−1 in the range from 20 Mg·ha−1 to 220 Mg·ha−1 for two slope groups with four forest types.
2.9. Extrapolating GLAS-Derived AGB Estimates to A Continuous AGB Map and Accuracy Assessment
Since AGB is closely related to vegetation density and greenness [
11], vegetation indices that maintain sensitivity in middle to high AGB were derived from Landsat data. These vegetation indices were developed from both empirical methods, such as NDVI, DVI (Difference Vegetation Index), SR (Simple Ratio), PVI (Perpendicular Vegetation Index), and mathematical models, e.g., SAVI (Soil Adjusted Vegetation Index), EVI (Enhanced Vegetation Index), and TVI (Triangular Vegetation Index). All of the TM-derived variables used to develop models for generating the AGB map are listed in
Table 4 [
73,
74,
75,
76,
77,
78]. Due to difference of pixel size between a GLAS footprint and a TM pixel, all TM pixels whose center points were within 65 m of a GLAS footprint were used to derive the mean values of LAI, canopy closure, ASTER GDEM2 data, and vegetation indices for each footprint. Models linking AGB estimates from GLAS footprints with these variables were developed to estimate the AGB of the Changbai Mountain area using Random Forest (RF) regressions.
The predicted AGB was compared with forest stand data in accuracy assessment. The forest stands that were not co-located in GLAS footprints were randomly selected for validation. Since forest stands had irregular boundaries, the forest stands that covered at least two map pixels were selected in order to reduce the contingency in the comparison. Predicted AGB values of all pixels within one forest stand were averaged to compare with the observed AGB value of the forest stand.
In recent studies, the regression tree method Random Forest (RF), a non-parametric machine-learning algorithm, has been successfully used in AGB estimation and proved to better describe the spatial variability in AGB from the regional to the national scale [
4,
8,
79,
80]. Random Forest is an algorithm based on ensemble techniques for classification and regression analysis [
81]. The main principle in RF is the combination of various decision trees (the default value is 500 trees) using several bootstrap samples and selecting a subset of explanatory variables at every node. About two thirds of the original observations (referred to as in-bag samples) within a bootstrap sample are used to train the trees, with the remaining one third observations (referred to as out-of-the bag samples, OOB) are used in an internal cross-validation technique for assessing how well the resulting RF model performs [
81]. Each decision tree is independently produced without any pruning and a random subset of the predictor variables (
mtry) is used to find the best split [
80]. The best split is defined by identifying the predictor variable and the split point that results in the largest reduction in the residual sum of squares between the sample observations and the node mean. When all trees are grown to the maximum extent, the algorithm creates trees that have high variance and low bias, where the final predictions are derived by averaging the predictions of the individual trees [
81]. RF generates an internal unbiased estimate of the generalization error as the forest building progresses and overcome the overfitting habit of decision tree algorithms [
81]. One of the main advantages of RF is that it does not require an assumption to be made regarding the normality of covariables, which is often violated when complex ecological systems and environmental variables are introduced [
4].
We used the R software [
82] with the “randomForest” [
83] package for our analysis. In this study, 1000 “RF trees” (
ntree) were included and four variables (
mtry) were tried at each split. The parameter
nodesize was set to the default value of 1. In order to evaluate the contribution of remote sensing variables in modeling the AGB, we explored the variable importance of all prediction metrics using OOB samples and assessed the optimal variables for forest AGB mapping. The variable importance measured in this study took into account the difference in mean squared error between a dataset obtained through random permutations of the values of the different variables and the original OOB predicted dataset [
81]. A larger OOB error rate indicates greater importance of the variable. All of these remote sensing variables were grouped and reported in four AGB ranges (<50 Mg·ha
−1, 50–100 Mg·ha
−1, 100–150 Mg·ha
−1, and ≥150 Mg·ha
−1).We grouped 10 variables in four types of observations or features that related to vegetation greenness (seven vegetation index metrics), leaf area index (LAI), canopy construction (canopy closure), and surface elevation (DEM).
4. Discussion
A method to predict AGB in the Changbai Mountain area was developed by combining large footprint LiDAR data, field measurements, and Landsat TM data. GLAS footprint data, as a method of sampling, was important for linking field plot data to TM data because they could extrapolate the AGB of limited field plots to a large area. The LiDAR waveform records the interaction of a laser beam with trees in a footprint, starting from the crown. The crown shape of a coniferous tree is similar to a cone. Therefore, to estimate the AGB of the coniferous forest within a GLAS footprint, some waveform variables reflecting the characteristics of the canopy vertical structure were used in the regression model, such as the mean canopy height or h50. The structure of a broadleaved tree or a mixed forest stand is more complex, and cannot be described by one type of tree crown. More variables reflecting the overall structure of canopies were used in the models. We found quartile and decile heights for waveform energy reaching 75% (h75) and 100% (h100) were positively correlated with AGB estimates for deciduous coniferous forest and deciduous broadleaved forest. Other researchers have also found that both h50 and h75 were good indicators of forest vertical structure [
38,
84]. For example, Lefsky et al. [
84] found that the mean canopy height was highly related to the AGB of Douglas-fir and western hemlock in Oregon, OR, USA. We also discovered that h50 (similar to mean canopy height) were useful for AGB estimation of coniferous in the Changbai Mountain areas, which is located at a latitude similar with Oregon. Based on our results and those of researchers listed above, we suggest that GLAS variables closely related to mean canopy height or total waveform length, quartile and decile heights (h75 and h100) be included in the list of candidate predictors in GLAS variable selection performance aimed at estimating AGB. Since the GLAS-based models tend to overestimate GLAS footprint AGB with increasing slope [
38], some researchers used stratified regression models to analyze the impact of slopes on AGB estimates [
8,
40]. The strategy was adopted in this study by combining several terrain index variables that were sensitive to topography. We also suggest consideration be given to the leading edge extent, trailing edge extent, and h
slope. The results have proved the stratified strategy with R
2-a values of all regression models were greater than 0.76 and R
2 values in AGB model validation were higher than 0.79.
We used forest inventory stands data for calibration and assessment AGB models based on the assumption that this dataset were reliable. However, some issues should be paid more attention. First, due to lack of sufficient field measurements, forest inventory stands maps were adopted. We should be very careful when using the AGB from forest inventory stands maps because they were estimated from optical and topographic extrapolation models, and if used for calibration of RF models the results might end up being artificially good. Although forest stands maps were spatially limited to a small area, they could provide sufficient AGB samples that were co-located with GLAS footprints. Second, to use more representative samples for training models, all GLAS-derived AGB were used for calibration models. The forest inventory stands data that were not used for calibration were used for validation of the AGB map. This was also based on the assumption that forest inventory stands data were reliable, however, the validation just evaluated at the aggregated units (the forest stands used in validation were varied from ~0.3 ha to ~3 ha in size) which were spatially limited to small areas. In future research, we plan to use two methods to improve the processing of validation process: (1) the use of GLAS-derived AGB for validation, which have similar spatial resolution with TM data and represent a relative even sampling spatial distribution; and (2) the use field measurements matched with pixel size of AGB map as much as possible.
When AGB was estimated from the GLAS footprints, the GLAS-derived AGB can serve as a calibrated inventory dataset for the whole region, overcoming the dependency on a large amount of forest inventory data. Extrapolating a large number of GLAS-derived AGB to a continuous map required widespread surface coverage. MODIS images had been used for extending the forest AGB from plots to sub-continental scales. However, its coarser spatial resolution and mismatch with field observations in spatial resolutions limited its application in local regions. Landsat TM data with 30-m spatial resolution matched well with most field observations, and its rich spectral information was sensitive to forest cover and forest stand structure. Similar to MODIS images, TM data was prone to saturation in high-AGB forest. Fortunately, good results in GLAS-derived AGB models could offer reliable AGB samples with the greatest ranges in spatial distribution for developing AGB models.
Forest canopy closure, also called angular canopy cover (ACC), is the proportion of covered sky at some angular range around the zenith, and can be measured with a field of view instrument [
64]. As it is dependent of canopy height, it will become a useful indicator that is used in estimating AGB (see the discussion in the next paragraph). Due to the requirement of proper weather and a strong tendency in the sampling error, in situ methods become time-consuming and labor-intensive for large areas. As canopy cover (CC) was prone to estimations from optical remote sensing data and the relationship between canopy cover and canopy closure had been assessed in [
67], canopy closure could also be estimated from optical remote sensing data. The relationship between CC and ACC was assessed by four techniques of field observation. We chose the relationship derived from the hemispherical photographs technique, which is one of the most popular methods still in use [
64]. The relationship between CC and ACC is a ratio (RCC) less than 1. Moreover, CC and ACC are both less than 100% according to their definitions. As a result, CC divided by RCC could obtain ACC, and its value may be more than 100%. The potential primary reason was the selection of the FOV used in the field estimation of ACC. Paletto and Tosi [
67] estimated ACC using a spherical densitometer and fisheye images with a FOV of 60°. Frazer et al. [
85] extracted the canopy structure using fisheye photographs with a FOV of 150°. Parent and Volin [
66] obtained digital hemispherical photographs using a maximum FOV of 120° for closed forest sites and a maximum angle of 150° for open sites. Jennings thought it was best to use an angle of view of 180° to estimate ACC by its definition. Obviously, different options of FOV caused biased ACC observations. In addition, tree species in the research of Paletto and Tosi was not completely consistent with species in the Changbai Mountain region. It was inevitable to bring errors in ACC estimation from the use of the RCC derived from their study. In a future study, we will focus on the development of accurate method for the field observation of canopy closure in the local area.
The importance of all prediction metrics (e.g., TM-derived variables and DEM) was evaluated. These importance cannot be explained in terms of physical relationships between the remote sensing variables and the AGB; they simply represent the relative significance of the remote sensing variables and their spatial characteristics to stratify forest types in the estimation of AGB. Despite the lack of explanation from the perspective of the physical mechanism, we attempted to illustrate the relationship between prediction variables and AGB from the perspective of vegetation and ecological remote sensing. Although canopy closure was estimated from canopy cover in our study, it is actually a variable dependent on canopy height, which is closely correlated to AGB. As we can see in
Figure 8, A is the canopy gap fraction (CGF) that is used to refer to the proportion of canopy gaps in an angular field of view [
64]. Canopy closure equals 1 − CGF. For different canopy heights, canopy cover (B in
Figure 8) can be a constant, but CGF is not. Therefore, canopy closure contains information on the canopy height that can be used in AGB estimation. As the crown shape of a coniferous tree is regular and the interval between adjacent trees is relatively constant, the vertical structure of coniferous forest stands is more significant than broadleaved forest stands. This can explain why canopy closure is more important in AGB models of coniferous forests than the importance in AGB models of broadleaved forests. Most forests in the AGB range ≥150 Mg·ha
−1 were mature forest or over-mature forest, and spectral reflectance was becoming saturated in middle-to-high AGB forest, so the importance of canopy closure in AGB prediction was highlighted. LAI is a key biological and physical parameter in most ecosystem because: (1) it defines the area that interacts with solar radiation and provides much of the remote sensing signal; and (2) it is the surface that is responsible for carbon absorption and exchange within the atmosphere [
86]. As an important indicator of foliage AGB, LAI showed relatively low importance in models of AGB estimation in the study. It was consistent with the knowledge that foliage AGB accounts for a small proportion of AGB in a forest stand. Since vegetation spectral reflectance contains information on the vegetation chlorophyll absorption bands in the visible region (especially in the blue band and the red band), and sustained high reflectance in the near-infrared and shortwave infrared regions, a few vegetation indices were created using the inverse relationship between red (or blue, green) and near-infrared reflectance associated with healthy green vegetation. Through normalizing or modeling external effects (e.g., the sun angle, viewing angle, and atmosphere condition) and internal effects (e.g., topography variations and soil variations), these vegetation indices (VIs) maximize sensitivity to vegetation biophysical parameters, such as tree abundance, basal area, stem density, and crown size, which are correlated to some degree with AGB. As can be seen in
Figure 7, for four forest types, VIs showed overwhelming importance in an AGB range less than 100 Mg·ha
−1. In addition to the reason mentioned above, TM data were closer to vegetation density than to its vertical structure, which leads to their saturation, starting from 100 to 150 Mg·ha
−1 [
16]. DEM data contributed less than other variables in modeling AGB, with relatively higher contribution in AGB estimation of deciduous coniferous forest due to its specific distribution ranges and extensive topographic features in the ranges.
Profile analysis was used to indicate continuous forest-richness in altitudinal vegetation zones. Good agreement between our findings and a previous study [
48] implied that the classification accuracy of the land cover map was acceptable. As the Changbai Mountain region lies across two countries (China and North Korea) and there is a lack of forest inventory data in North Korea, AGB estimates were only validated in the Chinese region. As much as possible, forest stands with the greatest ranges in spatial and quantity distribution of AGB were selected. AGB with large R
2 and relatively lower RMSE showed good agreement between AGB estimates and forest inventory data. The forest AGB map would be useful for depicting and quantifying the distribution of forest AGB over the Changbai Mountain region.
5. Conclusions
We used large footprint LiDAR data, Landsat TM data, and field measurements to generate an accurate forest AGB map of the Changbai Mountain region. Four main forest types were exacted from TM data and identified in altitudinal vegetation zones by profile analysis. The characteristics of altitudinal vegetation zones were significant on the north slope, south slope, and west slope of Changbai Mountain. On the north slope, mixed forest, coniferous forest, and deciduous broadleaved forest were dominant at three altitudinal zones (from ~700 m a.s.l. to ~1150 m a.s.l., from ~1150 m a.s.l. to ~1800 m a.s.l., and above ~1800 m a.s.l.), respectively. Mixed forest is dominant on the west slope at an elevation ranging from ~800 m to ~1650 m. On the east slope, mixed forest and deciduous coniferous forest are dominant at an elevation ranging from ~1000 m to ~1900 m. A stratified regression strategy was adopted to develop regression models linking field AGB estimates and GLAS metrics according to forest types. The R2-a values of AGB models estimated from GLAS waveform data are greater than 0.76. TM-derived vegetation indices, LAI, canopy closure and DEM data were modeled to the extrapolated GLAS-derived AGB of the Changbai Mountain region by use of the RF method. An R2 of 0.72 showed the overall high consistency between the estimated AGB and the forest inventory data. The contribution of remote sensing variables in modeling AGB was evaluated in four AGB ranges (<50 Mg·ha−1, 50-100 Mg·ha−1, 100-150 Mg·ha−1, and ≥150 Mg·ha−1). The canopy closure was the most important variable when AGB ≥150 Mg·ha−1, while vegetation index metrics accounted for large amount of contribution when AGB <150 Mg·ha−1. Canopy closure acquired from field observations should include all sizes of gaps in the FOV. DEM data were the least important of all variables in modeling AGB.
In general, we combined GLAS and TM data to estimate forest AGB in the Changbai Mountain region. GLAS footprints, as a method of sampling, can be extrapolated to generate a continuous AGB map of Changbai Mountain at a spatial resolution of 30 m. Although we attempted to explain the importance of remotely-sensed variables from the viewpoints of vegetation and ecological remote sensing, there is need to further quantify the relationship between these variables and AGB under different conditions, from the perspective of the physical mechanism. The relatively fine-scaled forest AGB map provides spatially-explicit AGB information for forest carbon cycle studies and forest resource management in the East of Eurasia.