[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Trajectory PHD and CPHD Filters for the Pulse Doppler Radar
Previous Article in Journal
Uncertainty Quantification in Data Fusion Classifier for Ship-Wake Detection
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Transfer Learning for Anthropogenic Geomorphic Feature Extraction from Land Surface Parameters Using UNet

Department of Geology and Geography, West Virginia University, 98 Beechurst Avenue Brooks Hall, Morgantown, WV 26506, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(24), 4670; https://doi.org/10.3390/rs16244670 (registering DOI)
Submission received: 30 September 2024 / Revised: 7 December 2024 / Accepted: 12 December 2024 / Published: 14 December 2024
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)
Figure 1
<p>(<b>a</b>) Location of 10 × 10 km USGS 3DEP tiles used in this study for which geomorphons were calculated; (<b>b</b>) training, validation, and testing extents for agricultural terrace (terraceDL) dataset [<a href="#B21-remotesensing-16-04670" class="html-bibr">21</a>] in Iowa, USA; (<b>c</b>) training, validation, and testing extents for surface coal mining valley fill face (vfillDL) dataset [<a href="#B22-remotesensing-16-04670" class="html-bibr">22</a>] in West Virginia, Kentucky, and Virginia, USA.</p> ">
Figure 2
<p>Example agricultural terraces in Iowa, USA (<b>a</b>) and valley fill faces (<b>c</b>) in the Appalachian southern coalfields of the eastern United States. Red areas in (<b>a</b>,<b>c</b>) show the extents of terraces and valley fill faces, respectively, over a multidirectional hillshade. The LSPs used in this study are visualized in (<b>b</b>,<b>d</b>) (red = TPI calculated with a 50 m circular window; green = square root of slope; blue = TPI calculated with a 2 m inner and 5 m outer annulus window). Coordinates are relative to the NAD83 UTM Zone 15N projection for (<b>a</b>,<b>b</b>) and the NAD83 UTM Zone 17N projection for (<b>c</b>,<b>d</b>).</p> ">
Figure 3
<p>Example land surface parameter (LSP) composite image chips (<b>a</b>) and associated geomorphon classifications (<b>b</b>). Chips were selected from random locations within the extent of the downloaded 3DEP DTM data. Each chip consists of 512 × 512 cells with a spatial resolution of 2 m.</p> ">
Figure 4
<p>Conceptualization of UNet architecture [<a href="#B19-remotesensing-16-04670" class="html-bibr">19</a>] with the ResNet-34 [<a href="#B20-remotesensing-16-04670" class="html-bibr">20</a>] encoder backbone used in this study. E = encoder; D = decoder, CH = classification head, LSPs = land surface parameters; Conv = convolutional layer, BN = batch normalization, and ReLU = rectified linear unit.</p> ">
Figure 5
<p>Example classification results using a random parameter initiation and 1000 training chips. (<b>a</b>) Multidirectional hillshade for example agricultural terrace classification; (<b>b</b>) reference agricultural terrace data; (<b>c</b>) agricultural terrace classification result; (<b>d</b>) multidirectional hillshade for example valley fill face classification; (<b>e</b>) reference valley fill face data; (<b>f</b>) valley fill face classification result; (<b>g</b>) multidirectional hillshade for example geomorphon classification; (<b>h</b>) reference geomorphon data; (<b>i</b>) geomorphon classification result.</p> ">
Figure 6
<p>Training loss for terraceDL (<b>a</b>) and vfillDL (<b>b</b>) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.</p> ">
Figure 7
<p>Validation F1-score for terraceDL (<b>a</b>) and vfillDL (<b>b</b>) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.</p> ">
Figure 8
<p>Training loss for terraceDL (<b>a</b>) and vfillDL (<b>b</b>) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.</p> ">
Figure 9
<p>Validation F1-score for terraceDL (<b>a</b>) and vfillDL (<b>b</b>) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.</p> ">
Figure 10
<p>Assessment metrics calculated from the withheld test data for terraceDL (<b>top</b>) and vfillDL (<b>bottom</b>) datasets using different weight initiations and with the encoder frozen and unfrozen. Results reflect the experiment using 1000 training chips and the model parameters associated with the training epoch that provided the highest F1-score for the validation data.</p> ">
Figure 11
<p>Assessment metrics for withheld test data for terraceDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.</p> ">
Figure 12
<p>Assessment metrics for withheld test data for vfillDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.</p> ">
Figure 13
<p>CKA analysis results for each convolutional layer in the architecture. Each graph represents a comparison of a pair of models. Each compared model was trained from a random initialization and using the largest training set available for the specific task. Since the ImageNet weights are not available for the decoder, the decoder blocks were not compared when ImageNet was included in the pair.</p> ">
Review Reports Versions Notes

Abstract

:
Semantic segmentation algorithms, such as UNet, that rely on convolutional neural network (CNN)-based architectures, due to their ability to capture local textures and spatial context, have shown promise for anthropogenic geomorphic feature extraction when using land surface parameters (LSPs) derived from digital terrain models (DTMs) as input predictor variables. However, the operationalization of these supervised classification methods is limited by a lack of large volumes of quality training data. This study explores the use of transfer learning, where information learned from another, and often much larger, dataset is used to potentially reduce the need for a large, problem-specific training dataset. Two anthropogenic geomorphic feature extraction problems are explored: the extraction of agricultural terraces and the mapping of surface coal mine reclamation-related valley fill faces. Light detection and ranging (LiDAR)-derived DTMs were used to generate LSPs. We developed custom transfer parameters by attempting to predict geomorphon-based landforms using a large dataset of digital terrain data provided by the United States Geological Survey’s 3D Elevation Program (3DEP). We also explored the use of pre-trained ImageNet parameters and initializing models using parameters learned from the other mapping task investigated. The geomorphon-based transfer learning resulted in the poorest performance while the ImageNet-based parameters generally improved performance in comparison to a random parameter initialization, even when the encoder was frozen or not trained. Transfer learning between the different geomorphic datasets offered minimal benefits. We suggest that pre-trained models developed using large, image-based datasets may be of value for anthropogenic geomorphic feature extraction from LSPs even given the data and task disparities. More specifically, ImageNet-based parameters should be considered as an initialization state for the encoder component of semantic segmentation architectures applied to anthropogenic geomorphic feature extraction even when using non-RGB image-based predictor variables, such as LSPs. The value of transfer learning between the different geomorphic mapping tasks may have been limited due to smaller sample sizes, which highlights the need for continued research in using unsupervised and semi-supervised learning methods, especially given the large volume of digital terrain data available, despite the lack of associated labels.

1. Introduction

Convolutional neural network (CNN)-based deep learning (DL) has shown great promise for performing pixel-level classification (i.e., semantic segmentation or assigning each pixel in a spatial extent to a thematic category) and feature extraction (i.e., identifying pixels occurring within features of interest and differentiating them from the landscape background). The performance gains offered by CNN-based methods in comparison to pixel-based classification using traditional machine learning (ML) is generally attributed to the ability of CNN-based architectures to capture spatial context information and patterns at multiple scales via the learning of parameters associated with moving windows or kernels. Via an iterative supervised training process using back propagation of errors with respect to model parameters and guided by a loss metric, the initial weights and biases associated with these kernels are incrementally adjusted for the task at hand [1,2,3].
A limitation of applying DL methods to pixel-level classification is the need for pixel-level labels, since model parameters are updated via a supervised learning process. Given the large number of trainable parameters within a CNN-based semantic segmentation architecture, which includes weights and biases associated with moving windows and, optionally, gain and shift associated with batch normalization, these models can be prone to overfitting to the training data or not generalizing to new data or mapping extents, especially when only a small training set is available [4,5,6,7]. Such issues are exacerbated when applied to real landscapes since classes of interest often have varying landscape proportions, resulting in a class imbalance problem and/or an inability to collect a large number of training samples for rare classes [8,9,10,11,12,13,14]. For feature extraction tasks, the feature of interest is often not abundant on the landscape, which can further complicate the collection of a large number of training samples [12,13].
Relating to the issue of limited sample size, the goal of this study is to explore the use of transfer learning as a means to potentially improve anthropogenic geomorphic feature extraction using land surface parameters (LSPs), also referred to as digital terrain variables [15,16], as predictor variables. Some prior studies have noted the value of transfer learning for mapping geomorphic features, such as Wilhelm et al. [17] and Zhang et al. [18]. Wilhelm et al. [17] explored the mapping of general landform features on Mars using black-and-white images from the Mars Reconnaissance Orbiter (MRO) Context Camera (CTX), while Zhang et al. [18] used multispectral data from the WorldView-2 satellite and drone-based imagery. Thus, there has been limited work relating to using transfer learning in the context of mapping anthropogenic geomorphic features and when LSPs are used as input predictor variables as opposed to imagery or multispectral data.
We explore and compare model performance for a UNet architecture [19] with a ResNet-32 encoder [20] backbone for two anthropogenic geomorphic feature extraction problems: the mapping of agricultural terraces using the terraceDL dataset [21] in the state of Iowa, USA, and valley fill faces resulting from surface coal mining reclamation in the states of West Virginia, Kentucky, and Virginia, USA, using the vfillDL dataset [22]. Comparisons are made between models using random parameter initiations, pre-trained ImageNet [23] parameters (i.e., transfer learning based on true color imagery), those learned from attempting to replicate a geomorphon-based [24,25] landform classification (i.e., transfer learning using the same input feature space but a different mapping problem), and those learned from the other feature extraction task (i.e., using the vfillDL parameters to initialize the terraceDL model encoder and the terraceDL parameters to initialize the vfillDL model encoder). For all models other than the random initiation, results are compared with the encoder “frozen” (i.e., not trainable) and “unfrozen” (i.e., trainable). We also perform comparisons across varying training set sizes (50, 100, 250, 500, 750, and 1000 512 × 512 pixel image chips).

2. Background

2.1. Light Detection and Ranging (Lidar) and Land Surface Parameters (LSPs)

Light detection and ranging (lidar) data have revolutionized landform and surficial geologic mapping [26,27], geohazard evaluation and modeling [28,29], and archeological research [30,31,32,33]. These data have also become more readily available; for example, the 3D Elevation Program (3DEP) in the United States is making LiDAR-derived point clouds and elevation products freely available for the majority of the country [34]. Combining these data with DL-based classification methods provides a unique opportunity to develop new mapping and modeling techniques, and we argue that such data and techniques are especially applicable to mapping anthropogenic geomorphic features.
From lidar-derived DTMs, a variety of land surface parameters (LSPs) can be derived that characterize varying aspects of the terrain surface (e.g., steepness, orientation, roughness, relative topographic position, and curvature) and correlate with important factors for mapping and modeling (e.g., moisture content, incoming solar radiation, and erosion potential) [15,16]. This study is part of a larger and continuing research project focused on the use of CNN-based DL and LSPs for geomorphic mapping and feature extraction. In our first study in the series [35], we explored the use of DL-based instance segmentation using the mask region-based convolutional neural network (mask R-CNN) architecture [36] for mapping valley fill faces resulting from surface coal mine reclamation. The same dataset, vfillDL [22], is used in this study. In the second study in the series [37], we explored the impact of using different terrain representations, or LSPs, as an input feature space or set of predictor variables for four different feature extraction or geomorphic mapping tasks. This study resulted in a suggestion for a feature space to represent terrain surfaces as an input to DL models. A similar terrain representation is used in this study. Furthering our prior work, here we begin to explore the impact of training sample size and means to potentially improve model performance when training sample size is limited, a common problem when undertaking both research and applied mapping and modeling tasks.

2.2. CNNs for Geomorphic Mapping

CNN-based pixel-level classification methods, which were first proposed in 2014–2015 via fully convolutional neural network architectures [38], viewed within the larger context and development of pixel-level classification methods applied to geospatial and Earth observation data, offer key advancements over prior techniques. Since the training process allows for learning the parameters associated with moving windows, the user does not need to manually define textural features, such as those after Haralick [39]. This is of value, as it is not generally known what subset of the large number of possible hand-crafted features will be most useful for the current mapping problem. Further, the processes of capturing spatial context information and performing pixel-level classification are conducted simultaneously, in contrast to geographic object-based image analysis (GEOBIA) methods [40,41,42,43].
We argue that CNN-based methods are especially applicable to anthropogenic geomorphic feature extraction from LSPs derived from high spatial resolution DTMs since spatial patterns or textures are the key information content that differentiates feature(s) of interest from the landscape background or different features from each other. This is in contrast to classification of land cover from true color or multispectral imagery where features or classes of interest are differentiated using both textural patterns and spectral signatures [35,37,44,45,46]. Prior to the development and wide adoption of CNN-based semantic segmentation methods, GEOBIA was explored as a means to capture spatial context information and perform geomorphic mapping tasks (e.g., [45,47,48,49,50]); for example, Drăguţ and Blaschke (2006) proposed a GEOBIA-based method for the differentiation of general landform features (peak, shoulder, flat, sideslope, etc.) from LSPs derived from DTMs. This highlights the value of spatial context information and textural patterns for geomorphic mapping tasks.
Since the development of CNN-based semantic segmentation methods, they have been applied to geomorphic mapping from spectral data (e.g., [18,51,52]), LSPs (e.g., [26,35,37,44,46,53,54,55,56,57]), or a combination of imagery and LSPs (e.g., [48,58,59,60,61,62]). CNN-based semantic segmentation has also been applied to varying geomorphic mapping and modeling problems, including general landform classification or surficial unit mapping (e.g., [46,51,55,58,59,61,63,64,65]), extraction of anthropogenic landforms or archeological features (e.g., [26,35,37,53,54,56]), digital soil mapping (e.g., [44,66]), and landslide mapping or susceptibility modeling (e.g., [67,68]). For general surficial geologic mapping using DL and only LSP predictor variables, Odom and Doctor [69] investigated the mapping of exposed bedrock and alluvium in the Delaware River Basin. For multiclass surficial geologic mapping, van der Meij et al. [46] compared DL-based predictions with those created manually by multiple trained analysts and highlighted the complexity of obtaining consistent geomorphic maps and the shortcomings of DL in regards to obtaining accurate class boundary delineations.
For the extraction of anthropogenic geomorphic features using DL and LSPs more specifically, studies often focus on archeological or historic land use features. For example, Suh et al. [70] investigated the extraction of charcoal hearths in Connecticut, USA, and compared different LSP input combinations consisting of slope, hillshade, and/or Visualization for Archeological Topography (VAT) raster grids. They were able to quantify the impact of different input feature combinations and offer recommendations. Guyot et al. [53] documented the utility of lidar-derived LSPs and DL for detecting topographic anomalies associated with archeological structures in Brittany, France. Banasiak et al. [71] explored DL-based semantic segmentation for differentiating archeological and modern anthropogenic features and natural features occurring under a tree canopy in the Białowieza Forest, Poland. Given this focus on archeological or historic land use features, we argue that there is a need to specifically explore the mapping and extraction of human-induced landscape alterations resulting from more recent agriculture, urbanization, infrastructure development, and resource extraction to better map and document physical landscape change. This study expands upon our prior studies, Maxwell et al. [35] and Maxwell et al. [37], focused on features resulting from agriculture and surface coal mining.

2.3. UNet Architecture

A variety of CNN-based semantic segmentation architectures have been proposed and subsequently refined or modified for use in geospatial mapping and modeling tasks, including fully convolutional neural networks [38], UNet [19], and DeepLabv3+ [72,73]. For our experiments, we use UNet, as this base architecture and modifications of it have been documented to be widely applicable for tasks, including general land cover mapping [74], cloud detection [75], building footprint extraction [76], delineating features from topographic maps [77], tree species delineation [78], and agricultural or crop mapping [79]. Since it uses an encoder-decoder architecture, the default encoder can be replaced with common CNN-based scene labeling architectures, such as ResNet [20], which allows for the use of pre-trained encoder parameters, as implemented in this study.
UNet was originally proposed by Ronneberger et al. in 2015. Convolutional layers within each encoder block capture spatial patterns to build feature maps from the input data or from the feature maps generated by the prior block using small kernels (e.g., 3 × 3 cells); activation functions incorporate non-linearity [7,80,81]; and batch normalization stabilizes the learning process [82,83,84]. The 2 × 2 cell 2D max pooling operations decrease the size of the array between each encoder block to allow for learning spatial patterns at multiple scales and varying receptive field sizes. The block at which the array has the smallest size in the spatial dimensions is termed the bottleneck, and the representation of the data at this scale captures a high degree of semantic information but at a reduced spatial resolution [19].
The purpose of the decoder is to use the spatial context information generated by the encoder and captured within the generated feature maps to rebuild the original spatial resolution of the data and allow for pixel-level, as opposed to scene-level, predictions. In order to undo the reduction in the array sizes in the spatial dimensions resulting from max pooling operations applied in the encoder, upsampling is performed between the decoder blocks. This can be accomplished using a resampling method, such as bilinear interpolation, or 2 × 2 transpose convolution, which implements trainable kernels. Each block in the decoder is linked to the encoder block with the same spatial resolution using skip connections, which allow for sharing semantic information between the encoder and decoder. The final classification at the pixel level is performed using 1 × 1 2D convolution, or 3 × 3 2D convolution if it is desired to use adjacent cells to inform the prediction [19].

2.4. Transfer Learning

As noted above, reduced generalization ability to new data or spatial extents is of concern when generating DL models due to the large number of parameters that must be learned. This issue is generally exacerbated by small training sample sizes, which can limit the practical utility of DL algorithms [7]. The goal of transfer learning is to allow for parameters learned from a different task and larger dataset to be applied to a new dataset and/or classification problem. The underlying idea is that learned general patterns in data can be applicable to new problems since the model captures data abstractions, such as edges and textures, that are of value for a wide variety of tasks [4,6,85,86]. Transfer learning methods have been found to be useful for geospatial mapping and modeling tasks, including land cover classification [4,87,88] and feature extraction [73,89,90].
Ma et al. [6] provide a review and categorization of transfer learning methods applicable to environmental remote sensing, including fine-tuning-based methods, multi-task learning, few-shot learning, and unsupervised domain adaption [6]. This study specifically focuses on fine-tuning-based methods. Fine-tuning-based transfer learning methods entail initializing a model for a new task with parameters learned for another task. For example, parameters learned for the scene labeling task presented by the ImageNet dataset [23] can be used to initialize a land cover classification model.
When using fine-tuning-based methods for semantic segmentation, it is common to implement one of two general training scenarios [6], both of which are explored in this study. In the first scenario, the encoder component of the architecture is initialized using pre-trained parameters. The encoder parameters are untrainable, or frozen, during the new training process. Meanwhile, the decoder component is initialized using random parameters and updateable, or unfrozen, during the current training process using the new dataset and task. The underlying idea is that the multiscale feature maps learned for other tasks provide a suitable feature space or data representation as input to the decoder for the new classification task, and updating a smaller set of the parameters can reduce issues of overfitting to the new, smaller dataset. This can also reduce training time since only a portion of the model is updated [6].
In the second scenario, the encoder is also initialized using pre-trained parameters while the decoder is initialized using random parameters; however, all parameters are trainable, or unfrozen, during the training process for the new task. Although all parameters are trainable, it is common to use different learning rates for different components of the architecture. Specifically, the learning rates in the encoder are smaller than those in the decoder since it is assumed that the pre-trained components of the architecture will require a smaller degree of refinement in comparison to the randomly initialized decoder [6].

3. Methods

3.1. Study Areas and Datasets

The lidar-derived DTMs used in this study were obtained from the United States Geological Survey (USGS) 3DEP [34,91]. The point features in Figure 1a represent the center coordinates of the randomly selected 10 × 10 km 3DEP 1 m DTM tiles for which terrain derivatives and geomorphons were calculated for use in this study. All DTMs were resampled to a 2 m spatial resolution using pixel aggregation and the mean elevation value from the four cells occurring within the new, larger cell. These tiles were selected by first obtaining a list of all available tiles. We then randomly selected 1500 tiles from those available. Data preparation was conducted using the R language and data science environment [92]. First, all tiles were downloaded. Any selected tile that was not complete or had empty cell values was not processed. This reduced the total number of available tiles to 1173. Of these tiles, 800 were randomly selected for inclusion in the training dataset, and the remaining tiles were randomly partitioned into a validation set of 173 tiles and a test set of 200 tiles. In alignment with common practices and terminology, in this study, validation data were used to assess the model at the end of each training epoch while testing data were withheld to assess the final model.
The terraceDL dataset [21] was created by the lead author’s lab group for a prior study [37]. The vector agricultural terrace data representing the features of interest were provided by the Iowa Best Management Practices (BMP) Mapping Project [93] and were created by manual digitization and interpretation of lidar-derived topographic data and aerial imagery. Specifically, analysts delineated terraces as a line feature along the top of the terrace ridge. These features represent an anthropogenic landscape alteration designed to reduce agricultural-induced soil loss by hindering sheet and rill erosion and gully development. The vector line features generated by the Iowa BMP Mapping Project were buffered by 4 m to generate polygon features and subsequently rasterized at a 2 m spatial resolution to align with the DTM data. Figure 1b shows the geographic regions within Iowa, USA, used for training, validation, and testing in this study, which were defined using watershed boundaries. Figure 2a,b shows example terrace features, which are characterized as thin, linear, and vegetated with grasses [93]. These features are different from wider agricultural terraces found in other parts of the world or natural fluvial terraces, as they are thin and designed primarily for erosion control and to prevent soil loss. Our training, validation, and test data were derived from the buffered line features representing the top of the terrace ridge.
The vfillDL dataset [22] was also generated by the lead author’s lab group for use in a prior study [37]. Valley fills are an anthropogenic landform feature resulting from mountaintop removal coal mining reclamation. Mountaintop removal consists of harvesting coal seams using explosives, which requires the removal of large volumes of overburden rock. Due to the naturally steep slopes in the mountaintop mining region of southern West Virginia, eastern Kentucky, and southwestern Virginia, it is not possible to reclaim mine sites to “approximate original contour”, as is required under the Surface Mine Control and Reclamation Act (SMCRA). As a result, mine operators are allowed to alter the landscape by flattening mountaintops and placing overburden material in adjacent valleys, resulting in valley fills [94,95,96,97]. In this study, we specifically focus on the mapping of the faces of valley fills, which are characterized as having terraced slopes to maintain stability and reduce erosion and drainage ditches to transport water from the mine complex to the valley bottom. Valley fill faces were manually digitized by the lead author’s lab group using manual interpretation of digital terrain representations and ancillary data. As shown in Figure 1c, a large region within West Virginia was used to train the model while adjacent regions within the state were used for model validation at the end of each training epoch. Areas within eastern Kentucky and southwestern Virginia were reserved as a testing set to assess the final model. The extents were defined based on the availability of lidar data. Example valley fill faces are shown in Figure 2c,d.
Table 1 provides counts of digitized features for both problems explored in the study. Since terraces are abundant within the watershed extents used in this study, a large number of features are available to train and assess models. In contrast, and since the valley fill faces are rarer features, less features are available to train and assess models. We argue that exploring two anthropogenic feature extraction tasks enhances the generalizability of this study, which is further enhanced by exploring features with different textural characteristics, sizes, and abundance within the landscapes in which they occur.

3.2. Land Surface Parameters (LSPs)

LSPs derived from DTMs were used as input predictor variables in this study. Specifically, we used a modified three-layer combination originally proposed by Dr. William Odom of the USGS. Maxwell et al. [37] documented that this feature space supports more accurate extractions of geomorphic features, including agricultural terraces and valley fill faces, using semantic segmentation DL methods in comparison to more common terrain visualization surfaces: hillshades, multidirectional hillshades, and slopeshades. This representation is visualized in Figure 2b,d, which also highlights the two features explored in this study. Figure 2a,c shows examples of the features of interest displayed over a multidirectional hillshade. The first layer is a topographic position index (TPI) calculated using a moving window with a 50 m circular radius and designed to characterize general hillslope position. The second layer is the square root of slope calculated in degrees, which provides a measure of steepness. The third layer is another TPI; however, it is calculated using an annulus moving window with an inner radius of 2 and an outer radius of 5 m. This surface captures more local relief and surface roughness patterns in comparison to the other TPI. Values for both TPIs and the square root of slope were clamped to a range of −10 to 10, then linearly rescaled to a range of 0 to 1. These surfaces were calculated in R using the terra package [98], and moving windows were configured using the MultiscaleDTM package [99].

3.3. Geomorphons

Geomorphons [24,25] offer a means to differentiate the landscape into meaningful landforms by categorizing each pixel into one of the following ten features: flat, summit, ridge, shoulder, spur, slope, hollow, footslope, valley, or depression. To accomplish this, the landscape patterns in all eight directions from the cell being classified are characterized. To allow for a broader view of the landscape and to characterize the landscape at variable scales, a line-of-sight method is used as opposed to only comparing the center cell to its direct neighbors using a 3 × 3 cell window. Along each line of sight, it is determined whether the landscape is trending downward (‘−’), upward (‘+’), or flat (‘0’). This results in patterns defined by a tuple with a length of eight. The large number of possible eight-tuple combinations are subsequently reclassified into the meaningful landforms listed above [24,25].
We chose to use geomorphons in this transfer learning study for several reasons. First, since the calculations make use of a line-of-sight method as opposed to local moving windows, the resulting classification cannot simply be replicated by learning and applying a small set of kernels to the input LSPs. Second, this classification method offers a means to generate pseudo-labels that can be consistently rendered over large spatial extents, allowing for the generation of potentially large datasets. We hypothesize that attempting to predict the geomorphon classification at each cell location will allow the CNN architecture to capture general landscape patterns, represented as multiscale feature maps that are generally useful for a wide range of geomorphic mapping tasks.
Geomorphons were calculated using the Whitebox Geospatial Tools [100] implementation and within the R environment using the whitebox package [101], which interfaces with this external tool kit. Calculations were made from the lidar-derived DTMs at a 2 m spatial resolution using a search radius of 50 cells, which defines the maximum distance for each cell’s line-of-sight calculations. A skip parameter of three cells was used, which sets a minimum distance at which to start line-of-sight calculations. The flatness threshold was set to 1°, and the flatness distance was set to zero cells. Default arguments were maintained for all other parameters. Figure 3 shows a random subset of terrain extents represented using the multi-layer terrain composite (a) and the associated geomorphon classification (b).

3.4. Chips and Data Partitions

In order for data to be used to train a CNN-based semantic segmentation algorithm, larger spatial extents need to be converted into smaller extents of a pre-defined size. These partitions are generally referred to as chips. In this study, the multi-layer LSP raster grids and associated pixel-level labels (i.e., geomorphon landform indices or valley fill face or terrace binary masks) were partitioned into non-overlapping 512 × 512 cell chips. Table 2 below summarizes the number of training, validation, and testing chips used in the study for each dataset. In order to explore the impact of training sample size, random subsets of the terrace and valley fill face training datasets were generated with sample sizes of 50, 100, 250, 500, 750, and 1000 chips. All available geomorphon training chips were used, which collectively contain over seven billion pixels. For the other two datasets, only chips that contained at least one pixel mapped to the positive class were used. Chips were generated using a custom R function.

3.5. Training Process

All DL models were trained using the PyTorch library [102] in the Python language [103] on a Linux-based workstation with an AMD Ryzen Threadripper Pro 3955WZ 16-core CPU, 128 GB of RAM, and three NVIDIA RTX A5000 GPUs with a combined 72 GB of VRAM. GPU-based computation was implemented with CUDA [104,105]. The Segmentation Models library for PyTorch [106] was used to implement a UNet model [19] with a ResNet-34 [20] encoder.
The general model architecture, configured for a binary classification, is conceptualized in Figure 4, while Table 3 lists each layer’s input layer(s), input and output array sizes, and number of trainable parameters. The model consists of five encoder blocks, where the last encoder block serves as the bottleneck layer, and five decoder blocks. It has 24,436,369 trainable parameters, consisting of weights and biases for the convolutional layers and scale and shift parameters associated with batch normalization layers. The ResNet-34 encoder includes skip connections, a key innovation of the ResNet-family of architectures, which allow for generating deeper architectures by minimizing the vanishing gradient problem. These connections aid in gradient flow and are implemented by adding the input to the block to the output of the block. In contrast to the original UNet encoder, RestNet-34 only uses max pooling once and early in the architecture (see Figure 4 and Table 3). In later stages of the architecture, the size of the array in the spatial dimensions is decreased using 3 × 3 2D convolution with a stride of two as opposed to one. The first convolutional layer uses a kernel size of 7 × 7 with a stride of two; as a result, the input data are downscaled in the spatial dimension by the first convolutional layer, no feature maps are generated at the original spatial resolution, and no skip connection is added between E1 and D5. Between other encoder and decoder blocks that process data at the same spatial resolution, skip connections are included by breaking the ResNet-34 architecture into components in which the output feature map sizes in the spatial dimensions match those of the decoder block to which the feature maps are provided. Skip connections are included between E1 and D4, E2 and D3, E3 and D2, and E4 and D1. Although exploring more architectures would allow for generalizing our results, this was not possible in this study due to the computational time required to train a large set of models. Instead, we focused on exploring the use of transfer learning and varying the training set sizes within the context of a single architecture.
The geomorphon model was trained with a mini-batch size of 32 chips for a total of 200 epochs, or passes, over the entire dataset. Model parameter updates were made after processing each mini-batch using backpropagation and the AdamW optimization algorithm [107]. A base learning rate of 2.85 × 10−4 was used, which was selected using a learning rate finder process after Smith [108,109]. We augmented the learning rate throughout the training process using a one-cycle learning rate policy with a maximum learning rate of 0.001. A weighted cross entropy (CE) loss [8,110] was used in which the geomorphon classes were weighted based on the inverse of their abundance in the training dataset in order to increase the relative weight of rarer classes. For the training set, random vertical and/or horizontal flips were applied with a 0.3 probability in an attempt to reduce overfitting. The epoch that provided the highest F1-score for predicting the validation data was maintained as the final model.
Table 4 summarizes the different experiments performed for the agricultural terrace and valley fill face datasets. Models were trained with (1) a random parameter initiation, (2) encoder parameters obtained from the classification of the geomorphon data, (3) encoder parameters obtained from the classification of the other problem (i.e., the terrace model for the valley fill faces and the valley fill faces for the terrace problem), and (4) ImageNet-based parameters. For all models except the random initiation, training was conducted with a frozen and unfrozen encoder in which the encoder was not trainable and trainable, respectively. When only the decoder was trained, the number of trainable parameters decreased to 3,151,697.
All models were trained for 50 epochs using a mini-batch size of 25 and the AdamW optimizer [107,111]. The classification problem was configured to only return the logit for the positive case. Similar to the geomorphon models and in order to combat overfitting, random horizontal and/or vertical flips were applied with a probability of 0.3. When the ImageNet weights were used, data were normalized using the published band means and standard deviations of this dataset (means = 0.485, 0.456, 0.406; standard deviations = 0.229, 0.224, 0.225). Due to the large data imbalance problem in which a small proportion of the cells in the dataset were classified to the positive case, a focal Tversky loss [112] was used with alpha = 0.7, beta = 0.3, and gamma = 0.75. This resulted in a higher relative weight applied to false negative (FN), or omission, as opposed to false positive (FP), or commission, errors relative to the positive case. A gamma value smaller than one places increased weight on difficult-to-predict samples, or those predicted with smaller logits relative to their correct classification [8,112,113,114]. For all experiments, the final model was selected as the epoch that provided the highest F1-score for the validation data.
When initializing the encoder with pre-trained weights but allowing for it to be updated or trained (i.e., unfrozen), initial experimentation suggested unstable learning when a constant learning rate was used for all components of the model, as evident by widely fluctuating assessment metrics for the validation dataset. Such issues have been previously reported as noted by Ma et al. [6]. As a result, when using an unfrozen, pre-trained encoder, we used a learning rate of 1× 10−6 for the first two encoder blocks, 1× 10−5 for the last two encoder blocks, and 1× 10−4 for all decoder blocks and the classification head. When the encoder was frozen or when the model was randomly initialized, we used a learning rate of 1× 10−4 for all trainable components of the model.

3.6. Model Assessment

Models were assessed and compared using the withheld testing samples for the terrace and valley fill face problems. From the predictions and associated pixel-level labels, we calculated the assessment metrics described in Table 5. Overall accuracy (OA) simply represents the proportion of pixels that were correctly classified. FNs represent errors of omission relative to the positive class, while FPs represent errors of commission relative to the positive class. As a result, recall quantifies 1–omission error relative to the positive class, while precision quantifies 1–commission error. The F1-score aggregates precision and recall to a single metric by calculating the harmonic mean of the two measures [8,12,13,115].
In order to further compare the data representations and compliment the analysis of the training process and the associated accuracy assessment, we also performed a central kernel alignment (CKA) analysis after Kornblith et al. [116] and Nguyen et al. [117], which was implemented in PyTorch by Kim [118]. CKA is a method for comparing the similarity or alignment between two sets of feature maps within the same architecture or between two architectures on a layer-by-layer basis (i.e., all learned feature maps from one layer are collectively compared to learned feature maps from another layer in the same architecture or those from a different architecture). This method uses a kernel function to project the features into a higher dimensional feature space, and alignment between the features is compared in this higher dimensional space using the Hilbert–Schmidt independence criterion (HSIC), a measure of dependence between two sets. In order to make HSIC scale invariant, CKA implements a normalized version of this metric [116,117]. For the entire model architecture, we used this method to compare the models trained from a random initialization for the geomorphon, agricultural terrace, and valley fill face problems on a pairwise basis. For the encoder component, we also compared the ImageNet-based parameters to those learned for the three problems investigated in this study.

4. Results

4.1. Performance Using 1000 Training Samples

In this section, we focus on the results for the models trained using the largest training set of 1000 512 × 512 chips, or 262,144,000 pixels, using different parameter initiations and with the encoder frozen and unfrozen. Figure 5 visualizes the results obtained when training the architecture with a random initialization for the agricultural terraces (Figure 5a–c), valley fill faces (Figure 5d–f), and geomorphon (Figure 5g–i) problems. The maps in the left column show the terrain surface using a multidirectional hillshade. The center column shows the reference data, while the right column shows the classification result. Figure 6 provides the training loss curves for the terraceDL (a) and vfillDL (b) experiments, while Figure 7 provides the F1-score for the validation data calculated at the end of each training epoch. Generally, the vfillDL predictions stabilized at a lower training loss and higher validation F1-score than the terraceDL dataset.
In comparing the different initiation states, the model initiated with the ImageNet weights and trained with the encoder unfrozen stabilized at the lowest training loss for both the agricultural terraces and valley fill face problems. This model also generally stabilized at a higher validation F1-score for the valley fill face problem. For the terrace problem, the model initialized with the ImageNet parameters and with a frozen or unfrozen encoder and the model trained with random parameters and trained with an unfrozen encoder generally stabilized at the highest validation F1-score. Generally, there was more variability between the valley fill face models in comparison to the agricultural terrace models. We attribute this to the valley fill face case being a more difficult feature extraction problem, although this is not reflected in the loss and assessment metrics, as discussed below.
For initiation states that were trained using both a frozen and unfrozen encoder, or all initiation states other than random parameters, training with an unfrozen backbone generally provided a lower training loss and a higher validation F1-score in comparison to the same initiation state but using a frozen encoder. Initializing using the geomorphon-based parameters generally yielded higher training loss and lower validation F1-scores in comparison to the other initiation states tested. Initiating using the vfillDL-based weights for the terraceDL case or the terraceDL weights for the vfillDL case outperformed the geomorphon-based results, as measured with the validation F1-score, despite the much smaller number of training samples: 1000 chips vs. 28,800 chips. Notably, based on the validation F1-score, the ImageNet-based parameter initiation with a frozen encoder outperformed other initiations, especially for the vfillDL dataset, even when the encoder was trained for the other initiations.

4.2. Impact of Sample Size

Figure 8 shows the training loss and Figure 9 shows the validation F1-score for the terraceDL (Figure 8a and Figure 9a) and vfillDL (Figure 8b and Figure 9b) datasets across the 50 training epochs using all training set sample sizes, 50, 100, 250, 500, 750, and 1000 512 × 512 cell chips, and using all tested initiation states and frozen and unfrozen encoder configurations. As expected and regardless of the initiation state, increasing the sample sizes generally improved the model performance. However, there were generally less differences between models with sample sizes equal to or larger than 500 chips. In alignment with the results discussed above and using 1000 training chips, the ImageNet model generally provided the best performance. For models using the same initiation state, training the encoder generally improved model performance in comparison to using a frozen encoder. The terraceDL or vfillDL initiations, when applied to the other investigated problem, generally outperformed the geomorphon-based initiation, despite the smaller sample size.

4.3. Comparison of Test Set Predictions

As stated in the Methods section, the model parameters associated with the epoch that provided the largest F1-score for the validation data for each initiation configuration was selected as the final model and subsequently applied to the withheld testing data to calculate a test set loss and assessment metrics. Figure 10 summarizes the results when using 1000 512 × 512 cell training chips. Recall was generally higher than precision, suggesting more occurrence of omission as opposed to commission errors. For the valley fill face case and when using the same weight initiations, training the encoder generally improved model performance. Results were more variable for the terrace problem, as the unfrozen-encoder model did not always outperform the associated frozen-encoder model or the performances were similar. In regards to the testing F1-score, the best performance was generally provided by using the ImageNet parameters with the encoder unfrozen or random parameter initiation. The vfillDL-based parameter initiation for the terraceDL problem and the terraceDL-based parameter initiation for the vfillDL problem generally outperformed the geomorphon-based models. Notably, the ImageNet parameters with the encoder frozen and the random initiation outperformed the geomorphon-based model even when the encoder was trained for both problems.
Table 6 provides the assessment metrics for the terraceDL predictions using all weight initiation configurations, frozen and unfrozen encoders, and sample sizes, while Figure 11 provides a graphic representation of these data. Table 7 and Figure 12 provide the same summarization and visualization for the vfillDL problem. Notably, models tended to stabilize when 500 or more training samples were provided, as improvements in F1-score, recall, and precision were more gradual. Note that overall accuracies were generally high for all experiments. This can be attributed to class imbalance. Since a large proportion of the landscape is mapped to the background class or is not the feature of interest, and because the background samples are generally well predicted, a large proportion of the pixels across the landscape are correctly mapped. Thus, it is important to assess the model using the F1-score, recall, and precision for the positive case.

4.4. CKA Analysis Results

The results of the CKA analysis are summarized in Figure 13, where each sample point represents a convolutional layer within the model. Blocks are separated by red dash lines, and the encoder and decoder are separated by a blue dash line. All convolutional layers within the same block are displayed using the same point color. The y-axis maps the CKA measures, with larger values suggesting stronger correlation or alignment for the same convolutional layer between a pair of models. All compared models are those created using the largest training sample set and initialized using random parameters. Since the ImageNet weights are only available for the encoder, the decoder is not compared between ImageNet and the other datasets.
Generally, alignment or correlation between the learned representations decreased as the data progressed through the encoder. The feature maps in the first two encoder blocks, E1 and E2, were generally similar between all models, while the final encoder block, E5 or the bottleneck, showed the lowest alignment. This generally suggests that the earlier encoder blocks are learning more general features, while later blocks are learning representations more specific to the mapping task. For all model comparisons that did not include ImageNet, the decoder block showed more variability. For example, between the agricultural terrace and valley fill face models, the first decoder block showed the lowest correlation while the later blocks showed higher alignment that stabilized around 0.6. When comparing the geomorphon model to the agricultural terrace and valley fill face models, the alignment in the decoder generally peaked for block D4. Since D4 is connected to E1 using a skip connection (see Figure 4), the higher alignments between the models in D4 may be attributed to its dependency on E1, which was generally highly correlated between all model pairs. The drop off in alignment for D5 relative to D4 when the geomorphon model was compared to the agricultural terrace or valley fill face models may also be contributed to the model architecture since there is no skip connection between any encoder block and the final decoder block.

5. Discussion

5.1. Key Findings

Based on a visual assessment of the LSPs used in this study, and also visualizations of hillshades derived from the same input DTM data, the agricultural terrace features appear more distinct from the landscape background in comparison to the valley fill face features, which also have more gradational boundaries, especially at the top of the faces, where they contact the reclaimed mine complex. The better model performance for the valley fill face problem in comparison to the terrace problem was also noted in our prior study, Maxwell et al. [37], where we attributed the lower assessment metric values for the terrace problem to their shape. Terraces are commonly long, thin features (see Figure 2 above), resulting in a larger proportion of cells mapped to the terrace class being near the edge of the feature. In contrast, the valley fill faces, since they are generally larger and less elongated (see Figure 2 above), have a lower proportion of edge cells due to their larger interior. In summary, we attribute the lower performance for the terrace problem relative to the valley fill face problem to the shape of the terrace features as opposed to the inherent difficulty in differentiating these features from the landscape background. This highlights the importance of considering factors that can influence assessment metrics other than the difficulty of the mapping problem.
Models trained with the same parameter initiation but with the encoder unfrozen generally outperformed the associated model with the encoder frozen or not trainable during the learning process. However, differences were commonly small. This suggests that there is value in refining the encoder’s pre-trained parameters and associated feature maps for the new task. It should be noted that this will increase training time since the backpropagation and optimization processes must be applied to the entire model, not just the decoder. Training the full model could result in overfitting, especially if the training set is small [5,7]. If this is an issue, our validation F1-score results suggest that the gains in model performance achieved by refining the encoder outweighed issues of overfitting for both investigated problems.
Generally, our results suggest that at least 500 training chips should be used for both of these problems; however, this is likely not the case for all anthropogenic geomorphic feature extraction problems due to varying difficulties in differentiating features from the landscape background, variability in feature presentation, and issues arising from fuzzy or gradational boundaries. For both problems, model performance generally stabilized when at least 500 training chips were provided, even when the models were randomly initialized and no transfer learning was applied, which suggests that collecting a very large dataset and generating thousands of image chips may not be necessary or may not merit the associated investment of time and cost. Since the same mini-batch size was used for all experiments, small sample sizes equate to less parameter updates per training epoch, since backpropagation and parameter updates are performed after each training mini-batch is processed as opposed to at the end of each training epoch. Thus, small sample sizes of 50 or 100 chips may merit using a larger number of training epochs; however, this could result in overfitting issues [7]. There was also generally more variability between models with different initiation states when sample sizes were smaller, which suggests that decisions on how to apply transfer learning or initialize a model may be of greater concern when using a smaller sample size.
Our training and validation results generally suggest that the feature maps learned by attempting to differentiate the geomorphon-based landforms did not transfer well to the new feature extraction problems even given the large sample size and the variety of landscapes represented in the dataset (i.e., 3DEP 10 × 10 km tiles randomly sampled from across the contiguous United States). In contrast, the terraceDL or vfillDL datasets, even though the datasets were smaller, generally provided a better initiation state for the other problem. Although the reason for the poorer performance of the geomorphon-based initiations is not known and does not align with our hypothesis as stated above, we argue that this could be associated with the nature of the problems investigated. Attempting to differentiate the entire landscape into general landform categories, primarily defined based on topographic position and steepness and with gradational boundaries between adjacent units, may not transfer well to the extraction of specific landscape features, which are defined based on texture and specific topographic presentations, such as the elongated nature of terraces and their associated slope breaks and the terraced, steep slopes of valley fill faces. It should also be noted that the landscape classification generated by the geomorphon method will vary based on the user-defined settings provided to the algorithm [24,25]. The settings used in this study are described in Section 3.3 above and were selected based on visualization of the resulting landform maps generated over different geographic extents. We opted to use a consistent set of parameters for all generated geomorphon classifications for consistency. It is possible that using different settings would result in a decrease or increase in the performance of the transfer learning process. This was not assessed in this study since we relied on a single, consistent set of parameters.
The strong performance of transfer learning using the ImageNet dataset in comparison to the geomorphon-based method and those using the other investigated problem were unexpected since the ImageNet dataset consists of RGB images developed for a scene-level labeling of common objects task not related to geomorphic features. We anticipated transfer learning from the other geomorphic tasks to be more effective since they used the same input feature space and were specific to geomorphic mapping tasks. In our prior study, Maxwell et al. [35], although this was not the purpose of the study, we documented that initializing a mask R-CNN model using parameters learned from the Common Objects in Context (COCO) dataset [119] provided better performance than a random initialization despite being applied to LSP input variables. Similarly, Zhang et al. [18] noted the value of using a COCO-based initialization for the extraction of ice-wedge polygons from high spatial resolution aerial imagery [18,52]. Wilhelm et al. [17] also noted the value of initializing models using ImageNet-based weights for general landform mapping from black-and-white imagery of the Martian terrain. Generally, multiple prior studies have noted that data abstractions learned from RGB imagery can be useful for initializing models for disparate problems and input feature spaces, especially when a limited number of training samples are available [4,6,87,88,120,121,122]. This strong performance may be attributed to several factors, including the large size of the ImageNet dataset, the large number of classes differentiated, and the sophisticated training processes applied. Low-level abstractions that capture textures, edges, and other general characteristics of image-like data may be of general value and applicable to a wide range of problems. Further, if a very large dataset is necessary to obtain benefits from transfer learning, this limits the practical utility of developing problem- and/or feature-space-specific transfer learning datasets, as developing such data may be intractable. In short, we suggest experimenting with using pre-trained parameters developed using large datasets even if the input feature space and problem domain are very different.
The CKA analysis generally supports the applicability of ImageNet-based parameters for applications to geomorphic mapping using LSP input variables since there was generally correlation or alignment between the encoder feature maps from ImageNet and the geomorphic mapping tasks explored here. However, we argue that the correlation between feature maps may only partially indicate the suitability of transfer learning between two problems, since the encoder feature maps learned for the geomorphon problem showed similar correlation patterns to the ImageNet encoder feature maps when compared to the two feature extraction tasks explored, despite the varying success of transfer learning using the two different problems.

5.2. Limitations and Future Research

There are several notable limitations of this study. First, in order to better characterize model performance, it would be preferable to run multiple experiments for each initialization state and sample size, perhaps using different random subsets of the available training chips, in order to characterize variability in performance. This was not possible here due to limited computational resources. In order to conduct our experiments, we had to first train a model using the geomorphon-based landform classification before applying them to the transfer learning problem. For the terraceDL and vfillDL datasets, we ran a total of 84 models as required to train all combinations of the initiation states, frozen or unfrozen encoder, and sample size configurations. This took nearly three weeks of processing time, even when using multiple GPUs. The computational costs of training DL models limit the ability to train a large number of models. Further, it was not possible to assess a wide range of hyperparameters or training process augmentations (e.g., learning rates, loss functions, optimizers, data augmentations, or learning rate schedulers). In contrast to traditional machine learning, where computational demands allow for running multiple replicates of models and tuning hyperparameters, similar experimentation using DL is limited [123]. As noted above, our prior study, Maxwell et al. [37], documented improved performance when using the LSP features implemented in this study for DL-based anthropogenic feature extraction as opposed to more traditional terrain visualization methods (e.g., hillshades, multidirectional hillshades, and slopeshades). However, there would be value in exploring a wider range of LSPs when applying transfer learning methods. This was not possible in this study given the computational and time requirements needed to preprocess a large number of predictor variables and train and evaluate models using these different feature spaces. Despite these limitations, we argue that our experiments offer valuable insights into applying transfer learning to anthropogenic geomorphic feature extraction problems using LSP feature spaces.
Generally, our experiments suggested only moderate improvements in model performance when implementing some transfer learning methods. In other cases, transfer learning was not effective. We argue that one of the key limitations of the methods used here is that they all rely on supervised learning from labeled datasets. Due to the difficulty of generating large sets of labeled data, the practicality of such transfer learning methods is limited. Even though labeled data are lacking, large volumes of digital terrain data are available, such as those from 3DEP [34] in the United States or the data available in different regions globally that are curated by the OpenTopography project [124]. Thus, in the following phases of this larger project, we plan to explore methods that allow for learning from unlabeled data, such as unsupervised domain adaption methods [6] and those being applied/developed to generate foundation models [125,126,127]. We are also currently exploring transformer-based [128,129] semantic segmentation architectures with a focus on SegFormer [130] and using a SegFormer-based encoder alongside CNN-based decoders.

6. Conclusions

This study explored the use of transfer learning for the semantic segmentation task of extracting anthropogenic geomorphic features from LSPs derived from high spatial resolution, lidar-derived digital terrain data. We developed custom transfer parameters by attempting to predict geomorphon-based landforms using a large dataset of digital terrain data provided by 3DEP in the United States. We also explored the use of pre-trained ImageNet parameters. Generally, we found that transfer learning between the different geomorphic datasets offered minimal benefits. However, the ImageNet-based parameters generally improved performance in comparison to a random parameter initialization, even when the encoder was frozen or not trained. Thus, we suggest that pre-trained models developed using large, image-based datasets may be of value for geomorphic feature extraction from LSPs, even given the data and task disparities. The value of transfer learning between the different geomorphic mapping tasks may have been limited due to smaller sample sizes, which highlights the value of developing pre-trained models using unsupervised learning methods, especially given the large volume of digital terrain data available, despite the lack of associated labels.

Author Contributions

Conceptualization, A.E.M.; methodology, A.E.M.; validation, A.E.M., S.F., and M.A.; formal analysis, A.E.M., S.F., and M.A.; writing—original draft preparation, A.E.M.; writing—review and editing, A.E.M., S.F., and M.A.; data curation, A.E.M.; supervision, A.E.M.; project administration, A.E.M.; funding acquisition, A.E.M. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this research was provided by the National Science Foundation (NSF) (Federal Award ID No. 2046059: “CAREER: Mapping Anthropocene Geomorphology with Deep Learning, Big Data Spatial Analytics, and LiDAR”). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Funding was also provided by AmericaView, which is supported by the U.S. Geological Survey under Grant/Cooperative Agreement No. G18AP00077. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the U.S. Geological Survey. Mention of trade names or commercial products does not constitute their endorsement by the U.S. Geological Survey.

Data Availability Statement

The terraceDL (https://doi.org/10.6084/m9.figshare.22320373.v2) and vfillDL (https://doi.org/10.6084/m9.figshare.22318522.v2) datasets are available via FigShare (https://figshare.com/) (accessed on 13 December 2024).

Acknowledgments

The authors would like to thank four anonymous reviewers whose comments strengthened the work. We would also like to thank the Iowa Best Management Practices (BMP) Mapping Project for providing the agricultural terrace data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hoeser, T.; Bachofer, F.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review—Part II: Applications. Remote Sens. 2020, 12, 3053. [Google Scholar] [CrossRef]
  2. Hoeser, T.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sens. 2020, 12, 1667. [Google Scholar] [CrossRef]
  3. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  4. Alem, A.; Kumar, S. Transfer Learning Models for Land Cover and Land Use Classification in Remote Sensing Image. Appl. Artif. Intell. 2022, 36, 2014192. [Google Scholar] [CrossRef]
  5. Ayyadevara, V.K.; Reddy, Y. Modern Computer Vision with PyTorch: Explore Deep Learning Concepts and Implement over 50 Real-World Image Applications; Packt Publishing Ltd.: Birmingham, UK, 2020. [Google Scholar]
  6. Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer Learning in Environmental Remote Sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
  7. Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
  8. Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
  9. Jadon, S. A Survey of Loss Functions for Semantic Segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Viña del Mar, Chile, 27–29 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
  10. Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 1–54. [Google Scholar] [CrossRef]
  11. Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice Loss for Data-Imbalanced NLP Tasks. arXiv 2020, arXiv:1911.02855. [Google Scholar]
  12. Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
  13. Maxwell, A.E.; Warner, T.A.; Guillen, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 2: Recommendations and Best Practices. Remote Sens. 2021, 13, 2591. [Google Scholar] [CrossRef]
  14. Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Cardoso, M.J. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3; Springer International Publishing: Cham, Switzerland, 2017; Volume 10553, pp. 240–248. [Google Scholar]
  15. Maxwell, A.E.; Shobe, C.M. Land-Surface Parameters for Spatial Predictive Mapping and Modeling. Earth-Sci. Rev. 2022, 226, 103944. [Google Scholar] [CrossRef]
  16. Franklin, S.E. Interpretation and Use of Geomorphometry in Remote Sensing: A Guide and Review of Integrated Applications. Int. J. Remote Sens. 2020, 41, 7700–7733. [Google Scholar] [CrossRef]
  17. Wilhelm, T.; Geis, M.; Püttschneider, J.; Sievernich, T.; Weber, T.; Wohlfarth, K.; Wöhler, C. Domars16k: A Diverse Dataset for Weakly Supervised Geomorphologic Analysis on Mars. Remote Sens. 2020, 12, 3981. [Google Scholar] [CrossRef]
  18. Zhang, W.; Liljedahl, A.K.; Kanevskiy, M.; Epstein, H.E.; Jones, B.M.; Jorgenson, M.T.; Kent, K. Transferability of the Deep Learning Mask R-CNN Model for Automated Mapping of Ice-Wedge Polygons in High-Resolution Satellite and UAV Images. Remote Sens. 2020, 12, 1085. [Google Scholar] [CrossRef]
  19. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  20. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
  21. Maxwell, A. terraceDL: A Geomorphology Deep Learning Dataset of Agricultural Terraces in Iowa, USA, 2023. [CrossRef]
  22. Maxwell, A. vfillDL: A Geomorphology Deep Learning Dataset of Valley Fill Faces Resulting from Mountaintop Removal Coal Mining (Southern West Virginia, Eastern Kentucky, and Southwestern Virginia, USA), 2023. [CrossRef]
  23. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; EEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
  24. Jasiewicz, J.; Stepinski, T.F. Geomorphons—A Pattern Recognition Approach to Classification and Mapping of Landforms. Geomorphology 2013, 182, 147–156. [Google Scholar] [CrossRef]
  25. Stepinski, T.F.; Jasiewicz, J. Geomorphons—A New Approach to Classification of Landforms. Proc. Geomorphometry 2011, 2011, 109–112. [Google Scholar]
  26. Sofia, G.; Fontana, G.D.; Tarolli, P. High-Resolution Topography and Anthropogenic Feature Extraction: Testing Geomorphometric Parameters in Floodplains. Hydrol. Process. 2014, 28, 2046–2061. [Google Scholar] [CrossRef]
  27. Tarolli, P. High-Resolution Topography for Understanding Earth Surface Processes: Opportunities and Challenges. Geomorphology 2014, 216, 295–312. [Google Scholar] [CrossRef]
  28. Jaboyedoff, M.; Oppikofer, T.; Abellán, A.; Derron, M.-H.; Loye, A.; Metzger, R.; Pedrazzini, A. Use of LIDAR in Landslide Investigations: A Review. Nat. Hazards 2012, 61, 5–28. [Google Scholar] [CrossRef]
  29. van Westen, C.J.; Castellanos, E.; Kuriakose, S.L. Spatial Data for Landslide Susceptibility, Hazard, and Vulnerability Assessment: An Overview. Eng. Geol. 2008, 102, 112–131. [Google Scholar] [CrossRef]
  30. Chase, A.F.; Chase, D.Z.; Fisher, C.T.; Leisz, S.J.; Weishampel, J.F. Geospatial Revolution and Remote Sensing LiDAR in Mesoamerican Archaeology. Proc. Natl. Acad. Sci. USA 2012, 109, 12916–12921. [Google Scholar] [CrossRef] [PubMed]
  31. Fernandez-Diaz, J.C.; Carter, W.E.; Shrestha, R.L.; Leisz, S.J.; Fisher, C.T.; González, A.M.; Thompson, D.; Elkins, S. Archaeological Prospection of North Eastern Honduras with Airborne Mapping LiDAR. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 902–905. [Google Scholar]
  32. Hesse, R. LiDAR-derived Local Relief Models—A new tool for archaeological prospection. Archaeol. Prospect. 2010, 17, 67–72. [Google Scholar] [CrossRef]
  33. Schindling, J.; Gibbes, C. LiDAR as a Tool for Archaeological Research: A Case Study. Archaeol. Anthropol. Sci. 2014, 6, 411–423. [Google Scholar] [CrossRef]
  34. Arundel, S.T.; Phillips, L.A.; Lowe, A.J.; Bobinmyer, J.; Mantey, K.S.; Dunn, C.A.; Constance, E.W.; Usery, E.L. Preparing The National Map for the 3D Elevation Program—Products, Process and Research. Cartogr. Geogr. Inf. Sci. 2015, 42, 40–53. [Google Scholar] [CrossRef]
  35. Maxwell, A.E.; Pourmohammadi, P.; Poyner, J.D. Mapping the Topographic Features of Mining-Related Valley Fills Using Mask R-CNN Deep Learning and Digital Elevation Data. Remote Sens. 2020, 12, 547. [Google Scholar] [CrossRef]
  36. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  37. Maxwell, A.E.; Odom, W.E.; Shobe, C.M.; Doctor, D.H.; Bester, M.S.; Ore, T. Exploring the Influence of Input Feature Space on CNN-Based Geomorphic Feature Extraction from Digital Terrain Data. Earth Space Sci. 2023, 10, e2023EA002845. [Google Scholar] [CrossRef]
  38. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  39. Haralick, R.M. On a Texture-Context Feature Extraction Algorithm for Remotely Sensed Imagery. In Proceedings of the 1971 IEEE Conference on Decision and Control, Miami Beach, FL, USA, 15–17 December 1971; pp. 650–657. [Google Scholar]
  40. Blaschke, T. Object Based Image Analysis for Remote Sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef]
  41. Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Queiroz Feitosa, R.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic Object-Based Image Analysis—Towards a New Paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef]
  42. Chen, G.; Weng, Q.; Hay, G.J.; He, Y. Geographic Object-Based Image Analysis (GEOBIA): Emerging Trends and Future Opportunities. GISci. Remote Sens. 2018, 55, 159–182. [Google Scholar] [CrossRef]
  43. Ma, L.; Li, M.; Ma, X.; Cheng, L.; Du, P.; Liu, Y. A Review of Supervised Object-Based Land-Cover Image Classification. ISPRS J. Photogramm. Remote Sens. 2017, 130, 277–293. [Google Scholar] [CrossRef]
  44. Behrens, T.; Schmidt, K.; MacMillan, R.A.; Viscarra Rossel, R.A. Multi-Scale Digital Soil Mapping with Deep Learning. Sci. Rep. 2018, 8, 15244. [Google Scholar] [CrossRef] [PubMed]
  45. Drăguţ, L.; Blaschke, T. Automated Classification of Landform Elements Using Object-Based Image Analysis. Geomorphology 2006, 81, 330–344. [Google Scholar] [CrossRef]
  46. van der Meij, W.M.; Meijles, E.W.; Marcos, D.; Harkema, T.T.L.; Candel, J.H.J.; Maas, G.J. Comparing Geomorphological Maps Made Manually and by Deep Learning. Earth Surf. Process. Landf. 2022, 47, 1089–1107. [Google Scholar] [CrossRef]
  47. Dornik, A.; Drăguţ, L.; Urdea, P. Classification of Soil Types Using Geographic Object-Based Image Analysis and Random Forests. Pedosphere 2018, 28, 913–925. [Google Scholar] [CrossRef]
  48. Feizizadeh, B.; Kazemi Garajeh, M.; Blaschke, T.; Lakes, T. An Object Based Image Analysis Applied for Volcanic and Glacial Landforms Mapping in Sahand Mountain, Iran. CATENA 2021, 198, 105073. [Google Scholar] [CrossRef]
  49. Saha, K.; Wells, N.A.; Munro-Stasiuk, M. An Object-Oriented Approach to Automated Landform Mapping: A Case Study of Drumlins. Comput. Geosci. 2011, 37, 1324–1336. [Google Scholar] [CrossRef]
  50. Verhagen, P.; Drăguţ, L. Object-Based Landform Delineation and Classification from DEMs for Archaeological Predictive Mapping. J. Archaeol. Sci. 2012, 39, 698–703. [Google Scholar] [CrossRef]
  51. Huang, L.; Liu, L.; Jiang, L.; Zhang, T. Automatic Mapping of Thermokarst Landforms from Remote Sensing Images Using Deep Learning: A Case Study in the Northeastern Tibetan Plateau. Remote Sens. 2018, 10, 2067. [Google Scholar] [CrossRef]
  52. Zhang, W.; Witharana, C.; Liljedahl, A.K.; Kanevskiy, M. Deep Convolutional Neural Networks for Automated Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery. Remote Sens. 2018, 10, 1487. [Google Scholar] [CrossRef]
  53. Guyot, A.; Lennon, M.; Lorho, T.; Hubert-Moy, L. Combined Detection and Segmentation of Archeological Structures from LiDAR Data Using a Deep Learning Approach. J. Comput. Appl. Archaeol. 2021, 4, 1. [Google Scholar] [CrossRef]
  54. Guyot, A.; Hubert-Moy, L.; Lorho, T. Detecting Neolithic Burial Mounds from LiDAR-Derived Elevation Data Using a Multi-Scale Approach and Machine Learning Techniques. Remote Sens. 2018, 10, 225. [Google Scholar] [CrossRef]
  55. Shumack, S.; Hesse, P.; Farebrother, W. Deep Learning for Dune Pattern Mapping with the AW3D30 Global Surface Model. Earth Surf. Process. Landf. 2020, 45, 2417–2431. [Google Scholar] [CrossRef]
  56. Trier, Ø.D.; Cowley, D.C.; Waldeland, A.U. Using Deep Neural Networks on Airborne Laser Scanning Data: Results from a Case Study of Semi-Automatic Mapping of Archaeological Topography on Arran, Scotland. Archaeol. Prospect. 2019, 26, 165–175. [Google Scholar] [CrossRef]
  57. Xu, Y.; Zhu, H.; Hu, C.; Liu, H.; Cheng, Y. Deep Learning of DEM Image Texture for Landform Classification in the Shandong Area, China. Front. Earth Sci. 2021, 16, 352–367. [Google Scholar] [CrossRef]
  58. Li, S.; Xiong, L.; Tang, G.; Strobl, J. Deep Learning-Based Approach for Landform Classification from Integrated Data Sources of Digital Elevation Model and Imagery. Geomorphology 2020, 354, 107045. [Google Scholar] [CrossRef]
  59. Robson, B.A.; Bolch, T.; MacDonell, S.; Hölbling, D.; Rastner, P.; Schaffer, N. Automated Detection of Rock Glaciers Using Deep Learning and Object-Based Image Analysis. Remote Sens. Environ. 2020, 250, 112033. [Google Scholar] [CrossRef]
  60. Schönfeldt, E.; Winocur, D.; Pánek, T.; Korup, O. Deep Learning Reveals One of Earth’s Largest Landslide Terrain in Patagonia. Earth Planet. Sci. Lett. 2022, 593, 117642. [Google Scholar] [CrossRef]
  61. Xie, Z.; Haritashya, U.K.; Asari, V.K.; Young, B.W.; Bishop, M.P.; Kargel, J.S. GlacierNet: A Deep-Learning Approach for Debris-Covered Glacier Mapping. IEEE Access 2020, 8, 83495–83510. [Google Scholar] [CrossRef]
  62. Zhong, J.; Sun, J.; Lai, Z. ICESat-2 and Multi-Spectral Images Based Coral Reefs Geomorphic Zone Mapping Using a Deep Learning Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6085–6098. [Google Scholar] [CrossRef]
  63. Du, L.; You, X.; Li, K.; Meng, L.; Cheng, G.; Xiong, L.; Wang, G. Multi-Modal Deep Learning for Landform Recognition. ISPRS J. Photogramm. Remote Sens. 2019, 158, 63–75. [Google Scholar] [CrossRef]
  64. Janowski, L.; Tylmann, K.; Trzcinska, K.; Rudowski, S.; Tegowski, J. Exploration of Glacial Landforms by Object-Based Image Analysis and Spectral Parameters of Digital Elevation Model. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  65. Salas, E.; Argialas, D. Automatic Identification of Marine Geomorphologic Features Using Convolutional Neural Networks in Seafloor Digital Elevation Models: Segmentation of DEM for Marine Geomorphologic Feature Mapping with Deep Learning Algorithms. In Proceedings of the 12th Hellenic Conference on Artificial Intelligence, Corfu, Greece, 7–9 September 2022; pp. 1–8. [Google Scholar]
  66. Padarian, J.; Minasny, B.; McBratney, A.B. Using Deep Learning for Digital Soil Mapping. Soil 2019, 5, 79–89. [Google Scholar] [CrossRef]
  67. Habumugisha, J.M.; Chen, N.; Rahman, M.; Islam, M.M.; Ahmad, H.; Elbeltagi, A.; Sharma, G.; Liza, S.N.; Dewan, A. Landslide Susceptibility Mapping with Deep Learning Algorithms. Sustainability 2022, 14, 1734. [Google Scholar] [CrossRef]
  68. Huang, F.; Zhang, J.; Zhou, C.; Wang, Y.; Huang, J.; Zhu, L. A Deep Learning Algorithm Using a Fully Connected Sparse Autoencoder Neural Network for Landslide Susceptibility Prediction. Landslides 2020, 17, 217–229. [Google Scholar] [CrossRef]
  69. Odom, W.; Doctor, D. Rapid Estimation of Minimum Depth-to-Bedrock from Lidar Leveraging Deep-Learning-Derived Surficial Material Maps. Appl. Comput. Geosci. 2023, 18, 100116. [Google Scholar] [CrossRef]
  70. Suh, J.W.; Anderson, E.; Ouimet, W.; Johnson, K.M.; Witharana, C. Mapping Relict Charcoal Hearths in New England Using Deep Convolutional Neural Networks and LiDAR Data. Remote Sens. 2021, 13, 4630. [Google Scholar] [CrossRef]
  71. Banasiak, P.Z.; Berezowski, P.L.; Zapłata, R.; Mielcarek, M.; Duraj, K.; Stereńczak, K. Semantic Segmentation (U-Net) of Archaeological Features in Airborne Laser Scanning—Example of the Białowieża Forest. Remote Sens. 2022, 14, 995. [Google Scholar] [CrossRef]
  72. Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
  73. Chen, Z.; Zhang, T.; Ouyang, C. End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef]
  74. Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S.; et al. Dynamic World, Near Real-Time Global 10 m Land Use Land Cover Mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
  75. Hu, K.; Zhang, D.; Xia, M. CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sens. 2021, 13, 4533. [Google Scholar] [CrossRef]
  76. Rastogi, K.; Bodani, P.; Sharma, S.A. Automatic Building Footprint Extraction from Very High-Resolution Imagery Using Deep Learning Techniques. Geocarto Int. 2022, 37, 1501–1513. [Google Scholar] [CrossRef]
  77. Maxwell, A.E.; Bester, M.S.; Guillen, L.A.; Ramezan, C.A.; Carpinello, D.J.; Fan, Y.; Hartley, F.M.; Maynard, S.M.; Pyron, J.L. Semantic Segmentation Deep Learning for Extracting Surface Mine Extents from Historic Topographic Maps. Remote Sens. 2020, 12, 4145. [Google Scholar] [CrossRef]
  78. Cao, K.; Zhang, X. An Improved Res-Unet Model for Tree Species Classification Using Airborne High-Resolution Images. Remote Sens. 2020, 12, 1128. [Google Scholar] [CrossRef]
  79. Wei, S.; Zhang, H.; Wang, C.; Wang, Y.; Xu, L. Multi-Temporal SAR Data Large-Scale Crop Mapping Based on U-Net Model. Remote Sens. 2019, 11, 68. [Google Scholar] [CrossRef]
  80. Agarap, A.F. Deep Learning Using Rectified Linear Units (Relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
  81. Sharma, S.; Sharma, S.; Athaiya, A. Activation Functions in Neural Networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
  82. Bjorck, J.; Gomes, C.; Selman, B.; Weinberger, K.Q. Understanding Batch Normalization. arXiv 2018, arXiv:1806.02375. [Google Scholar]
  83. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  84. Luo, P.; Wang, X.; Shao, W.; Peng, Z. Towards Understanding Regularization in Batch Normalization. arXiv 2018, arXiv:1809.00846. [Google Scholar]
  85. Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, WA, USA, 2 July 2012; JMLR Workshop and Conference Proceedings. pp. 17–36. [Google Scholar]
  86. Krishna, S.T.; Kalluri, H.K. Deep Learning and Transfer Learning Approaches for Image Classification. Int. J. Recent Technol. Eng. (IJRTE) 2019, 7, 427–432. [Google Scholar]
  87. Li, W.; Wang, Z.; Wang, Y.; Wu, J.; Wang, J.; Jia, Y.; Gui, G. Classification of High-Spatial-Resolution Remote Sensing Scenes Method Using Transfer Learning and Deep Convolutional Neural Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1986–1995. [Google Scholar] [CrossRef]
  88. Zhao, B.; Huang, B.; Zhong, Y. Transfer Learning With Fully Pretrained Deep Convolution Networks for Land-Use Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1436–1440. [Google Scholar] [CrossRef]
  89. Davari Majd, R.; Momeni, M.; Moallem, P. Transferable Object-Based Framework Based on Deep Convolutional Neural Networks for Building Extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2627–2635. [Google Scholar] [CrossRef]
  90. Prakash, P.S.; Soni, J.; Bharath, H.A. Building Extraction from Remote Sensing Images Using Deep Learning and Transfer Learning. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3079–3082. [Google Scholar]
  91. 3D Elevation Program|U.S. Geological Survey. Available online: https://www.usgs.gov/3d-elevation-program (accessed on 21 May 2024).
  92. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
  93. Mcneely, R.; Logan, A.A.; Obrecht, J.; Giglierano, J.; Wolter, C. Iowa Best Management Practices (BMP) Mapping Project Handbook; Iowa State University (ISU): Ames, IA, USA, 2017; 37p. [Google Scholar]
  94. Miller, A.J.; Zégre, N.P. Mountaintop Removal Mining and Catchment Hydrology. Water 2014, 6, 472–499. [Google Scholar] [CrossRef]
  95. Palmer, M.A.; Bernhardt, E.S.; Schlesinger, W.H.; Eshleman, K.N.; Foufoula-Georgiou, E.; Hendryx, M.S.; Lemly, A.D.; Likens, G.E.; Loucks, O.L.; Power, M.E.; et al. Mountaintop Mining Consequences. Science 2010, 327, 148–149. [Google Scholar] [CrossRef]
  96. Ross, M.R.V.; McGlynn, B.L.; Bernhardt, E.S. Deep Impact: Effects of Mountaintop Mining on Surface Topography, Bedrock Structure, and Downstream Waters. Environ. Sci. Technol. 2016, 50, 2064–2074. [Google Scholar] [CrossRef]
  97. Wickham, J.; Wood, P.B.; Nicholson, M.C.; Jenkins, W.; Druckenbrod, D.; Suter, G.W.; Strager, M.P.; Mazzarella, C.; Galloway, W.; Amos, J. The Overlooked Terrestrial Impacts of Mountaintop Mining. BioScience 2013, 63, 335–348. [Google Scholar] [CrossRef]
  98. Hijmans, R.J. Terra: Spatial Data Analysis; 2024. Available online: https://cran.r-project.org/web/packages/terra/index.html (accessed on 18 May 2024).
  99. Ilich, A.R.; Misiuk, B.; Lecours, V.; Murawski, S.A. MultiscaleDTM 2021. Available online: https://cran.r-project.org/web/packages/MultiscaleDTM/index.html (accessed on 18 May 2024).
  100. Whitebox Geospatial. Available online: https://www.whiteboxgeo.com/ (accessed on 18 May 2024).
  101. Lindsay, J.B. Whitebox GAT: A Case Study in Geomorphometric Analysis. Comput. Geosci. 2016, 95, 75–84. [Google Scholar] [CrossRef]
  102. PyTorch. Available online: https://pytorch.org/ (accessed on 30 October 2023).
  103. Welcome to Python.Org. Available online: https://www.python.org/ (accessed on 17 October 2022).
  104. CUDA Deep Neural Network (cuDNN)|NVIDIA Developer. Available online: https://developer.nvidia.com/cudnn (accessed on 30 October 2023).
  105. CUDA Toolkit—Free Tools and Training. Available online: https://developer.nvidia.com/cuda-toolkit (accessed on 30 October 2023).
  106. Iakubovskii, P. Segmentation Models Pytorch. GitHub Repos. 2019. [Google Scholar]
  107. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  108. Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 464–472. [Google Scholar]
  109. Smith, L.N.; Topin, N. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Baltimore, MD, USA, 15–17 April 2019; International Society for Optics and Photonics: Bellingham, WA, USA, 2019; Volume 11006, p. 1100612. [Google Scholar]
  110. Phan, T.H.; Yamamoto, K. Resolving Class Imbalance in Object Detection with Weighted Cross Entropy Losses. arXiv 2020, arXiv:2006.01413. [Google Scholar]
  111. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  112. Abraham, N.; Khan, N.M. A Novel Focal Tversky Loss Function with Improved Attention U-Net for Lesion Segmentation. arXiv 2018, arXiv:1810.07842. [Google Scholar]
  113. Hashemi, S.R.; Salehi, S.S.M.; Erdogmus, D.; Prabhu, S.P.; Warfield, S.K.; Gholipour, A. Tversky as a Loss Function for Highly Unbalanced Image Segmentation Using 3d Fully Convolutional Deep Networks. arXiv 2018, arXiv:1803.11078. [Google Scholar]
  114. Ma, J.; Chen, J.; Ng, M.; Huang, R.; Li, Y.; Li, C.; Yang, X.; Martel, A.L. Loss Odyssey in Medical Image Segmentation. Med. Image Anal. 2021, 71, 102035. [Google Scholar] [CrossRef]
  115. Tharwat, A. Classification Assessment Methods. Appl. Comput. Inform. 2020, 17, 168–192. [Google Scholar] [CrossRef]
  116. Kornblith, S.; Norouzi, M.; Lee, H.; Hinton, G. Similarity of Neural Network Representations Revisited. In Proceedings of the International Conference on Machine Learning; PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 3519–3529. [Google Scholar]
  117. Nguyen, T.; Raghu, M.; Kornblith, S. Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth. arXiv 2020, arXiv:2010.15327. [Google Scholar]
  118. Kim, D. Numpee/CKA.Pytorch. 2024. Available online: https://github.com/numpee/CKA.pytorch (accessed on 21 May 2024).
  119. Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. Microsoft Coco: Common Objects in Context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  120. Kawaguchi, K.; Kaelbling, L.P.; Bengio, Y. Generalization in Deep Learning. arXiv 2017, arXiv:1710.05468. [Google Scholar]
  121. Penatti, O.A.; Nogueira, K.; Dos Santos, J.A. Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 44–51. [Google Scholar]
  122. Schwindt, S.; Meisinger, L.; Negreiros, B.; Schneider, T.; Nowak, W. Transfer Learning Achieves High Recall for Object Classification in Fluvial Environments with Limited Data. Geomorphology 2024, 455, 109185. [Google Scholar] [CrossRef]
  123. Maxwell, A.E.; Bester, M.S.; Ramezan, C.A. Enhancing Reproducibility and Replicability in Remote Sensing Deep Learning Research and Practice. Remote Sens. 2022, 14, 5760. [Google Scholar] [CrossRef]
  124. Krishnan, S.; Crosby, C.; Nandigam, V.; Phan, M.; Cowart, C.; Baru, C.; Arrowsmith, R. OpenTopography: A Services Oriented Architecture for Community Access to LIDAR Topography. In Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications, Washington, DC, USA, 23–25 May 2011; pp. 1–8. [Google Scholar]
  125. Mai, G.; Huang, W.; Sun, J.; Song, S.; Mishra, D.; Liu, N.; Gao, S.; Liu, T.; Cong, G.; Hu, Y.; et al. On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence. arXiv 2023, arXiv:2304.06798. [Google Scholar]
  126. Mai, G.; Cundy, C.; Choi, K.; Hu, Y.; Lao, N.; Ermon, S. Towards a Foundation Model for Geospatial Artificial Intelligence (Vision Paper). In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 1–4 November 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–4. [Google Scholar]
  127. Mendieta, M.; Han, B.; Shi, X.; Zhu, Y.; Chen, C. Towards Geospatial Foundation Models via Continual Pretraining. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 16806–16816. [Google Scholar]
  128. Vaswani, A. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  129. Dosovitskiy, A. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  130. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Figure 1. (a) Location of 10 × 10 km USGS 3DEP tiles used in this study for which geomorphons were calculated; (b) training, validation, and testing extents for agricultural terrace (terraceDL) dataset [21] in Iowa, USA; (c) training, validation, and testing extents for surface coal mining valley fill face (vfillDL) dataset [22] in West Virginia, Kentucky, and Virginia, USA.
Figure 1. (a) Location of 10 × 10 km USGS 3DEP tiles used in this study for which geomorphons were calculated; (b) training, validation, and testing extents for agricultural terrace (terraceDL) dataset [21] in Iowa, USA; (c) training, validation, and testing extents for surface coal mining valley fill face (vfillDL) dataset [22] in West Virginia, Kentucky, and Virginia, USA.
Remotesensing 16 04670 g001
Figure 2. Example agricultural terraces in Iowa, USA (a) and valley fill faces (c) in the Appalachian southern coalfields of the eastern United States. Red areas in (a,c) show the extents of terraces and valley fill faces, respectively, over a multidirectional hillshade. The LSPs used in this study are visualized in (b,d) (red = TPI calculated with a 50 m circular window; green = square root of slope; blue = TPI calculated with a 2 m inner and 5 m outer annulus window). Coordinates are relative to the NAD83 UTM Zone 15N projection for (a,b) and the NAD83 UTM Zone 17N projection for (c,d).
Figure 2. Example agricultural terraces in Iowa, USA (a) and valley fill faces (c) in the Appalachian southern coalfields of the eastern United States. Red areas in (a,c) show the extents of terraces and valley fill faces, respectively, over a multidirectional hillshade. The LSPs used in this study are visualized in (b,d) (red = TPI calculated with a 50 m circular window; green = square root of slope; blue = TPI calculated with a 2 m inner and 5 m outer annulus window). Coordinates are relative to the NAD83 UTM Zone 15N projection for (a,b) and the NAD83 UTM Zone 17N projection for (c,d).
Remotesensing 16 04670 g002
Figure 3. Example land surface parameter (LSP) composite image chips (a) and associated geomorphon classifications (b). Chips were selected from random locations within the extent of the downloaded 3DEP DTM data. Each chip consists of 512 × 512 cells with a spatial resolution of 2 m.
Figure 3. Example land surface parameter (LSP) composite image chips (a) and associated geomorphon classifications (b). Chips were selected from random locations within the extent of the downloaded 3DEP DTM data. Each chip consists of 512 × 512 cells with a spatial resolution of 2 m.
Remotesensing 16 04670 g003
Figure 4. Conceptualization of UNet architecture [19] with the ResNet-34 [20] encoder backbone used in this study. E = encoder; D = decoder, CH = classification head, LSPs = land surface parameters; Conv = convolutional layer, BN = batch normalization, and ReLU = rectified linear unit.
Figure 4. Conceptualization of UNet architecture [19] with the ResNet-34 [20] encoder backbone used in this study. E = encoder; D = decoder, CH = classification head, LSPs = land surface parameters; Conv = convolutional layer, BN = batch normalization, and ReLU = rectified linear unit.
Remotesensing 16 04670 g004
Figure 5. Example classification results using a random parameter initiation and 1000 training chips. (a) Multidirectional hillshade for example agricultural terrace classification; (b) reference agricultural terrace data; (c) agricultural terrace classification result; (d) multidirectional hillshade for example valley fill face classification; (e) reference valley fill face data; (f) valley fill face classification result; (g) multidirectional hillshade for example geomorphon classification; (h) reference geomorphon data; (i) geomorphon classification result.
Figure 5. Example classification results using a random parameter initiation and 1000 training chips. (a) Multidirectional hillshade for example agricultural terrace classification; (b) reference agricultural terrace data; (c) agricultural terrace classification result; (d) multidirectional hillshade for example valley fill face classification; (e) reference valley fill face data; (f) valley fill face classification result; (g) multidirectional hillshade for example geomorphon classification; (h) reference geomorphon data; (i) geomorphon classification result.
Remotesensing 16 04670 g005
Figure 6. Training loss for terraceDL (a) and vfillDL (b) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.
Figure 6. Training loss for terraceDL (a) and vfillDL (b) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.
Remotesensing 16 04670 g006
Figure 7. Validation F1-score for terraceDL (a) and vfillDL (b) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.
Figure 7. Validation F1-score for terraceDL (a) and vfillDL (b) datasets using 1000 training samples, different weight initiations, and with the encoder frozen or unfrozen across all 50 training epochs. Magnified area shows results for epochs 40 through 50.
Remotesensing 16 04670 g007
Figure 8. Training loss for terraceDL (a) and vfillDL (b) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.
Figure 8. Training loss for terraceDL (a) and vfillDL (b) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.
Remotesensing 16 04670 g008
Figure 9. Validation F1-score for terraceDL (a) and vfillDL (b) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.
Figure 9. Validation F1-score for terraceDL (a) and vfillDL (b) datasets using varying training sample sizes, different weight initiations, and with the encoder frozen or unfrozen. Magnified area shows results for epochs 40 through 50.
Remotesensing 16 04670 g009
Figure 10. Assessment metrics calculated from the withheld test data for terraceDL (top) and vfillDL (bottom) datasets using different weight initiations and with the encoder frozen and unfrozen. Results reflect the experiment using 1000 training chips and the model parameters associated with the training epoch that provided the highest F1-score for the validation data.
Figure 10. Assessment metrics calculated from the withheld test data for terraceDL (top) and vfillDL (bottom) datasets using different weight initiations and with the encoder frozen and unfrozen. Results reflect the experiment using 1000 training chips and the model parameters associated with the training epoch that provided the highest F1-score for the validation data.
Remotesensing 16 04670 g010
Figure 11. Assessment metrics for withheld test data for terraceDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.
Figure 11. Assessment metrics for withheld test data for terraceDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.
Remotesensing 16 04670 g011
Figure 12. Assessment metrics for withheld test data for vfillDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.
Figure 12. Assessment metrics for withheld test data for vfillDL dataset using different training sample sizes, weight initiations, and with the encoder frozen and unfrozen.
Remotesensing 16 04670 g012
Figure 13. CKA analysis results for each convolutional layer in the architecture. Each graph represents a comparison of a pair of models. Each compared model was trained from a random initialization and using the largest training set available for the specific task. Since the ImageNet weights are not available for the decoder, the decoder blocks were not compared when ImageNet was included in the pair.
Figure 13. CKA analysis results for each convolutional layer in the architecture. Each graph represents a comparison of a pair of models. Each compared model was trained from a random initialization and using the largest training set available for the specific task. Since the ImageNet weights are not available for the decoder, the decoder blocks were not compared when ImageNet was included in the pair.
Remotesensing 16 04670 g013
Table 1. Count of features in the training, validation, and testing sets within the terraceDL and vfillDL datasets.
Table 1. Count of features in the training, validation, and testing sets within the terraceDL and vfillDL datasets.
DatasetTrainingValidationTesting
Agricultural terraces66,00015,50526,453
Valley fill faces1105304874
Table 2. Summary of number of training, validation, and testing chips available for each dataset used.
Table 2. Summary of number of training, validation, and testing chips available for each dataset used.
DatasetTrainingValidationTesting
Geomorphons28,80062287200
Agricultural terraces50, 100, 250, 500, 750, 100014884183
Valley fill faces50, 100, 250, 500, 750, 1000226620
Table 3. Layers in UNet architecture used in this study with associated abbreviations used in Figure 4, input layers, array size of input to and output of the layer, and number of trainable parameters. MB = mini-batch or sample dimension, E = encoder, D = decoder, CH = classification head, LSPs = land surface parameters; Conv = convolutional layer, BN = batch normalization, and ReLU = rectified linear unit. Array shapes are defined as [mini-batch size, number of feature maps, width, height].
Table 3. Layers in UNet architecture used in this study with associated abbreviations used in Figure 4, input layers, array size of input to and output of the layer, and number of trainable parameters. MB = mini-batch or sample dimension, E = encoder, D = decoder, CH = classification head, LSPs = land surface parameters; Conv = convolutional layer, BN = batch normalization, and ReLU = rectified linear unit. Array shapes are defined as [mini-batch size, number of feature maps, width, height].
NameInput(s)LayerInput ShapeOutput ShapeParameters
Input-LSPs[MB, 3, 512, 512]-
E1Input7 × 7 2D Conv (stride = 2) + BN + ReLU[MB, 3, 512, 512][MB, 64, 256, 256]9536
E2E12 × 2 2D max pool (stride = 2)[MB, 64, 256, 256][MB, 64, 128, 128]-
ResNet block[MB, 64, 128, 128][MB, 64, 128, 128]73,984
ResNet block[MB, 64, 128, 128][MB, 64, 128, 128]73,984
ResNet block[MB, 64, 128, 128][MB, 64, 128, 128]73,984
E3E2ResNet block + dowsample[MB, 64, 128, 128][MB, 64, 64, 64]230,144
ResNet block[MB, 64, 128, 128][MB, 128, 64, 64]295,424
ResNet block[MB, 64, 128, 128][MB, 128, 64, 64]295,424
ResNet block[MB, 64, 128, 128][MB, 128, 64, 64]295,424
E4E3ResNet block + downsample[MB, 128, 64, 64][MB, 256, 32, 32]919,040
ResNet block[MB, 256, 32, 32][MB, 256, 32, 32]1,180,672
ResNet block[MB, 256, 32, 32][MB, 256, 32, 32]1,180,672
ResNet block[MB, 256, 32, 32][MB, 256, 32, 32]1,180,672
ResNet block[MB, 256, 32, 32][MB, 256, 32, 32]1,180,672
ResNet block[MB, 256, 32, 32][MB, 256, 32, 32]1,180,672
E5E4ResNet block + downsample[MB, 256, 32, 32][MB, 512, 16, 16]3,673,088
ResNet block[MB, 512, 16, 16][MB, 512, 16, 16]4,720,640
ResNet block[MB, 512, 16, 16][MB, 512, 16, 16]4,720,640
D1E5 + E4Decoder block 1[MB, 256 + 512, 32, 32][MB, 256, 32, 32]2,360,320
D2D1 + E3Decoder block 2[MB,128 + 256, 64, 64][MB, 128, 64, 64]590,336
D3D2 + E2Decoder block 3[MB, 64 + 128, 128, 128][MB, 64, 128, 128]147,712
D4D3 + E1Decoder block 4[MB, 64 + 64, 128, 128][MB, 32, 256, 256]46,208
D5D4Decoder block 5[MB, 32, 512, 512][MB, 16, 512, 512]6976
CHD5Classification head[MB, 16, 512, 512][MB, 1, 512, 512]145
Total 24,436,369
Table 4. Summary of experiments performed. “X” indicates combinations that were used and were conducted using all available training sample sizes.
Table 4. Summary of experiments performed. “X” indicates combinations that were used and were conducted using all available training sample sizes.
DatasetInitiationFrozenUnfrozen
terraceDLRandom X
GeomorphonsXX
vfillDLXX
ImageNetXX
vfillDLRandom X
GeomorphonsXX
terraceDLXX
ImageNetXX
Table 5. Assessment metrics used in this study.
Table 5. Assessment metrics used in this study.
MetricEquation
Overall Accuracy (OA) T o t a l   C o r r e c t T o t a l   S a m p l e s
Recall 1 N j = 1 C T P j T P j + F N j
Precision 1 N j = 1 C T P j T P j + F P j
F1-Score 2 × R e c a l l × P r e c i s i o n R e c a l l + P r e c i s i o n
Table 6. Testing set assessment metrics for prediction of agricultural terraces using different parameter initiations, a frozen or unfrozen encoder, and varying sample sizes.
Table 6. Testing set assessment metrics for prediction of agricultural terraces using different parameter initiations, a frozen or unfrozen encoder, and varying sample sizes.
InitiationFrozen/UnfrozenTraining Sample Size
(Number of Chips)
OAF1-ScoreRecallPrecision
Random Unfrozen500.9820.3670.4560.308
1000.9890.4570.3950.541
2500.9890.4970.4470.560
5000.9900.5250.4890.566
7500.9900.5310.4900.579
10000.9900.5480.5360.560
ImageNet Frozen500.9850.4320.4730.398
1000.9880.4860.4700.505
2500.9900.5070.4550.571
5000.9900.5270.4890.571
7500.9900.5320.4890.582
10000.9900.5480.5220.577
Unfrozen500.9860.4540.4900.423
1000.9870.4850.5230.451
2500.9900.4840.4040.602
5000.9900.5280.4750.595
7500.9900.5340.4860.591
10000.9900.5440.5040.592
Geomorphons Frozen500.9850.3080.2890.331
1000.9840.3710.3920.352
2500.9870.4120.3990.425
5000.9880.4510.4270.478
7500.9880.4600.4350.489
10000.9880.4690.4370.508
Unfrozen500.9810.2900.3360.255
1000.9870.3570.3150.413
2500.9870.3920.3520.442
5000.9880.4390.3920.499
7500.9890.4540.3930.538
10000.9890.4560.3970.534
vfillDL Frozen500.9800.3480.4450.285
1000.9870.4250.4150.435
2500.9890.4560.4090.515
5000.9880.4760.4770.475
7500.9890.4910.4620.522
10000.9890.5060.4890.524
Unfrozen500.9860.3720.3660.379
1000.9870.4270.4000.458
2500.9880.4630.4270.505
5000.9890.4850.4330.551
7500.9890.4950.4410.562
10000.9900.5010.4470.570
Table 7. Testing set assessment metrics for prediction of surface coal mine valley fill faces using different parameter initiations, a frozen or unfrozen encoder, and varying sample sizes.
Table 7. Testing set assessment metrics for prediction of surface coal mine valley fill faces using different parameter initiations, a frozen or unfrozen encoder, and varying sample sizes.
InitiationFrozen/UnfrozenTraining Sample Size
(Number of Chips)
OAF1-ScoreRecallPrecision
Random Unfrozen500.9660.5740.5190.642
1000.9720.6310.5400.758
2500.9740.6830.6310.745
5000.9750.6860.6190.770
7500.9760.6960.6200.792
10000.9760.7090.6520.776
ImageNet Frozen500.9660.5400.4520.670
1000.9700.6010.5050.744
2500.9720.6250.5140.797
5000.9730.6370.5240.812
7500.9750.6760.5910.788
10000.9750.6720.5750.809
Unfrozen500.9690.5480.4170.798
1000.9720.6240.5130.795
2500.9750.6620.5550.821
5000.9760.6880.5860.835
7500.9770.6980.5960.844
10000.9770.7050.6200.819
Geomorphons Frozen500.9160.3740.5630.280
1000.9540.4690.4530.487
2500.9630.5040.4160.639
5000.9670.5310.4120.746
7500.9680.5770.4830.718
10000.9700.5920.4820.768
Unfrozen500.9560.3720.2950.505
1000.9610.4180.3110.638
2500.9650.4800.3670.695
5000.9650.5720.5260.628
7500.9690.5600.4440.758
10000.9700.6100.5230.734
terraceDL Frozen500.9690.5590.4440.753
1000.9700.5980.4950.756
2500.9710.6040.4950.774
5000.9730.6540.5730.763
7500.9730.6380.5420.776
10000.9730.6470.5470.791
Unfrozen500.9700.5990.5050.736
1000.9710.6150.5110.771
2500.9730.6440.5460.785
5000.9740.6570.5580.798
7500.9740.6730.6030.762
10000.9740.6750.5970.776
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Maxwell, A.E.; Farhadpour, S.; Ali, M. Exploring Transfer Learning for Anthropogenic Geomorphic Feature Extraction from Land Surface Parameters Using UNet. Remote Sens. 2024, 16, 4670. https://doi.org/10.3390/rs16244670

AMA Style

Maxwell AE, Farhadpour S, Ali M. Exploring Transfer Learning for Anthropogenic Geomorphic Feature Extraction from Land Surface Parameters Using UNet. Remote Sensing. 2024; 16(24):4670. https://doi.org/10.3390/rs16244670

Chicago/Turabian Style

Maxwell, Aaron E., Sarah Farhadpour, and Muhammad Ali. 2024. "Exploring Transfer Learning for Anthropogenic Geomorphic Feature Extraction from Land Surface Parameters Using UNet" Remote Sensing 16, no. 24: 4670. https://doi.org/10.3390/rs16244670

APA Style

Maxwell, A. E., Farhadpour, S., & Ali, M. (2024). Exploring Transfer Learning for Anthropogenic Geomorphic Feature Extraction from Land Surface Parameters Using UNet. Remote Sensing, 16(24), 4670. https://doi.org/10.3390/rs16244670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop