Abstract
Hyperspectral image processing techniques involve time-consuming calculations due to the large volume and complexity of the data. Indeed, hyperspectral scenes contain a wealth of spatial and spectral information thanks to the hundreds of narrow and continuous bands collected across the electromagnetic spectrum. Predictive models, particularly supervised machine learning classifiers, take advantage of this information to predict the pixel categories of images through a training set of real observations. Most notably, the Support Vector Machine (SVM) has demonstrate impressive accuracy results for image classification. Notwithstanding the performance offered by SVMs, dealing with such a large volume of data is computationally challenging. In this paper, a scalable and high-performance cloud-based approach for distributed training of SVM is proposed. The proposal address the overwhelming amount of remote sensing (RS) data information through a parallel training allocation. The implementation is performed over a memory-efficient Apache Spark distributed environment. Experiments are performed on a benchmark of real hyperspectral scenes to show the robustness of the proposal. Obtained results demonstrate efficient classification whilst optimising data processing in terms of training times.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Modern contributions in hyperspectral image processing (HSIs), including recent Remote Sensing (RS) image acquisition expeditions, have promoted the exploitation of the information contained in such data [1]. HSI data offers promising insights to characterize the Earth surface through spectrometer sampling. Hence, composition of ground materials are determined over hundreds of spectral bands by the representation of distinct features such as spectrum, shape or textures [2]. Thus, HSI scenes exhibit, pixel by pixel, the properties of the materials over the data cube \(\textbf{X} \in \mathbb {R}^{H\times W\times N}\), where \(H,\,W\), and N are the height, width and channel dimension, respectively. Also, the data can be represented in matrix form as \(\textbf{X} \in \mathbb {R}^{M\times N}\), where M are the pixel vectors. Accordingly, each pixel has a spectral signature represented as a N-dimensional array, which depends on the number of bands. Generally, a longest number of bands represents more accurately the pixel materials. In this context, remotely sensed images provide rich spectral information that can be applied to diverse applications such as agriculture [3] or urban planning [4], using different classification approaches [5], e.g. land cover classification [6].
The overwhelming details within the spectral dimension entails a heavy computational demand for its processing. In this regard, Deep Neural Networks (DNNs)-based classifiers take advantage of this information to determine pixel classes. Nevertheless, a large spectral dimension complicates the class prediction, increasing classifiers’ complexity. Classifiers must address a remarkable amount of features to select and extract the best representation of each class. This feature selection step is determined through a training procedure by matching network predictions with real labeled data. Hence, a noteworthy memory space is required to handle storage requirements. In order to address these needs, many efforts have been invested in the development of efficient training techniques. For instance, Wu et al. [7] proposed parallel techniques to optimize the classification on multi-core devices. These techniques have been brought to distributed High Performance Computing (HPC) environments to efficiently tackle with the high computational demands [8] for RS image analysis. Also, dedicated grid computing [9] platforms have explored similar satellite image processing applications. Alternatively, Cloud Computing (CC) environments provide a powerful centralized infrastructure with high performance capabilities. In this context, CC brings a service-oriented environment over a cluster with multiple machines and massive storage. Machines are launched through virtualization protocols, offering ease of use and effective distribute solutions. An additional advantage of CC is the potential of increasing the computing resources offered by providers, such as Amazon Web Services (AWS) [10] or Microsoft Azure [11]. This facilitates the adjustment of the resources needed to meet the computing requirements.
Workload parallelization approaches aim to reduce the training time of classification algorithms. In this regard, determining the classifier that is most suited for research based on its behavior and performance is highly advantageous For instance, the Multinomial Logistic Regression (MLR) classifier predicts the variable probability of category membership according to independent observations. Indeed, MLR obtains a noteworthy performance in terms of accuracy with a low amount of training samples. However, processing large HSI data cubes degrades its performance. Instead, Support Vector Machine (SVM) has been proven to outperform classifiers in the literature in several applications. It relies on pattern recognition techniques to exploit optimal decision boundaries [12], showing robust behaviour and impressive accuracy results when handling large HSI data [13]. Nevertheless, SVM clearly poses high computational requirements, and therefore faces significant runtimes for large-scale data processing. Paper [14] proposed multiple distributed solutions to speed up computations. Likewise, diverse techniques have been proposed in the literature to reduce both the data complexity and processing time. In this sense, paper [15] studies decomposition algorithms to split the problem into iterative sub-problems. Similarly, paper [16] analyze the well-known Quadratic Programming (QP) problem by splitting the algorithm into computationally light-weight QP sub-problems. Furthermore, workload balancing techniques effectively manage large computational demands by distributing the calculations across resources. In this context, work [17] used a static workload balance to perform data-parallel training of DNNs by considering resources speed, whilst model-parallelism approaches in [18] proposed a heterogeneous partitioning over the network layers filters. As a consequence of the heterogeneous partitioning, gradients calculations are studied and resolved in [19, 20].
Given the parallelizable nature of workload distribution approaches, the CC infrastructure is suitable for this purpose. Cloud environments are composed of fast computational devices such as Graphic Processing Units (GPUs) and last generation Central Processing Units (CPUs) to handle high workloads. The open-source distributed framework Apache Hadoop [21] provides reliability and scalability for high computational cost calculations, such as aggregations or queries over SQL databases. Additionally, Apache Spark [22] is a multi-pass engine designed to process large-scale data in a fault-tolerant environment. Spark implements Resilient Distributed Datasets (RDDs) for parallel operations whilst providing data collection sharing between resources. An essential feature of Spark is its design for data science and data abstraction. In this regard, Spark provide fast MapReduce [7] operations since it performs the processing in the main memory of the worker nodes, and hence preventing unnecessary disk operations. Meanwhile, Hadoop MapReduce tracks the disk after a mapping or reduction operation. In addition, the Spark engine uses a fast Directed Acyclic Graph (DAG) that defines the scheme of operations to be performed. Both spark and Hadoop are deployed on the top of an OpenStack environment, i.e., a modular cloud system that manages collections of computing, storage and network resources. These resources are managed through user-friendly APIs to provide integration and security. The OpenStack infrastructure provides IaaS functionality through three main shared services: (i) networking service to avoid network bottleneck and provide self-service network configurations with more control and flexibility for the users; (ii) object and block storage serviceFootnote 1 for cost-effective and scalable distributed storage, and (iii) computing service to provide on-demand computing resources by deploying virtual machines. The compute architecture is accessible via web interfaces and designed to support scaling. In summary, the capabilities of OpenStack leverage the management of the physical hardware and creates a solid foundation for the deployment of virtual machines. The proposed workflow design is shown in Fig. 1.
Furthermore, the progression of the CC paradigm is intricately intertwined with the advancements in machine learning techniques and the increasing complexity of data. This connection highlights the seamless synergy between the evolving capacities of CC and the intricate processing demands posed by hyperspectral imaging data. Consequently, the parallelization of computation within cloud environments emerges as an ideal platform for leveraging the enhanced capabilities of advanced classification techniques.
Recent Advances in Remote Sensing Classification
Convolutional neural networks (CNN) have emerged as a cornerstone for visual data processing. Such advances have led to the creation of novel techniques and architectures that enhance feature extraction capabilities [23, 24].
The work [25] delves into the intricacies of 3D-CNNs, introducing the innovative concept of a Hybrid Spectral CNN (HybridSN). This model addresses the inherent complexity of 3D-CNNs by streamlining the operation through spatial-spectral feature representation. It offers an unique solution for handling volumetric HSI data without compromising the discrimination of spectral features. Additionally, recent breakthroughs in HSI classification [26] have leveraged the power of vision transformer (ViT) models. The work [27] introduces a morphological ViT approach that combines spectral and spatial information through feature fusion. ViT models generate the challenge of heightened network complexity while showcasing remarkable classification capabilities. This prompts the pursuit of achieving a balance between optimizing performance and maintaining model simplicity. In this context, the research [28] aims to diminish the complexity of traditional HSI methodologies without increasing the number of network parameters. The method excels in detecting both geometric and spectral changes across the data. These integration of spectral, spatial, and morphological information within CNN and ViT frameworks has opened exciting avenues for efficient and accurate methodologies.
Additionally, synthetic aperture radar (SAR) data presents noteworthy contributions in recent years. The work [29] proposes a scattering vector model with a roll-invariant method to effectively capture both coherent and partially coherent target scattering, leveraging unique target characteristics to enhance wetland classification. The study [30] introduces a three-component decomposition approach using SAR data acquired in Oberpfaffenhofen, Germany. This approach effectively distinguishes between adjacent urban and forested areas by examining differences in the scattering mechanisms. Additionally, a novel research [31] explores scattering decomposition for polarimetric SAR data in agricultural landscapes within a region in Canada. The core idea revolves around the utilization of temporal changes in scattering methodologies to facilitate urban change detection, adding a dynamic element to SAR-based classification techniques. Alternatively, this data can be input into multimodal methods to integrate various data types [32].
Motivation and Challenges
Leveraging hyperspectral imaging within a cloud-based framework provides a valuable asset to disaster response initiatives. As previously introduced, the increasing volume of HSI data, collected through innumerable missions using satellite or airborne platforms, presents a significant convergence point for rapid and efficient processing within cloud-based environments. As a consequence, the computational capabilities of CC play a pivotal role in this context, enabling fast decision-making during critical real-world situations. This facilitates the timely extraction of valuable insights, expediting the assessment of disaster impact and the initiation of responsive measures. As an example, the Chamoli disaster in the Indian Himalaya [33] vividly illustrates the far-reaching implications of a major natural catastrophe, particularly for the rapid expansion of hydropower infrastructure into increasingly precarious areas. In this scenario, the integration of hyperspectral data and machine learning models is of upmost importance. Also, in the post-disaster phase, the integration of HSI and CC serves for multiple purposes, such as damage assessment, resource allocation, and recovery efforts. Another example is the submerged kelp detection from the subtidal zones of Helgoland, in Germany under a time-effective approach [34]. From these outcomes, the conclusion is straightforward: the synergy between such concepts is instrumental in risk assessment and early warning systems, as it empowers the detection of surface anomalies or change detection [35].
However, the surge in data volume is accompanied by the rich spectral information contained within the datasets and continuous improvements in data acquisition devices. Consequently, the implementation of distributed algorithms in scalable platforms becomes imperative to address such increasing complexity. In addition to these findings, evident benefits arise from the utilization of CC. There are three key advantages to highlight. Firstly, the inherent scalability of cloud environments. Secondly, the accessibility allows for rapid system deployment. Lastly, the cost-effectiveness of these platforms, which operate on a pay-as-you-go model. This approach eliminates the need for organizations to manage and upgrade their own data centers, thereby reducing economic expenses.
This work is motivated by the aforementioned challenges and the absence of scalable cloud solutions capable of handling large datasets on a time-constrained basis. Also, the rich information included in HSI data underscores the promising potential of employing SVMs for data processing. SVM properties, such as dimensionality reduction while preserving critical information, robustness to noise, and the ability to identify non-linear relationships resulting from the intricate interplay between spectral bands, establish it as a highly suitable option. Additionally, SVMs track record of high performance in environmental monitoring and agricultural applications emphasizes its relevance for disaster monitoring [36].
Contributions of this Work
This study explores the potential of harnessing CC architectures to establish a distributed framework for the processing of extensive hyperspectral imagery. For this purpose, a novel and adaptable SVM implementation using Apache Spark is proposed. The proposal searches for accelerate data processing while maintaining performance levels comparable to standard implementations. Also, the proposed framework efficiently handles an expanding number of cloud workers, ensuring scalability.
To substantiate the effectiveness of the proposal, an in-depth analysis is conducted using well-known HSI datasets from existing literature. This analysis encompasses multiple node distributions to showcase the versatility of the methodology in real-world scenarios. The experimentation with these datasets, characterized by diverse properties and sizes, yields invaluable insights. The adaptability and scalability of the approach across different datasets emphasize its potential to address the distinct requirements and challenges posed by a range of disaster scenarios.
The remainder of the paper follows this structure. In “Related work” Section discusses related work on parallelization methodologies. In “Background of the SVM approach” Section delves into the functionality of the SVM algorithm. In “Experimentation analysis” Section presents the research findings in terms of scalability and classification performance. Lastly, “Conclusions” Section formally outlines the conclusions drawn from this study.
Related Work
In the literature, various methods have been investigated to handle and parallelize the vast amount of data. For instance, distributed image processing has been explored to identify the challenges inherent in Hadoop and assess the potential of this approach in future remote-sensing cloud computing systems [37]. Other studies have concentrated on spectral clustering for mining large datasets [38] and parallelization techniques for complex neural networks [39]. As such, the current research emphasizes the processing of hyperspectral imagery in cloud environments using machine learning techniques. This high computational demands for processing high dimensional HSI data has been addressed in recent works using parallel cloud designs. Next, an overview of noteworthy distributed algorithms implementations from the literature is provided. Three main algorithms are studied: (i) DNNs based on Auto Encoders (AE), (ii) Multinomial Lineral Regression (MLR), and (iii) Principal Component Analysis (PCA).
Distributed Auto-Encoder In [40], DNNs distribution is proposed by exploiting the Apache Spark computation engine. The proposal implements a stacked Auto-Encoder (AE) to conduct HSI dimensionality reduction. Specifically, the work performs an optimization for a fully connected architecture based on the Multilayer Perceptron (MLP) network. The computation process for the i-th workerFootnote 2 is defined layer by layer as \(\left[ \textbf{x}^{(l)}_k = \delta \left( \textbf{x}^{(l-1)}_k \textbf{W}^{(l)} + b^{(l)} \right) \right] ^{(i)},\) where \(\textbf{x}^{(l)}_k\) is the output data representation of the k-th vector sample \(\textbf{x}_k\in \mathbb {R}^N\), in the space defined by the l-th layer, i.e., the transformation function applied to the input data \(\textbf{x}^{(l-1)}_k\) through the current set of weights \(\textbf{W}^{(l)}\) that interconnects neurons from previous \(l-1\) and current l layer, \(\forall l \in [1, L]\). Finally, the bias \(b^{(l)}\) is added and a non-linear activation function \(\delta\) is applied. In this context, weights \({\textbf {W}}^{(l)}\) determine pixels output responses based on the input samples. Then, the training error is obtained in each worker using the MSE loss function by comparing the prediction \(\textbf{x}^{(L)}_j\) with the real pixel observation \(\textbf{x}_j\). Hence, the global error \(\varepsilon\) is obtained for a specific training iteration t as the average of each disjoint error as: \(\varepsilon _t = \frac{1}{I} \sum _{i=1}^{I} \left[ \frac{1}{M} \sum _{k=1}^{M} \mid \mid \textbf{x}_k^{(L)} - \textbf{x}_k\mid \mid ^2 \right] ^{(i)}\). Then, local gradients \({\textbf {g}}^{(i)}_t\) are calculated in each worker and reduced to obtain global gradients \({\textbf {G}}_t = \frac{1}{I} \sum _{i=1}^I {\textbf {g}}_t^{(i)}\). As a result, the computation performance of the DNN is boosted for the processing of larger data sets.
Distributed Multinomial Logistic Regression A distributed approach of the Multinomial Logistic Regression (MLR) is implemented in [41] using Spark. The method calculates the fitting probability of a sample data \(\textbf{x}_k\) for a specific class d using a linear prediction function \(\left[ f(\textbf{x}_{k},y_k=d)\right] ^{(i)}\). Therefore, a probability score is calculated using a vector of logistic regressors that corresponds to each class \(d \in [1,D]\), denoted as \(\mathbf {\omega }(d) = [\omega _0, \omega _1, \dots , \omega _M]\), where M is the number of linear or non-linear functions defining the features of the input sample. Hence, \(\mathbf {\Omega } = [\omega (1), \omega (2), \dots , \omega (D)]\) compose the regressors for all classes. The objective is to estimate regressors from the input training set \(\mathcal {D}=\{\textbf{X}, \textbf{Y}\}\), where \({\textbf {X}}\) and \({\textbf {Y}}\) are the training samples and their respective class labels. Therefore, the MLR probability score in a distributed environment is considered as a set of independent binary regressions for each worker. These individual calculations are performed by using a pivot label and regressing the rest against the pivot. Hence, the probability of obtaining the pivot class in the i-th worker is calculated as: \(\left[ p(\textbf{x}_k,d) = \left( 1+\sum _{d=1}^{D}\exp \left( \mathbf {\omega }(d) \cdot \textbf{x}_k \right) \right) ^{-1} \right] ^{(i)}\). Lastly, errors and gradients are determined individually for each worker. In this regard, the master node manages to calculate the overall values of losses and gradients to obtain the optimal \(\mathbf {\Omega }^{*}\).
Distributed Principal Component Analysis A distributed implementation based on CC platforms has been implemented in [7]. The algorithm provides a data transformation into linearly uncorrelated variables, i.e., principal components. The components are either equal or less to the original image features. In this regard, these components values are defined by their importance in the processed data. In the context of HSI analysis, this feature representation removes spectral variances. However, the computation requirements are substantially increased due to the expensive calculations. The distributed proposal divides the required matrix multiplications by rows between the workers, ensuring no correlation among the obtained pixel vectors. As an approximation of algorithm performance, these calculations for an example data matrix \(\textbf{U}^{(i)}\) are determined as \(\textbf{U}^{(i)}= \left[ \sum _{k=1}^{M} (\textbf{x}_k^{\top } \times \textbf{x}_k)\right] ^{(i)}\). Subsequently, master node performs the pixel sum of the rows assigned to each worker and the eigen-decomposition extracts the eigenvalues and vectors \(\textbf{V}^{(i)}\). After that, the data is broadcast to the workers. In this sense, the algorithm conducts a spatial-domain partitioning. Hence, the pixel information (i.e., spectral signature) is stored in the same node, where each nodes deploys two different worker instances \(R=2\).
In addition, several data distribution techniques [42] and data complexity reduction techniques have been proposed for HSI data processing. For instance, Kang et al. [43] maintained rich spectral information whilst reducing its dimensionality. Also, feature extraction method [44] focus on obtain discriminative features that minimize both intraclass variation and interclass likelihood. Computation requirements have been studied in [45, 46].
Background of the SVM Approach
Sequential Algorithm Description
The SVM is a supervised classifier. Let the input data \(\textbf{X} \in \mathbb {R}^{M\times N}\) contain the pixel row information, i.e., \(\textbf{x}_k = [x_{k,1}, \dots , x_{k,N}], \forall k \in [1,M]\). Pixels are defined by the corresponding class labels \(y_k\in [0,1]\) for binary classification, or \(y_k\in [1,D]\) for multiclass classification, considering one-against-one or one-against-all approaches, where the former has robust behavior when classifying unbalanced data sets, but the latter is computationally less expensive.
In the N-dimensional space, the SVM defines an optimal hyperplane as an affine subspace of \(N-1\) dimensions to separate the samples into positive and negative by maximizing the margin between classes. Thus, the classifier is the discriminant function \(f(\textbf{x}) = \textbf{w} \cdot \textbf{x} + b = \pm 1\), where \(\textbf{w}\in \mathbb {R}^N=[w_1, \dots , w_N]\) is the normal vector to the hyperplane and the bias b adds offset to increase the samples. Hence, the distance to the hyperplane is used to find the optimal hyperplane \(f(\textbf{x}) = \textbf{w} \cdot \textbf{x} + b = 0\) in aims of maximizing the margin \(\mathcal {M} = \frac{2}{||{\textbf {w}}||}\). The complete functionality is shown in the Fig. 2.
Following the convex quadratic optimization problem, and considering the soft-margin approach as non-linearly separable data will be classified, the SVM optimization problem is described as the following soft-margin formulation:
where slack variables \(\zeta _k\) provide some flexibility to the model, since it is feasible to satisfy the constraint although the sample does not meet the original constraint. At the same time, the sum of all slack variables penalizes the selection of too large error margins, while C controls the impact of the soft-margin. Particularly, this work sets slack variables as the hinge loss \(\zeta _k=\max (0,1-y_kf(\textbf{x}_k))\) (L1-SVM), while C is set by grid search with cross-validation.
Lagrange multipliers \(\alpha\) are used to optimize the \(f(\textbf{w},\zeta )\) function subject to the M inequality constraints \(g_k(\textbf{w},b,\zeta )\) of Eq. (1), introducing the Lagrangian function \(\mathcal {L}(\textbf{w}, b, \alpha ,\zeta )=f(\textbf{w},\zeta )-\sum _{k=1}^{M}\alpha _k g_k(\textbf{w},b,\zeta )\). The minimum of f is found when its gradient points in the same direction as the gradient of g. In this regard, and using the duality principle, the Lagrangian problem \(\min _{\textbf{w},b,\zeta } \max _{\alpha } \mathcal {L}(\textbf{w},b,\alpha ,\zeta ), \,\,\text{ s.t. } \,\,\zeta _k\ge 0\) should be optimized. This problem can be rewritten into the dual Wolfe problem as:
which is optimized as \(\max _{\alpha } W(\alpha ), \,\,\text{ s.t. }\,0\le \alpha _k\le C, \,\,\sum _{k=1}^{M}\alpha _k y_k=0, \,\,k=1,\dots M\). Indeed, the dual Wolfe problem is a standard quadratic programming problem, and can be solved with a QP solver.
At this point, the normal vector is obtained as \(\textbf{w}=\sum _{k}^{M}\alpha _ky_k\textbf{x}_k\), where the data points \(\textbf{x}_k\) corresponding to nonzero \(\alpha _k\) are the support vectors, whilst the bias is obtained as the average of S support vectors, \(b=\frac{1}{S}\sum _{s=1}^S\left( y_s-\textbf{w}\cdot \textbf{x}_s\right)\), providing a more stable solution.
Proposed Cloud-Based SVM
The implementation of the proposed cloud linear-SVM was done using SCALA programming language. Moreover, the proposal implements a soft-margin “one-vs-all” multi-class strategy for D classes. In this regard, SVM is implemented in a distributed fashion to fully exploit the potential of a scalable architecture. The Apache Hadoop open source software framework enables distributed processing of large data sets on clusters of computers, and provides a fault-tolerant base to process concurrent computing tasks. In particular, HSI data is stored into the HDFS (Hadoop Distributed File System) disk, a java-based system that is optimized to store massive data sets, scaling horizontally, and to maintain multiple copies to ensure high availability and fault tolerance. Moreover, each HSI scene is divided into blocks of the same size (128 MB) and distributed across Hadoop data-nodes to prepare the environment with the aim of performing training procedure on several data subsets independently. However, traditional Hadoop Map-Reduce scheme is inefficient when performing iterative computations, as it requires disk I/O operations to reload data at each SVM iteration. In this regard, Apache Spark is used to avoid I/O operations, as it is an in-memory cluster computing platform that prioritizes storing data in the slaves cache memory (when there is enough space) instead of repeatedly reloading it from disk. A driver/controller node execute the global operations of the model and control the execution of the distributed SVM, while several executors/workers will perform the local operations on the distributed data. The steps executed by the proposed approach are illustrated in Fig. 3 and can be summarized as follows:
-
1.
The Spark Driver is launched to manage the execution and communication between workers. Workers are in charge of performing the computations. Also, the driver creates the context process which is responsible of the variables initialization. Next, the driver converts the user program into tasks and schedules the tasks to executors.
-
2.
The data is assigned to the workers \(\textbf{X}^{(i)}\), where \(i \in I\) determines the i-th worker identifier. In this regard, the data is managed by the HDFS.
-
3.
The workers are launched and the driver coordinates tasks among available workers determined by the context execution logic. The workers must complete its tasks over the respective RDDs that contains the concatenated spectral pixels (M) to be computed at the same time. The conversion of RDD lineage into tasks is performed by the scheduler. The task assignment is realized smartly on-demand based on the application needs.
-
4.
The running configuration (as weights \(\textbf{W}\)) is broadcast into available workers. In this sense, the parallel executors running on the different nodes do not require any network routing between them.
-
5.
After completion of the tasks, a reduction step coordinates the output data from the workers and perform the optimization step. Hence, data from workers passes through the driver node in a centralized manner. The scheduler updates the job stage and returns the final output.
The cloud-based SVM operates by initially loading the training data from the HDFS onto a RDD. This provides a fault-tolerant distributed data collection that facilitate distributed and parallel computing over the resulting data partitions. This is structured as a table of M training samples, each one with its corresponding label \(y_k\) and features \(\textbf{x}_k\) of the k-th sample. The mean \(\mu\) and the standard deviation \(\nu\) are obtained in a distributed manner from features to conduct the standardization of the data \(\tilde{\textbf{x}}_k={\left( \textbf{x}_k-\mu \right) }/{\nu }\), improving the rate of convergence. Moreover, and following the “one-vs-all” approach, D SVMs should be trained, with D setting the number of different land-cover classes from the \(\textbf{X}\) HSI datacube. Therefore, a binary SVM is trained for each class. In this regard, the normal vectors \(\textbf{w}\) related to the different classes are rearranged into the matrix \(\textbf{W}\in \mathbb {R}^{D\times N}\), whilst biases b are collected into the vector \(\textbf{b}\in \mathbb {R}^{D}\). Indeed, the cloud manager is responsible for storing and managing the reduced data from all workers.
Regarding the optimization problem, instead of considering the L1-normalization with \(\mid \mid \textbf{w}\mid \mid _1\) which is not differentiable, the L2-normalization loss formulation provided by Eq. (1) is used, as it is easier to optimize and more stable. Furthermore, the optimization procedure is conducted over the RDD containing the training samples through the Orthant-Wise Limited-memory Quasi-Newton (OWL-QN) optimizer, which is provided by the Breeze library for numerical processing of Spark. This algorithm generalizes the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method and it is executed by the driver node. In this sense, the cost function is defined for the optimizer as a DiffFunction element. This calculates both, the loss value and the gradient at a point, and aggregates the information on each worker using the tree aggregation method, summarizing it to the driver where the ML model is located. To update the loss function value and its corresponding gradient, the DiffFunction maps a DifferentiableLossAggregator element over the RDD containing the training samples. Indeed, the DiffFunction sends a DifferentiableLossAggregator instance to the workers, particularly to each partition, which collects the local gradient updates of the Hinge loss function, and combines the information to get the accumulated gradient for a given iteration. Then, the driver updates the trainable parameters by taking into account the obtained accumulated gradient until the optimizer converges or the maximum number of iterations is reached.
Experimentation Analysis
Environment Configuration
The experimental evaluation have been performed under a cloud computing platform with OpenStack as backbone. Spark v3.3.0 is configured against a Hadoop v3.2.3 framework. The master node has 16GB RAM of memory and 80GB HDD storage with 6 virtual cores. Eight worker nodes are deployed with same features as the master node. The HDFS and Yarn package manager are used through a web GUI API. Physical nodes are composed of 8x Dell PowerEdge M630 nodes with CentOS 7.9 as Operating System (OS). Each node is constituted of 2x Intel(R) Xeon(R) CPU E5-2650 v3 running at 2.30 GHz. All nodes mount a shared storage volume from a dual NetApp FAS3140 Network-Attached Storage (NAS) appliance, which is connected via NFS using 4\(\times\)1 GB Ethernet network.
Experimental Settings
Experiments were conducted using popular high-dimensional HSI data. In particular, the validation of the proposed CC-based method has been conducted in the popular Indian Pines (IP) and Big Indian Pines (BIP) scenes [47], which are described in Figs. 4 and 5, respectively. It is noteworthy that, although both scenes were collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) during a flight campaign over agricultural fields in northwest Indiana, the former scene, IP, contains 16 classes in a \(145\times 145\) datacube with 224 channels (where 24 belongs to water absorption bands that can be removed), whilst the later scene, BIP, is a larger version of the IP scene, comprising 58 land classes in a \(2678\times 614\) datacube with 220 bands. The wavelength range of both scene is bounded between 0.4 and 2.5 \(\mu\)m.
To evaluate the performance of the proposed CC-based SVM, different widely-used classification models are included for comparative analysis, considering both supervised and unsupervised approaches. These models encompass (i) Gaussian Naive Bayes (GNB); (ii) PERCEPTRON; (iii) Decision Tree (DT); (iv) K-Nearest Neighbors (KNN); (v) RIDGE; (vi) Multinomial Logistic Regression (MLR), and Random Forest (RF).Footnote 3 To assess the classification performance, overall accuracy (OA), average accuracy (AA) and kappa index (\(\kappa (x100)\)) have been used.
The SVM, MLR, and PERCEPTRON algorithms are configured with 200 iterations maximum. Focusing on the MLR, newton-cg is set as the solver, whilst the penalty is set to none. The KNN classifier is employed with 3 neighbors. Both the DNN and RF algorithms have 40 as the maximum depth, and the RF algorithm uses 40 estimators with 1 as the maximum features value. Additionally, the SVM, MLR, RIDGE, and PERCEPTRON algorithms have a tolerance set to 1e−6. The rest of the configuration parameters are the default ones.
Experimental Results
The experimentation is conducted to determine both the scalability and accuracy of the proposed cloud-SVM. For this purpose, four different configurations of the cloud environment were used, varying the number of workers. Specifically, 1, 2, 4 and 8 workers were used during the experimentation.
The first experiment focuses on analyzing the scalability of the proposal with different worker configurations. This is done using the BIP dataset and considering different dataset sizes. This scene presents a high complexity, thus demanding a greater commitment from the computational resources within the cloud environment. This increased resource demand enables a more comprehensive analysis of the proposed methodology response to intricate situations. The obtained results are provided in the Fig. 6, demonstrating a significant reduction in processing time when employing the proposed distributed SVM algorithm compared to a single-worker execution. Furthermore, utilizing a larger number of workers with the same data size markedly decreases the runtime. This effect is particularly pronounced when dealing with larger data sizes, as it capitalizes on the computational resources processing capabilities. Moreover, the decrease in runtime exhibits a proportional relationship with the increase in the number of workers for each dataset size. This observation contributes to the conclusion that the computational resources are achieving near optimal speedup in each configuration, thereby indicating promising scalability as the dataset size continues to grow. Finally, the evidences from the results demonstrates that the proposal provides a better response in terms of training times to the increase of data.
The same behaviour is observed in Fig. 7 for sizes significantly larger than those included in the previous experiment. In particular, Fig. 7a–c provide the individual behavior of the proposal for environments with 2, 4 and 8 workers, respectively. The scalability of each configuration is visually appreciated considering different data sizes. Finally, Fig. 7d provides the comparison of all the observed behaviors for the above configurations as a whole. It can be observed the improvement in training times altogether, where the proposed distributed implementation with 8 workers is the most efficient configuration due to the workload distribution. The scalability with respect to the number of workers and data size is unequivocally demonstrated.
This initial experiment capitalizes on the strengths of the proposed methodology in terms of scalability and acceleration. The increase in data size and number of workers leverages the advantages afforded by cloud computing environments, particularly with respect to enhanced memory and computational processing capabilities.
Finally, the last experiment is conducted to validate the accuracy obtained by the proposed method considering the IP dataset. The choice of this dataset is based on its extensive usage in prior literature. This enables our method to have a direct performance comparison with stable Scikit-learn implementations.
Firstly, Table 1 presents per-class classification results for all models. It is evident that the SVM shows superior performance in all the metrics considered. Indeed, the selected classifiers have been arranged based on their accuracy outcomes, where less effective models appears on the left-side of Table 1 whilst proficient models are on the right-side. The SVM demonstrates a clear differentiation for seven classes, comprising \(43.75\%\) of the total 16 classes (ranging from 0 to 15). Notably, minor classes such as 1-Alfalfa, 6-Grass/pasture-mowed, and 8-Oats show significant accuracy improvements for the SVM compared to other models. Additionally, major classes also benefit from enhanced classification, contributing to a noteworthy overall performance. This is particularly evident in the AA value, signifying consistent performance across all classes.
Secondly, Fig. 8 displays OA, AA, and \(\kappa (x100)\) for increasing data amounts. This aligns with the preceding scalability experiments and serves as an indicator of the classification performance in a scalable environment. Figure 8 also highlights the SVM classification performance with low data amounts for training. Moreover, high-resolution classification maps are provided in Fig. 9 to offer detailed insights. It is noteworthy that the SVM model struggles in fine-grain classification of specific pixels, as is recurrent in traditional models based exclusively on pixel spectral information treated in complete isolation. Nevertheless, its classification map is satisfactory compared to the other algorithms.
Finally, the evaluation of classification performance underscores the robustness inherent in the proposed distributed SVM methodology. Notably, the presence of minor classes, characterized by pixels weakly represented across certain partitions, does not significantly impact the overall performance. Conversely, the reduction step tackles this challenge efficiently, yielding remarkable and consistent accuracy across all classes.
Conclusions
This work exploits a distributed cloud computing environment for the processing of large dimensional hyperspectral data. The proposed methodology focuses on analyzing the scalability whilst ensuring a notable classification performance. In this context, sequential algorithms suffer an important degradation of their performance due to the computational-expensive processing of such complex data. This is exacerbated by the increasing complexity of machine learning models, where performance improves at the cost of more complex and sophisticated algorithms. Therefore, the classification entails a noteworthy amount of calculations that exceed the capacities of an individual machine. Besides that fact, sequential algorithms are useful when the data can be stored and processed in a single machine. This work presents a well-optimized cloud parallel implementation of the SVM machine learning algorithm over a cloud environment. The SVM approach demonstrates notable outcomes in discriminating features for classification purposes. The driving rationale behind this research is the high performance of the SVM and the idea of reducing the computational time required for fast decision-making. In the provided study, the experimental results demonstrate computational performance gains on the processing of complex HSI scenarios. Moreover, the proposal exhibit effectiveness for classification purposes, achieving notable performance in comparison to the standard Scikit-learn models. As a case of study, the proposal have been evaluated against the baseline one-node implementation for scalability purposes. Encouraged by the outstanding results obtained in this work, the future work is set to develop new distributed implementations for the processing of large HSI data in cloud computing environments.
Data Availibility
Datasets could be downloaded from https://github.com/mhaut/HSI-datasets.
Notes
The former enables integrated storage that can be connected directly to applications. Block storage, on the other hand, allows data blocks to be connected and shared with compute instances in high-performance environments.
For simplification the worker index i has been taken out, in order to clarify the indexes of samples and layers.
To easy the replication of the obtained results, the considered classifiers have been tested following the implementation available in Scikit-learn https://scikit-learn.org/stable/.
References
Goetz AFH, Vane G, Solomon JE, Rock BN. Imaging spectrometry for earth remote sensing. Science. 1985;228(4704):1147–53. https://doi.org/10.1126/science.228.4704.1147.
Srivastava S, Vargas-Muñoz JE, Tuia D. Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution. Remote Sens Environ. 2019;228:129–43. https://doi.org/10.1016/j.rse.2019.04.014.
Lu B, Dao PD, Liu J, He Y, Shang J. Recent advances of hyperspectral imaging technology and applications in agriculture. Remote Sens. 2020. https://doi.org/10.3390/rs12162659.
Weber C, Aguejdad R, Briottet X, Avala J, Fabre S, Demuynck J, Zenou E, Deville Y, Karoui MS, Benhalouche FZ, Gadal S, Ouerghemmi W, Mallet C, Bris AL, Chehata N. Hyperspectral imagery for environmental urban planning. In: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018; p. 1628–31.
Paoletti ME, Moreno-Álvarez S, Haut JM. Multiple attention-guided capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2022;60:1–20. https://doi.org/10.1109/TGRS.2021.3135506.
Talukdar S, Singha P, Mahato S, Shahfahad Pal S, Liou Y-A, Rahman A. Land-use land-cover classification by machine learning classifiers for satellite observations-a review. Remote Sens. 2020. https://doi.org/10.3390/rs12071135.
Wu Z, Li Y, Plaza A, Li J, Xiao F, Wei Z. Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J Select Top Appl Earth Observ Remote Sens. 2016;9(6):2270–8. https://doi.org/10.1109/JSTARS.2016.2542193.
Plaza A, Du Q, Chang Y-L. High performance computing for hyperspectral image analysis: Perspective and state-of-the-art. 2009 IEEE Int Geosci Remote Sens Symp. 2009;5:72–5. https://doi.org/10.1109/IGARSS.2009.5417729.
Gorgan D, Bacu V, Stefanut T, Rodila D, Mihon D. Grid based satellite image processing platform for earth observation application development. In: 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2009; p. 247–52.
Bankar S. Cloud computing using amazon web services (aws). Int J Trend Sci Res Dev. 2018;2(4):2156–7.
Desai V, Oza K, Shinde P, Naik P. Microsoft azure: Cloud platform for application service deployment. 2021;7:20–3.
Cristianini N, Shawe-Taylor J. An introduction to support vector machines: and other kernel-based learning methods. USA: Cambridge University Press; 1999.
Paoletti ME, Haut JM, Tao X, Miguel JP, Plaza A. A new gpu implementation of support vector machines for fast hyperspectral image classification. Remote Sens. 2020. https://doi.org/10.3390/rs12081257.
Haut JM, Paoletti ME, Moreno-Álvarez S, Plaza J, Rico-Gallego J-A, Plaza A. Distributed deep learning for remote sensing data interpretation. Proc IEEE. 2021;109(8):1320–49. https://doi.org/10.1109/JPROC.2021.3063258.
Osuna E, Freund R, Girosi F. An improved training algorithm for support vector machines. In: Principe J, Gile L, Morgan N, Wilson E editors. Neural Networks for Signal Processing VII—Proceedings of the 1997 IEEE Workshop, IEEE, New York; 1997. p. 276–85
Platt JC. Fast training of support vector machines using sequential minimal optimization. MIT Press: Cambridge; 1999. p. 185–208.
Moreno-Álvarez S, Haut JM, Paoletti ME, Rico-Gallego JA, Díaz-Martín JC, Plaza J. Training deep neural networks: a static load balancing approach. J Supercomput. 2020;76(12):9739–54.
Moreno-Álvarez S, Haut JM, Paoletti ME, Rico-Gallego JA. Heterogeneous model parallelism for deep neural networks. Neurocomputing. 2021;441:1–12.
Moreno-Álvarez S, Paoletti ME, Rico-Gallego JA, Haut JM. Heterogeneous gradient computing optimization for scalable deep neural networks. J Supercomput. 2022;78:1–15.
Moreno-Álvarez S, Paoletti ME, Cavallaro G, Rico JA, Haut JM. Remote sensing image classification using cnns with balanced gradient for distributed heterogeneous computing. IEEE Geosci Remote Sens Lett. 2022;19:1–5. https://doi.org/10.1109/LGRS.2022.3173052.
Wadkar S, Siddalingaiah M, Venner J. Pro Apache Hadoop. 2nd ed. USA: Apress; 2014.
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I. Apache spark: A unified engine for big data processing. Commun ACM. 2016;59(11):56–65. https://doi.org/10.1145/2934664.
Roy SK, Manna S, Song T, Bruzzone L. Attention-based adaptive spectral-spatial kernel resnet for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2021;59(9):7831–43. https://doi.org/10.1109/TGRS.2020.3043267.
Chen Z, Hong D, Gao H. Grid network: Feature extraction in anisotropic perspective for hyperspectral image classification. IEEE Geosci Remote Sens Lett. 2023;20:1–5. https://doi.org/10.1109/LGRS.2023.3297612.
Roy SK, Krishna G, Dubey SR, Chaudhuri BB. Hybridsn: Exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification. IEEE Geosci Remote Sens Lett. 2020;17(2):277–81. https://doi.org/10.1109/LGRS.2019.2918719.
Chen Z, Wu G, Gao H, Ding Y, Hong D, Zhang B. Local aggregation and global attention network for hyperspectral image classification with spectral-induced aligned superpixel segmentation. Expert Syst Appl. 2023;232: 120828. https://doi.org/10.1016/j.eswa.2023.120828.
Roy SK, Deria A, Shah C, Haut JM, Du Q, Plaza A. Spectral-spatial morphological attention transformer for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2023;61:1–15.
Paoletti ME, Tao X, Han L, Wu Z, Moreno-Álvarez S, Roy SK, Plaza A, Haut JM. Parameter-free attention network for spectral-spatial hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2023;61:1–17. https://doi.org/10.1109/TGRS.2023.3295097.
Touzi R. Target scattering decomposition in terms of roll-invariant target parameters. IEEE Trans Geosci Remote Sens. 2007;45(1):73–84. https://doi.org/10.1109/TGRS.2006.886176.
An W, Cui Y, Yang J. Three-component model-based decomposition for polarimetric sar data. IEEE Trans Geosci Remote Sens. 2010;48(6):2732–9.
Muhuri A, Goïta K, Magagi R, Wang H. Geodesic distance based scattering power decomposition for compact polarimetric sar data. IEEE Trans Geosci Remote Sens. 2023;61:1–12. https://doi.org/10.1109/TGRS.2023.3304710.
Roy SK, Deria A, Hong D, Rasti B, Plaza A, Chanussot J. Multimodal fusion transformer for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2023;61:1–20. https://doi.org/10.1109/TGRS.2023.3286826.
...Shugar DH, Jacquemart M, Shean D, Bhushan S, Upadhyay K, Sattar A, Schwanghart W, McBride S, de Vries MVW, Mergili M, Emmer A, Deschamps-Berger C, McDonnell M, Bhambri R, Allen S, Berthier E, Carrivick JL, Clague JJ, Dokukin M, Dunning SA, Frey H, Gascoin S, Haritashya UK, Huggel C, Kääb A, Kargel JS, Kavanaugh JL, Lacroix P, Petley D, Rupper S, Azam MF, Cook SJ, Dimri AP, Eriksson M, Farinotti D, Fiddes J, Gnyawali KR, Harrison S, Jha M, Koppes M, Kumar A, Leinss S, Majeed U, Mal S, Muhuri A, Noetzli J, Paul F, Rashid I, Sain K, Steiner J, Ugalde F, Watson CS, Westoby MJ. A massive rock and ice avalanche caused the 2021 disaster at Chamoli, Indian Himalaya. Science. 2021;373(6552):300–6. https://doi.org/10.1126/science.abh4455.
Uhl F, Bartsch I, Oppelt N. Submerged kelp detection with hyperspectral data. Remote Sens. 2016;8(6):487.
Chen Z, Wang Y, Gao H, Ding Y, Zhong Q, Hong D, Zhang B. Temporal difference-guided network for hyperspectral image change detection. Int J Remote Sens. 2023;44(19):6033–59. https://doi.org/10.1080/01431161.2023.2258563.
Rahman MA, Hasan ST, Kader MA. Computer vision based industrial and forest fire detection using support vector machine (svm). In: 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), IEEE. 2022. p. 233–238.
Liu J, Wu J, Sun L, Zhu H. Image data model optimization method based on cloud computing. J Cloud Comput. 2020;9:1–10.
Jin R, Kou C, Liu R, Li Y. Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. J Cloud Comput Adv Syst Appl. 2013;2(1):1–10.
Cui L, Qu Z, Zhang G, Tang B, Ye B. A bidirectional dnn partition mechanism for efficient pipeline parallel training in cloud. J Cloud Comput. 2023;12(1):22. https://doi.org/10.1186/s13677-022-00382-7.
Haut JM, Gallardo JA, Paoletti ME, Cavallaro G, Plaza J, Plaza A, Riedel M. Cloud deep networks for hyperspectral image analysis. IEEE Trans Geosci Remote Sens. 2019;57(12):9832–48. https://doi.org/10.1109/TGRS.2019.2929731.
Haut JM, Paoletti ME. Cloud implementation of multinomial logistic regression for uav hyperspectral images. IEEE J n Miniaturiz Air Space Syst. 2020;1:163–71.
Chen Z, Chen N, Yang C, Di L. Cloud computing enabled web processing service for earth observation data processing. IEEE J Select Top Appl Earth Observ Remote Sens. 2012;5(6):1637–49. https://doi.org/10.1109/JSTARS.2012.2205372.
Kang X, Duan P, Li S, Benediktsson JA. Decolorization-based hyperspectral image visualization. IEEE Trans Geosci Remote Sens. 2018;56(8):4346–60.
Jia X, Kuo B-C, Crawford MM. Feature mining for hyperspectral image classification. Proc IEEE. 2013;101(3):676–97.
Plaza AJ, Plaza J, Valencia D. Impact of platform heterogeneity on the design of parallel algorithms for morphological processing of high-dimensional image data. J Supercomput. 2006;40:81–107.
Jaramago JAG, Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza A, Plaza J. Gpu parallel implementation of dual-depth sparse probabilistic latent semantic analysis for hyperspectral unmixing. IEEE J Select Top Appl Earth Observ Remote Sens. 2019;12(9):3156–67. https://doi.org/10.1109/JSTARS.2019.2934011.
Green RO, Eastwood ML, Sarture CM, Chrien TG, Aronsson M, Chippendale BJ, Faust JA, Pavri BE, Chovit CJ, Solis M, et al. Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (aviris). Remote Sens Environ. 1998;65(3):227–48.
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This publication has been supported by the Consejería de Economía, Ciencia y Agenda Digital of the Junta de Extremadura and the European Regional Development Fund (ERDF) of the European Union through grant reference GR21040 and 0206_RAT_EOS_PC_6_E. This work was also supported in part by the Spanish Ministerio de Ciencia e Innovacion under Project PID2019-110315RB-I00 (APRISA). This work made use of the computing infrastructure of CETA-CIEMAT. This infrastructure was partially funded by the European Regional Development Fund (ERDF) of the European Union. CETA-CIEMAT belongs to CIEMAT and the Government of Spain. Finally, this article is partially supported by the Junta the Extremadura (Ref. IB20040).
Author information
Authors and Affiliations
Contributions
JMH, JMFV and APD set the experimental environment and conducted the experiments with the mentioned datasets. JMH, MEP and SMA wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no conflict of interest as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Haut, J.M., Franco-Valiente, J.M., Paoletti, M.E. et al. Hyperspectral Image Analysis Using Cloud-Based Support Vector Machines. SN COMPUT. SCI. 5, 719 (2024). https://doi.org/10.1007/s42979-024-03073-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-03073-z