CN111428787A

CN111428787A - Hyperspectral image parallel classification method based on GPU

Info

Publication number: CN111428787A
Application number: CN202010212055.4A
Authority: CN
Inventors: 张明华; 邹亚晴; 宋巍; 黄冬梅; 杜艳玲
Original assignee: Shanghai Ocean University
Current assignee: Shanghai Ocean University
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-07-17

Abstract

The invention relates to a GPU-based hyperspectral image parallel classification method in the field of high-performance computation, which comprises a first step of processing a hyperspectral image by using a G-PNPE parallel image preprocessing algorithm to obtain Cube data and a second step of constructing a CNN-SVM model for data training and classification; the method is characterized in that the algorithm idea of GEEM is used for realizing the parallelization of convolution operation, a G-PNPE data preprocessing algorithm is provided, data are stored in a global memory through the data preprocessing algorithm and can be read by a GPU for directly carrying out matrix operation. The hyperspectral image classification algorithm is verified by the method, and the speed of the hyperspectral image classification algorithm can be improved by 25% -30% compared with that of the original algorithm under the condition of keeping the precision of the classification algorithm unchanged. Meanwhile, the invention has better accelerated performance and generalization capability under the condition of a plurality of convolution layers. Therefore, the invention not only further improves the efficiency, but also has better expansion capability.

Description

Hyperspectral image parallel classification method based on GPU

Technical Field

The invention relates to the field of high-performance computing, in particular to a parallel algorithm for improving the classification training and execution efficiency of hyperspectral images based on a GPU.

Background

The hyperspectral remote sensing is an important means for observing the ground, and obtains the spectral information of ground objects while acquiring the ground surface image. By finely classifying the images, important decision reference can be provided in a plurality of key fields such as agriculture, environmental management and urban planning.

A DSP is a specialized processor that has been developed specifically for implementing various digital signal processing algorithms. Researchers often use this method to achieve high-performance processing of hyperspectral remote sensing images. Based on a high-parallelism low-storage single-instruction multiple-data processor array, Chai et al utilize a plurality of Digital Signal Processors (DSPs) to solve on-track real-time processing of hyperspectral data and meet the requirements of real-time processing and storage. Guo Wenchui et al explores that multi-DSP parallel processing is used for hyperspectral image anomaly detection, uses 4 DSPs to realize interconnection through a CPCI bus, shares a data bus and a storage module, divides an anomaly detection algorithm into 4 parallel tasks, and realizes 4-fold improvement of the calculation efficiency.

A Field Programmable Gate Array (FPGA) that allows hardware/software co-design and possibly provides both powerful on-board computing power and flexibility. The Carlos Gonzalez uses the FPGA technology when the N-FINDR algorithm is used for extracting the end members of the hyperspectral image, and a certain acceleration effect can be obtained.

The rapid development of a GPU (Graphic processing Unit) in the aspect of a general-purpose computing architecture continuously improves the computing performance thereof, and the numerical computing capability thereof far exceeds that of other general-purpose processors. Particularly, with the continuous improvement and expansion of the GPU in the aspects of programmability and application range, a new technical means is provided for accelerating remote sensing image processing.

The Zebin Wu realizes the parallelization accelerated optimization of the SVMCK (composite kernel function SVM) algorithm to the hyperspectral image classification based on the CUDA technology. And transferring the calculation of the kernel function to a GPU through algorithm analysis to realize the speed increase of the algorithm by at least 6 times. Qicong Wang also uses the CUDA technology when using the GPU to classify the hyperspectral images by the weighted Markov random field algorithm. Zheng et al implemented the SSGSCI-KSRC classification algorithm 30 times faster than the serial approach using CUDA. Zhang soldier and the like use a CUDA calculation model when processing the hyperspectral data of Tiangong I, and obtain an acceleration ratio of more than 7 times.

Stack, etc. in implementing accelerated CNN optimization using a GPU, data sharing is implemented using more registers instead of shared memory. Because the read-write response time of the register is far shorter than that of the shared memory, the acceleration effect is better compared with the realization of a CPU. But this approach is extremely sensitive to the shape of the input data and less flexible. HanDong et al propose a GPU-based Cube-CNN-SVM parallel algorithm (GCN). The PNPE (parallel Neighbor Pixels extraction) parallel Neighbor pixel extraction algorithm is used for preprocessing data, Cube data can be extracted from an original image in a parallel mode, then the Cube data is loaded into a GPU for CNN training, and model training efficiency is effectively improved.

Although the above methods can improve the efficiency of the hyperspectral image classification algorithm to a certain extent, certain problems still exist: 1) the realization difficulty is high by using a DSP and FPGA mode, and the application range of the special chip is small, so that the large-range popularization is difficult to realize; 2) with the development and improvement of GPU performance, the acceleration mode efficiency of DSP and FPGA can not meet the current requirement; 3) the existing parallel computing method based on the GPU does not fully utilize the resources of the GPU, does not achieve the efficiency maximization, and has larger promotion space; 4) the existing parallel computing method based on the GPU has insufficient expansibility and higher requirements on the type of a model, and cannot be used in a large range.

Disclosure of Invention

The invention aims to provide a hyperspectral image parallel classification method based on a GPU (graphics processing Unit), a new parallel strategy is proposed based on a GEMM (generalized expectation-maximization) algorithm, the parallel capability of a CNN (convolutional neural network) algorithm is enhanced, the training efficiency of the algorithm is improved, and the problems of insufficient parallel and low expansibility of the current algorithm are solved.

The purpose of the invention is realized as follows: a hyperspectral image parallel classification method based on a GPU comprises a first step of processing a hyperspectral image by a G-PNPE parallel image preprocessing algorithm to obtain Cube data and a second step of constructing a CNN-SVM model for data training and classification;

in the first step:

firstly, extracting Cube data from a hyperspectral image, reorganizing the data to form a data matrix by using a G-PNPE algorithm, and inputting the data matrix into a model to perform convolution operation;

in the second step:

firstly, preprocessing a hyperspectral image to generate a Cube sample, then inputting the sample into a model and training by using a CNN algorithm, and classifying the trained data as a high-level feature by using an SVM as a classifier.

Further, in the first step, the convolution operation is changed into a matrix multiplication form by using a GEMM algorithm, input data is expanded and copied, convolution kernels are rearranged, and preparation is made for parallelization.

Further, in the first step above, the input of each convolution is reorganized such that each row contains all the input values required to compute one element of the output feature. The convolution kernels are likewise unwrapped and then concatenated to form a convolution kernel matrix. Each output corresponds to a column in the new matrix.

Further, in the first step, when the new input matrix is multiplied by the kernel matrix, the output characteristics are automatically calculated.

Further, in the first step, the hyperspectral image is extracted in a Cube form in a parallel mode and stored in a global memory for training of a subsequent model;

the data extraction process comprises the following steps:

(A) the hyperspectral image is regarded as a three-dimensional matrix with the dimension H W C, the index position of each effective element (pixel with a non-empty pixel value) is scanned in a CPU to be (x, y and C), the position information of the index adjacent pixel can be obtained through the index, the position of the ith element in the data matrix can be obtained through a formula (1) for the index of 3C elements in each Cube, and the formula (1) is as follows: x H + y W + C9 + i C H W;

(B) scanning the convolution kernel matrix according to the mode of the step A, and calculating according to a formula (2) to obtain the position of the jth element in the convolution kernel in the matrix after conversion, wherein the formula (2) is as follows: x H + y W + c 9+ i H W;

in the formula: h is the height of the hyperspectral image; w is the width of the hyperspectral image; and C is the number of wave bands of the hyperspectral image.

Further, in the second step, different activation functions and loss functions are selected when the model is trained using the CNN algorithm.

Further, in the second step, parameters of the algorithm are adjusted by using an MBGD algorithm to improve the model accuracy.

Further, in the second step, the training phase of the model is divided into two parts: forward propagation and backward propagation; the classification result of each Cube is calculated in the forward propagation stage, and the weight of the network is updated in the backward propagation stage so as to obtain a better classification result.

Further, the hyperspectral image includes spectral information and spatial information.

The invention has the beneficial effects that: the method is characterized in that the algorithm idea of GEEM is used for realizing the parallelization of convolution operation, a G-PNPE data preprocessing algorithm is provided, data are stored in a global memory through the data preprocessing algorithm and can be read by a GPU for directly carrying out matrix operation. The hyperspectral image classification algorithm is verified by the hyperspectral image classification method, and the speed of the hyperspectral image classification algorithm can be improved greatly compared with that of the original algorithm under the condition of keeping the precision of the classification algorithm unchanged. Meanwhile, the invention has better accelerated performance and generalization capability under the condition of a plurality of convolution layers. Therefore, the invention not only further improves the efficiency, but also has better expansion capability.

Drawings

Fig. 1 is a diagram of a definition form of a convolution operation.

Fig. 2 is a matrix multiplication form diagram of convolution operation.

FIG. 3 is a Cube-CNN-SVM model framework diagram.

Fig. 4 is a schematic diagram of image preprocessing.

Detailed Description

The invention will be further described with reference to the accompanying figures 1-4 and the specific embodiments.

A hyperspectral image parallel classification method based on a GPU comprises a first step of processing a hyperspectral image by a G-PNPE parallel image preprocessing algorithm to obtain Cube data and a second step of constructing a CNN-SVM model for data training and classification.

The first step comprises the following substeps:

(1-1) implementation method of convolutional layer in convolutional neural network

The CNN is one of artificial neural networks, is characterized in that feature extraction is carried out through image convolution, can be well applied to image classification and identification, and has a large application scene and an acceleration space on a graphic processor chip. The convolutional layer is an important part in the CNN model, and the convolution calculation is realized by using a sliding window mode, which is the simplest and direct mode but is difficult to realize parallelism due to disordered memory access.

The operation to implement convolution by definition typically includes a number of nested loops, the outer loop traversing the input image, the inner loop traversing the convolution kernel.A conventional convolution operation in the convolution layer is provided in FIG. 1. for simplicity, the other operations involved in convolution have been omitted.

The calculation of convolution according to the definition of convolution has the problems of low execution efficiency, unsuitability for parallel calculation and the like. We can optimize the input data and the recombination and duplication of the convolution kernel by converting the operation of convolution into a matrix multiplication form.

The core idea of GEMM is to spread and copy the input data, rearrange the convolution kernel, and in the forward propagation process, each convolution operation will be converted into a matrix product, and correspondingly we can simply treat the backward propagation as another matrix product operation. The method not only enables the access of the algorithm to the memory to be more regular in the training process, but also facilitates the acceleration by using a parallel mode.

FIG. 2 also provides an example illustrating the matrix product form of the convolutional layers. The inputs to each convolution are reorganized so that each row contains all the input values needed to compute one element of the output feature. The convolution kernels are likewise unwrapped and then concatenated to form a convolution kernel matrix. Each output corresponds to a column in the new matrix. When the new input matrix is multiplied by the kernel matrix, the output characteristics will be automatically calculated.

Convolutional layers are the most important part of the CNN model and are the most time-consuming part of the entire model. Therefore, if the calculation efficiency of the convolution layer can be effectively improved, the operation efficiency of the whole algorithm can be improved. Therefore, the idea of using the GEMM algorithm changes the convolution operation into a matrix multiplication form, and is ready for parallelization.

(1-2) GEMM-based G-PNPE parallel image preprocessing algorithm

The reason why the GEMM is not used in the traditional parallel method is that the hyperspectral image has high dimensionality, multiple wave bands and higher data volume. When a large amount of data is subjected to matrix conversion, a large amount of time is consumed, and the acceleration effect is poor. The invention provides a G-PNPE parallel algorithm, which extracts a hyperspectral image into a Cube form in a parallel mode and stores the Cube form in a global memory for training of a subsequent model. The data extraction process comprises the following steps:

To sum up, Cube data is extracted from a hyperspectral image, the data is reorganized to form a data matrix by using a G-PNPE algorithm, and then the data is input into a model to carry out convolution operation.

The second step comprises the following substeps:

(2-1) Cube-CNN-SVM classification model

In order to improve the classification precision, the spectral information and the spatial information of the hyperspectral image are fused and input into a classifier for training. The pixels adjacent to the pixels are extracted at the same time when the labeled pixels are extracted through a preprocessing algorithm. The label pixel and its neighboring pixels are input to the model as training data for training. The data extracted in this way are called Cube samples.

Firstly, preprocessing a hyperspectral image to generate a Cube sample, and then inputting the sample into a model for training. The trained data is classified as high-level features using SVMs as classifiers. FIG. 3 shows the overall architecture diagram of the Cube-CNN-SVM.

The Cube-CNN-SVM model is trained by using Cube cubes as input, and each Cube is independent in the training process, so that the Cube-CNN-SVM model is suitable for parallel acceleration.

(2-2) training of model

The training samples are stored in a global memory after being preprocessed, and can be directly accessed in model training. We can choose different activation and loss functions when using the CNN training model.

Taking the output layer softmax activation function and the cross entropy loss function adopted by the invention as examples, the calculation of the loss value is shown as a formula 3:

in the formula: n denotes a sample class, y_iRepresents an expected output value of the model, a_iRepresenting the actual output of the model.

To improve model accuracy we use the MBGD (Mini-batch Gradient Descent) algorithm to adjust the parameters of the algorithm. Taking the ith neuron as an example, we can compute the partial derivatives of the output values by the loss function defined above:

the function of the output layer is defined as a_i＝f(z_i) Wherein Z is_iIs the dot product of the weights between the input data. According to the chain rule, one can obtain:

by calculating a loss function

Updating parameters by back-propagation to the previous layer.

(2-3) implementation pseudo code of model

The training phase of the model is divided into two parts: forward propagation and backward propagation. The classification result of each Cube is calculated in the forward propagation stage, and the weight of the network is updated in the backward propagation stage so as to obtain a better classification result. The specific implementation pseudo code is as follows:

the invention realizes the parallelization of convolution operation by using the idea of GEEM, provides a G-PNPE data preprocessing algorithm, stores data into a global memory through the data preprocessing algorithm, and can be read by a GPU to directly perform matrix operation. The hyperspectral image classification algorithm is verified by the method, and the speed of the hyperspectral image classification algorithm can be improved by 25% -30% compared with that of the original algorithm under the condition of keeping the precision of the classification algorithm unchanged. Meanwhile, the invention has better accelerated performance and generalization capability under the condition of a plurality of convolution layers. Therefore, the invention not only further improves the efficiency, but also has better expansion capability.

The inventive method herein is compared to other parallel methods to demonstrate superiority. The GCN hyperspectral image parallel classification model and the non-parallel Cube-CNN-SVM model are mainly compared.

Evaluation indexes of the simulated contrast experiment are as follows:

(1) the percentage of time consumed by the convolutional layer over the entire model:

in the method, the time ratio of the convolution layer in the whole model operation process is reduced by 10%, and the whole model optimization effect is obvious.

(2) And (3) comparing the accuracy of the model:

compared with a non-parallel model, the accuracy of the parallel model provided by the invention is almost consistent, which shows that the parallel computing model provided by the invention has no influence on the classification accuracy.

(3) Training time comparison of the model:

the average training time of the two parallel models is greatly reduced compared with that of the non-parallel model, and the acceleration ratio is respectively 5.3 and 6.8. Compared with a GCN model, the improved model of the invention improves the time efficiency by about 20-30% and has obvious speed improvement.

While the preferred embodiments of the present invention have been described, those skilled in the art will appreciate that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A hyperspectral image parallel classification method based on a GPU is characterized by comprising a first step of processing a hyperspectral image by using a G-PNPE parallel image preprocessing algorithm to obtain Cube data and a second step of constructing a CNN-SVM model for data training and classification;

in the first step:

in the second step:

2. The GPU-based hyperspectral image parallel classification method according to claim 1, characterized in that: in the first step, the GEMM algorithm is used to change the convolution operation into a matrix multiplication form, the input data is expanded and copied, and the convolution kernels are rearranged, so that preparation is made for parallelization.

3. The GPU-based hyperspectral image parallel classification method according to claim 2 is characterized in that: in the first step described above, the inputs of each convolution are reorganized so that each row contains all the input values required to compute one element of the output characteristic. The convolution kernels are likewise unwrapped and then concatenated to form a convolution kernel matrix. Each output corresponds to a column in the new matrix.

4. The GPU-based hyperspectral image parallel classification method according to claim 3, characterized in that: in the first step described above, the output characteristics will be automatically calculated when the new input matrix is multiplied by the kernel matrix.

5. The GPU-based hyperspectral image parallel classification method according to claim 4 is characterized in that: in the first step, the hyperspectral image is extracted in a parallel mode into a Cube form and stored in a global memory for training of a subsequent model;

the data extraction process comprises the following steps:

(A) the hyperspectral image is regarded as a three-dimensional matrix with the dimension H W C, the index position of each effective element scanned in a CPU is (x, y, C), the position information of the index adjacent pixel can be obtained through the index, the position of the ith element in the data matrix can be obtained through the index of 3X 3C elements in each Cube through a formula (1), and the formula (1) is as follows: x H + y W + C9 + i C H W;

in the formula: h is the height of the hyperspectral image; w is the width of the hyperspectral image; c is the number of wave bands of the hyperspectral image;

the above-mentioned effective elements are pixels whose pixel values are not null.

6. The GPU-based hyperspectral image parallel classification method according to claim 1, characterized in that: in the second step, different activation functions and loss functions are selected when training the model using the CNN algorithm.

7. The GPU-based hyperspectral image parallel classification method according to claim 6 is characterized in that: in the second step, the parameters of the algorithm are adjusted by using the MBGD algorithm to improve the model accuracy.

8. The GPU-based hyperspectral image parallel classification method according to claim 7 is characterized in that: in the second step, the training phase of the model is divided into two parts: forward propagation and backward propagation; the classification result of each Cube is calculated in the forward propagation stage, and the weight of the network is updated in the backward propagation stage so as to obtain a better classification result.

9. The GPU-based hyperspectral image parallel classification method according to any one of claims 1 to 8, characterized by comprising the following steps: the hyperspectral image includes spectral information and spatial information.