CN114581789A

CN114581789A - Hyperspectral image classification method and system

Info

Publication number: CN114581789A
Application number: CN202210158771.8A
Authority: CN
Inventors: 孙启玉; 杨公平; 刘玉峰; 孙平; 褚德峰
Original assignee: Shandong Fengshi Information Technology Co ltd
Current assignee: Shandong Fengshi Information Technology Co ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-06-03

Abstract

The invention provides a hyperspectral image classification method and a hyperspectral image classification system, which comprise the following steps: acquiring a hyperspectral image, and preprocessing the hyperspectral image; performing channel interaction and compression of pixel spectral dimensions and expansion and alignment of a spatial window of pixel sample spatial dimensions on the preprocessed hyperspectral image to obtain mapping characteristics; based on the mapping characteristics, after two times of information interaction are carried out in sequence, spectrum-space characteristics are obtained; obtaining the category of each pixel in the acquired hyperspectral image by adopting a classifier based on the spectrum-space characteristics; the information interaction is to perform spectrum characteristic channel compression, non-local space information extraction and spectrum characteristic channel expansion on the input characteristics in sequence, and then fuse the input characteristics with the information interaction to obtain output characteristics. The receptive field for capturing the spatial information is enlarged, and richer and more robust spectral spatial information can be captured to efficiently complete the hyperspectral image classification task.

Description

Hyperspectral image classification method and system

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a hyperspectral image classification method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The hyperspectral image classification is a core technology of hyperspectral image processing and interpretation, and a hyperspectral image classification task aims to assign a unique semantic label to each hyperspectral image pixel according to a given ground object class set and based on the spectrum and spatial characteristics of a hyperspectral image.

According to the traditional hyperspectral image classification method, the spectral space features which are discriminant and robust enough are difficult to extract efficiently from the aspects of spectral dimension and spatial dimension to complete the task of assigning the pixel labels, and in a complex scene, the generalization capability of the traditional hyperspectral image classification method is often tested. In recent years, a hyperspectral image classification method based on deep learning often extracts stacked spectrum-space characteristics in an end-to-end mode to realize a more robust pixel classification result, and the development of a hyperspectral image classification task is promoted.

Among the numerous deep learning models, the classification method based on the convolutional neural network has attracted extensive attention in the hyperspectral image classification task by the characteristics of local perception and parameter sharing and proves the excellent performance of the classification method. Similar to typical computer vision tasks such as semantic segmentation, target detection and instance segmentation, in the hyperspectral image classification task, modeling of long-range or non-local spatial information is also beneficial to assisting in completing the pixel classification task. To globally aggregate locally captured filtered values, convolution-based structures require deep stacking of convolution layers to capture spatial features of different granularity.

However, hyperspectral images are inherently accompanied by small sample problems due to their dual difficulties of image acquisition and data annotation, which limits the convolutional layer stacking depth of the model used to handle the hyperspectral image classification task. In addition, the deeper level convolution model also affects the efficiency of model task processing in terms of model parameters and runtime.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a hyperspectral image classification method and system, which expand the receptive field of capturing spatial information and can capture richer and more robust spectral spatial information to efficiently complete a hyperspectral image classification task.

In order to achieve the purpose, the invention adopts the following technical scheme:

a first aspect of the present invention provides a hyperspectral image classification method, which includes:

acquiring a hyperspectral image, and preprocessing the hyperspectral image;

performing channel interaction and compression of pixel spectral dimensions and expansion and alignment of a spatial window of pixel sample spatial dimensions on the preprocessed hyperspectral image to obtain mapping characteristics;

based on the mapping characteristics, after two times of information interaction are carried out in sequence, spectrum-space characteristics are obtained;

obtaining the category of each pixel in the acquired hyperspectral image by adopting a classifier based on the spectrum-space characteristics;

the information interaction is to perform spectrum characteristic channel compression, non-local space information extraction and spectrum characteristic channel expansion on the input characteristics in sequence, and then fuse the input characteristics with the information interaction to obtain output characteristics.

Further, the channel interaction and compression of the pixel spectral dimension are realized by a spectral compression group consisting of a 1 × 1 convolution layer, batch normalization operation and a Relu activation function;

or,

the spatial window expansion and alignment of the pixel sample spatial dimension is realized by a spatial expansion group consisting of a 2 × 2 convolution layer, a batch normalization operation and a Relu activation function.

Further, the spectral feature channel compression is realized by a first 1 × 1 convolutional layer;

the output channel of the first 1 x 1 convolutional layer is configured with a batch normalization operation and a Relu activation function.

Further, the non-local spatial information extraction is realized through a double-head self-attention layer;

the dot product multiplication and element-by-element addition operations of the double-headed self-attention layer configuration matrix, the Softmax activation function, and the relative position encoding computation operations.

Further, the spectral feature channel expansion is realized by a second 1 × 1 convolutional layer;

the output channel of the second 1 × 1 convolutional layer is configured with a Relu activation function.

Further, the fusing with the mapping feature specifically includes: and the mapping characteristic and the output of the second 1 × 1 convolutional layer are subjected to fusion addition operation and Relu activation function in sequence to obtain interactive information.

Further, the preprocessing comprises respectively performing mean-variance standardization processing on each spectral channel in the acquired hyperspectral image.

A second aspect of the present invention provides a hyperspectral image classification system comprising:

a pre-processing module configured to: acquiring a hyperspectral image, and preprocessing the hyperspectral image;

a compression expansion module configured to: performing channel interaction and compression of pixel spectral dimensions and expansion and alignment of a spatial window of pixel sample spatial dimensions on the preprocessed hyperspectral image to obtain mapping characteristics;

a self-attention module configured to: based on the mapping characteristics, after two times of information interaction are carried out in sequence, spectrum-space characteristics are obtained;

a classification module configured to: obtaining the category of each pixel in the acquired hyperspectral image by adopting a classifier based on the spectrum-space characteristics;

A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for hyperspectral image classification as described above.

A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for hyperspectral image classification as described above when executing the program.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a hyperspectral image classification method, which is characterized in that after spectrum characteristic channel compression, non-local spatial information extraction and spectrum characteristic channel expansion are sequentially carried out on mapping characteristics, the mapping characteristics are fused with the mapping characteristics to obtain interaction information, and the model performance and generalization capability of a model on a hyperspectral image classification task are further improved; in particular, a double-head self-attention mechanism is used for capturing non-local space interaction and correlation information, the mechanism enlarges the receptive field for capturing space information, and efficient matrix dot product multiplication is used for reducing the model calculation amount and the model parameter amount, so that the efficiency of classification is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of model training according to a first embodiment of the present invention;

FIG. 3 is a structural diagram of a Bottleneck transform sub-block according to a first embodiment of the present invention;

fig. 4 is a diagram of a dual-headed self-attention layer according to a first embodiment of the invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment provides a hyperspectral image classification method, as shown in fig. 1, which specifically includes the following steps:

step 1, acquiring a hyperspectral image, and preprocessing the hyperspectral image.

Specifically, the specific method of pretreatment is as follows: and respectively carrying out mean-variance standardization treatment on each spectrum channel in the acquired hyperspectral image so as to accelerate the convergence speed of the proposed network model in the training process.

And 2, inputting the preprocessed hyperspectral images into a hyperspectral image classification network to obtain the category of each pixel in the acquired hyperspectral images. As shown in fig. 1, the hyperspectral image classification network includes a compression and expansion block (including a compression and expansion module), a self-attention block (self-attention module), and a classifier.

And for image blocks input by the network model, adopting a compression and expansion block to perform channel interaction and compression of pixel spectral dimensions and expansion and alignment of a spatial window of pixel sample spatial dimensions to obtain mapping characteristics. As shown in fig. 1, the companding block includes a spectral compression set consisting of a 1 × 1 convolution layer, a batch normalization operation, and a Relu activation function, and a spatial expansion set consisting of a 2 × 2 convolution layer, a batch normalization operation (BN), and a Relu activation function. The spectrum compression group is used for realizing channel interaction and compression of the spectrum dimension of the pixel sample, the spectrum compression coefficient alpha is used for controlling the channel interaction and compression degree of the spectrum dimension, and the alpha is set to be 0.64 through experimental verification; and the space expansion group is used for realizing the expansion and alignment of a space window of a pixel sample space dimension, and the original mapping characteristics (characteristic mapping blocks or mapping characteristic maps) of 9 multiplied by alpha C are transformed into the mapping characteristics of 12 multiplied by alpha C.

The mapping features are input from the attention block, and interaction information (spectral interaction and non-local spatial interaction information, or spectral-spatial features) of the mapping features is extracted. Specifically, based on the mapping characteristics, after two times of information interaction are carried out in sequence, the spectrum-space characteristics are obtained; the information interaction is to perform spectrum characteristic channel compression, non-local space information extraction and spectrum characteristic channel expansion on the input characteristics in sequence, and then fuse the input characteristics with the information interaction to obtain output characteristics. As shown in fig. 1, the self-attention block is composed of two information interactive sub-blocks (Bottleneck Transformer, i.e. bottleeck Transformer sub-blocks) connected in sequence. The bottleeck Transformer subblock, as shown in fig. 3, is derived from a bottleeck variant, one of the basic units of classical ResNet, and consists of a 1 × 1 convolution layer set, a 3 × 3 convolution layer set, and a 1 × 1 convolution layer set, with targeted matching to batch normalization operations and Relu activation functions, plus a residual connection. The Bottleneck Transformer sub-block replaces the 3 × 3 convolutional layer set in Bottleneck with a double-headed Self-Attention (Dual-Head Self-Attention) layer, and the rest of the settings remain unchanged. Specifically, the Bottleneck transducer subblock is composed of an input layer, an output layer and a plurality of output layers which are sequentially connected,A first 1 x 1 convolutional layer, a dual-headed self-attentive layer, a second 1 x 1 convolutional layer, and an output layer. The output channel of the first 1 × 1 convolutional layer is configured with batch normalization operations and Relu activation functions, and the output channel of the second 1 × 1 convolutional layer is configured with Relu activation functions. And the output of the input layer and the output of the second 1 multiplied by 1 convolutional layer are input into the output layer after fusion addition operation and Relu activation function. The first 1 x 1 convolutional layer is used to perform compression of the spectral feature channel, the second 1 x 1 convolutional layer is used to perform expansion of the spectral feature channel, and the dual-headed self-attention layer is used to model non-local spatial information. As shown in fig. 4, the double-headed self-attention layer splits the input data into three same-dimension shared matrices q, k, and v after the input data is subjected to dimension transformation by the full connection layer; specifically, x represents the dot product operation of the matrix, + represents the element-by-element addition operation of the matrix, 1 x 1 is the dot convolution operation, Softmax represents the nonlinear activation operation of the Softmax function, W_q,W_kAnd W_vTrainable weighting matrices, R, representing respective corresponding matrices q, k and v_hAnd R_wThe relative position codes of the height and the width of the characteristic input diagram are respectively represented, r represents a position code matrix generated for the output characteristic diagram, and besides, a T corner with upper left corner in capital is used in the diagram to mark a transpose matrix of a torque conversion matrix. Dot product multiplication and element-by-element addition operations of the double-headed self-attention layer configuration matrix, Softmax activation functions and relative position encoding calculation operations. Specifically, firstly, W is aligned_qAnd W_kThe similarity of a matrix q and a matrix k is calculated by carrying out nonlinear activation on the dot product result of the matrix q by using a softmax activation function, wherein the q matrix is embedded with relative position codes firstly, then proper dimensionality scaling is carried out on the similarity result, the corrected result is also the self weight of the v matrix, and the matrix v subjected to self-attention weighting correction is input into an output layer by using the self weight. In other words, the input map is processed by 1 × 1 convolutional layer to obtain initialized same matrixes q, k and v, and the matrixes q and k are dot-product-operated to obtain a matrix qk^TRelative position codes R corresponding to height and width of input map_hAnd R_wPerforming element-by-element addition operation to obtain a matrix r, and performing dot product operation on the matrix q and the matrix r to obtain a matrix qr^TMatrix qk^TAnd qr^TAnd after element-by-element addition operation, performing nonlinear activation by using a softmax activation function to obtain a similarity matrix of the matrix q and the matrix k, and performing dot product operation on the similarity matrix and the matrix v to obtain an output result. The self-attentive body portion of the dual-headed self-attentive layer used may be represented as:

where DHSA (q, k, v) represents the output result (mutual information) of the double-headed self-attention layer, d_kFor the dimension of the k vector used as the dimension scaling of the intermediate result, T denotes the vector transpose and softmax is the softmax activation function.

And inputting the interaction information into a classifier to obtain the category of each pixel in the acquired hyperspectral image. As shown in figure 1, the method uses a classic three-layer classifier to perform final hyperspectral image pixel classification, wherein the hyperspectral image pixel classification comprises a mean pooling layer, a flattening layer and a full-link layer, and the final classifier uses a softmax activation function to predict and generate a label classification probability vector.

As shown in fig. 2, the method for training the hyperspectral image classification network includes:

(1) and carrying out mean-variance standardization processing on each spectral channel of the hyperspectral image.

(2) Filling 0 in the preprocessed image boundary; and splitting the data set according to the ground object class sample proportion of the data set. For each category in the data scene, according to the training sample: and (3) verifying the sample: test specimen r₁:r₂:r₃For example, for the Indian Pines dataset and the Kennedy Space Center dataset, r₁:r₂:r₃0.1:0.01:0.89 for the Pavia University dataset r₁:r₂:r₃0.05:0.01: 0.945; when the number of certain sample classes is too small to meet the sampling requirement of the verification set, the minimum sampling number is set to ensure that each class is uniformly sampled according to the proportion of the number of class samples; specifically, when each pixel sample is sampled, the image is takenCutting an image block with the size of 9 multiplied by C by taking the element as a center, wherein 9 multiplied by 9 represents the size of a space window of the image block, and C is the original spectral dimension of the image; the training samples, the verification samples and the test samples of each category are respectively aggregated into a training set, a verification set and a test set.

(3) For each image block input into the network model, realizing spectral domain compression and spatial domain expansion of the pixel sample through a compression expansion block; and extracts the interactive information through the self-attention block.

(4) And (5) judging whether the iteration number of the hyperspectral image classification network reaches the set training iteration number, if so, performing the step (5), and otherwise, returning to the step (3).

(5) And sending the interaction information (spectrum-space characteristics) to a classifier for pixel classification, and calculating a loss value according to the label. The classifier uses a softmax activation function to predict and generate a label classification probability vector, then uses a cross entropy loss function to calculate a loss value, iteratively trains the proposed hyperspectral image classification network model in a back propagation mode, updates parameters of the proposed hyperspectral image classification network model, and expresses the cross entropy loss function used by the proposed hyperspectral image classification network model as follows:

wherein,

representing the calculated loss value, N representing the number of samples of a single batch training set in a small batch training mode adopted by the model, where the value is 32, K represents the number of classes in a data scene, N and K index the nth sample of the current batch training set and the kth class in the class label set, y_nRepresenting the true value, In (y), of the nth hyperspectral image patch sample In the current batch of training set_nK) denotes the indicator function, when y_nWhen k is, In (y)_nK) is 1; otherwise, In (y)_nK) is 0. In addition to this, the present invention is,

and outputting a probability value representing that the nth hyperspectral image block sample under consideration belongs to the softmax function of the class k.

The network model method provided by the invention uses an Adam optimizer, the learning rate is set to be 0.001, and the size of a single-batch training set in a small-batch training mode is 32. In model training, calculating loss values corresponding to a training set and a verification set after each iteration is completed, and using a model after 100 iterations as a final model. In the model testing stage, the cut image block testing set is used for performing model testing in different hyperspectral scenes, the classification performance of the model in each hyperspectral scene can be quantitatively measured according to the corresponding truth value label, and the model can obtain the visual image of the whole scene by assigning class labels to each pixel in the scene.

The method makes up the defects of the existing deep learning-based method on the hyperspectral image classification task to a certain extent, obtains richer and more robust spectrum-space characteristics, and further improves the model performance and generalization capability of the model on the hyperspectral image classification task; in particular, the model uses a double-headed self-attention mechanism to capture non-local spatial interaction and correlation information, the mechanism enlarges the receptive field for capturing spatial information, and efficient matrix dot product multiplication is used to reduce the model calculation amount and the model parameter amount, which is beneficial to improving the efficiency of the model.

Example two

The embodiment provides a hyperspectral image classification system, which specifically comprises the following modules:

It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in a hyperspectral image classification method as described in the first embodiment above.

Example four

The present embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps in the hyperspectral image classification method according to the first embodiment are implemented.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A hyperspectral image classification method is characterized by comprising the following steps:

acquiring a hyperspectral image, and preprocessing the hyperspectral image;

2. The hyperspectral image classification method according to claim 1, wherein the channel interaction and compression of the spectral dimensions of the pixels are realized by a spectral compression group consisting of a 1 x 1 convolutional layer, a batch normalization operation and a Relu activation function;

or,

the spatial window expansion and alignment of the pixel sample spatial dimension is achieved by a spatial expansion group consisting of a 2 x 2 convolution layer, a batch normalization operation and a Relu activation function.

3. The hyperspectral image classification method according to claim 1, wherein the spectral feature channel compression is realized by a first 1 x 1 convolutional layer;

4. The hyperspectral image classification method according to claim 1, wherein the non-local spatial information extraction is realized by a double-headed self-attention layer;

5. The hyperspectral image classification method according to claim 1, wherein the spectral feature channel expansion is realized by a second 1 x 1 convolutional layer;

6. The hyperspectral image classification method according to claim 5, wherein the fusing with the mapping features specifically is: and the mapping characteristic and the output of the second 1 × 1 convolutional layer are subjected to fusion addition operation and Relu activation function in sequence to obtain interactive information.

7. The hyperspectral image classification method according to claim 1, wherein the preprocessing comprises performing mean-variance normalization on each spectral dimension of all pixels in the acquired hyperspectral image.

8. A hyperspectral image classification system, comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for hyperspectral image classification according to any of the claims 1 to 7.

10. Computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor realizes the steps in a method for hyperspectral image classification according to any of the claims 1 to 7 when executing the program.