CN114913434A

CN114913434A - High-resolution remote sensing image change detection method based on global relationship reasoning

Info

Publication number: CN114913434A
Application number: CN202210622122.9A
Authority: CN
Inventors: 梁漪; 张成坤; 韩敏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-08-16
Anticipated expiration: 2042-06-02
Also published as: CN114913434B

Abstract

A high-resolution remote sensing image change detection method based on global relationship reasoning comprises the following steps: firstly, extracting multi-scale features of a double-time phase remote sensing image through a pre-trained encoder; secondly, a global relationship reasoning module is used for carrying out relationship reasoning between the regions on each scale characteristic; and finally, establishing a multi-scale feature fusion decoder, and generating a final change detection result through a semantic segmentation head. In addition to acquiring local information from the stacked convolutional layers, the invention also fully considers the global semantic information capable of representing the internal relation between the changing objects, and in addition, the method is a network structure of an encoder-decoder, can realize detail information recovery, effectively weakens the interference of background noise and reduces the phenomenon of false detection. The method and the device fully utilize the multi-scale information and the global semantic information of the remote sensing image to generate the division characteristics with the resolution effect, effectively improve the change detection precision and have wide application prospect.

Description

High-resolution remote sensing image change detection method based on global relationship reasoning

Technical Field

The invention belongs to the technical field of remote sensing image processing, and relates to a high-resolution remote sensing image change detection method based on global relationship reasoning. The invention can be used for detecting the building change in two optical remote sensing images with different time phases, and can be applied to land utilization and coverage, city planning and the like.

Background

The primary objective of the remote sensing image change detection task is to make a decision on the change state of an image pixel or an object by comparing double-temporal or multi-temporal images with time intervals, and can provide an important theoretical basis for upper-layer management and decision of tasks such as urban spatial layout, land utilization detection, water body monitoring, disaster monitoring and the like. In recent years, with the rapid development of satellite sensor technology, the resolution of remote sensing images is developed from the past dozens of meters and hundreds of meters to the present meter level and sub-meter level, and the remote sensing images show the trend of high resolution. The high-resolution remote sensing images bring more technical challenges to earth surface change detection, the high-resolution remote sensing images contain more earth surface information, the texture details of the ground features are finer, the topological relation among the ground features is more complex, and the uncertainty of a change detection result is greatly increased. In addition, different from the imaging mode of a common optical image, the remote sensing image is obtained from a satellite or an unmanned aerial vehicle at a high altitude overlooking the ground, the process is easily influenced by factors such as time, light, environment and the like, interference such as shape distortion, overexposure, shadow and the like can be caused, and the change detection difficulty is further increased.

With the development of big data and computing resources, the deep learning technology is gradually introduced into the field of change detection analysis. The depth network can learn abstract representation of data layer by layer, the nonlinear learning capability of the depth network is easier to handle complex pattern recognition, and an actual solution is provided for high-resolution remote sensing image change detection. In the image domain, the minimum learning unit of the depth network is 2-dimensional convolution. Depending on the translation invariance of convolution and the characteristic of being capable of smoothing noise, the spatial hierarchical structure of the mode can be learned by stacking multiple layers of convolution, and the extracted multi-layer features contain rich local details. Therefore, many researches and designs various convolutional neural networks at home and abroad to meet the task of detecting the change of the remote sensing image.

The paper "Daudtr, Bertrand LS, Alexandre B. full capacitive Signal Networks for Change Detection [ C ]// IEEEInternationality Conference on image Processing, ICIP 2018, Athens, Greece, October 7-10,2018: 4063-4067" by Rodrigo CayeDaudt et al presents 3 Full Convolutional (FC) network structures, FC-EF, FC-Sim-Conc and FC-Sim-Diff, respectively. The FC-EF comprises an encoder and a decoder which can extract 4 scale features, and double time phase images are spliced and input into a network, belonging to a single-input single-output structure; the FC-Sim-Conc comprises a twin encoder and a decoder, the double-temporal images are respectively input into the twin encoder, are subjected to connection processing on the extracted characteristic channels with the same scale and then are input into the decoder, and the FC-Sim-Conc belongs to a double-input single-output structure; the FC-Sim-Diff also comprises a twin encoder and a decoder, wherein the twin encoder is respectively input into the double-phase images, and the difference value of the extracted same-scale features is calculated and then input into the decoder. Further, Rodrigo CayeDaudt et al, in the paper "DaudR, Bertrand LS, Alexandre B, et al, Multitask learning for large-scale session management detection [ J ]. Computer Vision and Image interpretation, 2019,187:102783," improve FC-EF networks by adding a residual block at the end of each layer of the FC-EF encoder to facilitate the training of the network (i.e., FC-EF-Res).

Chen et al, in the paper "Chen H, Shi Z W.A Spatial-Temporal Attention-Based Method and a New data set for Mobile Sensing image Change Detection [ J ]. Remote Sensing,2020,12(10):1662," propose spatio-Temporal Attention Network (STANet), which extracts visual features Based on the ResNet18 convolutional Network and uses Attention to self-adapt the region of interest of the image of interest. Shi et al in the article "ShiQ, Liu M X, Li S C, et al, ADeeply Supervised Attention method-based Network and an Open orthogonal Image data set for Remote Sensing Change Detection [ J ]. IEEEtransactions on geographic and Remote Sensing,2022,60: 1-16" propose a depth Supervised Attention method-based Network (DSAMNet), which also extracts Image features based on the ResNet18 convolutional Network and uses the spatial Attention and channel Attention mechanisms to generate discriminative features.

However, the depth network formed by stacking convolutional layers is limited to fine detail feature extraction in a smaller area by the limitation of the receptive field of the convolutional kernel. And the context connection among the multi-scale features is not tight, and semantic information is not fully utilized. It is known that in the same scene, semantic attributes of variant objects are related, i.e. variant object bodies are consistent. If semantic information can be well utilized, one or a small number of change objects are determined, and the remaining change objects are further distinguished through semantic relations, so that not only can the detection precision be improved, but also the detection efficiency can be improved. In view of the deficiencies of the prior art, a change detection method that comprehensively utilizes global semantic information and local information is needed.

Disclosure of Invention

The optical remote sensing image has wide scene coverage range and complex background, and has the problems of shadow noise, overexposure, cloud layer shielding and the like caused by an imaging mode. Detecting some local changes in complex pairs of optical remote sensing images remains an open and challenging task at present. The invention provides an effective change detection method by considering the internal relation between the global semantic information and the objects of the image aiming at the problem that a plurality of changed objects are difficult to accurately detect only through local detail information in a high-resolution remote sensing image. The invention provides a high-resolution remote sensing image change detection method based on global relationship reasoning, aiming at the defect that the prior art focuses more on local detail information of an optical remote sensing image and does not fully utilize global semantic information to carry out change detection, and the change detection precision can be improved.

In order to solve the problems, the invention adopts the technical scheme that:

a high-resolution remote sensing image change detection method based on global relationship reasoning is disclosed. The method not only obtains local information from the stacked convolution layer, but also fully considers global semantic information which can represent the internal relation between the changing objects, and particularly adopts a global relation reasoning module to model the semantic relation between different objects or regions on the remote sensing image characteristic diagram. In addition, the method is a network structure of the encoder-decoder, can realize detail information recovery, effectively weakens the interference of background noise, and reduces the phenomenon of false detection. The specific high-resolution remote sensing image change detection method based on global relationship reasoning comprises the following steps:

the first step, sample partitioning and data preprocessing.

1.1) sample partitioning. A pair of different times t covering the same earth's surface ₁ ,t ₂ The high-resolution remote sensing image and the corresponding actual label are taken as a sample, and the collected sample is divided into a training set, a verification set and a test set;

1.2) data preprocessing. The preprocessing process is divided into image cropping and standardization. Taking a sample as an example, the image and the actual label are clipped to 256 × 256, and the clipped image is normalized channel by channel to obtain the processed t ₁ Temporal video

And t ₂ Temporal video

C, H, W indicates the number of channels, height, and width of the video, respectively.

And secondly, constructing a change detection network architecture based on global relationship reasoning.

2.1) constructing an encoder. Twin encoders built on the basis of the ResNet18 network for extraction respectively

And

obtaining a multi-scale feature by using a feature map on 4 scales

And

2.2) constructing a Global relationship Reasoning module (Global learning, GloRe). Multi-scale features extracted from twin encoders

And

each input was to 4 GloRe. Each GloRe is connected with a rolling block with a kernel of 3 multiplied by 3 in series, and the output of the rolling block is enhanced

And

2.3) constructing a decoder. The main role of the decoder is the fusion of the multi-scale enhancement features and the generation of difference features, with the following sub-steps:

2.3.1) multiscale enhancement feature fusion. The minimum scale features gradually get close to the upper scale features. Namely, it is

And

generation and fusion of units

Blend features of equal size

And

generation and fusion of units

Blend features of equal size

By analogy, finally generate and

blend features of equal size

Each fusion unit has the same structure to fuse

And

for example, the fusion process is described as follows:

1) will be characterized by

Inputting the data into an up-sampling layer using bilinear interpolation, inputting the data into a convolution layer with a kernel of 3 multiplied by 3, and obtaining characteristics by activating a function ReLU layer

Namely, it is

2) Will be characterized by

Inputting the data into a kernel of 3 × 3 convolutional layer, and activating the function ReLU layer to obtain the characteristics

Namely, it is

3) Merging in a channel cascade

And

and fed to a kernel 1 x 1 convolution block to obtain the fusion characteristics

Namely, it is

2.3.2) generating a difference signature. Fused features

And

making difference, inputting the difference into a kernel to obtain difference characteristic F for a 3 multiplied by 3 convolution block _diff I.e. by

2.4) construction of Change detection Head. Is formed by connecting an up-sampling layer of bilinear interpolation and a convolution block in series and is used for generating a change detection graph

Comprising the following substeps:

2.4.1) differential feature F _diff Upsampling (scale factor of 2, bilinear interpolationMethod), get the updated feature F by the volume block (Conv 3 × 3+ BN + ReLU) _diff(1) ；

2.4.2) feature F _diff(1) Upsampling (scaling factor of 2, bilinear interpolation), by means of a convolution block (Conv 3 × 3+ BN + ReLU), to obtain a feature F _diff(2) ；

2.4.3) feature F _diff(2) Output through convolutional layer (Conv 1 × 1, output channel 1) and Sigmoid layer

And thirdly, constructing a loss function.

The Loss function is constructed in a combined manner of Dice Loss (Dice Loss) and Binary cross entropy Loss (BCE Loss).

And fourthly, training and verifying the network.

4.1) basic setting. Parameters required to initialize the network training process include iteration round (Epoch), batch size (Batchsize), initial Learning Rate (LR). And setting a learning rate updating strategy, such as linear attenuation, exponential attenuation, fixed step attenuation and the like. Updating the weight of the change detection network by using an Adaptive Moment Estimation optimizer (Adam), and setting a first-order Moment attenuation coefficient beta of the change detection network ₁ And second moment attenuation coefficient beta ₂ 。

4.2) network training. The network one-time training process corresponds to one Epoch and comprises the following sub-steps:

4.2.1) inputting a training sample with the size of Batchsize into a change detection network based on global relationship reasoning to obtain a change detection result;

4.2.2) calculating Dice Loss and BCE Loss, transmitting the sum of the Dice Loss and the BCE Loss to an Adam optimizer to update the weight of the change detection network, and repeating the step 4.2.1) until the training of the network on all training samples is completed, so as to obtain the change detection network trained under Epoch once.

4.3) network authentication. And inputting the verification set to a trained network to obtain a change detection result of the verification set, and calculating an evaluation index of the network according to the change detection result and an actual sample label.

And fifthly, repeating the network training and verifying process.

And finishing the Epoch network training, and selecting the optimal verification result as the final network.

Compared with the prior art, the invention has the beneficial effects that:

(1) from the aspects of practicability and operability of the model, the invention constructs an end-to-end remote sensing image change detection network, the network can acquire local information through an encoder and global semantic information through a global relationship reasoning module, the local information and the semantic information are fully utilized to detect the change of the image, and the problems of detection loss and missing detection of a change object caused by low global modeling efficiency of the conventional convolutional neural network are solved, so that the invention has the advantage of high detection precision.

(2) The detection network constructed by the invention uses decoding characteristics instead of coding characteristics to calculate the change information, and can effectively avoid complex background and imaging interference in the image. And in consideration of the complementary action of the multi-scale features, the multi-scale coding features are fused in the decoder, and can guide the detail recovery of the variable object to resist the problem of scale difference of the variable object, so that the identification of the variable object is more accurate.

(3) The encoder constructed by the invention firstly integrates the multi-scale features of each image and then calculates the difference features, but not integrates the multi-scale features after calculating the difference features of each layer, so that more salt and pepper noises can be effectively prevented from being generated, and the encoder has excellent change detection visual effect.

Drawings

FIG. 1 is a flow chart of a high resolution remote sensing image change detection method based on global relationship reasoning;

FIG. 2 is a block diagram of a high resolution remote sensing image change detection network based on global relationship reasoning;

FIG. 3 is a block diagram of a decoder;

FIG. 4 is a graph of the variation of the training loss of the network;

fig. 5 is a change curve of the evaluation index of the network authentication process F1;

FIG. 6(a) is a true tag of the LEVIR test set;

FIG. 6(b) is a graph showing the results of detecting changes in FC-EF;

FIG. 6(c) is a graph showing the results of detecting a change in FC-EF-Conc;

FIG. 6(d) is a graph showing the results of detecting changes in FC-EF-Diff;

FIG. 6(e) is a graph showing the results of detecting changes in FC-EF-Res;

FIG. 6(f) is a graph of change detection results for STANet;

FIG. 6(g) is a graph showing the results of detecting a change in DSAMNet;

FIG. 6(h) is a graph showing the results of change detection according to the present invention.

Detailed Description

In order to make the process problems solved by the present invention, the process schemes adopted and the process effects achieved more clear, the present invention will be further described in detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant elements of the present invention are shown in the drawings.

1. Data and operating environment.

This example is illustrated using a LEVIR change detection public data set containing building changes. The hardware platform of the simulation experiment of this example is: CPU is Intel (R) core (TM) i7-8700, main frequency is 3.2GHz, memory is 16GB, GPU is NVIDIA GTX 1070, and memory is 8G. The software platform of the simulation experiment of this example is: python 3.8.

2. And (5) carrying out the steps.

(1) Data acquisition and data pre-processing.

(1a) And (6) acquiring data. The LEVIR remote sensing image collects the changes of various buildings (villas or high-rise apartments) in 20 areas of a plurality of cities in Texas USA in the years 2002-2018. The data set contains 637 pairs of remote sensing images with the resolution of 0.5 m/pixel and corresponding real labels, and has 3 channels of red, green and blue, and the size of 1024 × 1024. The data set was calculated as 7: 1: the scale division of 2 is training set, verification set and test set, respectively including 445, 64 and 128 pairs of remote sensing images.

(1b) And (4) preprocessing data. The preprocessing process is divided into image cropping and standardization. Due to the limitation of the GPU, the experiment cannot be trained directly on the whole image, so all samples are cropped to 256 × 256 size and the cropped image is normalized channel by channel according to equation (1).

Wherein,

is t ₁ 、t ₂ One training sample taken at a time of day,

calculating the average value of all images under red, green and blue channels at the moment, and calculating the LEVIR data set channel by channel, including

The variance of all images under red, green and blue channels at the time is calculated

(2) And inputting the LEVIR training data set into a change detection network based on global relationship reasoning for forward propagation.

FIG. 2 is a high-resolution remote sensing image change detection network structure based on global relationship reasoning, provided by the invention, for a pair of preprocessed LEVIR images

And

for example, the forward propagation process of the network is as follows:

(2a) the multi-scale features are obtained by an encoder. To speed up training, ResNet18 pre-trained on ImageNet datasets (containing over 120 million images, for a total of 1000 classes) was used for encoding. ResNet18 extracts separately

And

features on 4 scales, i.e.

And

wherein,

and

(2b) and acquiring the multi-scale enhanced features through a global relationship reasoning module (GloRe). The essence of GloRe is that local areas on the feature map are treated as nodes, and the relationship between the nodes represents the relationship between the local areas on the feature map. The present example sets the number of nodes (N) for 4 glores to 128, 64, 32, and 16, respectively. Multi-scale features extracted by twin encoders

And

3X 3 rolled blocks (Co) by GloRe and corresponding kernel, respectivelynv 3 × 3+ BN + ReLU), output enhancement features

And

(2c) obtaining a difference feature F by a decoder _diff . Referring to fig. 3, the decoding process is as follows:

s2c.1 multi-scale enhancement features are obtained through a fusion unit and

blend features of equal size

The minimum scale features gradually get close to the upper scale features. In particular, the amount of the solvent to be used,

and

through a fusion unit FF ₁ Generating features

And

through a fusion unit FF ₂ Generating features

And

through a fusion unit FF ₃ Generating features

Fusion unit FF ₁ 、FF ₂ And FF ₃ Has the advantages ofThe same structure is adopted.

S2c.2 fusion feature

And

difference of (2)

Input to kernel for 3 × 3 convolution block to obtain difference features

(2d) Detection of Head output variation by variation detection of difference characteristics

(3) Calculating a prediction output according to equation (2)

And loss of authentic label Y.

Wherein L is a loss function, L _Dice Is DiceLoss, L _BCE Are BCELoss. DiceLoss addresses the problem of class imbalance, and expression is shown in equation (3).

In the formula,

and Y _ij Respectively representing the predicted value and the real value of the index position of i rows and j columns. ε is a smoothing constant, preventing the denominator to be 0, and is set to 10 in this example ^-5 . BCELoss is used for calculating the matching degree between the detection result of the predicted change and the real label, and the expression is shown as a formula (4).

(4) And (5) network iterative training and verification.

(4a) And setting parameters. The network training process sets Epoch to 50, Batchsize to 16, and LR to 0.001. The learning rate updating strategy adopts linear attenuation, specifically, the learning rates of the first 25 epochs are kept unchanged, and the learning rates of the last 25 epochs are linearly attenuated to 0. Parameter β of Adam optimizer ₁ ＝0.9，β ₂ ＝0.999。

(4b) And (5) network iterative training. The network one-time training process corresponds to one Epoch and comprises the following sub-steps:

s4b.1, inputting a training sample with the size of Batchsize into a change detection network based on global relationship reasoning for forward propagation to obtain a change detection result;

s4b.2 calculates corresponding loss, updates the network weight by using an Adam optimizer, repeats the step S4b.1 until the training of the network on all training samples is completed, and obtains a change detection network trained under one Epoch.

(4c) And (5) network authentication. And inputting the verification set to the trained network to obtain a change detection result of the verification set, and calculating an evaluation index F1 score (formula (6)) according to the change detection result and the actual sample label. And repeating the network training and verifying process to finish the Epoch network training. From fig. 4, where the training loss decays rapidly and tends to stabilize, and fig. 5, where the example validation set F1 evaluates, the 50 th round of training networks may be determined to be the optimal network.

(5) And (5) testing the network. And inputting the test set into the optimal network, and calculating the following evaluation indexes.

Accuracy (Acc) measures the degree of closeness of the predicted value and the true value, i.e. the proportion of correctly predicted (un) changed categories, and the expression is shown in formula (5).

Wherein, TP, TN, FP and FN are respectively a true positive case, a true negative case, a false positive case and a false negative case.

The F1 score can be regarded as a harmonic mean of the precision and recall, expressed as shown in equation (6).

The Mean Intersection over Union (MIoU) is the ratio of the Intersection and Union of the two sets of the actual value and the predicted value, and the expression is shown in formula (7).

The Kappa Coefficient (KC) measures the consistency of the model prediction result and the actual class, and is different from Acc in that the Kappa Coefficient can score the bias of a sample with unbalanced class, and the expression is shown in a formula (8).

(6) And (5) comparing and analyzing results.

Comparative analyses were performed using 6 prior art techniques and the present invention, all techniques performing in the LEVIR test set as shown in table 1. The invention represents the remote sensing image change detection method based on the global relationship reasoning, and the FC-EF, the FC-Sim-Conc, the FC-Sim-Diff and the FC-EF-Res represent 4 full-convolution change detection networks proposed by Daudt and the like, the STATNet represents a space-time attention network proposed by Chen and the like, and the DSAMNet represents a deep supervision attention measurement network proposed by Shi and the like.

Table 1 performance evaluation table of the present invention and the existing remote sensing image change detection model

The combination of table 1 shows that the invention obtains excellent detection results, and all indexes are improved higher than those of other technologies, which proves that the invention can obtain higher change detection precision. In addition, fig. 6 shows the visual results of the present invention and the prior art on one of the test samples, and it can be seen that the present invention can obtain a cleaner visual effect, and the changed objects have clearer edges and better internal consistency.

Finally, it should be noted that: the above examples are only intended to represent embodiments of the present invention, it is understood that the examples are only intended to illustrate the present invention and not to limit the scope of the present invention, after reading the present invention, it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, and various equivalent modifications of the present invention fall within the scope of the invention defined in the appended claims.

Claims

1. A high-resolution remote sensing image change detection method based on global relationship reasoning is characterized by comprising the following steps:

firstly, dividing a sample and preprocessing data;

1.1) dividing samples;

1.2) data preprocessing to obtain processed t ₁ Temporal images

And t ₂ Temporal video

C, H, W denotes the number of channels, height and width of the image;

secondly, constructing a change detection network architecture based on global relationship reasoning;

2.1) constructing an encoder; twin encoders built on the basis of the ResNet18 network for extraction respectively

And

obtaining a multi-scale feature by using a feature map on 4 scales

And

2.2) constructing a global relationship reasoning module GloRe; multi-scale features to be extracted by twin encoders

And

(*∈{t ₁ ,t ₂ }), input to 4 GloRe, respectively; each GloRe is connected with a convolution block with a kernel of 3 multiplied by 3 in series to output enhanced features

And

2.3) constructing a decoder for fusion of the multi-scale enhancement features and generating difference features, having the following sub-steps:

2.3.1) multi-scale enhanced feature fusion; gradually closing the minimum scale feature to the upper-level scale feature; namely, it is

And

generation and fusion of units

Blend features of equal size

And

generation and fusion of units

Blend features of equal size

By analogy, finally generate and

blend features of equal size

Each fusion unit has the same structure to fuse

And

to illustrate, fuseThe process is described as follows:

1) will be characterized by

Namely, it is

2) Will be characterized by

Namely, it is

3) Merging in a channel cascade

And

Namely that

2.3.2) generating difference characteristics; fused features

And

making difference, inputting the difference into a kernel to obtain difference characteristic F for a 3 x 3 convolution block _diff I.e. by

2.4) constructing a change detection Head; is formed by connecting an up-sampling layer with bilinear interpolation and a convolution block in series and is used for generating a change detection graph

Comprising the following substeps:

2.4.1) Difference characteristic F _diff Upsampling to obtain updated features F by convolution block _diff(1) ；

2.4.2) feature F _diff(1) Upsampling to obtain feature F by convolution block _diff(2) ；

2.4.3) feature F _diff(2) Export through convolutional and Sigmoid layers

Thirdly, constructing a loss function;

constructing a Loss function in a combined mode of Dice Loss and binary cross entropy Loss BCE Loss;

fourthly, network training and verification;

4.1) basic setting, including initializing parameters required by the network training process, setting a learning rate updating strategy, updating the weight of the change detection network by using an adaptive moment estimation optimizer Adam, and setting a first moment attenuation coefficient beta thereof ₁ And second moment attenuation coefficient beta ₂ ；

4.2) network training; the network one-time training process corresponds to one Epoch and comprises the following sub-steps:

4.2.2) calculating Dice Loss and BCE Loss, transmitting the sum of the Dice Loss and the BCE Loss to an Adam optimizer, updating the weight of the change detection network, and repeating the step 4.2.1) until the training of the network on all training samples is completed to obtain the change detection network trained under Epoch once;

4.3) network authentication; inputting the verification set to a trained network to obtain a change detection result of the verification set, and calculating an evaluation index of the network according to the change detection result and an actual sample label;

fifthly, repeating the network training and verifying process;

2. The method for detecting the change of the high-resolution remote sensing image based on the global relationship reasoning according to claim 1, wherein in the step 1.1), the sample division specifically comprises the following steps: a pair of different times t covering the same earth's surface ₁ ,t ₂ And (4) taking the high-resolution remote sensing image and the corresponding actual label as a sample, and dividing the collected sample into a training set, a verification set and a test set.

3. The method for detecting the change of the high-resolution remote sensing image based on the global relationship reasoning according to claim 1, wherein in the step 1.2), the data preprocessing process comprises image cutting and standardization.