CN113222016B

CN113222016B - Change detection method and device based on cross enhancement of high-level and low-level features

Info

Publication number: CN113222016B
Application number: CN202110519208.4A
Authority: CN
Inventors: 黄睿; 邢艳; 鲁欢
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2022-07-12
Anticipated expiration: 2041-05-12
Also published as: CN113222016A

Abstract

The invention discloses a change detection method and a device based on cross enhancement of high-level and low-level features, wherein the method comprises the following steps: and repeatedly optimizing the cross coding characteristics through a cross enhancement module, namely: multiplying the obtained low-level change characteristics and the high-level change characteristics element by element to obtain low-level characteristics with better representation capability; element-by-element multiplying the processed high-level change features with the low-level change features to update the high-level features; repeating the above operations to improve the representational capacity of the features; calculating loss and summing the change detection prediction results of the cross coding layer in the training process, calculating loss and summing the output of the high-layer and low-layer change characteristics at different stages in the cross enhancement process, and sequentially adding the loss and the loss of the final change detection result to obtain final loss; training is carried out based on a Pythrch deep learning network framework, and change detection is carried out based on a trained model. The invention obtains accurate change results by fusing multi-layer prediction graphs.

Description

Change detection method and device based on cross enhancement of high-level and low-level features

Technical Field

The invention relates to the field of change detection, in particular to a change detection method and device based on cross enhancement of high-level and low-level features.

Background

The change detection of images is an important research subject in the field of computer vision, and the main task of the change detection is to process images shot at different moments in the same scene so as to detect a change area of the captured images in two observations, so that the change detection has wide application in the fields of resource monitoring, anomaly detection, video monitoring, automatic driving and the like.

The traditional change detection method adopts manual characteristics and an optimization algorithm which are artificially designed. The early change detection method most commonly uses an image difference method. Although the detection method based on the image aberration value change is simple and intuitive, a proper threshold value needs to be selected to divide the changed area and the unchanged area. In addition, some complex features such as: methods such as gradient, Change Vector Analysis (CVA) are also introduced in the field of change detection. The document [1] proves that the change vector analysis method has potential practicability in multispectral monitoring of land coverage and land conditions. In order to improve the robustness of the detection result, more complex models are introduced into the change detection. For example, document [2] proposes to use a markov data fusion method to combine baud features and spatial context to generate precise variations; document [3] models change detection as a reconstruction problem with an iterative coupled dictionary learning model; document [4] detects changes in an image by reconstructing errors using a pixel-invariant joint dictionary. In order to overcome the influence of illumination and camera pose and obtain a better change detection result, document [5] proposes image alignment, illumination correction and low-rank change detection-based joint optimization of a scale from small to large. It is also possible to use image difference methods after image alignment and correction of the illumination, as in document [6 ]. Although the traditional change detection method is simple and intuitive to use, the detection result is greatly interfered by factors such as illumination, camera pose and the like in the application of a real scene.

In recent years, with the rapid development of deep learning, Convolutional Neural Networks (CNNs) have enjoyed great success in the field of computer vision. At present, different deep convolutional neural network system structures are designed in many change detection methods to overcome the camera pose difference and the illumination interference, so that the detection result has better robustness. For example, document [7] designs a small network with only two convolutional layers and one fully-connected layer to detect whether a 28 × 28 image pair has changed. After training, a full resolution prediction can be generated using a sliding window. Document [8] processes aerial images of buildings before and after tsunami by a convolutional neural network to determine damage to the buildings. However, the above methods are only different from the model input, but the basic network structure is realized by stacking a plurality of convolutional layers. Besides Convolutional Neural Networks (CNN), some other methods are also applied to change detection, such as: a countermeasure network (GAN), a cyclic convolutional neural network (RCNN), and a Long Short-Term Memory network (LSTM). In fact, most of the above-mentioned change detection methods based on deep learning merely introduce differences in the network structure.

However, how to design an efficient change detection network is still an open problem. The method introduces the idea of characteristic cross enhancement into change detection, ensures better description of image change, has better robustness on illumination, camera pose difference and seasonal change, and obtains a prediction result closer to real change.

Reference to the literature

[1]R.D.Johnson,E.Kasischke,Change vector analysis:A technique for the multispectral monitoring of land cover and condition,International Journal of Remote Sensing 19(3)(1998)411–426.

[2]G.Moser,E.Angiati,S.B.Serpico,Multiscale unsupervised change detection on optical images by markov random fifields and wavelets,IEEE Geoscience and Remote Sensing Letters 8(4)(2011)725–729.

[3]M.Gong,P.Zhang,L.Su,J.Liu,Coupled dictionary learning for change detection from multisource data,IEEE Transactions on Geoscience and Remote sensing 54(12)(2016)7077–7091.

[4]X.Lu,Y.Yuan,X.Zheng,Joint dictionary learning for multispectral change detection,IEEE transactions on cybernetics 47(4)(2017)884–897.

[5]W.Feng,F.-P.Tian,Q.Zhang,N.Zhang,L.Wan,J.Sun,Fine-grained change detection of misaligned scenes with varied illuminations,in:Proceedings of the IEEE International Conference on Computer Vision,2015,pp.1260–1268.

[6]S.Stent,R.Gherardi,B.Stenger,R.Cipolla,Precise deterministic change detection for smooth surfaces,in:2016IEEE Winter Conference on Applications of Computer Vision,IEEE,2016,pp.1–9.

[7]A.Ding,Q.Zhang,X.Zhou,B.Dai,Automatic recognition of landslide based on cnn and texture change detection,in:2016 31st Youth Academic Annual Conference of Chinese Association of Automation(YAC),IEEE,2016,pp.444–448.

[8]A.Fujita,K.Sakurada,T.Imaizumi,R.Ito,S.Hikosaka,R.Nakamura,Damage detection from aerial images via convolutional neural networks,in:2017 Fifteenth IAPR International Conference on Machine Vision Applications,IEEE,2017,pp.5–8.

Disclosure of Invention

The invention has provided a change detection method and apparatus based on characteristic cross enhancement of high level and low level, the invention is based on VGG (visual geometry group)16 twin network to withdraw the convolution characteristic of the reference (reference) picture and inquiry (query) picture separately, use inner coding and cross coding technology to get the multi-scale difference characteristic between two pictures sequentially, and divide it into high level characteristic and low level characteristic, optimize high level characteristic and low level characteristic repeatedly through the method of cross enhancement; and adopting a multilayer supervision mechanism at different layers, fusing different layers in cross coding layer by layer and prediction results of high-layer and low-layer characteristics after cross enhancement, and training the network by using a multilayer loss function, which is described in detail in the following description:

in a first aspect, a change detection method based on cross enhancement of high-level and low-level features, the method comprising:

and repeatedly optimizing the cross coding characteristics through a cross enhancement module, namely: multiplying the obtained low-level change characteristics and the high-level change characteristics element by element to obtain low-level characteristics with better representation capability; element-by-element multiplying the processed high-level change features with the low-level change features to update the high-level features; repeating the above operations to improve the representational capacity of the features;

calculating loss and summing the change detection prediction results of the cross coding layer in the training process, calculating loss and summing the output of the high-layer and low-layer change characteristics at different stages in the cross enhancement process, and sequentially adding the loss and the loss of the final change detection result to obtain final loss;

training is carried out based on a Pythrch deep learning network framework, and change detection is carried out based on a trained model.

Wherein the method further comprises:

and extracting the convolution characteristics output by the convolution layer module from the reference image and the query image respectively through a twin convolution neural network, and carrying out internal coding and cross coding operation on the convolution characteristics to obtain cross coding characteristics.

In a second aspect, a change detection apparatus based on high-level and low-level feature cross enhancement, the apparatus comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling the program instructions stored in the memory to cause the apparatus to perform the method steps of the first aspect.

In a third aspect, a computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps of the first aspect.

The technical scheme provided by the invention has the beneficial effects that:

1. the twin convolutional neural network model designed by the invention is not a simple extension of the existing semantic segmentation method any more, and the image difference characteristics obtained by using the network are used for detecting changes, so that the twin convolutional neural network model has excellent detection performance on data sets such as PCD (panoramic change detection), VL-CMU-CD (visual positioning for change detection at the university of Chimerron in the card), CDnet (change detection network) and the like;

2. the invention designs a cross enhancement module, and repeatedly updates and optimizes the characteristics of the high layer and the low layer by using a cross enhancement method to obtain the multilayer characteristics which can accurately represent the change information; the difference between the positions of illumination and a camera is overcome, and a change detection result with better robustness is obtained;

3. the invention adopts a multi-layer supervision mode, so that the network layer finds the change characteristics of the image, and obtains an accurate change result by fusing the multi-layer prediction images.

Drawings

FIG. 1 is a schematic diagram of a change detection network based on cross enhancement of high-level and low-level features proposed by the present invention;

FIG. 2 is a schematic diagram of the detection results of the proposed method and other methods on a common data set PCD;

fig. 3 is a schematic diagram of the detection result of the proposed method and other methods on the common data set VL _ CMU _ CD according to the present invention;

FIG. 4 is a schematic diagram of the detection results of the method and other methods proposed in the present invention on the common data set CDnet;

fig. 5 is a schematic structural diagram of a change detection device based on cross enhancement of high-level and low-level features.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

The method provides a change detection method based on cross enhancement of high-level and low-level features aiming at the problem of image change area detection, and obtains change features with better robustness by repeatedly cross enhancing the high-level features and the low-level features.

Example 1

A change detection method based on cross enhancement of high-level and low-level features, see fig. 1, the method comprising the steps of:

one, construct twin convolution neural network

Referring to fig. 1, the twin network for feature extraction in the method is a Unet network with a VGG16 network as a basic structure, and the invention deletes the last two fully-connected layers of the VGG16 and uses only the first five convolutional layer modules. And extracting convolution characteristics output by 5 convolution layer modules from the reference image X and the query image Y respectively by using the network.

The twin convolutional neural network mentioned in the embodiment of the present invention is formed by using independent feature extraction networks on two images.

Wherein, VGG16 network structure mainly includes: the network structures of the 5 convolutional layer modules Conv1-Conv5, the two fully connected layer modules FC6 and FC7, and the VGG16 are well known to those skilled in the art, and no further description is given in the embodiments of the present invention.

Secondly, performing inner coding and cross coding operation on the convolution characteristics of the reference image X and the query image Y respectively

Wherein, the inner coding module includes: and the first convolution unit performs up-sampling twice on the convolution characteristics obtained before and then performs convolution, batch normalization and ReLU (linear rectification function) operation.

In the specific implementation, the convolution, batch normalization, and ReLU operations are all well known to those skilled in the art, and are not described in detail in the embodiments of the present invention.

To reduce the amount of computation, the number of input/output channels of the convolutional layer in the second convolutional unit is reduced by half. And recovering the output channel in the last convolution unit to be the preset channel number.

Through the processing, the inner coding module can extract more robust semantic features, so that interference caused by illumination change and camera pose difference is avoided.

In one embodiment, the intra-coding module is formulated as follows:

wherein,

convolution characteristics of the ith convolution layer module being X or Y;

the convolution characteristic output by the ith convolution layer module of X or Y is the inner coding characteristic generated by the inner coding module; cat (-) is a splicing operation, will

And

splicing according to the channel; φ (-) is an inner encoding operation, comprising: convolution, batch normalization and upsampling operations.

It should be noted that

Inner coding feature of

The inner coding features of the images before and after the change are processed by a cross coding module to obtain the absolute difference features of the images, and the change of each feature layer can be accurately reflected. X and Y are generated after internal coding operationAnd performing difference on the 5 inner coding features to obtain 5 feature difference graphs of different levels. And combining the feature difference graph of each layer with the feature difference graph of the higher layer to obtain the cross coding feature with higher effectiveness.

The cross-coding operation formula is as follows:

wherein,

represents the cross-coding characteristics of the ith layer;

represents the computation process of the cross-coding module, and abs (·) represents the computation of the absolute difference between two features. It should be noted that the cross-coding feature of layer 5 is

The cross coding module also includes 3 convolution units, and the operation of each convolution unit is the same as that of the inner coding module, which is not described herein again in the embodiments of the present invention.

Thirdly, 5 cross coding characteristics are repeatedly optimized through a cross enhancement module

From the cross-coding module to 5 cross-coding features

And performing splicing operation and inputting the spliced operation into a convolution unit to reduce the number of channels to 128, so as to obtain low-level variation characteristics (LF).

The stitching operation is also performed to compress the number of channels to 128 by a convolution unit, resulting in high level variance features (HF).

Wherein, HF (F)^HF) And LF (F)^LF) The calculation formulas of (a) are respectively as follows:

where Conv (·,128) denotes a convolution unit composed of a convolution layer including 128 convolution kernels, a batch normalization layer, and a ReLU layer. Bi (-) represents a bilinear interpolation operation, which can change the resolution of the feature.

The element-by-element multiplication of the LF and HF features obtained in the method can obtain lower-layer features with better characterization capability. At the same time, the processed HF is also multiplied by the LF signature to update the higher layer signature.

And repeatedly using the previous step, so that the finally obtained result contains high-level semantic information and low-level texture information, and simultaneously, the influence of noise in a single high-level feature or low-level feature can be avoided. The formula for updating the HF and LF characteristics is as follows:

wherein, Bi (-) represents bilinear interpolation operation, which can change the resolution of the feature. As an element multiplication operation,

for the high level features obtained at the t-th iteration of the cross-enhancement operation,

and iterating the t-th time for cross enhancement operation to obtain the low-level features.

In addition, the operation of cross enhancement may be repeatedly used for a plurality of times to improve the representation capability of the features, and the specific execution times is not limited in this embodiment of the present invention.

Fourthly, prediction of change results

Multiscale change probability map P obtained from multiscale feature difference map at feature extraction stage₁-P₅This can be obtained from the following equation:

conv (. cndot., 2) is a convolutional layer that can generate two images, one of which is a probability map of change and the other of which is a probability map of no change.

Compared with the method of generating the prediction probability graph only at the network end, the prediction of the cross coding features is added, and the prediction value closer to the real change can be obtained. Prediction of LF and HF generation using crossover enhancement operation at the t-th time

And

this can be obtained from the following equation:

in order to obtain a more robust variation prediction result, P is required_iAnd

the concatenation is convolved. The final change detection result can be calculated by the following formula:

wherein T is the iteration number of the cross enhancement operation,

the predicted result generated for the Tth time.

It should be noted that since the resolution of the different predictors is different, all of the predictors are bi-linearly upsampled and sized to align with the reference image X. Final prediction result is represented by P_fThe first channel of (a).

Five change detection prediction results P of cross coding layer in feature extraction network are respectively detected in training process₁-P₅Calculating loss and summing, calculating loss and summing output of HF and LF characteristics at different stages in the cross enhancement process, and finally detecting change_fAre added in sequence to obtain the final loss

The final loss is composed of 4 parts in total, and the formula is as follows:

wherein,

in turn represent P_i、

And P_fIs lost.

Network training and testing

Based on the Pythrch deep learning network framework, the networks proposed in the first step to the fourth step are trained, and a trained network model can be obtained on a corresponding data set. And inputting the query image X and the reference image Y by using the network model, generating a change detection result after network calculation, and ending the process.

In summary, in the embodiments of the present invention, the depth features of the reference image and the query image are respectively extracted through the twin network, the multi-scale information of the images is fused through the inner coding operation, the multi-scale difference features between the reference image and the query image are extracted through the cross coding operation, and the change difference features of the current layer except the highest layer are fused with the change difference features of the previous layer. And constructing a cross enhancement module, and repeatedly optimizing the high-level features and the low-level features. And (4) training the network by using a plurality of layers of loss functions to obtain a change detection result which has accurate change information and can overcome the difference between the lighting and camera poses.

Example 2

The scheme of example 1 is further described below with reference to fig. 1, which is a specific example, and is described in detail below:

embodiments of the present invention employ an iterative strategy to repeatedly update the high-level and low-level features. Enhancing the high-level features by using the low-level features so that the high-level features obtain more variation details; the high-level features are used for enhancing the low-level features, so that the low-level features have better robustness to illumination, camera pose difference and seasonal variation. And obtaining a change detection result closer to the real value through repeated cross enhancement between the high-level features and the low-level features.

Specifically, the resulting cross-coded features are separated into high-level and low-level features according to their respective original feature extraction levels. Varying degrees may be described by high-level and low-level variation features. The high-level feature space resolution is small, but the high-level feature space resolution has high semantic abstraction, and unreal changes caused by illumination and camera pose differences can be effectively solved. And the spatial resolution of the low-level features is higher, the image details are rich, and the clearly changed boundary can be displayed. And the best results of the variation are not obtained by using only one of them. According to the method, the cross feature enhancement module can effectively improve the characterization capability of the high-level and low-level features.

In summary, the embodiment of the present invention introduces the cross enhancement method into the change detection, so as to ensure better description of the image change, and at the same time, have better robustness to the illumination, the camera pose difference and the seasonal change, and obtain the prediction result closer to the real change.

Example 3

The feasibility verification of the solutions of examples 1 and 2 is carried out below with reference to fig. 2-4, which are described in detail below:

the network base learning rate in the embodiment of the present invention is set to 1e-3 based on the network structure shown in fig. 1. The number of data samples grabbed by one training is set to be 6, and the network parameters are updated by using an Adam algorithm with a momentum parameter of 0.9 and a weight attenuation of 0.999. And the reference data set is detected in three changes of PCD, VL-CMU-CD and CDnet to verify the method.

As can be seen from fig. 2, 3, and 4, embodiments of the present invention and ADCD (known to those skilled in the art) networks can detect more subtle changes, such as branches and poles, than other methods. In the 4 th image of fig. 2, the detection result of the method is more accurate than that of the ADCD network, and excellent results are obtained on different data sets.

Example 4

Based on the same inventive concept, an embodiment of the present invention further provides a change detection apparatus based on cross enhancement of high-level and low-level features, and referring to fig. 5, the apparatus includes: a processor 1 and a memory 2, the memory 2 having stored therein program instructions, the processor 1 calling the program instructions stored in the memory 2 to cause the apparatus to perform the following method steps in an embodiment:

extracting convolution characteristics output by the convolution layer module from the reference image and the query image respectively through a twin convolution neural network, and carrying out internal coding and cross coding operation on the convolution characteristics to obtain cross coding characteristics;

The final loss is composed of 4 parts in total, and the formula is as follows:

wherein,

in turn represent P_i、

And P_fIs lost.

It should be noted that the device description in the above embodiments corresponds to the method description in the embodiments, and the embodiments of the present invention are not described herein again.

The execution main bodies of the processor 1 and the memory 2 may be devices having a calculation function, such as a computer, a single chip, a microcontroller, and the like, and in the specific implementation, the execution main bodies are not limited in the embodiment of the present invention, and are selected according to requirements in practical applications.

The memory 2 and the processor 1 transmit data signals through the bus 3, which is not described in detail in the embodiment of the present invention.

Example 5

Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the method steps in the foregoing embodiments.

The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.

It should be noted that the descriptions of the readable storage medium in the above embodiments correspond to the descriptions of the method in the embodiments, and the descriptions of the embodiments of the present invention are not repeated here.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium or a semiconductor medium, etc.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-mentioned serial numbers of the embodiments of the present invention are only for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A change detection method based on cross enhancement of high-level and low-level features, the method comprising:

extracting convolution characteristics output by the convolution layer module from the reference image and the query image respectively through a twin convolution neural network, and performing inner coding and cross coding operation on the convolution characteristics to obtain cross coding characteristics;

among 5 cross-coding features obtained by the cross-coding module, the feature

Splicing operation is carried out and input into a convolution unit, the number of channels is reduced to 128, low-level change characteristics LF are obtained,

similarly, splicing operation is carried out, the number of channels is compressed into 128 through a convolution unit, and high-level change characteristic HF is obtained;

training based on a Pythrch deep learning network frame, and performing change detection based on a trained model;

the representative ability of repeating the above operations to improve the features is:

wherein Bi (-) indicates a bilinear interpolation operation capable of changing the resolution of the feature, an element multiplication operation, F_t ^HFIterating the t-th time for cross enhancement operations to obtain high level features, F_t ^LFIterating the t-th time for cross enhancement operation to obtain low-level features; f^HFA high-level change feature; f^LFLow-level variation features; conv (·,128) denotes a convolution unit consisting of a convolution layer containing 128 convolution kernels, a batch normalization layer and a ReLU layer.

2. The method of claim 1, wherein the intra-coding operation is:

wherein,

convolution characteristics of the ith convolution layer module being either X or Y;

And

3. The method of claim 2, wherein the cross-coding operation is to:

wherein,

represents the cross-coding characteristics of the ith layer;

represents the computation process of the cross-coding module, abs (·) represents the computation of the absolute difference between two features,

the convolution characteristic output by the ith convolution layer module of X is the inner coding characteristic generated by the inner coding module;

the convolution characteristic output by the ith convolution layer module of Y is the inner coding characteristic generated after the inner coding module.

4. The method according to claim 1, wherein the final change detection result is:

wherein T is the iteration number of the cross enhancement operation,

for the predicted result of the Tth generation, P₁-P₅Is a multiscale change probability map.

5. The method of claim 4, wherein the final loss is composed of 4 parts, and the formula is as follows:

wherein,

sequentially representing a multiscale change probability map P_iPredicted result P of LF and HF generation when using cross enhancement operation for t time_t ^LFAnd P_t ^HFFinal change detection result P_fIs lost.

6. A change detection apparatus based on cross enhancement of high-level and low-level features, the apparatus comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling upon the program instructions stored in the memory to cause the apparatus to perform the method steps of any of claims 1-5.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of any of claims 1-5.