CN118917997A - Remote sensing image cross-domain simulation translation method and device based on contrast learning - Google Patents
Remote sensing image cross-domain simulation translation method and device based on contrast learning Download PDFInfo
- Publication number
- CN118917997A CN118917997A CN202411026449.5A CN202411026449A CN118917997A CN 118917997 A CN118917997 A CN 118917997A CN 202411026449 A CN202411026449 A CN 202411026449A CN 118917997 A CN118917997 A CN 118917997A
- Authority
- CN
- China
- Prior art keywords
- image
- source
- target
- loss function
- translated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 104
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000004088 simulation Methods 0.000 title claims abstract description 41
- 230000006870 function Effects 0.000 claims abstract description 130
- 230000008485 antagonism Effects 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 66
- 238000012549 training Methods 0.000 claims description 56
- 230000011218 segmentation Effects 0.000 claims description 29
- 230000008569 process Effects 0.000 claims description 13
- 238000009877 rendering Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000005315 distribution function Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The application discloses a remote sensing image cross-domain simulation translation method, device, medium and equipment based on contrast learning. The method comprises the following steps: acquiring a source image to be translated and a target image to be translated; inputting the source image to be translated and the target image to be translated into an image translation model; the image translation model comprises a generator and a discriminator, wherein the generator comprises an encoder and a decoder; inputting a source image to be translated and a target image to be translated into an encoder to respectively obtain a source coding feature and a target coding feature; inputting the images into a decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image; inputting the source pseudo sample image into a discriminator for discrimination; the image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on a source pseudo-sample image and a target pseudo-sample image. The application can avoid the problem of image detail distortion caused by excessive stylization of the image.
Description
Technical Field
The application relates to the technical field of remote sensing image processing, in particular to a remote sensing image cross-domain simulation translation method and device based on contrast learning, a storage medium and electronic equipment.
Background
Deep learning (DEEP LEARNING) is increasingly used in the field of aerial remote sensing image interpretation, which has achieved remarkable research results through large-scale datasets and complex network structures. However, in practical applications, the remote sensing image is affected by various factors such as sensor type, band difference, geographical location, and seasonal variation, and exhibits significant heterogeneity. This heterogeneity results in differences in the behavior of the same object under different spectral conditions, and similar behavior of different materials under the same spectrum, presenting a significant challenge to models that rely on data learning.
The trained model performance tends to suffer significant degradation due to variations in the spectral distribution of the target image. Especially when there is a significant change in the band or sensor, it is often necessary to retrain the model to accommodate the new data distribution. However, trainable datasets matching target fields are extremely scarce, particularly when dealing with small areas or specific application scenarios, which is a problem that is more pronounced. Furthermore, while existing public signature datasets provide valuable resources for aerial image interpretation, their utilization and exploration is not yet comprehensive and thorough. Existing methods tend to focus only on local features or specific tasks of the data sets, while ignoring the relevance and complementarity between the data sets. This limits further improvement of the model in terms of cross-domain adaptation and generalization capability.
In the field of image processing, traditional methods based on lower statistical value matching such as histogram matching and MKL mainly focus on matching at a color level in spite of application in image color alignment, and neglect the importance of image semantic logic. With the continued advancement of computer vision technology, many style transfer and image translation methods have emerged in recent years, which aim to convert an input image into an output image of another style, while preserving the content and semantics of the input image as much as possible. Among other things, mainstream methods such as CycleGAN and its variants constrain image style and preserve content by exploiting resistive and periodic consistency losses, respectively. However, when these methods are applied to remote sensing images, it is often difficult to maintain semantic consistency, and even excessive stylization problems may occur, resulting in distortion of image details.
Disclosure of Invention
The embodiment of the application provides a remote sensing image cross-domain simulation translation method and device based on contrast learning, a storage medium and electronic equipment, which can avoid the problem of image detail distortion caused by excessive stylization of an image.
The embodiment of the application provides a remote sensing image cross-domain simulation translation method based on contrast learning, which comprises the following steps:
acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
Inputting the source to-be-translated image and the target to-be-translated image into an image translation model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo sample image into a discriminator for discrimination to obtain discrimination results;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the image translation model further comprises a content projection network, and the method further comprises:
inputting the source coding features into the content projection network to obtain feature vectors of the source coding features;
inputting the source pseudo-sample image into the content projection network to obtain a first query vector of the source pseudo-sample image;
matching the feature vectors of the source to-be-translated image and the source pseudo-sample image at the same position, taking the matched source to-be-translated image as a first positive sample, and taking the source to-be-translated images corresponding to the feature vectors of other positions in the source to-be-translated image as a first negative sample;
calculating a content contrast loss function based on the first positive sample, the first negative sample, and the first query vector;
the encoder is trained based on the content contrast loss function to constrain content consistency of the source to-be-translated image and the source pseudo-sample image.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the image translation model further comprises a style projection network, and the method further comprises:
Inputting the target coding features into the style projection network to obtain feature vectors of the target coding features;
Inputting the source pseudo-sample image into the style projection network to obtain a second query vector of the source pseudo-sample image;
matching the feature vectors of the target to-be-translated image and the source pseudo-sample image at the same position, taking the matched target to-be-translated image as a second positive sample, and taking the source to-be-translated image corresponding to the feature vectors of other positions in the source to-be-translated image as a second negative sample;
Calculating a style contrast loss function based on the second positive sample, the second negative sample, and the second query vector;
The encoder is trained based on the style contrast loss function to constrain content consistency of the target to-be-translated image and the source pseudo-sample image.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the method further comprises the following steps:
Constructing a semantic segmentation model;
and inputting the new stylized image output by the image translation model into the trained semantic segmentation model to obtain a semantic segmentation result.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the joint training process of the semantic segmentation model and the image translation model comprises the following steps:
If the images in the source image domain for training have corresponding training labels, calculating a first loss function based on the training labels corresponding to the images in the source image domain and the fresh-air-grid image, and performing iterative training based on the first loss function to obtain a trained semantic segmentation model;
If the images in the target image domain for training have corresponding training labels, calculating a second loss function based on the antagonism loss function, the content comparison loss function, the style comparison loss function, the identity consistency loss function and the semantic loss function, and performing iterative training based on the second loss function to obtain a trained semantic segmentation model;
And if the images in the source image domain for training and the images in the target image domain for training have no corresponding training labels, removing the semantic segmentation model, calculating a third loss function according to the antagonism loss function, the content contrast loss function and the style contrast loss function, and performing iterative training based on the third loss function to obtain a trained image translation model.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the calculating content contrast loss function based on the first positive sample, the first negative sample and the first query vector comprises:
Calculating a content contrast loss function by a first formula:
wherein, For the first query vector to be a first,As a first positive sample of the sample,As a first negative sample of the sample,Is a cross entropy loss function.
Further, the remote sensing image cross-domain simulation translation method based on contrast learning, wherein the cross entropy loss function is calculated by a second formula, and the second formula is as follows:
wherein, For the first query vector to be a first,In the case of a positive example vector,Is a negative example vector.
The embodiment of the application also provides a remote sensing image cross-domain simulation translation device based on contrast learning, which comprises the following steps:
the acquisition module is used for acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
A rendering module for inputting the source rendering image and the target rendering image into an image rendering model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo-sample image and the target pseudo-sample image into a discriminator for discrimination to obtain a first discrimination result and a second discrimination result respectively;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
The embodiment of the application also provides a computer readable storage medium, wherein a plurality of instructions are stored in the computer readable storage medium, and the instructions are suitable for being loaded by a processor to execute the remote sensing image cross-domain simulation translation method based on contrast learning.
The embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the processor is electrically connected with the memory, the memory is used for storing instructions and data, and the processor is used for steps in the remote sensing image cross-domain simulation translation method based on contrast learning.
According to the remote sensing image cross-domain simulation translation method, device, storage medium and electronic equipment based on contrast learning, the image translation model is trained through the content contrast loss function and the style contrast loss function, so that the problem of image detail distortion caused by excessive stylization of an image can be avoided, rapid alignment of target domain style characteristics under an unsupervised condition is realized, model parameters are small, and model training time is greatly shortened.
Drawings
The technical solution and other advantageous effects of the present application will be made apparent by the following detailed description of the specific embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of a remote sensing image cross-domain simulation translation method based on contrast learning according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of an image translation model according to an embodiment of the present application.
Fig. 3 is another schematic structural diagram of an image translation model according to an embodiment of the present application.
Fig. 4 is a flowchart of calculating a content contrast loss function and a style contrast loss function according to an embodiment of the present application.
FIG. 5 is a flowchart of the joint training of the semantic segmentation model and the image translation model provided by the application.
Fig. 6 is a schematic structural diagram of a remote sensing image cross-domain simulation translation device based on contrast learning according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Fig. 8 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
In the conventional image color matching method, the histogram matching technical process is most classical and common, and the specific technical ideas can be summarized as follows: (1) acquiring a source image and a target image: and selecting a source image and a target image which need to be subjected to color matching. (2) calculating an image histogram: and respectively calculating histograms of the source image and the target image to obtain the distribution condition of each pixel value in the image. (3) calculating a cumulative histogram: the histogram of the source image is normalized to obtain a Cumulative Distribution Function (CDF), and a cumulative distribution function of the target image is calculated based on the cumulative distribution function. (4) calculating a matching function: a matching function is calculated by accumulating the differences of the distribution functions, the function reflecting the correspondence of the pixel values of the source image to the target image. (6) mapping source image pixel values: and mapping each pixel value of the source image by using the matching function to obtain the pixel value of the target image. (7) generating a matching result: and combining the mapped pixel values into a new image to complete the color matching from the source image to the target image. The method fully utilizes the histogram distribution information of the image, and adjusts the color distribution of the source image into the color distribution of the target image through the mapping function, thereby realizing color matching. However, this method has a disadvantage in that only color matching is considered, semantic information of the image is ignored, and semantic inconsistency of the matched image may be caused.
Image translation methods based on CycleGAN and variants thereof in the field of deep learning are the most commonly used techniques for cross-domain image translation tasks, and the technical flow is as follows: (1) build generator G and arbiter D: the generator G is responsible for converting the source image a into a dummy image a 'having the style of the target domain B, and the discriminator D is responsible for distinguishing the real image B from the dummy image a'. (2) combat losses: the training generator G generates realistic false images, making it difficult for the arbiter D to distinguish between true and false. Training the arbiter D increases its discrimination capability. (3) cycle consistency loss: ensuring that the image A is generated into A 'by G and then the A' is converted back into A by another generator G ', and requiring that the G and the G' have circular consistency. (4) joint training: the generator and the arbiter are trained jointly by countering the loss and the cyclical consistency loss. CycleGAN and its variants have the following disadvantages in terms of image translation: (1) overspray: countering the loss easily causes the generator G to pay excessive attention to the image style, resulting in distortion of the detail of the generated image. (2) cycle consistency limitation: it is assumed that there is a one-to-one correspondence between source and target domains, but in practical applications this assumption is often too strict. (3) disregarding semantic information: mainly focusing on image style matching, but ignoring semantic information, semantic inconsistencies may result. (4) style loss function limitation: it is difficult to automatically extract the style characteristics of the image by manually designing the style loss function. And (5) the training difficulty is high: the challenge loss function has the problem of gradient disappearance, and the training process is difficult. (6) large computing resource consumption: the generator and the discriminator need to be trained simultaneously, and the consumption of calculation resources is large.
In order to solve the problems, the embodiment of the application provides a remote sensing image cross-domain simulation translation method and device based on contrast learning, a storage medium and electronic equipment. The remote sensing image cross-domain simulation translation device based on contrast learning provided by the embodiment of the application can be integrated in electronic equipment, wherein the electronic equipment can be a terminal, a server and other equipment, and the terminal can comprise a tablet personal computer, a notebook computer, personal computing (PC, personal Computer), a micro-processing box or other equipment and the like.
Referring to fig. 1, fig. 1 is a flowchart of a remote sensing image cross-domain simulation translation method based on contrast learning, which is applied to an electronic device, according to an embodiment of the present application, the remote sensing image cross-domain simulation translation method based on contrast learning includes the following steps:
s1, acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain.
S2, inputting a source image to be translated and a target image to be translated into an image translation model; the image translation model includes a generator and a discriminator, the generator includes an encoder and a decoder, fig. 2 is a schematic structural diagram of the image translation model provided in the embodiment of the present application, and fig. 2 may be referred to.
Wherein inputting the source to-be-translated image and the target to-be-translated image into the image translation model comprises:
s21, inputting the source image to be translated and the target image to be translated into an encoder to obtain source coding features and target coding features respectively.
Specifically, the source is to be translated into an image And target to-be-translated imageInput into an encoder, whereinAndRespectively representDomain and method for making sameNumber of images of the domain.
In one embodiment, the decoder employs Resnet _9blocks as the backbone network, which consists of 9 residual blocks, each containing two or three convolutional layers, and a batch normalization and ReLU activation function. The purpose of the encoder is to extract advanced features from the input image.
Each specific residual block consists of two 3x3 convolution layers (or a first 1x1 convolution layer and a second 3x3 convolution layer), and the convolution layers are connected through a BN layer and a ReLU activation function. The residual connection is used to add the input and output of the block to ensure efficient propagation of information in the network.
S22, inputting the source coding feature and the target coding feature into a decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image.
In particular, the decoder is a key component that is mainly used to convert the feature map extracted by the encoder back into image space to generate the target image. The decoder of the present application can use 2 convolution layers to step up the resolution of the feature map. Unlike the convolutional layers in the encoder, which employ an upsampling (upsampling) operation and concatenate a nonlinear activation function ReLU layer after each convolutional layer, the last layer of the decoder is the output layer, which is responsible for converting the feature map generated by the decoder into the final output image. The specific structure of the output layer depends on the requirements of the task.
S23, inputting the source pseudo sample image into a discriminator for discrimination to obtain a discrimination result. The discrimination result is a source pseudo-sample image having a target image domain style.
The discriminator D of the present invention may use a Multi-scale discriminator (Multi-Scale Discriminator) to discriminate the generated source pseudo-sample image a' for assessing the authenticity of the generated image at different scales. The design of such a structure aims at capturing details of the image at different resolutions and global structures, thereby improving the quality of the generated image. The discriminator D specifically comprises a 16-layer feature extraction convolution layer, a full connection layer and a multi-scale feature fusion module, wherein feature graphs with different scales are fused. This may be achieved by stitching, adding, or other operations. The fused feature map contains information of multiple scales, which is helpful for the discriminator to evaluate the authenticity of the generated image more accurately.
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on a source pseudo-sample image and a target pseudo-sample image.
During the training process, the multi-scale discriminator is trained with the generator. The generator is responsible for generating the image as realistic as possible, while the multi-scale discriminator is responsible for evaluating the authenticity of the generated image. By back-propagation algorithms and gradient descent methods, the system can continually optimize the parameters of the generator and discriminator to improve the quality of the generated image.
Fig. 3 is another schematic structural diagram of an image translation model provided by the embodiment of the present application, where, as shown in fig. 3, the image translation model further includes a content projection network, and the remote sensing image cross-domain simulation translation method based on contrast learning provided by the present application further includes the following steps:
S31, inputting the source coding features into a content projection network to obtain feature vectors of the source coding features. Fig. 4 is a flowchart of a calculation of a content contrast loss function and a style contrast loss function according to an embodiment of the present application, which may participate in fig. 4.
In particular, the encoder is to be usedThe output source code features (being multi-layer features) are input to the content projection networkIn the method, feature vectors of each selected layer L of the source coding features are obtained, and the feature vectors represent content features of the source image to be translated on each layer and can be expressed by the following formula:
wherein, Representing an image to be translated and,The feature vector is represented by a vector of features,Is a two-layer multi-layer perceptron (MLP), with 256 elements per layer, to obtain projected feature vectors.
S32, inputting the source pseudo-sample image into a content projection network to obtain a first query vector of the source pseudo-sample image.
Comparing the query points in the source pseudo-sample image with the learned target, and mapping the query points to K-dimensional vectors through a content projection network to obtain a first query vector。
S33, matching the feature vectors of the source to-be-translated image and the source pseudo sample image at the same position, taking the matched source to-be-translated image as a first positive sample, and taking the source to-be-translated image corresponding to the feature vectors of other positions in the source to-be-translated image as a first negative sample.
Further, the feature vector of the same position of the matched image to be translated and the source pseudo sample image is used as a first positive example vector, and the feature vector of other positions in the source image to be translated is used as a first negative example vector.
S34, calculating a content contrast loss function based on the first positive sample, the first negative sample and the first query vector.
Specifically, the content contrast loss function may be calculated by a first formula:
wherein, For the first query vector to be a first,As a first positive sample of the sample,As a first negative sample of the sample,Is a cross entropy loss function.
Wherein the cross entropy loss functionIs defined as:
wherein, As a first positive example vector of the first example,The first negative example vector is used to determine,,Is a temperature coefficient.
In practice, the construction of meaningful and smart positive and negative samples is the current difficulty in contrast learning. The present application converts the problem of maximizing mutual information into an (n+1) classification problem, where the distance between the query and other examples is passed through temperatureScaling was performed=0.07.
S35, training an encoder based on the content contrast loss function to restrict the content consistency of the source to-be-translated image and the source pseudo-sample image.
Further, the identity coherence loss function is also utilized to maintain the consistency of detail between the target to-be-translated image b and the target pseudo-sample image b ', which b' is generated by the generator G.
Identity consistency loss is defined as follows, being the L1 distance between the two:
Content contrast loss is primarily to ensure consistency of the composite image and the original image content. The method and the device enable the model to keep the content details and textures of the image unchanged in the translation process by comparing the feature vectors of the original image and the translated image at the same position. By applying multiple layers on the single image itself, segmented contrast learning, positive and negative examples can be more effectively extracted from the single image than it is more effective to find positive and negative examples in other images of the entire dataset.
With continued reference to fig. 3, the image translation model further includes a style projection network to ensure that the model is able to learn the target image dataset style in terms of image stylization constraints, unlike previous style transfer by AdaIN adaptively normalizing image mean and variance and calculating Gram loss, introducing style contrast loss encourages model to target style learning from an unsupervised training perspective. Similar to the content contrast loss, the invention also selects multi-layer, slice-type contrast loss for loss function design.
In one embodiment, the remote sensing image cross-domain simulation translation method based on contrast learning provided by the application further comprises the following steps:
S41, inputting the target coding features into a style projection network to obtain feature vectors of the target coding features.
In particular, the encoder is to be usedThe output target coding features (being multi-layer features) are input into a style projection networkIn the method, feature vectors of each selected layer L of the target coding features are obtained, the feature vectors represent content features of the target image to be translated on each layer, and the content features can be represented by the following formula:
wherein, A translation image of the object is represented,The feature vector is represented by a vector of features,Is a two-layer multi-layer perceptron (MLP), with 128 cells per layer, to obtain projected feature vectors.
S42, inputting the source pseudo-sample image into the style projection network to obtain a second query vector of the source pseudo-sample image.
S43, matching the feature vectors of the target to-be-translated image and the source pseudo sample image at the same position, taking the matched target to-be-translated image as a second positive sample, and taking the source to-be-translated image corresponding to the feature vectors of other positions in the source to-be-translated image as a second negative sample.
Further, the feature vector of the same position of the matched image to be translated and the source pseudo sample image is taken as a second positive example vector, and the feature vector of other positions in the source image to be translated is taken as a second negative example vector.
S44, calculating a style contrast loss function based on the second positive sample, the second negative sample and the second query vector.
Wherein, As a second positive sample of the sample,As a second negative example of the process of the present invention,For the second query vector to be a second query vector,Wherein256 Is set for each patch.For cross entropy loss function, the calculation mode and the previous appearanceSimilarly, not described here too much.
S45, training an encoder based on the style comparison loss function to restrict the content consistency of the target image to be translated and the source pseudo sample image.
According to the application, the detail consistency can be maintained through the layered image slicing type content contrast loss function training decoder, the problem of image detail distortion caused by excessive stylization of images is avoided, the rapid alignment of the style characteristics of the target domain under the unsupervised condition is realized through the layered image slicing type style contrast loss function training decoder, the model parameters are small, and the model training time is greatly shortened.
Furthermore, the application integrates the semantic segmentation model into the image translation frame, carries out joint training on the semantic segmentation model and the image translation model, and can select whether to retrain according to the self requirement and obtain the semantic segmentation result. The method specifically comprises the following steps:
s51, constructing a semantic segmentation model;
s52, inputting the fresh-air formatted image output by the image translation model into the trained semantic segmentation model to obtain a semantic segmentation result.
If the images in the source image domain for training have corresponding training tags or the images in the target image domain for training have corresponding training tags, these tags can be used to retrain or pretrain the semantic segmentation model to ensure semantic consistency of the translated images. If both are lacking labels, the semantic segmentation network can be removed, but this can lead to performance degradation. FIG. 5 is a flowchart of the joint training of the semantic segmentation model and the image translation model provided by the present application, and as shown in FIG. 5, the joint training process specifically includes:
And S61, if the images in the source image domain for training have corresponding training labels, calculating a first loss function based on the training labels corresponding to the images in the source image domain and the fresh-air-formatted images, and performing iterative training based on the first loss function to obtain a trained semantic segmentation model.
Wherein the first loss function is:
wherein, Is the one-hot encoding (one-hot encoding) of a real label, 1 for a certain class and 0 for the rest.Is the probability of that class of model predictions.
S62, if the images in the target image domain for training have corresponding training labels, calculating a second loss function based on the antagonism loss function, the content comparison loss function, the style comparison loss function, the identity consistency loss function and the semantic loss function, and performing iterative training based on the second loss function to obtain a trained semantic segmentation model;
wherein the second loss function is:
wherein, Is the function of the contrast loss,Is a style contrast loss function that is a function of the style,Is the content contrast loss function and,Is a function of identity consistency loss,Is a semantic loss function (i.e., a first loss function).
And S63, if the images in the source image domain for training and the images in the target image domain for training have no corresponding training labels, removing the semantic segmentation model, calculating a third loss function according to the contrast loss function, the content contrast loss function and the style contrast loss function, and performing iterative training based on the third loss function to obtain a trained image translation model.
Wherein the third loss function is:
wherein, ,,,AndSuper parameters. By way of example only, and not by way of limitation,,,=1,If necessary set asWill then=0.5, Otherwise set to 0.
The application adds the semantic segmentation model and provides flexibility for users to choose whether to retrain the semantic segmentation model according to requirements. The design not only serves as semantic constraint of style conversion, but also creates a complementary closed loop, and further improves accuracy and reliability of image translation.
According to the method described in the above embodiments, the present embodiment will be further described from the perspective of a contrast-learning-based remote sensing image cross-domain simulation translation device, which may be implemented as a separate entity or integrated in an electronic device, where the electronic device may be a terminal, a server, or other devices, and the terminal may include a tablet computer, a notebook computer, a personal computer (PC, personal Computer), a micro-processing box, or other devices.
Referring to fig. 6, fig. 6 specifically illustrates a remote sensing image cross-domain simulation translation device based on contrast learning, which is applied to an electronic device, and the remote sensing image cross-domain simulation translation device based on contrast learning may include:
the acquisition module is used for acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
the translation module is used for inputting the source image to be translated and the target image to be translated into the image translation model; the image translation model comprises a generator and a discriminator, wherein the generator comprises an encoder and a decoder;
Wherein inputting the source to-be-translated image and the target to-be-translated image into the image translation model comprises:
Inputting a source image to be translated and a target image to be translated into an encoder to respectively obtain a source coding feature and a target coding feature;
Inputting the source coding feature and the target coding feature into a decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
Inputting the source pseudo-sample image and the target pseudo-sample image into a discriminator for discrimination to obtain a first discrimination result and a second discrimination result respectively;
the image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on a source pseudo-sample image and a target pseudo-sample image.
In the implementation, each module and/or unit may be implemented as an independent entity, or may be combined arbitrarily and implemented as the same entity or a plurality of entities, where the implementation of each module and/or unit may refer to the foregoing method embodiment, and the specific beneficial effects that may be achieved may refer to the beneficial effects in the foregoing method embodiment, which are not described herein again.
In addition, the embodiment of the application also provides electronic equipment which can be equipment such as a computer and a tablet personal computer. As shown in fig. 7, the electronic device 400 includes a processor 401, a memory 402. The processor 401 is electrically connected to the memory 402.
The processor 401 is a control center of the electronic device 400, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or loading application programs stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device.
In this embodiment, the processor 401 in the electronic device 400 loads the instructions corresponding to the processes of one or more application programs into the memory 402 according to the following steps, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions:
acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
Inputting the source to-be-translated image and the target to-be-translated image into an image translation model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo sample image into a discriminator for discrimination to obtain discrimination results;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
The electronic device can realize the steps in any embodiment of the remote sensing image cross-domain simulation translation method based on contrast learning provided by the embodiment of the application, so that the beneficial effects which can be realized by any remote sensing image cross-domain simulation translation method based on contrast learning provided by the embodiment of the application can be realized, and detailed reference is made to the previous embodiment, and the detailed description is omitted.
Fig. 8 shows a specific block diagram of an electronic device according to an embodiment of the present invention, which may be used to implement the remote sensing image cross-domain simulation translation method based on contrast learning provided in the foregoing embodiment. The electronic device 500 may be a terminal, server, etc., where the terminal may include a tablet, notebook, personal computer (PC, personal Computer), mini-processing box, or other device, etc.
The RF circuit 510 is configured to receive and transmit electromagnetic waves, and to perform mutual conversion between the electromagnetic waves and the electrical signals, thereby communicating with a communication network or other devices. RF circuitry 510 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. The RF circuitry 510 may communicate with various networks such as the internet, intranets, wireless networks or with other devices via wireless networks. The wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. The wireless network may use various communication standards, protocols, and technologies including, but not limited to, global system for mobile communications (Global System for Mobile Communication, GSM), enhanced mobile communications technology (ENHANCED DATA GSM Environment, EDGE), wideband code division multiple access technology (Wideband Code Division Multiple Access, WCDMA), code division multiple access technology (Code Division Access, CDMA), time division multiple access technology (Time Division Multiple Access, TDMA), wireless fidelity technology (WIRELESS FIDELITY, wi-Fi) (e.g., american institute of electrical and electronics engineers standards IEEE802.11 a, IEEE802.11 b, IEEE802.11g, and/or IEEE802.11 n), internet telephony (Voice over Internet Protocol, voIP), worldwide interoperability for microwave access (Worldwide Interoperability for Microwave Access, wi-Max), other protocols for mail, instant messaging, and short messaging, as well as any other suitable communication protocols, including even those not currently developed.
The memory 520 may be used to store software programs and modules, such as corresponding program instructions/modules in the above embodiments, and the processor 580 executes the software programs and modules stored in the memory 520 to perform various functional applications and data processing, that is, to implement functions of photographing by the front camera, processing photographed images, and switching display colors of display contents on the display screen. Memory 520 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 520 may further include memory located remotely from processor 580, which may be connected to electronic device 500 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input unit 530 may be used to receive input numeric or character information and to generate a keyboard, a mouse related to user settings and function control.
The display unit 540 may be used to display information input by a user or provided to a user, as well as various graphical user interfaces, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 540 may include a display panel 541, and optionally, the display panel 541 may be configured in the form of an LCD (Liquid CRYSTAL DISPLAY), an OLED (Organic Light-Emitting Diode), or the like.
Audio circuitry 560, speakers 561, and microphone 562 may provide an audio interface between the user and the electronic device 500. The audio circuit 560 may transmit the received electrical signal converted from audio data to the speaker 561, and the electrical signal is converted into a sound signal by the speaker 561 and output; on the other hand, microphone 562 converts the collected sound signal into an electrical signal, which is received by audio circuit 560 and converted into audio data, which is processed by audio data output processor 580 for transmission to, for example, another terminal via RF circuit 510, or which is output to memory 520 for further processing. Audio circuitry 560 may also include an ear bud jack to provide communication of the peripheral headphones with electronic device 500.
The electronic device 500 may facilitate user reception of requests, transmission of information, etc. via the transmission module 570 (e.g., wi-Fi module), which provides wireless broadband internet access to the user. Although the transmission module 570 is illustrated, it is understood that it is not an essential component of the electronic device 500 and can be omitted entirely as needed within a range that does not change the essence of the invention.
Processor 580 is a control center of electronic device 500 that utilizes various interfaces and lines to connect the various parts of the overall handset, and performs various functions of electronic device 500 and processes data by running or executing software programs and/or modules stored in memory 520, and invoking data stored in memory 520, thereby performing overall monitoring of the electronic device. Optionally, processor 580 may include one or more processing cores; in some embodiments, processor 580 may integrate an application processor that primarily processes operating systems, user interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 580.
The electronic device 500 also includes a power supply 590 (e.g., a battery) that provides power to the various components, and in some embodiments, may be logically coupled to the processor 580 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. Power supply 590 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the electronic device 500 further includes a camera (e.g., front camera, rear camera), a bluetooth module, etc., which are not described herein. In particular, in this embodiment, the display unit of the electronic device is a touch screen display, the mobile terminal further includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
Inputting the source to-be-translated image and the target to-be-translated image into an image translation model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo sample image into a discriminator for discrimination to obtain discrimination results;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
In the implementation, each module may be implemented as an independent entity, or may be combined arbitrarily, and implemented as the same entity or several entities, and the implementation of each module may be referred to the foregoing method embodiment, which is not described herein again.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor. To this end, an embodiment of the present invention provides a storage medium having stored therein a plurality of instructions that can be loaded by a processor to perform the steps of any one of the embodiments of the contrast learning-based remote sensing image cross-domain simulation translation method provided by the embodiment of the present invention.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the storage medium can execute the steps in any embodiment of the remote sensing image cross-domain simulation translation method based on contrast learning provided by the embodiment of the present invention, the beneficial effects that any remote sensing image cross-domain simulation translation method based on contrast learning provided by the embodiment of the present invention can be realized, which are detailed in the previous embodiments and are not described herein.
The remote sensing image cross-domain simulation translation method, device, storage medium and electronic equipment based on contrast learning provided by the embodiment of the application are described in detail, and specific examples are applied to illustrate the principle and implementation of the application, and the description of the above embodiments is only used for helping to understand the method and core ideas of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.
Claims (10)
1. The remote sensing image cross-domain simulation translation method based on contrast learning is characterized by comprising the following steps of:
acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
Inputting the source to-be-translated image and the target to-be-translated image into an image translation model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo sample image into a discriminator for discrimination to obtain discrimination results;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
2. The contrast learning-based remote sensing image cross-domain simulation rendering method of claim 1, wherein the image rendering model further comprises a content projection network, the method further comprising:
inputting the source coding features into the content projection network to obtain feature vectors of the source coding features;
inputting the source pseudo-sample image into the content projection network to obtain a first query vector of the source pseudo-sample image;
matching the feature vectors of the source to-be-translated image and the source pseudo-sample image at the same position, taking the matched source to-be-translated image as a first positive sample, and taking the source to-be-translated images corresponding to the feature vectors of other positions in the source to-be-translated image as a first negative sample;
calculating a content contrast loss function based on the first positive sample, the first negative sample, and the first query vector;
the encoder is trained based on the content contrast loss function to constrain content consistency of the source to-be-translated image and the source pseudo-sample image.
3. The contrast learning-based remote sensing image cross-domain simulation rendering method of claim 1, wherein the image rendering model further comprises a style projection network, the method further comprising:
Inputting the target coding features into the style projection network to obtain feature vectors of the target coding features;
Inputting the source pseudo-sample image into the style projection network to obtain a second query vector of the source pseudo-sample image;
matching the feature vectors of the target to-be-translated image and the source pseudo-sample image at the same position, taking the matched target to-be-translated image as a second positive sample, and taking the source to-be-translated image corresponding to the feature vectors of other positions in the source to-be-translated image as a second negative sample;
Calculating a style contrast loss function based on the second positive sample, the second negative sample, and the second query vector;
The encoder is trained based on the style contrast loss function to constrain content consistency of the target to-be-translated image and the source pseudo-sample image.
4. The contrast learning-based remote sensing image cross-domain simulation translation method of claim 1, further comprising:
Constructing a semantic segmentation model;
and inputting the new stylized image output by the image translation model into the trained semantic segmentation model to obtain a semantic segmentation result.
5. The contrast learning-based remote sensing image cross-domain simulation translation method of claim 4, wherein the joint training process of the semantic segmentation model and the image translation model comprises:
If the images in the source image domain for training have corresponding training labels, calculating a first loss function based on the training labels corresponding to the images in the source image domain and the fresh-air-grid image, and performing iterative training based on the first loss function to obtain a trained semantic segmentation model;
If the images in the target image domain for training have corresponding training labels, calculating a second loss function based on the antagonism loss function, the content comparison loss function, the style comparison loss function, the identity consistency loss function and the semantic loss function, and performing iterative training based on the second loss function to obtain a trained semantic segmentation model;
And if the images in the source image domain for training and the images in the target image domain for training have no corresponding training labels, removing the semantic segmentation model, calculating a third loss function according to the antagonism loss function, the content contrast loss function and the style contrast loss function, and performing iterative training based on the third loss function to obtain a trained image translation model.
6. The contrast learning-based remote sensing image cross-domain simulated translation method of claim 2, wherein the computing a content contrast loss function based on the first positive sample, the first negative sample, and the first query vector comprises:
Calculating a content contrast loss function by a first formula:
wherein, For the first query vector to be a first,As a first positive sample of the sample,As a first negative sample of the sample,Is a cross entropy loss function.
7. The contrast learning-based remote sensing image cross-domain simulation translation method of claim 6, wherein the cross entropy loss function is calculated by a second formula, the second formula is:
wherein, For the first query vector to be a first,In the case of a positive example vector,Is a negative example vector.
8. Remote sensing image cross-domain simulation translation device based on contrast learning is characterized by comprising:
the acquisition module is used for acquiring a source image to be translated in a source image domain and a target image to be translated in a target image domain;
A rendering module for inputting the source rendering image and the target rendering image into an image rendering model; wherein the image translation model comprises a generator and a discriminator, the generator comprising an encoder and a decoder;
Wherein the inputting the source to-be-translated image and the target to-be-translated image into an image translation model comprises:
inputting the source to-be-translated image and the target to-be-translated image into the encoder to respectively obtain source coding features and target coding features;
inputting the source coding feature and the target coding feature into the decoder to respectively obtain a source pseudo-sample image and a target pseudo-sample image;
inputting the source pseudo-sample image and the target pseudo-sample image into a discriminator for discrimination to obtain a first discrimination result and a second discrimination result respectively;
The image translation model is trained based on a content contrast loss function, a style contrast loss function and an antagonism loss function constructed based on the source pseudo-sample image and the target pseudo-sample image.
9. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the contrast learning based remote sensing image cross-domain simulated translation method of any of claims 1 to 7.
10. An electronic device comprising a processor and a memory, the processor being electrically connected to the memory, the memory being configured to store instructions and data, the processor being configured to perform the steps of the contrast learning based telemetry image cross-domain emulation translation method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411026449.5A CN118917997A (en) | 2024-07-30 | 2024-07-30 | Remote sensing image cross-domain simulation translation method and device based on contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411026449.5A CN118917997A (en) | 2024-07-30 | 2024-07-30 | Remote sensing image cross-domain simulation translation method and device based on contrast learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118917997A true CN118917997A (en) | 2024-11-08 |
Family
ID=93297183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411026449.5A Pending CN118917997A (en) | 2024-07-30 | 2024-07-30 | Remote sensing image cross-domain simulation translation method and device based on contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118917997A (en) |
-
2024
- 2024-07-30 CN CN202411026449.5A patent/CN118917997A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7096444B2 (en) | Image area positioning method, model training method and related equipment | |
CN109543714B (en) | Data feature acquisition method and device, electronic equipment and storage medium | |
CN111325271B (en) | Image classification method and device | |
CN112614110B (en) | Method and device for evaluating image quality and terminal equipment | |
CN108921941A (en) | Image processing method, device, storage medium and electronic equipment | |
CN109086680A (en) | Image processing method, device, storage medium and electronic equipment | |
CN112839223B (en) | Image compression method, image compression device, storage medium and electronic equipment | |
CN117274109B (en) | Image processing method, noise reduction model training method and electronic equipment | |
CN111539353A (en) | Image scene recognition method and device, computer equipment and storage medium | |
CN111737520B (en) | Video classification method, video classification device, electronic equipment and storage medium | |
CN116775915A (en) | Resource recommendation method, recommendation prediction model training method, device and equipment | |
CN114723987B (en) | Training method of image tag classification network, image tag classification method and device | |
CN115239860A (en) | Expression data generation method and device, electronic equipment and storage medium | |
CN110097570A (en) | A kind of image processing method and device | |
CN116229188B (en) | Image processing display method, classification model generation method and equipment thereof | |
CN114444705A (en) | Model updating method and device | |
Huang et al. | Edge device-based real-time implementation of CycleGAN for the colorization of infrared video | |
CN118917997A (en) | Remote sensing image cross-domain simulation translation method and device based on contrast learning | |
CN117668290A (en) | Cross-modal retrieval method, device, equipment, storage medium and computer program | |
CN113032560B (en) | Sentence classification model training method, sentence processing method and equipment | |
CN116883708A (en) | Image classification method, device, electronic equipment and storage medium | |
CN114943976A (en) | Model generation method and device, electronic equipment and storage medium | |
CN117975484B (en) | Training method of change detection model, change detection method, device and equipment | |
CN117852624B (en) | Training method, prediction method, device and equipment of time sequence signal prediction model | |
CN117131213B (en) | Image processing method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication |