CN111724443A - Unified scene visual positioning method based on generating type countermeasure network - Google Patents
Unified scene visual positioning method based on generating type countermeasure network Download PDFInfo
- Publication number
- CN111724443A CN111724443A CN202010517260.1A CN202010517260A CN111724443A CN 111724443 A CN111724443 A CN 111724443A CN 202010517260 A CN202010517260 A CN 202010517260A CN 111724443 A CN111724443 A CN 111724443A
- Authority
- CN
- China
- Prior art keywords
- image
- dimensional
- loss
- network
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 230000000007 visual effect Effects 0.000 title claims abstract description 39
- 230000011218 segmentation Effects 0.000 claims abstract description 43
- 238000013519 translation Methods 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 17
- 230000002457 bidirectional effect Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 230000008485 antagonism Effects 0.000 claims description 6
- NTSBMKIZRSBFTA-AIDOXSFESA-N Digoxigenin bisdigitoxoside Chemical compound C1[C@H](O)[C@H](O)[C@@H](C)O[C@H]1O[C@@H]1[C@@H](C)O[C@@H](O[C@@H]2C[C@@H]3[C@]([C@@H]4[C@H]([C@]5(CC[C@@H]([C@@]5(C)[C@H](O)C4)C=4COC(=O)C=4)O)CC3)(C)CC2)C[C@@H]1O NTSBMKIZRSBFTA-AIDOXSFESA-N 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000004807 localization Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of visual positioning, and particularly relates to a unified scene visual positioning method, system and device based on a generative confrontation network, aiming at solving the problems of low positioning precision and poor robustness of the existing visual positioning method. The system method comprises the following steps: acquiring a query image and performing semantic segmentation to obtain a semantic tag map; splicing the semantic tag graph and the query image and translating; extracting a global descriptor and two-dimensional local features of the translated query image, and matching the global descriptor of the translated image with global descriptors of all images in an image library to obtain a candidate image; acquiring a three-dimensional model corresponding to the candidate image, and matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs; and calculating the pose of the 6-degree-of-freedom camera corresponding to the query image. The invention can obtain the high-precision camera pose corresponding to the query image and can realize robust visual positioning under long time span.
Description
Technical Field
The invention belongs to the field of visual positioning, and particularly relates to a unified scene visual positioning method, system and device based on a generative confrontation network.
Background
Visual localization is a key technology in the field of three-dimensional space vision, and the core goal of the visual localization is to estimate the 6-degree-of-freedom pose of a camera under a global coordinate system. One of its major difficulties is how to deal with the challenges presented by the appearance changes of query images and database images over long time spans. Current common visual localization methods focus on extracting more robust features from the image to account for the effects of scene differences.
The existing mainstream visual positioning methods mainly comprise the following three methods:
(1) structure-based visual localization;
(2) image-based visual localization;
(3) learning-based visual positioning;
the method (1) focuses on directly matching the feature points on the query image with all the feature points stored in the three-dimensional model, so that the scale of data participating in operation is large, and only local features are considered, so that the method is difficult to have good robustness on environmental changes; the method (2) can be divided into two stages, wherein the early method is to use the extracted global descriptor for retrieval, then use the retrieved pose of the most similar database image as the pose of the query image, later, in order to improve the positioning accuracy, the image-based method gradually evolves to search several candidate images which are most similar to the query image in the images contained in the three-dimensional model, then match the feature points contained in the images, and use the obtained two-dimensional and three-dimensional matching points for pose calculation. Obviously, the methods (1) and (2) both rely heavily on feature extraction from images, which vary greatly over a long time span. Especially when applied in challenging scenes where lighting, weather or seasons vary widely, there is a higher demand on the robustness of the algorithm. The method (3) attempts to directly return the camera pose using an end-to-end method, but currently cannot achieve the same level of accuracy as the conventional method. In view of this, the present invention provides a unified scene visual positioning method based on a generative confrontation network.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problems of low positioning accuracy and poor robustness caused by scene change in the conventional visual positioning method under a long time span, a first aspect of the present invention provides a unified scene visual positioning method based on a generative countermeasure network, the method comprising:
step S100, acquiring a query image, and performing semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
step S200, splicing the semantic label graph and the query image, translating by a generator of a pre-trained generation type countermeasure network, and taking the translated image as a first image;
step S300, extracting a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
s400, acquiring a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
step S500, based on each two-dimensional-three-dimensional matching point pair, calculating a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC frame;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
In some preferred embodiments, the training method of the generative countermeasure network is as follows:
acquiring a training sample set; the training sample set comprises a query image and a database image; the database image is an image of a scene corresponding to the query image;
performing semantic segmentation on the query image and the database image through a semantic segmentation network respectively, and splicing the query image and the database image with the database image; taking the spliced inquiry image as a second image and taking the spliced database image as a third image;
decomposing the second image and the third image into a content code and a style code respectively by a generator of the generative confrontation network;
recombining the style code and the content code of the second image with the style code and the content code of the third image and then decoding; and based on the decoded image, acquiring a corresponding loss value through a discriminator of the generative countermeasure network, and updating network parameters.
In some preferred embodiments, the method for calculating the loss value of the generative countermeasure network comprises:
wherein E isAIs XAEncoder of the domain, EBIs XBEncoder of the domain, GAIs XADecoder of domains, GBIs XBDecoder of domains, DAIs to try to distinguish XADiscriminator for intermediate translation image and real image, DBIs to try to distinguish XBA discriminator for the translated image and the real image,indicating that the network is at XAThe value of the loss of resistance of the domain,indicating that the network is at XBValue of the loss of antagonism of the Domain, LcycRepresents that X isA、XBAfter the images in the domain are respectively translated into the opposite domain and then translated back to the original domain, the reconstruction loss value of the translated images relative to the original images,represents XAImage x in the domainAThe reconstruction loss value relative to the original image after the encoding-decoding operation,represents XBImage x in the domainBThe reconstruction loss value relative to the original image after the encoding-decoding operation,represents XAContent coding in the Domain cAThe reconstruction loss value after the decoding-encoding operation is encoded relative to the original content,represents XBContent coding in the Domain cBThe reconstruction loss value after the decoding-encoding operation is encoded relative to the original content,represents XAStylistic coding s in the domainAThe reconstruction loss value after decoding-encoding operation relative to the original style encoding,represents XBStylistic coding s in the domainBReconstruction loss value, lambda, relative to the original style code after a decoding-encoding operationxyc、λx、λc、λsEach representing a weight, X, corresponding to each loss functionARepresenting the corresponding scene of the query image in the generator, XBRepresenting the corresponding scene of the database image in the generator.
In some preferred embodiments, the two-dimensional local features are extracted by SuperPoint.
In some preferred embodiments, in step S300, "match the global descriptor of the first image with the global descriptors of the images in a preset image library respectively to obtain candidate images", the method includes:
extracting global descriptors of the first image and images in a preset image library through a NetVLAD;
respectively calculating the L2 distance between the global descriptor of the first image and the global descriptor of each image in a preset image library;
and taking the images of the image library corresponding to the N minimum L2 distances as candidate images, wherein N is a positive integer.
In some preferred embodiments, the three-dimensional model is constructed by:
extracting local features of each image in the image library through SuperPoint;
based on the extracted local features, camera pose calibration is carried out through a motion recovery structure method SFM and sparse point cloud is generated;
and constructing a three-dimensional model based on the camera pose and the sparse point cloud.
The invention provides a unified scene visual positioning system based on a generative confrontation network, which comprises a semantic segmentation module, a translation module, a descriptor matching module, a point pair acquisition module and a camera pose calculation module;
the semantic segmentation module is configured to acquire a query image and perform semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
the translation module is configured to splice the semantic tag graph and the query image, translate the semantic tag graph and the query image through a generator of a pre-trained generation type countermeasure network, and take the translated image as a first image;
the descriptor matching module is configured to extract a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
the point pair obtaining module is configured to obtain a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
the camera pose calculation is configured to calculate a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC frame based on each two-dimensional-three-dimensional matching point pair;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the program applications are loaded and executed by a processor to implement the above-mentioned unified scene visual positioning method based on the generative confrontation network.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the unified scene visual positioning method based on the generative confrontation network.
The invention has the beneficial effects that:
according to the invention, the query image scene and the database image scene are unified into a standard scene, and the high-precision camera pose corresponding to the query image can be obtained through two steps of processing of image retrieval and local matching, and the robust visual positioning under a long time span can be realized. The invention unifies the images under different scenes into the standard scene through the cross-domain translation of the query image and the database image, effectively overcomes the challenges brought to visual positioning due to scene changes such as illumination, weather, seasons and the like under long-time span, and thereby enhances the robustness; meanwhile, through the coarse-to-fine grading positioning process combining global image retrieval and local feature matching, the camera high-precision 6-degree-of-freedom pose corresponding to the query image can be obtained, and high-precision pose calculation of the image after the scene is unified is achieved. On the test data set, the method has higher recall rate of the positioning threshold.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
FIG. 1 is a flowchart illustrating a unified scene visual positioning method based on a generative confrontation network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a unified scene vision positioning system based on a generative confrontation network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training framework for a generator of a generative confrontation network according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the effects of a generator of a generative countermeasure network on an image before and after translation according to an embodiment of the present invention;
FIG. 5 is a detailed flowchart of a unified scene visual positioning method based on a generative confrontation network according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The unified scene visual positioning method based on the generative confrontation network, as shown in fig. 1, comprises the following steps:
step S100, acquiring a query image, and performing semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
step S200, splicing the semantic label graph and the query image, translating by a generator of a pre-trained generation type countermeasure network, and taking the translated image as a first image;
step S300, extracting a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
s400, acquiring a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
step S500, based on each two-dimensional-three-dimensional matching point pair, calculating a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC frame;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
In order to more clearly describe the unified scene visual positioning method based on the generative countermeasure network of the present invention, the following will expand the detailed description of the steps in an embodiment of the method of the present invention with reference to the drawings.
In the following preferred embodiment, the training process of the generative confrontation network is detailed first, and then the acquisition of the 6-degree-of-freedom camera pose by the unified scene vision positioning method based on the generative confrontation network is detailed.
1. Training process for generative confrontation network
Step A100, obtaining a training sample set
In the present embodiment, the training sample images used are from the CMU-Seasons dataset. This dataset is a subset of the CMU Visual localization dataset established by Badino et al. It encompasses urban, suburban and park scenes of the pittsburgh area of the united states. Images were captured by two front cameras mounted on the car, which were pointed at about 45 degrees to the left/right front of the vehicle. The collected images have a time span of 1 year, wherein a certain collected data is taken as a database (image library), and collected data under other different seasonal conditions are used for inquiry. The park scene data of the data set includes images of different weather and seasons in the same place. Therefore, the generative confrontation network of the embodiment of the present invention is a suitable training data set, i.e., the training data set includes a query image and a database image. Among them, the generative countermeasure network in the present invention is preferably a UniGAN network.
Step A200, preprocessing the query image and the database image
In this embodiment, the selected query image and the database image are subjected to semantic segmentation, and then spliced with the image before segmentation, as shown in fig. 5. The method comprises the following specific steps:
training a semantic segmentation network to perform semantic segmentation by using two-dimensional matching point pairs between images shot under different scene conditions; constructing a semantic segmentation network based on a convolutional neural network;
after training is finished, inputting the query image and the database image into a trained semantic segmentation network, and outputting a semantic label graph which endows all pixels of the query image and the database image with semantic labels;
and taking the semantic tag images corresponding to the query image and the database image as a fourth channel, splicing the RGB three channels of the query image and the database image corresponding to the fourth channel to obtain a four-channel query image and a database image combined with the semantic tags, and enabling semantic information to assist image translation. Other parts in fig. 5 are described below.
Step A300, training the generating countermeasure network based on the four-channel query image and the database image.
In this embodiment, a framework for generating a countermeasure network is adopted, and based on an automatic encoder structure, a scene in which a four-channel query image and a database image are located is defined as two domains (X)A,XB) Setting an overall loss function, training the UniGAN network to perform translation training between two domains (namely translating the query image and the database image into a unified scene), and training the UniGAN network into a cross-domain image translation model by using images under a long time span. The method comprises the following specific steps:
the automatic encoder structure is based on the setting that an image x can be encoded into a content code c and a style code s in a potential space, namely, the image is decomposed into a content code and a style code; the basic framework is as follows:
for each domain Xi(i ═ a, B) are provided with an encoder E, respectivelyiAnd decoder GiSo thatGi(ci,si)=xi. Under this architecture, image translation can be achieved by swapping the encoder and decoder pairs.
As shown in FIG. 3, xA(query image) is encoded into content code c by an auto-encoderAStyle coding sA,xB(database images) are encoded by an auto-encoderContent encoding cBStyle coding sBCombining style coding and content coding in different domains, and then translating and decoding to obtain xAB、xBA. And obtaining the loss through a discriminator based on the translated image, and updating and optimizing network parameters.
The loss of the generational countermeasure network, i.e. the total loss, includes the bidirectional reconstruction loss, the cycle consistency loss and the countermeasure loss; the bi-directional reconstruction loss includes L1 loss and MS-SSIM loss.
The bidirectional reconstruction loss means that the model should be able to reconstruct an image x in the direction image → space → imageiAnd its potential spatial coding (c)i,si) Should be reconstructed in the direction of landmark → image → landmark. The construction of the bidirectional reconstruction loss is shown in equations (1) (2) (3):
wherein,represents XAImage x in the domainAThe reconstruction loss value relative to the original image after the encoding-decoding operation is αProportion of the loss of medium MS-SSIM, GARepresents XADecoder of domains, GBIs XBThe decoder of the domain or domains is,represents XAThe content encoder of the domain is arranged to,represents XAThe style of the field is coded by a style coder,represents XAContent coding in the Domain cAThe reconstruction loss value after the decoding-encoding operation is encoded relative to the original content,represents XBThe content encoder of the domain is arranged to,represents XAStylistic coding s in the domainAAnd (4) a reconstruction loss value relative to the original style coding after the decoding-coding operation.
Other bi-directional reconstruction loss terms: xBImage x in the domainBReconstruction loss value relative to original image after encoding-decoding operationXBContent coding in the Domain cBReconstruction loss value relative to original content coding after decoding-coding operationXBStylistic coding s in the domainBReconstruction loss value relative to original style coding after decoding-coding operationConstructed in a similar manner.
The image bidirectional reconstruction loss term combines the L1 loss and the MS-SSIM loss to retain the color, brightness and contrast of the high-frequency region, and the MS-SSIM loss can be specifically referred to as the following documents: "ZHao H, Gallo O, Frosio I, et al, lossiness for image retrieval with a neural network [ J ]. IEEE Transactions on computational imaging,2016,3(1): 47-57", the present invention is not described in detail herein.
For convenience, the reconstruction loss (or differential loss) of the images m and n is briefly expressed below using the following function, as shown in equation (4):
the loss of cycle consistency means that the translated image generated by the model should have no resolvability from the real image in the target domain. The calculation is shown in equation (5):
Lcyc=RMix(GB→A(GA→B(xA)),xA)+RMix(GA→B(GB→A(xB)),xB) (5)
Lcycrepresents that X isA、XBRespectively translating the images in the domain into a counterpart domain, and translating the images back to the original domain to translate the reconstruction loss value of the images relative to the original images; gA→BRepresenting the image by XADomain translation to XBGenerator of domain, with GB(cA,sB) Equivalent, GBRepresents XBA decoder of the domain. The image is divided into XBDomain translation to Domain XAGenerator G ofB→ADefined in a similar manner.
The antagonism loss means that in the generation of an antagonism network framework, the capabilities of a generator and a discriminator are gradually strengthened in training, and finally a steady state is achieved. The construction is shown in equation (6):
wherein,indicating that the network is at XBValue of the loss of antagonism of the Domain, DBIs to try to distinguish XBMiddle translation image and real imageThe discriminator of (1). Discriminator DAAnd network at XAAntagonistic loss terms of domainsDefined in a similar manner.
The weighted sum of the bi-directional reconstruction loss, the cyclic consistency loss and the antagonism loss obtains the total loss of the generative antagonistic network, as shown in equation (7):
wherein λ iscyc,λx,λc,λsThe weight corresponding to each loss function.
Fig. 4 is a comparison diagram of changes in appearance of a partial image before and after image translation. From the left column to the right column: the method comprises the steps of obtaining an original image (Origin) before translation, mapping semantic labels into semantic graphs (Semantics) of corresponding colors after semantic segmentation, and obtaining a unified scene image (Translated) after UniGAN translation. It can be seen that the translated images largely eliminate the difference brought to the scene by the environmental factors such as season, illumination and the like, so that the images under different scenes can be basically unified to the same standard scene firstly, and then the subsequent positioning step is executed, thereby improving the robustness to the scene change
2. Unified scene visual positioning method based on generating type countermeasure network
Step S100, acquiring a query image, and performing semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map.
In the embodiment, a query image to be positioned in a current scene is acquired, and semantic segmentation is performed on the query image through a trained semantic segmentation network to obtain a semantic tag map.
And S200, splicing the semantic label graph and the query image, translating by a generator of a pre-trained generation type countermeasure network, and taking the translated image as a first image.
In this embodiment, the semantic tag graph is used as a fourth channel and is spliced with three RGB channels of the query image to obtain a four-channel query image, and the four-channel query image is translated by a generator of the pre-trained generation countermeasure network, and the translated image is used as the first image.
Step S300, extracting a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image.
In this embodiment, the translated image, i.e. the first image, is searched and matched in a pre-constructed image library to determine candidate images. The method comprises the following specific steps:
extracting a global descriptor of the translated query image by using a NetVLAD, calculating and sequencing L2 distances between the descriptor of the translated query image and each image in a pre-constructed image library by taking an L2 distance between the global descriptors as a standard, and taking the image in the image library corresponding to the smallest N L2 distances as a candidate image, namely obtaining N candidate images from the pre-constructed image library for the translated image, wherein N is a positive integer, and the distance is preferably set to 10 in the invention. The image library is a database in the above, and is a database in which an image (database image) of a scene corresponding to the query image is subjected to semantic segmentation and generator translation and then stored. NetVLAD can be specifically referenced: "Relja Arandjelovic, Petr Gronat, Akihiko Tolii, Tomas Pajdla, and Josef Sivic," Netvlad: Cn architecture for weather superior discovery registration, "in Proceedings of the IEEE conference on computer vision and printer registration, 2016, pp.5297-5307", the present invention is not described in detail herein.
In addition, feature point detection and descriptor extraction are performed on the translated image (first image) through the SuperPoint, that is, two-dimensional local features are extracted. Superpoint can be specifically referenced: "Daniel DeTone, TomaszMalisiewicz, and Andrew Rabinovich," Superpoint: Self-provided interest point detection and description, "in Proceedings of the IEEE Conference on computer Vision and Pattern Recognition works, 2018, pp.224-236", the present invention is not described in detail herein.
S400, acquiring a pre-constructed three-dimensional model corresponding to the candidate image; and matching the two-dimensional local features with the three-dimensional point cloud in the candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs.
When a training generating type confrontation network is used, a three-dimensional model under a current scene is constructed according to a translation image corresponding to a database image generated by a network generator, and the specific construction method is as follows:
extracting two-dimensional local features of each image in an image library through Superpoint;
based on the extracted two-dimensional local features, camera pose calibration is carried out through a motion recovery structure method SFM (structure-from-motion), sparse point cloud (namely SFM point cloud) is generated, and sparse reconstruction is carried out to obtain a sparse three-dimensional model.
In this embodiment, a three-dimensional model pre-constructed in a scene corresponding to N candidate images is obtained, and in the three-dimensional model, a three-dimensional point cloud in a range determined by the candidate images is matched with two-dimensional local features to obtain a two-three dimensional matching point pair. The method comprises the steps of combining similar images into the same 'place' according to the three-dimensional poses corresponding to N candidate images, and obtaining the corresponding relation between two-dimensional points of a query image and three-dimensional points of point cloud through matching of two-dimensional feature points corresponding to the candidate 'place'.
And S500, calculating the position and posture of the 6-degree-of-freedom camera corresponding to the query image through a PnP-RANSAC frame based on each two-dimensional-three-dimensional matching point pair.
In this embodiment, based on each two-dimensional-three-dimensional matching point pair, a 6-degree-of-freedom camera pose corresponding to the query image is calculated by using a PnP-RANSAC framework (preferably, a solvepnp pransac () function provided by OpenCV is used in the present invention). A 6 Degree of freedom (DoF) Pose (dose), i.e. (x, y, z) coordinates, and angular yaw, pitch, roll around three coordinate axes.
In addition, the invention was evaluated on CMU-seasides data sets in order to prove the effectiveness of the unified scene visual localization method based on generative confrontation networks. As described above, the data set contains urban, suburban and park scenes of the pittsburgh area of the united states, with images spanning a1 year time span. Therefore, the method is a suitable research carrier for the research problem of the invention. In the evaluation process, the park scenes were used to train the UniGAN and the entire dataset was used for positioning accuracy evaluation. The evaluation results are shown in table 1:
TABLE 1
In table 1, folage denotes a multi-leaf scene, Mixed folage denotes a Mixed-leaf scene, No folage denotes a No-leaf scene, Urban denotes a Urban scene, Suburban denotes a Suburban scene, park denotes a park scene, distance denotes a distance error threshold in meters, and origin denotes an angle error threshold in degrees.
Recall [% ] of three different positioning methods at different distance and angle thresholds are shown in table 1. The first method represents that scene unification is not carried out, the NetVLAD (NV) is directly used for carrying out global image retrieval, and SuperPoint (SP) is used for carrying out local feature matching to realize hierarchical positioning from coarse to fine; the second method means that UniGAN is used for unifying scenes of an original three-channel (RGB) query image and a database image and then carrying out hierarchical positioning; the third method comprises the steps of firstly performing semantic segmentation on the query image and the database image, then performing scene unification on a four-channel (RGBS) query image and a database image which are combined with a semantic label graph by using UniGAN, and then performing hierarchical positioning.
As can be seen from the data in table 1, in an application environment with a long time span, especially in a high-precision threshold (0.25m, 2 °) and a scene greatly influenced by environmental factors, the positioning result of the method with unified scene is generally superior to that of the first method, and the third method combining semantic information is generally superior to that of the second method.
A unified scene visual positioning system based on a generative confrontation network according to a second embodiment of the present invention, as shown in fig. 2, includes: the system comprises a semantic segmentation module 100, a translation module 200, a descriptor matching module 300, a point pair acquisition module 400 and a camera pose calculation module 500;
the semantic segmentation module 100 is configured to obtain a query image, and perform semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
the translation module 200 is configured to splice the semantic tag map and the query image, translate the semantic tag map and the query image by a generator of a pre-trained generative countermeasure network, and use the translated image as a first image;
the descriptor matching module 300 is configured to extract a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
the point pair obtaining module 400 is configured to obtain a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
the camera pose calculation 500 is configured to calculate a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC framework based on each two-dimensional-three-dimensional matching point pair;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
It should be noted that, the unified scene visual positioning system based on the generative countermeasure network provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are adapted to be loaded by a processor and to implement the above-mentioned unified scene visual positioning method based on a generative confrontation network.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described unified scene vision localization method based on a generative confrontation network.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.
Referring now to FIG. 6, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the method, system, and apparatus of the present application. The server in fig. 6 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system includes a Central Processing Unit (CPU)601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for system operation are also stored. The CPU 601, ROM 602, and RAM603 are connected to each other via a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (9)
1. A unified scene visual positioning method based on a generative confrontation network is characterized by comprising the following steps:
step S100, acquiring a query image, and performing semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
step S200, splicing the semantic label graph and the query image, translating by a generator of a pre-trained generation type countermeasure network, and taking the translated image as a first image;
step S300, extracting a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
s400, acquiring a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
step S500, based on each two-dimensional-three-dimensional matching point pair, calculating a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC frame;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
2. The unified scene visual positioning method based on generative confrontation network as claimed in claim 1, wherein the training method of the generative confrontation network is:
acquiring a training sample set; the training sample set comprises a query image and a database image; the database image is an image of a scene corresponding to the query image;
performing semantic segmentation on the query image and the database image through a semantic segmentation network respectively, and splicing the query image and the database image with the database image; taking the spliced inquiry image as a second image and taking the spliced database image as a third image;
decomposing the second image and the third image into a content code and a style code respectively by a generator of the generative confrontation network;
recombining the style code and the content code of the second image with the style code and the content code of the third image and then decoding; and based on the decoded image, acquiring a corresponding loss value through a discriminator of the generative countermeasure network, and updating network parameters.
3. The method for unified scene visual positioning based on generative confrontation network as claimed in claim 2, wherein the calculation method of loss value of the generative confrontation network is:
wherein E isAIs XAEncoder of the domain, EBIs XBEncoder of the domain, GAIs XADecoder of domains, GBIs XBDecoder of domains, DAIs to try to distinguish XADiscriminator for intermediate translation image and real image, DBIs to try to distinguish XBA discriminator for the translated image and the real image,indicating that the network is at XAThe value of the loss of resistance of the domain,indicating that the network is at XBValue of the loss of antagonism of the Domain, LcycRepresents that X isA、XBThe images in the domain are translated into the opposite domain respectively and thenAfter translation back to the original domain, the reconstructed loss value of the translated image relative to the original image,represents XAImage x in the domainAThe reconstruction loss value relative to the original image after the encoding-decoding operation,represents XBImage x in the domainBThe reconstruction loss value relative to the original image after the encoding-decoding operation,represents XAContent coding in the Domain cAThe reconstruction loss value after the decoding-encoding operation is encoded relative to the original content,represents XBContent coding in the Domain cBThe reconstruction loss value after the decoding-encoding operation is encoded relative to the original content,represents XAStylistic coding s in the domainAThe reconstruction loss value after decoding-encoding operation relative to the original style encoding,represents XBStylistic coding s in the domainBReconstruction loss value, lambda, relative to the original style code after a decoding-encoding operationcyc、λx、λc、λsEach representing a weight, X, corresponding to each loss functionARepresenting the corresponding scene of the query image in the generator, XBRepresenting the corresponding scene of the database image in the generator.
4. The method according to claim 1, wherein the two-dimensional local features are extracted by SuperPoint.
5. The method as claimed in claim 4, wherein in step S300, "matching the global descriptor of the first image with the global descriptors of the images in a preset image library to obtain candidate images" is performed by:
extracting global descriptors of the first image and images in a preset image library through a NetVLAD;
respectively calculating the L2 distance between the global descriptor of the first image and the global descriptor of each image in a preset image library;
and taking the images of the image library corresponding to the N minimum L2 distances as candidate images, wherein N is a positive integer.
6. The unified scene visual positioning method based on generative countermeasure network as claimed in claim 1, wherein the three-dimensional model is constructed by:
extracting local features of each image in the image library through SuperPoint;
based on the extracted local features, camera pose calibration is carried out through a motion recovery structure method SFM and sparse point cloud is generated;
and constructing a three-dimensional model based on the camera pose and the sparse point cloud.
7. A unified scene vision positioning system based on a generative confrontation network, the system comprising: the system comprises a semantic segmentation module, a translation module, a descriptor matching module, a point pair acquisition module and a camera pose calculation module;
the semantic segmentation module is configured to acquire a query image and perform semantic segmentation on the query image through a semantic segmentation network to obtain a semantic tag map;
the translation module is configured to splice the semantic tag graph and the query image, translate the semantic tag graph and the query image through a generator of a pre-trained generation type countermeasure network, and take the translated image as a first image;
the descriptor matching module is configured to extract a global descriptor and two-dimensional local features of the first image; respectively matching the global descriptor of the first image with the global descriptors of all images in a preset image library to obtain candidate images; the image library is a database stored after semantic segmentation and generator translation of the image of the scene corresponding to the query image;
the point pair obtaining module is configured to obtain a pre-constructed three-dimensional model corresponding to the candidate image; matching the two-dimensional local features with three-dimensional point clouds in a candidate image determination range in the three-dimensional model to obtain two-dimensional-three-dimensional matching point pairs;
the camera pose calculation is configured to calculate a 6-degree-of-freedom camera pose corresponding to the query image through a PnP-RANSAC frame based on each two-dimensional-three-dimensional matching point pair;
the generative countermeasure network adopts bidirectional reconstruction loss, cycle consistency loss and countermeasure loss to optimize in the training process; the bidirectional reconstruction loss comprises L1 loss and MS-SSIM loss.
8. A storage device having stored therein a plurality of programs, wherein said program applications are loaded and executed by a processor to implement the method of unified scene visual positioning based on generative confrontation networks of any of claims 1 to 6.
9. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that said program is adapted to be loaded and executed by a processor to implement the method of unified scene visual positioning based on generative confrontation networks according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517260.1A CN111724443B (en) | 2020-06-09 | 2020-06-09 | Unified scene visual positioning method based on generative confrontation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010517260.1A CN111724443B (en) | 2020-06-09 | 2020-06-09 | Unified scene visual positioning method based on generative confrontation network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111724443A true CN111724443A (en) | 2020-09-29 |
CN111724443B CN111724443B (en) | 2022-11-08 |
Family
ID=72567640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010517260.1A Active CN111724443B (en) | 2020-06-09 | 2020-06-09 | Unified scene visual positioning method based on generative confrontation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111724443B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112798020A (en) * | 2020-12-31 | 2021-05-14 | 中汽研(天津)汽车工程研究院有限公司 | System and method for evaluating positioning accuracy of intelligent automobile |
CN113313771A (en) * | 2021-07-19 | 2021-08-27 | 山东捷瑞数字科技股份有限公司 | Omnibearing measuring method for industrial complex equipment |
CN113379646A (en) * | 2021-07-07 | 2021-09-10 | 厦门大学 | Algorithm for performing dense point cloud completion by using generated countermeasure network |
CN113570535A (en) * | 2021-07-30 | 2021-10-29 | 深圳市慧鲤科技有限公司 | Visual positioning method and related device and equipment |
CN113963188A (en) * | 2021-09-16 | 2022-01-21 | 杭州易现先进科技有限公司 | Method, system, device and medium for visual positioning by combining map information |
CN114743013A (en) * | 2022-03-25 | 2022-07-12 | 中国科学院自动化研究所 | Local descriptor generation method, device, electronic equipment and computer program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180373979A1 (en) * | 2017-06-22 | 2018-12-27 | Adobe Systems Incorporated | Image captioning utilizing semantic text modeling and adversarial learning |
CN110363215A (en) * | 2019-05-31 | 2019-10-22 | 中国矿业大学 | The method that SAR image based on production confrontation network is converted into optical imagery |
CN111046125A (en) * | 2019-12-16 | 2020-04-21 | 视辰信息科技(上海)有限公司 | Visual positioning method, system and computer readable storage medium |
US20200143079A1 (en) * | 2018-11-07 | 2020-05-07 | Nec Laboratories America, Inc. | Privacy-preserving visual recognition via adversarial learning |
-
2020
- 2020-06-09 CN CN202010517260.1A patent/CN111724443B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180373979A1 (en) * | 2017-06-22 | 2018-12-27 | Adobe Systems Incorporated | Image captioning utilizing semantic text modeling and adversarial learning |
US20200143079A1 (en) * | 2018-11-07 | 2020-05-07 | Nec Laboratories America, Inc. | Privacy-preserving visual recognition via adversarial learning |
CN110363215A (en) * | 2019-05-31 | 2019-10-22 | 中国矿业大学 | The method that SAR image based on production confrontation network is converted into optical imagery |
CN111046125A (en) * | 2019-12-16 | 2020-04-21 | 视辰信息科技(上海)有限公司 | Visual positioning method, system and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
李琦: "基于生成对抗网络的仿真假体视觉图像优化研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112798020A (en) * | 2020-12-31 | 2021-05-14 | 中汽研(天津)汽车工程研究院有限公司 | System and method for evaluating positioning accuracy of intelligent automobile |
CN113379646A (en) * | 2021-07-07 | 2021-09-10 | 厦门大学 | Algorithm for performing dense point cloud completion by using generated countermeasure network |
CN113379646B (en) * | 2021-07-07 | 2022-06-21 | 厦门大学 | Algorithm for performing dense point cloud completion by using generated countermeasure network |
CN113313771A (en) * | 2021-07-19 | 2021-08-27 | 山东捷瑞数字科技股份有限公司 | Omnibearing measuring method for industrial complex equipment |
CN113313771B (en) * | 2021-07-19 | 2021-10-12 | 山东捷瑞数字科技股份有限公司 | Omnibearing measuring method for industrial complex equipment |
CN113570535A (en) * | 2021-07-30 | 2021-10-29 | 深圳市慧鲤科技有限公司 | Visual positioning method and related device and equipment |
CN113963188A (en) * | 2021-09-16 | 2022-01-21 | 杭州易现先进科技有限公司 | Method, system, device and medium for visual positioning by combining map information |
CN113963188B (en) * | 2021-09-16 | 2024-08-23 | 杭州易现先进科技有限公司 | Method, system, device and medium for combining map information visual positioning |
CN114743013A (en) * | 2022-03-25 | 2022-07-12 | 中国科学院自动化研究所 | Local descriptor generation method, device, electronic equipment and computer program product |
Also Published As
Publication number | Publication date |
---|---|
CN111724443B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111724443B (en) | Unified scene visual positioning method based on generative confrontation network | |
Lin et al. | Line segment extraction for large scale unorganized point clouds | |
Dai et al. | RADANet: Road augmented deformable attention network for road extraction from complex high-resolution remote-sensing images | |
CN110246181B (en) | Anchor point-based attitude estimation model training method, attitude estimation method and system | |
Choi et al. | Depth analogy: Data-driven approach for single image depth estimation using gradient samples | |
Su et al. | DLA-Net: Learning dual local attention features for semantic segmentation of large-scale building facade point clouds | |
CN112990152B (en) | Vehicle weight identification method based on key point detection and local feature alignment | |
Koch et al. | Real estate image analysis: A literature review | |
Li et al. | MF-SRCDNet: Multi-feature fusion super-resolution building change detection framework for multi-sensor high-resolution remote sensing imagery | |
Cheng et al. | Hierarchical visual localization for visually impaired people using multimodal images | |
Liu et al. | Registration of infrared and visible light image based on visual saliency and scale invariant feature transform | |
CN115272599A (en) | Three-dimensional semantic map construction method oriented to city information model | |
CN116844129A (en) | Road side target detection method, system and device for multi-mode feature alignment fusion | |
CN115577768A (en) | Semi-supervised model training method and device | |
US20220155441A1 (en) | Lidar localization using optical flow | |
CN114943747A (en) | Image analysis method and device, video editing method and device, and medium | |
CN117437274A (en) | Monocular image depth estimation method and system | |
CN110135474A (en) | A kind of oblique aerial image matching method and system based on deep learning | |
CN116562234A (en) | Multi-source data fusion voice indoor positioning method and related equipment | |
Jung et al. | Progressive modeling of 3D building rooftops from airborne Lidar and imagery | |
Chen et al. | An improved BIM aided indoor localization method via enhancing cross-domain image retrieval based on deep learning | |
CN114764880A (en) | Multi-component GAN reconstructed remote sensing image scene classification method | |
CN114972937A (en) | Feature point detection and descriptor generation method based on deep learning | |
Han et al. | Scene-unified image translation for visual localization | |
Li et al. | AFENet: An Attention-Focused Feature Enhancement Network for the Efficient Semantic Segmentation of Remote Sensing Images. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |