CN114913295A

CN114913295A - Visual mapping method, device, storage medium and computer program product

Info

Publication number: CN114913295A
Application number: CN202210334446.2A
Authority: CN
Inventors: 肖中阳; 韩冰; 张涛; 黄帅
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-08-16

Abstract

The embodiment of the application provides a visual mapping method, a visual mapping device, a storage medium and a computer program product, wherein the visual mapping method comprises the following steps: acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames; determining a normal feature point and an additional feature point in the image frame; carrying out online state estimation on the positions of the camera pose and the conventional feature points in the actual physical space, and establishing online map data according to the online state estimation result; performing off-line state estimation on the position of the extra feature point in the actual physical space, and establishing off-line map data according to the result of the off-line state estimation; and combining the online map data and the offline map data to obtain complete map data. The computation load of online real-time positioning and map building is reduced, the efficiency of online real-time positioning is ensured, and the higher precision of the complete map data is also ensured.

Description

Visual mapping method, device, storage medium and computer program product

Technical Field

The embodiment of the application relates to the technical field of vision, in particular to a visual mapping method, a visual mapping device, a storage medium and a computer program product.

Background

Positioning technology is a key technology generally needed in the fields of Augmented Reality (AR), automatic driving, robots and the like. Among the positioning technologies, the visual positioning technology is widely used due to its low equipment cost and relatively mature technology.

In the visual positioning technology, compared with technologies such as satellite positioning And network positioning, visual synchronous positioning And Mapping (V-SLAM), the technology can get rid of dependence on satellite signals or network communication, realize state estimation on the position And attitude of equipment in an unknown environment And the positions of other elements in the environment, And obtain a positioning result And an environment map in real time.

In an actual working scene based on the V-SLAM technology, an image acquisition device such as a camera cannot be effectively positioned due to abnormal conditions such as shading and shaking, and continuous state estimation and positioning are required to be realized by repositioning with the aid of map data. However, in the process of establishing map data, feature points need to be extracted for map establishment, if the number of the feature points is too many, the calculation amount is large, too many resources are consumed, and the efficiency of real-time positioning is affected, and if the number of the feature points is too few, the accuracy of the map data is affected, which may result in inaccurate repositioning.

Disclosure of Invention

Embodiments of the present application provide a method, an apparatus, a storage medium, and a computer program product for visual mapping to at least partially solve the above problems.

According to a first aspect of embodiments of the present application, there is provided a visual mapping method, including: acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames; determining a normal feature point and an additional feature point in the image frame; based on observation data of the positions of the conventional feature points in the image frames and inertial navigation observation data corresponding to the image frames, carrying out online state estimation on the positions of the camera pose and the conventional feature points in the actual physical space, and establishing online map data according to the online state estimation result; based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by online state estimation, performing offline state estimation on the position of the additional feature point in the actual physical space, and establishing offline map data according to the result of the offline state estimation; and combining the online map data and the offline map data to obtain complete map data.

According to a second aspect of the embodiments of the present application, there is provided a visual mapping apparatus, including: the acquisition module is used for acquiring image frames shot by the camera and inertial navigation observation data corresponding to the image frames; the characteristic point module is used for acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames; the online state estimation module is used for estimating the online state of the camera pose and the position of the conventional feature point in the actual physical space based on the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame, and establishing online map data according to the result of online state estimation; the off-line state estimation module is used for carrying out off-line state estimation on the position of the additional feature point in the actual physical space based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by on-line state estimation, and establishing off-line map data according to the result of the off-line state estimation; and the map module is used for combining the online map data and the offline map data to obtain complete map data.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the visual mapping method according to the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the visual mapping method according to the first aspect.

According to the visual mapping method, the visual mapping device, the storage medium and the computer program product, image frames shot by a camera and inertial navigation observation data corresponding to the image frames are obtained; determining a normal feature point and an additional feature point in the image frame; based on observation data of the positions of the conventional feature points in the image frames and inertial navigation observation data corresponding to the image frames, carrying out online state estimation on the positions of the camera pose and the conventional feature points in the actual physical space, and establishing online map data according to the online state estimation result; based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by online state estimation, performing offline state estimation on the position of the additional feature point in the actual physical space, and establishing offline map data according to the result of the offline state estimation; and combining the online map data and the offline map data to obtain complete map data. Because the establishment of the online real-time positioning depends on the conventional characteristic points which are not all characteristic points, the calculation amount is reduced, the efficiency of the online real-time positioning is ensured, the complete map data comprises the online map data and the offline map data, the offline map data is established in an offline state based on the additional characteristic points, the calculation power of the online real-time positioning is not occupied, and the high precision of the complete map data is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flowchart of a visual mapping method according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a conventional feature point according to a first embodiment of the present application;

FIG. 3 is a flowchart of a visual mapping method according to the second embodiment of the present application;

FIG. 4 is a schematic diagram of a feature point according to a second embodiment of the present application;

fig. 5A is a flowchart of a relocation method according to a third embodiment of the present application;

FIG. 5B is a schematic diagram of a feature point extraction, tracking, and graph creation process in the embodiment shown in FIG. 5A;

FIG. 5C is a schematic diagram of an instant positioning and conventional feature point mapping in the embodiment shown in FIG. 5A;

FIG. 5D is a schematic diagram of an additional feature point map construction in the embodiment shown in FIG. 5A;

FIG. 6 is a block diagram of a visual mapping apparatus according to a fourth embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Example one

Referring to fig. 1, a flowchart of a visual mapping method according to an embodiment of the present application is shown.

The visual mapping method of the embodiment comprises the following steps:

step S101: and acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames.

It should be noted that the Inertial navigation observation data refers to data obtained based on an Inertial navigation technology, and exemplarily, the Inertial navigation observation data may include a camera pose determined by using the Inertial navigation technology, and the Inertial navigation observation data may be obtained based on a sensor such as an Inertial Measurement Unit (IMU). The inertial navigation observation data corresponding to the image frame refers to inertial navigation observation data measured in the same time period as the image frame. The time period may be 1 second, 2 seconds, 5 seconds, etc., and the length of the time period may be set in advance. It should be further noted that the number of the image frames may be one or more, correspondingly, the number of the inertial navigation observation data is also one or more, and the image frames and the corresponding inertial navigation observation data may be periodically acquired according to a time sequence. Of course, this is merely an example.

Step S102: regular feature points and additional feature points are determined in the image frame.

It should be noted that, in the present application, the regular feature points are feature points for performing online positioning and online state estimation, and the additional feature points are feature points for performing offline state estimation, and if there are a plurality of image frames, after the regular feature points and the additional feature points are determined in one image frame, the corresponding regular feature points and additional feature points in other image frames can be determined. For example, as shown in fig. 2, fig. 2 is a schematic diagram of a conventional feature point according to a first embodiment of the present application, where the 1 st image frame includes conventional feature points a1, B1, and C1, and corresponding object points in an actual physical space (in a map) are A, B, C, that is, a1, B1, and C1 are image points of A, B, C, respectively, and then the conventional feature points in the 2 nd image frame are image points a2 corresponding to a point a in the 2 nd image frame, B2 corresponding to a point B in the 2 nd image frame, and C2 corresponding to a point C in the 2 nd image frame. Of course, this is merely illustrative. There are various ways to determine the regular feature points and the extra feature points in the image frame, and here, two specific implementations are exemplified:

optionally, in the first example, determining the regular feature points and the extra feature points in the image frame includes: calculating a response value of a point in the image frame, and determining a point with the response value larger than a preset response threshold value as a characteristic point; the image frame is divided into at least two grid regions, the first Nc% of the feature points in the grid regions, in which the response values are sorted from large to small, are determined as regular feature points, and the remaining feature points in the grid regions are determined as additional feature points, Nc being a number greater than 0 and less than 100. Illustratively, Nc may be equal to 50, 60, 40, etc., and the larger Nc indicates that the more conventional feature points, the more accurate online positioning is, but the more calculation effort is consumed, and the lower Nc indicates that the more conventional feature points, the less accurate online positioning is, but the less calculation effort is, and the higher efficiency is. No matter how Nc takes values, the total number of the feature points is not affected, that is, the accuracy of the finally obtained complete map data can be ensured. The feature point may be a relatively prominent point in the image frame, and in some application scenarios, the feature point may be an extreme point. The response value of the feature point may indicate the degree of protrusion of the feature point in the image frame, and the larger the response value, the more the feature point is protruded. After the image frame is divided into at least two grid areas, some conventional feature points and additional feature points are selected in each grid area, so that the conventional feature points can be uniformly distributed in the image frame, the additional feature points can be uniformly distributed in the image frame, and the accuracy of online positioning and online state estimation and the accuracy of offline state estimation can be improved. Alternatively, the response value of a point may represent the similarity between the point and the points in the neighborhood, and the smaller the similarity (the larger the difference), the larger the response value, the larger the similarity (the smaller the difference), and the smaller the response value. Illustratively, the average value of the differences between the pixel values of one point and other points in the neighborhood of the point is taken as the response value of the point, and the point is only exemplarily illustrated here.

Step S103: and performing online state estimation on the camera pose and the position of the conventional feature point in the actual physical space based on the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame, and establishing online map data according to the online state estimation result.

The online map data refers to map data determined by online state estimation. Alternatively, the observation data of the position of the regular feature point in the image frame may include the coordinates of the regular feature point in the image frame. Here, the specific example is given to explain how to perform online state estimation and establish online map data, and performing online state estimation on the camera pose and the position of the conventional feature point in the actual physical space based on the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame, and establishing online map data according to the result of the online state estimation includes:

calculating estimation data of a camera pose and estimation data of the position of the conventional characteristic point in an actual physical space according to observation data of the position of the conventional characteristic point in an image frame and inertial navigation observation data corresponding to the image frame; calculating residual error data based on the estimation data of the camera pose and the estimation data of the positions of the conventional feature points in the actual physical space, and aiming at the same conventional feature point in at least two image frames, taking the estimation data of the position of the conventional feature point in the actual physical space as the state data of the conventional feature point when the sum of the residual error data of the conventional feature point is minimum; and establishing online map data based on the state data of the at least one conventional feature point.

And residual data are calculated, and the state data of the conventional characteristic points are determined according to the residual data, so that the error can be optimized, and the graph building precision is improved. It should be noted that the estimation data of the camera pose can be determined according to the inertial navigation observation data; the estimation data of the position of the regular feature point in the actual physical space may be coordinates of the estimated regular feature point in the actual physical space, and the state data of the regular feature point may be coordinates of the finally determined regular feature point in the actual physical space. It should be noted that, if a plurality of image frames ordered in time are included, the residual data of the conventional feature points may represent an error between the observation of the current image frame and the observation of the historical image frame, which is only an exemplary illustration here. Further optionally, the residual data may include at least one of a residual formed by temporal window marginalization prior constraints, a residual of inertial navigation observation data between two adjacent frames, and a re-projection residual of a conventional feature point. When the residual error data contains a plurality of residual errors, the optimization effect can be improved, so that the state data of the conventional characteristic points is more accurate, namely the positioning of the conventional characteristic points in the actual physical space is more accurate. In addition, further, for the camera pose, the estimated data of the camera pose when the sum of the residual data of at least two conventional feature points is minimized for one image frame may be used as the state data of the camera pose. The camera pose is optimized by using the residual errors, and the accuracy of the camera pose can be improved.

Step S104: and performing off-line state estimation on the position of the additional feature point in the actual physical space based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by on-line state estimation, and establishing off-line map data according to the off-line state estimation result.

The offline map data refers to map data determined by offline state estimation. Alternatively, a specific example is set forth herein to illustrate how to perform offline state estimation and build offline map data. The method comprises the following steps of performing off-line state estimation on the position of an extra feature point in an actual physical space based on observation data of the position of the extra feature point in an image frame and state data of a camera pose obtained by on-line state estimation, and establishing off-line map data according to the off-line state estimation result, wherein the off-line state estimation comprises the following steps:

calculating estimation data of the position of the additional feature point in the actual physical space according to the observation data of the position of the additional feature point in the image frame and the estimation data of the camera pose; calculating residual error data based on the estimation data of the positions of the additional feature points in the actual physical space, and regarding the same additional feature point in at least two image frames, when the sum of the residual error data of the additional feature points is minimum, taking the estimation data of the positions of the additional feature points in the actual physical space as the state data of the additional feature points; offline map data is established based on the state data of the at least one additional feature point. It should be noted that, if a plurality of image frames are included in a time sequence, the residual data of the additional feature points may represent an error between an observation of the current image frame and an observation of the historical image frame, which is only an exemplary illustration here. And the state data of the additional characteristic points are optimized by using the residual errors, so that the positioning accuracy of the additional characteristic points in the map can be improved.

Step S105: and combining the online map data and the offline map data to obtain complete map data.

With reference to step S103 and step S104, optionally, taking the example that the state data and the estimation data represent the coordinates of the feature point in the actual physical space as an example, the online map data may include the coordinates of the conventional feature point in the actual physical space, that is, the position of the conventional feature point in the actual physical space; the offline map data may contain the coordinates of the additional feature point in the actual physical space, i.e., the location of the additional feature point in the actual physical space. In this way, the on-line map data and the off-line map data are combined to obtain complete map data, that is, the map data including the coordinates of the normal feature points and the coordinates of the additional feature points.

Optionally, after the complete map data is obtained, when the visual positioning is abnormal, the visual repositioning may be performed based on the complete map data, and exemplarily, when the visual positioning is determined to be abnormal according to the abnormal detection result, the feature point extraction is performed on the image frame to be positioned; acquiring a candidate image frame from the complete map data, matching feature points in the candidate image frame with feature points in an image frame to be determined, and calculating the initial position of the image frame to be positioned; based on the complete map data, carrying out re-projection matching on the feature points according to the initial position, and resolving the pose of the image frame to be positioned.

In combination with steps S101-S105, because the establishment of the online real-time positioning depends on the conventional feature points, which are not all feature points, the computation workload is reduced, and the efficiency of the online real-time positioning is ensured, while the complete map data includes the online map data and the offline map data, which are established in an offline state based on the additional feature points, the computation workload of the online real-time positioning is not occupied, and the precision of the complete map data is ensured to be higher.

Example two

Based on the visual mapping method described in the first embodiment, a second embodiment of the present application provides another visual mapping method, which further describes the method described in the first embodiment, as shown in fig. 3, the method includes the following steps:

step S301: and acquiring an image frame shot by the camera and inertial navigation observation data corresponding to the image frame.

It should be noted that the image frames and the corresponding inertial navigation observation data may be acquired periodically, for example, one image frame and the corresponding inertial navigation observation data are acquired every second, and for example, one image frame and the corresponding inertial navigation observation data are acquired every 0.1 second. Inertial navigation observation data can be acquired by using an IMU, which is only exemplary.

Step S302: regular feature points and additional feature points are determined in the image frame.

Specifically, as shown in fig. 4, fig. 4 is a schematic diagram of a feature point according to a second embodiment of the present application. Dividing an image frame into at least two grid areas, then calculating response values of points in the image frame, determining the points with the response values larger than a preset response threshold value as feature points, determining at least two feature points in the image frame, determining the feature points with the response values positioned at Nc% in the front in the descending order in each grid area as conventional feature points, and determining the rest feature points as extra feature points.

After step S302, performing online positioning and online state estimation based on the conventional feature points:

step S303: and performing online state estimation on the position of the camera pose and the position of the conventional feature point in the actual physical space based on the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame.

For the online state estimation of the conventional feature point, which may also be referred to as tracking of the conventional feature point, a specific example is described here, the camera pose may be estimated by using inertial navigation observation data, i.e., estimation data representing the camera pose may be obtained, the position (i.e., coordinates) of the conventional feature point in the image frame may be determined by using the image frame, the camera pose determined by using the inertial navigation observation data and the position of the conventional feature point in the image frame may be estimated by projecting in the actual space based on geometric constraints, i.e., the estimated data of the conventional feature point in the actual physical space, for the same conventional feature point, the estimation data of one conventional feature point may be obtained by using each image frame including the conventional feature point and the corresponding inertial navigation observation data, and there is an error between the estimation data of the conventional feature points, it may be optimized to select the estimation data that minimizes the error between the estimation data of the regular feature points. In another specific example, the estimation of the camera pose is explained, and the camera pose can be determined by using inertial navigation observation data, but for different conventional feature points in the same image frame, the same pose of the camera corresponds, but different estimation data of the camera pose cause different errors on the estimation data of the feature points, so that the estimation data of the camera pose with the smallest sum of the residual error data of the different conventional feature points should be selected.

In combination with the above description, optionally, the state quantity is estimated by a least squares optimization method in the maximum a posteriori sense. Constructing an optimization problem in a time window with a certain time length, wherein the optimized variables are as follows:

χ＝[x ₀ ,x ₁ ,…,x _n ,p ₀ ,p ₁ ,…,p _m ]

wherein x is _k The states of the position, the posture, the speed and the like of the image frame in the k period are shown, and n is the number of the image frames in the time window. p is a radical of _l Is the three-dimensional position of the ith anomalous map point, and m is the number of all conventional feature points observed in all image frames within the time window. The optimization problem constructed according to the maximum posterior is as formula one:

wherein, min _χ Denotes χ, the first term | | | r when the residual data is taken as the minimum _p -H _p χ|| ² For the residual formed by the time window marginalization prior constraint, { rp, Hp } represents the marginalized prior information;

is between the k frame and the k +1 frame in the time window

Residual errors formed by IMU observation data, wherein B represents a data set of all IMU observation data;

for the ith conventional feature point in the jth image frame

C represents the set of observed conventional feature points in the current time window. The specific implementation of each item can be realized by those skilled in the art through an appropriate equation or function according to actual needs, and the embodiments of the present application are not limited thereto. It should be noted that, in the above-mentioned specific implementation equation or function of each item, the bias of the IMU, the state quantities (i.e. state estimation results) of the conventional image frame and the conventional map point directly take initial values, and are not optimized and updated.

Step S304: and determining the state data of the camera pose and the state data of the conventional feature points according to the result of the online state estimation, and establishing online map data based on the state data of at least one conventional feature point.

And combining a formula I, when the minimum value of the residual error data is met, taking the estimation data of the camera pose as the state data of the camera pose, and taking the estimation data of the conventional feature point as the state data of the conventional feature point.

After step S302, off-line state estimation is performed based on the extra feature points:

step S305: and performing off-line state estimation on the position of the additional feature point in the actual physical space based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by on-line state estimation.

For the off-line state estimation of the extra feature point, which may also be referred to as tracking of the extra feature point, a specific example is enumerated here to illustrate, and according to the state data of the camera pose and the observation data of the extra feature point (the coordinates of the extra feature point in the image frame), projection is performed in the actual physical space based on geometric constraints, so that the estimation data of the extra feature point, that is, the estimated coordinates of the extra feature point in the actual physical space, may be obtained.

Optionally, in an embodiment, a least-squares optimization method in a maximum likelihood sense is used to solve. For the qth extra feature point in the system, the optimized variable is the estimated data (i.e. the coordinates of the extra feature point in the actual physical space) p of the extra feature point _q . The corresponding optimization problem is as the formula two:

wherein,

for the qth conventional feature point in the jth image frame

The visual reprojection residual formed by the observations in (1). Alternatively, the state estimation optimization of an additional feature point may be performed when the number of image frames of the additional feature point exceeds a preset number, that is, offline state estimation is performed by using formula two, which is only an example.

Step S306: determining state data of the additional feature points according to the result of the offline state estimation, and establishing offline map data based on the state data of at least one additional feature point.

It should be noted that, the execution sequence of steps S303 to S304 and steps S305 to S306 are not sequential, and may also be executed by using different threads.

After step S304 and step S306, the method further includes:

step S307: and combining the online map data and the offline map data to obtain complete map data.

In the embodiment, the establishment of the online real-time positioning depends on the conventional feature points, the conventional feature points are not all feature points, the calculation amount is reduced, the online real-time positioning efficiency is ensured, the complete map data comprises the online map data and the offline map data, the offline map data is established in an offline state based on the additional feature points, the calculation power of the online real-time positioning is not occupied, and the high accuracy of the complete map data is ensured.

The visual mapping method of the embodiment may be executed by any suitable electronic device with visual mapping capability, including but not limited to: mobile terminals (such as mobile phones, PADs, etc.) PCs, servers, etc.

EXAMPLE III

The present embodiment is based on the method of the first or second embodiment, and a relocation method based on the method is described, as shown in fig. 5A, the method includes the following steps:

step S100: and constructing a relocation map.

Construction of the relocation map is performed before camera occlusion/fast motion occurs, including estimation of the regular and extra feature point position states. If the tracking of the feature points is stable in the process, the positions of the camera pose and the map points of the conventional feature points can be estimated on line in real time, and meanwhile, the positions of the additional feature points are estimated in an off-line mode, so that a basis is provided for relocation in advance.

In the above process, the process of extracting visual feature points from the normal feature points and the extra feature points is shown in fig. 5B, and includes:

process 1: extracting characteristic points and calculating a descriptor.

In the present embodiment, the visual feature points are classified into two types: regular feature points (e.g., star symbols in fig. 4) and extra feature points (e.g., rectangle symbols in fig. 4). In the same frame of image, there is no intersection between the conventional feature point set and the additional feature point set. In this embodiment, the feature points are uniformly extracted by the same method and are divided into the conventional feature points and the additional feature points, but the specific types of the feature points are not limited. In order to improve the precision of real-time positioning and mapping, the two types of feature points need to be uniformly distributed in the image. For this reason, the present embodiment divides the image frame into a plurality of grids, and extracts a certain proportion of regular feature points and extra feature points in each grid.

In one embodiment, a certain response value (score value) threshold is set, and N FAST corner points are extracted from the whole image. According to the response value size, the first N with the maximum response value is arranged in each grid _c % FAST corner points are used as conventional feature points; the rest of N _e % FAST corner points as extra feature points.

And (2) a process: and performing conventional characteristic point tracking and additional characteristic point tracking based on the conventional characteristic points and the additional characteristic points.

After the conventional feature point and the additional feature point are extracted, the descriptor of the feature point is calculated, and the descriptor type is not specified in the embodiment. In one embodiment, an ORB descriptor is used as a local feature for the description. And respectively tracking the conventional characteristic points and the additional characteristic points among frames. The present embodiment does not specify a tracking method. In one embodiment, the LK optical flow method is used for inter-frame tracking. According to the inter-frame association relation of the conventional feature points and in combination with other sensors, the position of the conventional feature points and the real-time pose of the camera can be subjected to real-time state estimation, and a current camera pose and a conventional feature point map are obtained; and estimating the position of the additional feature point in another thread according to the positioning result of the camera and the inter-frame tracking of the additional feature point to obtain an additional feature point map.

And 3, process: and (4) instant positioning and conventional feature point mapping (instant positioning mapping).

The real-time positioning and conventional feature point map construction solve the problems of estimation of real-time pose states of a camera in a world coordinate system and estimation of position states of conventional feature points in the world coordinate system. The instant positioning and the conventional characteristic point map construction are summarized asOne system state estimation (state estimation) problem, as shown in FIG. 5C, the system observations may include (but are not limited to) the camera observations of regular feature points over a period of time

(solid line as shown by symbol 5), observation of inertial navigation Module (IMU)

(as shown by the dashed line of symbol 4), the state quantities may include (without limitation) the pose state of the camera (as shown by symbol 1) over a period of time, the position state of the regular feature point (as shown by symbol 2), and the bias state of the inertial navigation module, etc. It should be noted that, in order to control the power consumption, in the instant positioning and conventional feature point mapping, the observation quantity does not include the observation of the conventional feature point (as indicated by the symbol 3) by the camera, and the state quantity does not include the position state of the conventional feature point.

The present embodiment does not limit the method of solving the state estimation problem. In one embodiment, the state quantities are estimated using a least squares optimization in the maximum a posteriori sense. Constructing an optimization problem in a time window with a certain time length, wherein the optimized variables are as follows:

χ＝[x ₀ ,x ₁ ,…,x _n ,p ₀ ,p ₁ ,…,p _m ]

wherein x is _k And n is the pose of the camera at the moment k, and the number of the camera moments in the time window. p is a radical of _l Is the three-dimensional position of the ith conventional feature point, and m is the number of all conventional feature points observed in all conventional frames in the time window. As mentioned before, the state quantities may also include camera speed, IMU bias, etc., which are not shown in the above equation. The optimization problem constructed according to the maximum posterior is as follows:

wherein the meaning of each parameter in the above formula can be described with reference to formula one, such as the first term being a sliding time windowThe residual formed by the mouth-marginalized prior constraints,

and forming a residual error for IMU observation between the kth frame and the k +1 frame in the sliding time window.

And observing the constructed visual re-projection residual in the j frame camera for the l conventional feature point.

And 4, process: and constructing the map based on the extra feature points of the offline mapping.

Fig. 5D is a diagram illustrating additional feature point mapping. In part of process 3, the camera pose, as indicated by symbol 7, has been estimated. The extra feature point map construction solves the problem of estimating the position state of the extra feature points according to the camera pose and the observation of the extra feature points by the camera from a plurality of visual angles. If the state quantity is reduced to the state estimation problem, the state quantity to be estimated is the position of the additional characteristic point (shown as a symbol 6); the observed quantity is the observation of the camera on the additional characteristic point

(as indicated by symbol 8).

The present embodiment does not limit the method of solving the aforementioned state estimation problem. In one embodiment, a least squares optimization method in the maximum likelihood sense is used for solving. For the qth additional feature point map feature point in the system, the optimized variable is p _q . The corresponding optimization problem is

Wherein the meaning of each parameter in the above formula can be as described in the above formula two, for example,

indicating the residual error formed by observing the point in the j frame image.

It should be noted that, unlike the real-time solution of complex optimization problems in the instant positioning and conventional feature point map construction, the construction of the additional feature point map is not necessarily performed in real time, and the solution of the state estimation optimization problems corresponding to different feature points can be performed in different threads. In one embodiment, when the number of image frames observed by an additional feature point exceeds a certain threshold, the feature point state estimation optimization problem is solved.

Step S200: and carrying out relocation based on the constructed relocation map.

And entering a repositioning process after the tracking of the feature points of the camera fails due to occlusion or quick movement. In this process, feature points are extracted from the repositioned frame. Performing initial association with key frames in a map according to a certain method (such as Bag of word) to obtain candidate matching key frames; then matching the feature points in the candidate key frame with the feature points in the repositioning frame according to the feature point descriptors and the geometric constraints, and calculating the rough position of the current frame according to the feature points; and finally, carrying out re-projection matching on all other feature points in the map according to the rough position to obtain the final association relationship between the repositioning frame and the map point, and solving the pose of the repositioning frame. Feature point matching and pose estimation in relocation can be performed by mainstream algorithms such as ORB-SLAM, VINS-MONO and the like, and details are not repeated herein.

With the present embodiment, an effective visual repositioning can be performed based on the constructed repositioning map,

example four

With reference to fig. 6, a block diagram of a visual image creating apparatus according to a third embodiment of the present application is shown.

The visual image creating apparatus 50 of the present embodiment includes: the acquiring module 501 is configured to acquire an image frame captured by a camera and inertial navigation observation data corresponding to the image frame; a feature point module 502 for determining regular feature points and additional feature points in the image frame; the online state estimation module 503 is configured to perform online state estimation on the camera pose and the position of the conventional feature point in the actual physical space based on observation data of the position of the conventional feature point in the image frame and inertial navigation observation data corresponding to the image frame, and establish online map data according to a result of the online state estimation; an offline state estimation module 504, configured to perform offline state estimation on the position of the additional feature point in the actual physical space based on observation data of the position of the additional feature point in the image frame and state data of a camera pose obtained through online state estimation, and establish offline map data according to a result of the offline state estimation; and a map module 505, configured to combine the online map data and the offline map data to obtain complete map data.

Optionally, the feature point module 502 is configured to calculate a response value of a point in the image frame, and determine a point, of which the response value is greater than a preset response threshold, as a feature point; the image frame is divided into at least two grid regions, the first Nc% of the feature points in the grid regions, in which the response values are sorted from large to small, are determined as regular feature points, and the remaining feature points in the grid regions are determined as additional feature points, Nc being a number greater than 0 and less than 100.

Optionally, the online state estimation module 503 is configured to calculate estimated data of the camera pose and estimated data of the position of the conventional feature point in the actual physical space according to the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame; calculating residual error data based on the estimation data of the camera pose and the estimation data of the positions of the conventional feature points in the actual physical space, and aiming at the same conventional feature point in at least two image frames, when the sum of the residual error data of the conventional feature points is minimum, the estimation data of the positions of the conventional feature points in the actual physical space; on-line map data is established based on estimated data of a position of at least one conventional feature point in an actual physical space.

Optionally, the line state estimation module 503 is configured to estimate data of the camera pose when the sum of residual data of at least two conventional feature points is minimized for one image frame.

Optionally, the residual data includes at least one of a residual formed by temporal window marginalization prior constraints, a residual of inertial navigation observation data between two adjacent frames, and a re-projection residual of a conventional feature point.

Optionally, the offline state estimation module 504 is configured to calculate estimated data of the position of the additional feature point in the actual physical space according to the observation data of the position of the additional feature point in the image frame and the estimated data of the camera pose; calculating residual error data based on the estimation data of the positions of the additional feature points in the actual physical space, and aiming at the same additional feature point in at least two image frames, calculating the estimation data of the positions of the additional feature points in the actual physical space when the sum of the residual error data of the additional feature points is minimum; the offline map data is established based on the estimated data of the location of the at least one additional feature point in the actual physical space.

Optionally, the visual mapping apparatus 50 further includes a repositioning module 506, configured to perform feature point extraction on the image frame to be positioned when it is determined that the visual positioning abnormality occurs according to the abnormality detection result; acquiring a candidate image frame from the complete map data, matching feature points in the candidate image frame with feature points in an image frame to be determined, and calculating the initial position of the image frame to be positioned; based on the complete map data, carrying out re-projection matching on the feature points according to the initial position, and resolving the pose of the image frame to be positioned.

The visual mapping device of this embodiment is used to implement the corresponding visual mapping method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the visual image creating apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not repeated here.

EXAMPLE five

With reference to fig. 7, a schematic structural diagram of an electronic device according to a fifth embodiment of the present application is shown in combination with the first embodiment and the third embodiment, and the specific embodiment of the present application does not limit the specific implementation of the electronic device.

As shown in fig. 7, the electronic device may include: a processor (processor) 602, a Communications Interface 604, a memory 606, and a communication bus 608.

Wherein:

processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608.

A communication interface 604 for communicating with other electronic devices or servers.

The processor 602 is configured to execute the program 610, and may specifically execute the relevant step S in the above embodiment of the visual mapping method.

In particular, program 610 may include program code comprising computer operating instructions.

The processor 602 may be a CPU, or an application Specific Integrated circuit (asic), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 606 for storing a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 610 may specifically be configured to cause the processor 602 to perform the following operations: acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames; determining a normal feature point and an additional feature point in the image frame; based on observation data of the positions of the conventional feature points in the image frames and inertial navigation observation data corresponding to the image frames, carrying out online state estimation on the positions of the camera pose and the conventional feature points in the actual physical space, and establishing online map data according to the online state estimation result; based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by online state estimation, performing offline state estimation on the position of the additional feature point in the actual physical space, and establishing offline map data according to the result of the offline state estimation; and combining the online map data and the offline map data to obtain complete map data.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to calculate a response value of a point in the image frame, and determine a point where the response value is greater than a preset response threshold as the feature point; the image frame is divided into at least two grid regions, the first Nc% of the feature points in the grid regions, in which the response values are sorted from large to small, are determined as regular feature points, and the remaining feature points in the grid regions are determined as additional feature points, Nc being a number greater than 0 and less than 100.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to calculate estimated data of the camera pose and estimated data of the position of the regular feature point in the actual physical space according to the observation data of the position of the regular feature point in the image frame and the inertial navigation observation data corresponding to the image frame; calculating residual error data based on the estimation data of the camera pose and the estimation data of the positions of the conventional feature points in the actual physical space, and aiming at the same conventional feature point in at least two image frames, when the sum of the residual error data of the conventional feature points is minimum, the estimation data of the positions of the conventional feature points in the actual physical space; on-line map data is established based on estimated data of the position of the at least one conventional feature point in the actual physical space.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to estimate data of the camera pose when the sum of residual data of at least two conventional feature points is minimized for one image frame.

In an alternative embodiment, the residual data includes at least one of a residual formed by a temporal window marginalization prior constraint, a residual of inertial navigation observation data between two adjacent frames, and a re-projection residual of a conventional feature point.

In an alternative embodiment, the program 410 is further configured to cause the processor 402 to calculate estimated data of the position of the additional feature point in the actual physical space based on the observed data of the position of the additional feature point in the image frame and the estimated data of the camera pose; calculating residual error data based on the estimation data of the positions of the additional feature points in the actual physical space, and aiming at the same additional feature point in at least two image frames, calculating the estimation data of the positions of the additional feature points in the actual physical space when the sum of the residual error data of the additional feature points is minimum; the offline map data is established based on the estimated data of the location of the at least one additional feature point in the actual physical space.

In an optional implementation manner, the program 410 is further configured to enable the processor 402 to perform feature point extraction on an image frame to be positioned when it is determined that a visual positioning abnormality occurs according to an abnormality detection result; acquiring a candidate image frame from the complete map data, matching feature points in the candidate image frame with feature points in an image frame to be determined, and calculating the initial position of the image frame to be positioned; based on the complete map data, carrying out re-projection matching on the feature points according to the initial position, and resolving the pose of the image frame to be positioned.

For specific implementation of each step in the program 410, reference may be made to corresponding descriptions in corresponding steps in the above embodiment of the visual mapping method, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Through the electronic equipment of the embodiment, because the online real-time positioning depends on the conventional characteristic points, the conventional characteristic points are not all the characteristic points, the calculation amount is reduced, and the online real-time positioning efficiency is ensured.

It should be noted that, according to implementation needs, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the visual mapping methods described herein. Further, when a general-purpose computer accesses code for implementing the visual mapping method illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the visual mapping method illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A visual mapping method, comprising:

acquiring an image frame shot by a camera and inertial navigation observation data corresponding to the image frame;

determining regular feature points and additional feature points in the image frame;

based on observation data of the positions of the conventional feature points in the image frames and inertial navigation observation data corresponding to the image frames, carrying out online state estimation on the positions of the camera pose and the conventional feature points in the actual physical space, and establishing online map data according to the online state estimation result;

performing off-line state estimation on the position of the additional feature point in the actual physical space based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by on-line state estimation, and establishing off-line map data according to the result of the off-line state estimation;

and combining the online map data and the offline map data to obtain complete map data.

2. The method of claim 1, wherein the determining regular feature points and extra feature points in the image frame comprises:

calculating a response value of a point in the image frame, and determining a point of which the response value is greater than a preset response threshold value as a feature point;

dividing the image frame into at least two grid regions, determining, as the regular feature points, feature points of the first Nc% of response values ordered from large to small in the grid regions, determining remaining feature points in the grid regions as the additional feature points, Nc being a number greater than 0 and less than 100.

3. The method of claim 1, wherein the performing online state estimation on the camera pose and the position of the regular feature point in the actual physical space based on the observation data of the position of the regular feature point in the image frame and the inertial navigation observation data corresponding to the image frame, and establishing online map data according to the online state estimation result comprises:

calculating estimated data of the camera pose and estimated data of the position of the conventional feature point in an actual physical space according to the observed data of the position of the conventional feature point in the image frame and inertial navigation observed data corresponding to the image frame;

calculating residual data based on the estimated data of the camera pose and the estimated data of the position of the conventional feature point in the actual physical space, and regarding the same conventional feature point in at least two image frames, taking the estimated data of the position of the conventional feature point in the actual physical space as the state data of the conventional feature point when the sum of the residual data of the conventional feature point is minimum;

and establishing the online map data based on the state data of at least one conventional feature point.

4. The method of claim 3, wherein the method further comprises:

and for one image frame, when the sum of residual error data of at least two conventional feature points is minimum, the estimated data of the camera pose is used as the state data of the camera pose.

5. The method of claim 3, wherein the residual data comprises at least one of a residual formed by temporal window marginalization prior constraints, a residual of inertial navigation observation data between two adjacent frames, and a reprojection residual of the regular feature points.

6. The method of claim 1, wherein the performing offline state estimation on the position of the additional feature point in the actual physical space based on the observed data of the position of the additional feature point in the image frame and the estimated data of the camera pose obtained by online state estimation, and establishing offline map data according to the result of the offline state estimation comprises:

calculating estimated data of the position of the additional feature point in the actual physical space according to the observed data of the position of the additional feature point in the image frame and the estimated data of the camera pose;

calculating residual data based on the estimation data of the positions of the additional feature points in the actual physical space, and taking the estimation data of the positions of the additional feature points in the actual physical space as the state data of the additional feature points when the sum of the residual data of the additional feature points is minimum for the same additional feature point in at least two image frames;

establishing the offline map data based on state data of at least one of the additional feature points.

7. The method of any of claims 1-6, wherein the method further comprises:

when the abnormal visual positioning is determined to occur according to the abnormal detection result, extracting feature points of the image frame to be positioned;

acquiring a candidate image frame from the complete map data, matching feature points in the candidate image frame with feature points in the image frame to be determined, and calculating an initial position of the image frame to be positioned;

based on the complete map data, carrying out re-projection matching on the feature points according to the initial position, and calculating the pose of the image frame to be positioned.

8. A visual mapping apparatus, comprising:

the acquisition module is used for acquiring image frames shot by a camera and inertial navigation observation data corresponding to the image frames;

a feature point module to determine conventional feature points and additional feature points in the image frame;

the online state estimation module is used for estimating the online state of the camera pose and the position of the conventional feature point in the actual physical space based on the observation data of the position of the conventional feature point in the image frame and the inertial navigation observation data corresponding to the image frame, and establishing online map data according to the result of online state estimation;

the off-line state estimation module is used for carrying out off-line state estimation on the position of the additional feature point in the actual physical space based on the observation data of the position of the additional feature point in the image frame and the state data of the camera pose obtained by on-line state estimation, and establishing off-line map data according to the result of the off-line state estimation;

and the map module is used for combining the online map data and the offline map data to obtain complete map data.

9. A storage medium having stored thereon a computer program which, when executed by a processor, carries out the method according to any one of claims 1-7.

10. A computer program product which, when executed by a processor, implements the method of any one of claims 1-7.