CN117036447A

CN117036447A - Indoor scene dense three-dimensional reconstruction method and device based on multi-sensor fusion

Info

Publication number: CN117036447A
Application number: CN202310964130.6A
Authority: CN
Inventors: 詹文强; 徐庶; 宫成业; 刘庆杰; 毛翔; 吴婕
Original assignee: Nanhu Research Institute Of Electronic Technology Of China
Current assignee: Nanhu Research Institute Of Electronic Technology Of China
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-11-10

Abstract

The invention discloses a multi-sensor fusion-based indoor scene dense three-dimensional reconstruction method and device, wherein the method comprises the steps of obtaining an IMU odometer according to IMU data; correcting radar data by using the IMU data, and obtaining a radar odometer according to the adjacent corrected radar data; taking the IMU odometer as an initialization parameter, obtaining a visual initial pose according to visual data, and introducing a historical global three-dimensional map to optimize the visual initial pose to obtain the visual odometer; and extracting dense Ping Miandian cloud according to the radar data and the visual data, converting the point cloud and dense Ping Miandian cloud of the radar data to obtain a three-dimensional model, determining pixels of each point cloud in the three-dimensional model in the visual data based on the matched radar odometer and the visual odometer, and mapping pixel values of corresponding pixels to the three-dimensional model to obtain a new global three-dimensional map. The invention uses visual data, IMU data and radar data to stably and effectively establish a high-precision global three-dimensional map of the indoor scene.

Description

Indoor scene dense three-dimensional reconstruction method and device based on multi-sensor fusion

Technical Field

The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to an indoor scene dense three-dimensional reconstruction method and device based on multi-sensor fusion.

Background

In the field of intelligent control application, such as unmanned aerial vehicle navigation, automatic driving automobile, robot positioning and the like, three-dimensional reconstruction and environment perception are key technologies for development of the intelligent control system. Existing three-dimensional reconstruction and environmental perception methods typically rely on a single type of sensor, such as a vision camera, IMU (inertial measurement unit) or lidar. However, single types of sensors often have limitations such as reduced performance of the vision camera in low light environments, accumulated errors of the IMU over long periods of operation, laser radar being affected by object reflectivity, etc.

In the prior art, for example, chinese patent document with application number CN202211305636.8 discloses a method, device, equipment and storage medium for reconstructing a three-dimensional scene, where the method obtains an image sequence of a target scene, where the image sequence includes a plurality of continuous image groups; generating an optical flow estimation image corresponding to the first image based on the first image and the second image in the current image group; generating an optical flow estimation image corresponding to the second image based on the second image and the third image in the current image group; and performing scene construction based on the first image, the second image, the optical flow estimation image corresponding to the first image and the optical flow estimation image corresponding to the second image to obtain a construction scene corresponding to the current image group. The method is based on the traditional three-dimensional reconstruction, and uses the optical flow algorithm to remove moving objects in the scene, but the application scene of the method has certain limitation because the vision is greatly influenced by the environment.

In the prior art, for example, chinese patent document with application number CN202211315512.8 discloses a three-dimensional scene reconstruction method, device, apparatus and storage medium, where the method determines a terrain layer according to j target images; carrying out semantic recognition and pixel depth estimation on at least one target image to obtain a semantic object; taking a semantic object meeting a first preset condition as a first semantic object, and determining a vector outline of the first semantic object on a terrain layer according to camera parameters; taking the semantic object meeting the second preset condition as a second semantic object, and determining the modeling position of the second semantic object on the terrain layer according to the camera parameters; and carrying out three-dimensional reconstruction according to the vector outline, the modeling position and the terrain layer to obtain a three-dimensional reconstruction model of the outdoor scene. The three-dimensional reconstruction efficiency of the outdoor scene is improved by automatically generating the terrain layer, the first semantic object and the second semantic object, and automatically extracting the vector outline of the first semantic object on the terrain layer and the modeling position of the second semantic object on the terrain layer. The method is mainly used for outdoor scenes, the topographic structure of the three-dimensional scene environment is determined through the image contour lamp information, and as the method uses the image contour information, the universality of the use features is not strong, and the method is difficult to apply to different scenes.

The prior art also discloses a three-dimensional scene reconstruction, online home decoration and commodity acquisition method, equipment and medium, for example, in the Chinese patent literature with the application number of CN 202211599959.2. According to the method, based on the image corresponding to the first space object, 2D visual information such as boundary lines, vanishing points and the like existing in the image is detected, based on the 2D visual information, constraint relations existing between the boundary lines and between the vanishing points are combined, 3D structure information corresponding to the first space is obtained, namely a target three-dimensional model, real scene information corresponding to the first space object can be reflected in the target three-dimensional model, application effects based on the three-dimensional model, such as home decoration matching effects based on the three-dimensional model, can be improved. Because the method mainly aims at indoor building environments, uses geometric structure information to carry out three-dimensional reconstruction, has strong scene pertinence and is difficult to use in different environments.

Disclosure of Invention

The invention aims to provide an indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion, which uses visual data, IMU data and radar data to stably and effectively establish a high-precision indoor scene global three-dimensional map.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

an indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion, which comprises the following steps:

collecting radar data, IMU data and vision data;

obtaining an IMU odometer according to the IMU data;

correcting radar data by using the IMU data, and obtaining a radar odometer according to the adjacent corrected radar data;

taking the IMU odometer as an initialization parameter, obtaining a visual initial pose according to visual data, and introducing a historical global three-dimensional map to optimize the visual initial pose to obtain the visual odometer;

and extracting dense Ping Miandian cloud according to the radar data and the visual data, converting the point cloud and dense Ping Miandian cloud of the radar data to obtain a three-dimensional model, determining pixels of each point cloud in the three-dimensional model in the visual data based on the matching radar odometer and the visual odometer, and mapping pixel values of corresponding pixels to the three-dimensional model to obtain a new global three-dimensional map.

The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.

Preferably, the obtaining the IMU odometer according to the IMU data includes:

and obtaining the IMU milestones at preset moments by using a pre-integration algorithm based on the IMU milestones corresponding to the radar milestones of the previous frame or the IMU milestones corresponding to the vision milestones of the previous frame.

Preferably, the calculating method includes:

integrating the radar odometer of the previous frame and the IMU odometer at the corresponding moment obtained through a pre-integration algorithm to obtain the IMU odometer corresponding to the radar odometer of the previous frame;

or fusing the last frame of visual odometer and the IMU odometer at the corresponding moment obtained by the pre-integration algorithm to obtain the IMU odometer corresponding to the last frame of visual odometer.

Preferably, the correcting the radar data using the IMU data includes:

and carrying out time integration on each point in the radar data according to the IMU data to obtain the relative pose change from each point to the ending time of the radar data, and projecting each point in the radar data to the coordinate system of the ending time based on the relative pose change to finish the correction of the radar data.

Preferably, the obtaining a radar odometer according to the adjacent corrected radar data includes:

extracting characteristics of the corrected radar data;

matching the characteristics of the radar data of the current frame and the radar data of the previous frame, and taking the successfully matched points;

and executing a point-to-plane residual calculation algorithm based on the point successfully matched in the current frame and the plane characteristics extracted from the radar data in the previous frame, taking the calculation result of the point-to-plane residual calculation algorithm as an observation value, taking the IMU odometer as a predicted value, and performing iterative calculation by using an iterative Kalman filtering algorithm until convergence to obtain a final optimized pose result as a radar odometer.

Preferably, the obtaining the visual initial pose according to the visual data by using the IMU odometer as an initialization parameter includes:

acquiring a current frame image and a previous frame image, and extracting characteristic points of the two frames of images;

searching corresponding feature points in the current frame image according to the positions of the tracked feature points in the previous frame image;

matching the feature point of the previous frame with the feature point searched in the current frame image, taking the feature point successfully matched and updating the position information;

and taking the IMU odometer as an initial value, and solving the PNP problem through a BA optimization algorithm according to the successfully matched characteristic points to obtain the visual initial pose.

Preferably, the optimizing the initial visual pose by introducing the historical global three-dimensional map to obtain the visual odometer comprises the following steps:

acquiring color information corresponding to the initial visual pose;

acquiring color information of a corresponding position on a historical global three-dimensional map according to the visual initial pose;

and obtaining the visual odometer by adopting a minimum luminosity error algorithm according to the acquired two color information.

Preferably, the extracting Ping Miandian cloud according to the radar data and the visual data includes:

dividing the regions belonging to the same plane in the visual data into the same category by using an image semantic division algorithm;

finding points belonging to the same plane in radar data by using a point cloud plane detection algorithm, and taking corresponding pixel values of the points belonging to the same plane in image data of corresponding categories;

for points belonging to the same plane in radar data, using an interpolation algorithm to convert the sparse planar point cloud into a dense planar point cloud.

Preferably, the transforming the point cloud and the dense Ping Miandian cloud of the radar data to obtain a three-dimensional model includes: and converting all points in the point cloud and the dense Ping Miandian cloud of the radar data into a global coordinate system to obtain a three-dimensional model.

Compared with the prior art, the indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion has the following beneficial effects:

(1) By fusing visual data, IMU data and radar data, environment perception with higher accuracy and robustness is realized. The present invention can more effectively address challenges in different environments and conditions than methods that use only a single type of sensor.

(2) The method has the advantages that a plane point cloud extraction mode is used, sparse laser radar point clouds are not needed, and a dense three-dimensional model can be reconstructed; and the unified three-dimensional map is used, so that the consistency of the global three-dimensional map is ensured. In a large-scale scene, the invention can reduce accumulated errors and improve the accuracy of the reconstruction result; meanwhile, the method is also beneficial to improving the utilization rate of data, and compared with the existing algorithm, the method uses less acquired data and can reconstruct a dense three-dimensional scene model.

(3) According to the method, the three-dimensional model is generated according to the Ping Miandian cloud, and the texture is mapped to the model, so that the three-dimensional reconstruction result has richer visual information, and the user experience and the practical application value are improved.

The second object of the present invention is to provide a three-dimensional reconstruction device for dense indoor scenes based on multi-sensor fusion, which comprises a processor and a memory storing a plurality of computer instructions, wherein the computer instructions realize the steps of the three-dimensional reconstruction method for dense indoor scenes based on multi-sensor fusion when being executed by the processor.

Drawings

Fig. 1 is a flowchart of an indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In order to overcome the problems in the prior art, the invention provides a three-dimensional reconstruction method based on visual data, IMU data and radar data, which realizes three-dimensional scene reconstruction with higher precision and robustness. Since the point cloud of a general lidar is sparse, a high-resolution lidar is often expensive. In order to improve the accuracy and the consistency of the three-dimensional reconstruction of the common laser radar, the invention provides a three-dimensional reconstruction algorithm based on plane detection, and the higher reconstruction accuracy can be achieved by using a small amount of data.

As shown in fig. 1, the core idea of the method provided by the invention is to realize environment sensing with higher precision and robustness by directly and indirectly fusing the data of three sensors. Specifically, the invention comprises the following steps:

(1) Calibrating a sensor: and calibrating the vision-IMU and the laser radar-IMU respectively, and calculating the external parameter matrix of the vision camera and the laser radar relative to the IMU by taking the IMU coordinate system as a reference.

(2) Visual data acquisition: image data of the environment is acquired by a stereoscopic camera.

(3) IMU data acquisition: motion sensor data including acceleration, angular velocity, magnetic field strength, and the like is acquired by the IMU.

(4) Radar data acquisition: and acquiring point cloud data of the environment, including three-dimensional coordinates and reflectivity information, through a laser radar.

(5) Odometer estimation: the laser radar-based odometer, the vision-based odometer and the IMU-based odometer output results are obtained by processing the radar data, the vision data and the IMU data, three-dimensional reconstruction is carried out on the multi-type odometer based on the target, and the obtained map has higher accuracy and stability.

(5.1) for IMU odometry: and obtaining the IMU odometer according to the IMU data.

Forward propagation: and obtaining state estimation at different moments by using a pre-integration algorithm. Specifically, the IMU odometer at each preset moment is obtained by using a pre-integration algorithm based on the IMU odometer corresponding to the radar odometer of the previous frame or the IMU odometer corresponding to the vision odometer of the previous frame. Each instant is herein understood as a particular instant required in the subsequent calculation process.

The precision is improved: the method and the device integrate the state estimation results of the visual odometer and the radar odometer, and improve the precision of IMU state estimation. The method comprises the steps of fusing a radar odometer of a previous frame with an IMU odometer of a corresponding moment obtained through a pre-integration algorithm to obtain an IMU odometer corresponding to the radar odometer of the previous frame; or fusing the last frame of visual odometer and the IMU odometer at the corresponding moment obtained by the pre-integration algorithm to obtain the IMU odometer corresponding to the last frame of visual odometer.

The present embodiment optimizes the IMU odometer of each frame to improve the accuracy of the state estimation value obtained by forward propagation, where the fusion algorithm may be, for example, a kalman filter algorithm, or other fusion algorithm. And in the IMU odometer at the corresponding time obtained through the pre-integration algorithm, the pre-integration algorithm is executed based on the data of the last frame, and the corresponding time is the time at which the radar odometer of the last frame is located or the time at which the vision odometer of the last frame is located.

(5.2) for radar odometry: and correcting the radar data by using the IMU data, and obtaining the radar odometer according to the adjacent corrected radar data.

Data preprocessing: the raw radar data collected by the lidar is preprocessed to eliminate noise and unwanted data. The preprocessing method can comprise filtering, removing too far or too near points, and the like.

Motion compensation correction: in the back propagation algorithm, time integration is carried out on each point in the radar data according to the IMU data, so that the relative pose change from each point to the ending time of the radar data of the current frame is obtained, and each point in the radar data is projected onto a coordinate system of the ending time based on the relative pose change, so that the correction of the radar data is completed. The estimated relative attitude change can project all points to the scanning end time according to the accurate sampling time of each single point in the scanning, so that the quality of the point cloud of the laser radar in one scanning period is improved.

Feature extraction: features are extracted from the lidar data. For 3D point cloud data, common features include planes, corner points, edges, etc. The feature extraction method comprises line segment extraction (such as line segment matching, split-and-Merge algorithm), plane extraction (such as RANSAC algorithm), key point extraction (such as ISS (Intrinsic Shape Signatures) algorithm and Harris 3D algorithm), etc.

Feature matching: and matching the characteristics in the radar data of the current frame and the previous frame. Descriptors (e.g., PFH, FPFH, SHOT, etc.) may be used to describe features and used for feature matching. In order to improve the accuracy of matching, a robust algorithm such as RANSAC can be adopted to remove wrong matching pairs, so that a successfully matched point is obtained.

State estimation: based on the point successfully matched in the current frame and the plane feature extracted from the radar data of the previous frame, a point-to-plane residual calculation algorithm is executed (in the point-to-plane residual calculation algorithm, a point-to-plane residual is defined first, then an LM algorithm is used for minimizing a residual vector to obtain a calculation result, the calculation process is a conventional technology, details are not needed in the embodiment), the calculation result of the point-to-plane residual calculation algorithm is used as an observation value, an IMU odometer is used as a prediction value, and an iterative Kalman filtering algorithm is used for iterative calculation until convergence, so that the finally optimized pose result is obtained as a radar odometer.

(5.3) for visual odometer: and taking the IMU odometer as an initialization parameter, obtaining a visual initial pose according to visual data, and introducing a historical global three-dimensional map to optimize the visual initial pose to obtain the visual odometer.

Initializing: and calibrating the vision camera in advance to obtain the camera internal reference matrix. Thereafter, the visual odometer is initialized, two frames of images are captured from the visual camera, and the images are prepared by some preprocessing operation (e.g., noise reduction, de-distortion).

Feature extraction: in each frame of image, feature points (such as corner points, edges, etc.) are extracted. Various feature extraction algorithms may be used, such as SIFT algorithm, SURF algorithm, ORB algorithm, FAST algorithm, etc.

Searching the characteristic points: in the current frame image, corresponding feature points are searched according to the positions of the tracked feature points in the previous frame image. Common feature point searching methods include template matching, kalman filtering, and the like.

Feature point matching: and matching the characteristic point of the previous frame with the characteristic point searched in the current frame image. The feature descriptors can be used to calculate the similarity between feature points and select the best match. In order to improve the accuracy of matching, a robust algorithm such as RANSAC can be adopted to reject false matching.

Updating the characteristic points: and updating the position of the successfully matched feature point in the current frame as the position in the current frame image. For feature points that fail tracking, they can be removed from the tracking list and new feature points re-detected if needed.

State estimation: and taking the IMU odometer as an initial value of optimization, solving the PNP problem through a BA optimization algorithm according to the successfully matched characteristic points to obtain the visual initial pose, wherein the specific solving process is a conventional technology, and the embodiment is not repeated.

Optimizing based on a luminosity consistency algorithm: and finally, according to the established global color map, using a light consistency algorithm to improve the precision of state estimation. The method is specifically realized based on two pieces of color information, the corresponding color information of the visual initial pose in the current frame image is obtained, the color information of the corresponding position on the historical global three-dimensional map is obtained according to the visual initial pose, and the visual odometer is obtained by adopting a minimum luminosity error algorithm according to the two pieces of obtained color information.

(6) Dense point cloud generation based on a plane detection algorithm: dividing the regions belonging to the same plane in the visual data into the same category by using an image semantic division algorithm (such as FCN algorithm, U-net algorithm, segNet algorithm, deep Lab algorithm and the like); and (3) finding out points belonging to the same plane in the radar data by using a point cloud plane detection algorithm (such as a Hough Tranform algorithm and the like), and taking corresponding pixel values of the points belonging to the same plane in the image data of the corresponding class. In the embodiment, two detected plane areas are fused, so that the detection precision is improved; and then, aiming at points belonging to the same plane in radar data, obtaining coordinates of other points in space by using an interpolation algorithm, so that sparse planar point cloud is converted into dense planar point cloud, and the problem of cavity generation caused by sparse laser radar scanning point cloud is solved.

(7) Building a global three-dimensional map: and according to state estimation at different moments, converting the dense plane point cloud obtained in the last step and the point cloud in the radar data into a global coordinate system to obtain a three-dimensional model, and carrying out downsampling and smoothing on the three-dimensional model. And meanwhile, determining pixels of each point cloud in the three-dimensional model in visual data based on the matching radar odometer and the visual odometer, and mapping pixel values of corresponding pixels to the three-dimensional model to obtain a new global three-dimensional map.

According to the embodiment, the positions and the postures of the radar and the vision are calculated respectively, and then RGB information of pixels corresponding to the point cloud in the vision data is found according to the positions and the postures of the two data collection, so that the newly added point cloud is colored. The three-dimensional reconstruction result has richer visual information, and is beneficial to improving user experience and practical application value.

The embodiment is based on direct fusion and indirect fusion of multi-sensor data, and remarkably improves the accuracy and stability of three-dimensional reconstruction, and is specifically stated as follows:

multisensor data direct fusion: in the forward propagation process of the IMU, rough state estimation at different moments is obtained through pre-integration, and then the estimated state quantity is sent to backward propagation in a radar mileage calculation method for compensating radar point cloud distortion caused by motion. In the point-to-plane residual calculation algorithm of the radar odometer and the solution of the PNP problem of the visual odometer through the BA optimization algorithm, the two algorithm results and the state estimation result based on the IMU are optimally fused, so that accurate state estimation is obtained.

Multisensor data indirect fusion: a global three-dimensional map is maintained based on a visual odometer and a laser radar odometer algorithm. The three-dimensional model is constructed by utilizing the laser radar data and the visual data, the point cloud is colored by utilizing the visual data, the visual data is directly and effectively fused by minimizing the luminosity error from the frame to the map, and the plurality of sensor data are indirectly fused.

In another embodiment, the invention also provides an indoor scene dense three-dimensional reconstruction device based on multi-sensor fusion, which comprises a processor and a memory storing a plurality of computer instructions, wherein the computer instructions realize the steps of the indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion when being executed by the processor.

For specific limitation of the indoor scene dense three-dimensional reconstruction device based on multi-sensor fusion, reference may be made to the limitation of the indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion hereinabove, and the description thereof will not be repeated here.

The memory and the processor are electrically connected directly or indirectly to each other for data transmission or interaction. For example, the components may be electrically connected to each other by one or more communication buses or signal lines. The memory stores a computer program executable on a processor that implements the method of the embodiments of the present invention by running the computer program stored in the memory.

The Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory is used for storing a program, and the processor executes the program after receiving an execution instruction.

The processor may be an integrated circuit chip having data processing capabilities. The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), and the like. The methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims

1. The indoor scene dense three-dimensional reconstruction method based on the multi-sensor fusion is characterized by comprising the following steps of:

collecting radar data, IMU data and vision data;

obtaining an IMU odometer according to the IMU data;

2. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the obtaining an IMU odometer according to IMU data comprises:

3. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 2, wherein the calculating method comprises the following steps of:

4. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the correcting radar data using IMU data comprises:

5. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the step of obtaining a radar odometer from adjacent corrected radar data comprises:

extracting characteristics of the corrected radar data;

6. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the obtaining the visual initial pose according to the visual data by using the IMU odometer as an initialization parameter comprises:

7. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the optimizing the initial visual pose by introducing the historical global three-dimensional map to obtain the visual odometer comprises the following steps:

acquiring color information corresponding to the initial visual pose;

8. The multi-sensor fusion-based indoor scene dense three-dimensional reconstruction method of claim 1, wherein the extracting Ping Miandian cloud from radar data and visual data comprises:

9. The method for dense three-dimensional reconstruction of an indoor scene based on multi-sensor fusion according to claim 1, wherein the converting the point cloud and the dense Ping Miandian cloud of radar data to obtain a three-dimensional model comprises: and converting all points in the point cloud and the dense Ping Miandian cloud of the radar data into a global coordinate system to obtain a three-dimensional model.

10. An indoor scene dense three-dimensional reconstruction device based on multi-sensor fusion, comprising a processor and a memory storing a plurality of computer instructions, wherein the computer instructions, when executed by the processor, implement the steps of the indoor scene dense three-dimensional reconstruction method based on multi-sensor fusion as claimed in any one of claims 1 to 9.