CN116310120A

CN116310120A - Multi-view 3D reconstruction method, device, equipment and storage medium

Info

Publication number: CN116310120A
Application number: CN202310271277.7A
Authority: CN
Inventors: 赵敏达; 李林橙; 张永强; 刘柏; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-06-23
Anticipated expiration: 2043-03-14
Also published as: CN116310120B

Abstract

The application provides a multi-view three-dimensional reconstruction method, device, equipment and storage medium, and relates to the technical field of three-dimensional reconstruction. The method comprises the following steps: acquiring two-dimensional images of a target object under a plurality of view angles; respectively determining point cloud data under a plurality of view angles according to the two-dimensional images under the plurality of view angles; sampling point clouds in a preset voxel space in point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points; determining a three-dimensional surface of the target object according to the target voxel point; determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle; determining the surface color of the target object based on each intersection point interval by adopting a color network; the target object is rendered based on the surface color. Compared with the prior art, a large amount of redundant and invalid calculation is avoided, and the efficiency and the accuracy of three-dimensional reconstruction are improved.

Description

Multi-view 3D reconstruction method, device, equipment and storage medium

技术领域technical field

本申请涉及三维重建技术领域，具体而言，涉及一种多视角三维重建方法、装置、设备及存储介质。The present application relates to the technical field of three-dimensional reconstruction, and in particular, relates to a multi-view three-dimensional reconstruction method, device, equipment, and storage medium.

背景技术Background technique

多视角三维重建是一种利用从不同视角拍摄待重建物体的图片进行三维(3D)物体形状重建的技术。近些年来，随着3D游戏、建筑设计、虚拟现实的发展，对真实世界的物体进行三维重建的需求日趋增加。当前基于深度学习的三维重建的发展较快。Multi-view 3D reconstruction is a technique for reconstructing the shape of a three-dimensional (3D) object by taking pictures of the object to be reconstructed from different perspectives. In recent years, with the development of 3D games, architectural design, and virtual reality, the demand for 3D reconstruction of real-world objects is increasing. At present, the development of 3D reconstruction based on deep learning is rapid.

现有技术一般通过使用符号距离函数(Signed Distance Function，SDF)的神经表面重建方法。该方法依据运动恢复结构(Structure From Motion，SFM)方法计算的稀疏点云获取物体的初始边界框(BBox)。基于相机位姿估计结果设计观察的采样射线，并在BBox内部进行有效采样。对于每个目标体素点，通过多层感知器(Multilayer Perceptron，MLP)预测SDF值，以及在该目标体素点处的颜色。然后通过Logistic概率密度分布函数，把SDF值转为体密度值，结合每个目标体素点的预测颜色，利用神经体渲染方法来训练SDF表示。The prior art generally uses a neural surface reconstruction method using a signed distance function (Signed Distance Function, SDF). This method obtains the initial bounding box (BBox) of the object based on the sparse point cloud calculated by the Structure From Motion (SFM) method. Based on the camera pose estimation results, the observed sampling ray is designed and effectively sampled inside the BBox. For each target voxel point, the SDF value and the color at the target voxel point are predicted by a multilayer perceptron (MLP). Then, through the Logistic probability density distribution function, the SDF value is converted into a volume density value, combined with the predicted color of each target voxel point, and the neural volume rendering method is used to train the SDF representation.

但是这样的重建方式由于实际待重建物体通过只占据BBox中的一部分，因此基于整个BBox的射线采样存在大量冗余与无效计算，从而降低了优化的效率，导致最终重建效果的下降。However, in this reconstruction method, because the actual object to be reconstructed only occupies a part of the BBox, there are a lot of redundancy and invalid calculations based on the ray sampling of the entire BBox, which reduces the efficiency of optimization and leads to a decline in the final reconstruction effect.

发明内容Contents of the invention

本申请的目的在于，针对上述现有技术中的不足，提供一种多视角三维重建方法、装置、设备及存储介质，以解决现有技术中存在大量冗余与无效计算，降低了优化效率导致重建效果下降的问题。The purpose of the present application is to provide a multi-view 3D reconstruction method, device, equipment and storage medium for the deficiencies in the above-mentioned prior art, so as to solve the problem of a large number of redundant and invalid calculations in the prior art, which reduces the optimization efficiency and leads to A problem with reduced rebuild performance.

为实现上述目的，本申请实施例采用的技术方案如下：In order to achieve the above purpose, the technical solution adopted in the embodiment of the present application is as follows:

第一方面，本申请一实施例提供了一种多视角三维重建方法，所述方法包括：In the first aspect, an embodiment of the present application provides a multi-view 3D reconstruction method, the method comprising:

获取目标物体在多个视角下的二维图像；Acquire two-dimensional images of the target object under multiple viewing angles;

根据所述多个视角下的二维图像，分别确定所述多个视角下的点云数据；According to the two-dimensional images under the multiple viewing angles, respectively determine the point cloud data under the multiple viewing angles;

基于预设体素空间，对多个视角下的点云数据中在所述预设体素空间中的点云进行采样，得到目标体素点；Based on the preset voxel space, sampling the point cloud in the preset voxel space in the point cloud data under multiple viewing angles to obtain the target voxel point;

根据所述目标体素点确定所述目标物体的三维表面；determining a three-dimensional surface of the target object according to the target voxel points;

根据各所述视角下的相机位姿，确定每条采样射线与所述三维表面的交点区间；Determining the intersection interval between each sampling ray and the three-dimensional surface according to the camera pose under each of the viewing angles;

采用颜色网络基于各所述交点区间，确定所述目标物体的表面颜色；Using a color network to determine the surface color of the target object based on each of the intersection intervals;

基于所述表面颜色，对所述目标物体进行渲染。Render the target object based on the surface color.

第二方面，本申请另一实施例提供了一种多视角三维重建装置，所述装置包括：获取模块、确定模块、采样模块和渲染模块，其中：In the second aspect, another embodiment of the present application provides a multi-view 3D reconstruction device, the device includes: an acquisition module, a determination module, a sampling module and a rendering module, wherein:

所述获取模块，用于获取目标物体在多个视角下的二维图像；The acquiring module is configured to acquire two-dimensional images of the target object under multiple viewing angles;

所述确定模块，用于根据所述多个视角下的二维图像，分别确定所述多个视角下的点云数据；The determining module is configured to respectively determine point cloud data under the multiple viewing angles according to the two-dimensional images under the multiple viewing angles;

所述采样模块，用于基于预设体素空间，对多个视角下的点云数据中在所述预设体素空间中的点云进行采样，得到目标体素点；The sampling module is configured to, based on a preset voxel space, sample point clouds in the preset voxel space in the point cloud data under multiple viewing angles to obtain target voxel points;

所述确定模块，具体用于根据所述目标体素点确定所述目标物体的三维表面；根据各所述视角下的相机位姿，确定每条采样射线与所述三维表面的交点区间；采用颜色网络基于各所述交点区间，确定所述目标物体的表面颜色；The determining module is specifically configured to determine the three-dimensional surface of the target object according to the target voxel points; determine the intersection interval between each sampling ray and the three-dimensional surface according to the camera poses under each of the viewing angles; The color network determines the surface color of the target object based on each of the intersection intervals;

所述渲染模块，用于基于所述表面颜色，对所述目标物体进行渲染。The rendering module is configured to render the target object based on the surface color.

第三方面，本申请另一实施例提供了一种多视角三维重建设备，包括：处理器、存储介质和总线，所述存储介质存储有所述处理器可执行的机器可读指令，当多视角三维重建设备运行时，所述处理器与所述存储介质之间通过总线通信，所述处理器执行所述机器可读指令，以执行如上述第一方面任一所述方法的步骤。In the third aspect, another embodiment of the present application provides a multi-view 3D reconstruction device, including: a processor, a storage medium, and a bus, and the storage medium stores machine-readable instructions executable by the processor. When the perspective three-dimensional reconstruction device is running, the processor communicates with the storage medium through a bus, and the processor executes the machine-readable instructions to perform the steps of any one of the methods described in the first aspect above.

第四方面，本申请另一实施例提供了一种存储介质，所述存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行如上述第一方面任一所述方法的步骤。In a fourth aspect, another embodiment of the present application provides a storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the method described in any one of the above-mentioned first aspects are executed.

本申请的有益效果是：采用本申请提供的多视角三维重建方法，获取目标物体在多个视角下的二维图像，随后基于多个二维图像分别确定多个视角下的点云数据，并基于预设体素空间，对在预设体素空间中的点云进行采样，确定目标体素点，基于目标体素点确定目标物体的三维表面，随后，在基于采样射线确定目标物体的表面颜色时，首先确定每条采样射线与三维表面的交点区间，随后仅基于颜色网络确定焦点区间内的每条采样射线的颜色，从而确定目标物体的表面颜色，这样的确定方式可以缩小采样射线的采样范围，从而提高确定目标物体表面颜色的效率，以及提高目标物体的渲染效率。The beneficial effect of the present application is: adopt the multi-view three-dimensional reconstruction method provided by the present application to obtain two-dimensional images of the target object under multiple viewing angles, and then determine the point cloud data under multiple viewing angles based on the multiple two-dimensional images, and Based on the preset voxel space, sample the point cloud in the preset voxel space, determine the target voxel point, determine the three-dimensional surface of the target object based on the target voxel point, and then determine the surface of the target object based on the sampling ray When using color, first determine the intersection interval between each sampling ray and the three-dimensional surface, and then determine the color of each sampling ray in the focus area based on the color network, thereby determining the surface color of the target object. This method of determination can reduce the size of the sampling ray. Sampling range, so as to improve the efficiency of determining the surface color of the target object, and improve the rendering efficiency of the target object.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that are required in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application, and thus It should be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings based on these drawings without creative work.

图1为本申请一实施例提供的多视角三维重建方法的流程示意图；FIG. 1 is a schematic flowchart of a multi-view three-dimensional reconstruction method provided by an embodiment of the present application;

图2为本申请另一实施例提供的多视角三维重建方法的流程示意图；FIG. 2 is a schematic flowchart of a multi-view 3D reconstruction method provided by another embodiment of the present application;

图3为本申请另一实施例提供的多视角三维重建方法的流程示意图；FIG. 3 is a schematic flowchart of a multi-view three-dimensional reconstruction method provided by another embodiment of the present application;

图4为本申请另一实施例提供的多视角三维重建方法的流程示意图；FIG. 4 is a schematic flowchart of a multi-view 3D reconstruction method provided by another embodiment of the present application;

图5为本申请一实施例提供的多视角三维重建装置的结构示意图；FIG. 5 is a schematic structural diagram of a multi-view 3D reconstruction device provided by an embodiment of the present application;

图6为本申请另一实施例提供的多视角三维重建装置的结构示意图；FIG. 6 is a schematic structural diagram of a multi-view 3D reconstruction device provided by another embodiment of the present application;

图7为本申请一实施例提供的多视角三维重建设备的结构示意图。FIG. 7 is a schematic structural diagram of a multi-view three-dimensional reconstruction device provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them.

通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of the present application.

另外，本申请中使用的流程图示出了根据本申请的一些实施例实现的操作。应该理解，流程图的操作可以不按顺序实现，没有逻辑的上下文关系的步骤可以反转顺序或者同时实施。此外，本领域技术人员在本申请内容的指引下，可以向流程图添加一个或多个其他操作，也可以从流程图中移除一个或多个操作。Additionally, the flowcharts used in this application illustrate operations implemented in accordance with some embodiments of the application. It should be understood that the operations of the flowcharts may be performed out of order, and steps that have no logical context may be performed in reverse order or concurrently. In addition, those skilled in the art may add one or more other operations to the flowchart or remove one or more operations from the flowchart under the guidance of the content of the present application.

为方便对本申请的理解，下述对本申请设计的部分名词进行解释说明：For the convenience of understanding the application, some nouns designed in the application are explained below:

显著性物体检测(salient object detection，SOD)：旨在区分视觉上最明显的区域。利用传统计算机视觉技术或者基于数据驱动的深度学习算法标记图像中属于显著性前景的部分。Salient object detection (SOD): Aims to distinguish the most visually salient regions. Use traditional computer vision techniques or data-driven deep learning algorithms to mark parts of the image that are salient foregrounds.

符号距离函数(Signed Distance Function，SDF)三维物体表面的隐式表达，在空间中的一个有限区域上确定一个点到物体表面的距离并同时对距离的符号进行定义：点在物体内部为正，外部为负，位于表面上时为0。可借助多层感知器(MLP)网络来隐式地表征SDF。The implicit expression of the signed distance function (Signed Distance Function, SDF) three-dimensional object surface determines the distance from a point to the object surface on a limited area in space and defines the sign of the distance at the same time: the point is positive inside the object, Negative outside, 0 when on the surface. SDFs can be implicitly represented by means of a multi-layer perceptron (MLP) network.

边界框(BoundingBox，BBox)，3D空间中指包围物体的最小边界框。Bounding Box (BoundingBox, BBox), refers to the smallest bounding box surrounding an object in 3D space.

运动恢复结构(Structure From Motion，SFM)是从一系列包含视觉运动信息的多幅二维图像中估计三维结构的技术，采用SFM可以得到每个视角的相机参数以及稀疏点云。Structure From Motion (SFM) is a technique for estimating three-dimensional structures from a series of multiple two-dimensional images containing visual motion information. Using SFM, camera parameters and sparse point clouds of each view can be obtained.

Marching Cubes：表面提取算法，又称表面绘制算法，是一种用于在体积数据中提取等值面的算法。核心概念是在设定好的三维空间及顶点标量值后，通过比较顶点和用户指定阈值时，确定体素的哪些边与等值面相交，创建三角贴片，并连接等值面边界上所有立方体的面片，以此得到一个表面。Marching Cubes: Surface extraction algorithm, also known as surface rendering algorithm, is an algorithm for extracting isosurfaces in volume data. The core concept is to determine which sides of the voxel intersect with the isosurface by comparing the vertices with the user-specified threshold after setting the 3D space and vertex scalar values, create triangular patches, and connect the isosurface boundaries All the facets of the cube to get a surface.

本申请实施例所提供的一种多视角三维重建方法方法可以由终端或服务器执行。终端是具有计算硬件的任何设备，该计算硬件能够支持和执行对应的软件产品。如下结合多个具体的应用示例，对本申请实施例所提供的一种多视角三维重建方法进行解释说明。图1为本申请一实施例提供的一种多视角三维重建方法的流程示意图，如图1所示，该方法包括：A method for multi-view 3D reconstruction provided in an embodiment of the present application may be executed by a terminal or a server. A terminal is any device having computing hardware capable of supporting and executing a corresponding software product. A multi-view three-dimensional reconstruction method provided in the embodiment of the present application is explained below in conjunction with multiple specific application examples. Fig. 1 is a schematic flowchart of a multi-view 3D reconstruction method provided by an embodiment of the present application. As shown in Fig. 1, the method includes:

S101：获取目标物体在多个视角下的二维图像。S101: Acquire two-dimensional images of a target object under multiple viewing angles.

在本申请的实施例中，在一些可能的实施例中，多个视角例如可以为50-80个视角，应当理解，上述实施例仅为示例性说明，具体多个视角下的二维图像仅为需要为针对目标物体进行360°环绕后的视角多个视角图像即可，也可以为小于50张的任意整数张，或大于80张的任意整数张，具体多个视角的具体个数限制可以根据用户需要灵活调整，并不以上述实施例给出的为限。In the embodiments of the present application, in some possible embodiments, the multiple viewing angles may be, for example, 50-80 viewing angles. Multiple viewing angle images are sufficient for the 360° surround view of the target object, or any integer number less than 50, or any integer number greater than 80. The specific number of multiple viewing angles can be limited. It can be flexibly adjusted according to user needs, and is not limited to those given in the above-mentioned embodiments.

在一些可能的实施例中，获取多个视角下的二维图像的方式例如可以为：获取预设相机在多个视角下对真实环境的目标物体进行拍摄得到的多个二维图像，该多个二维图像分别对应多个视角。In some possible embodiments, the manner of acquiring two-dimensional images under multiple viewing angles may be, for example: acquiring multiple two-dimensional images obtained by shooting a target object in a real environment with a preset camera under multiple viewing angles, the multiple Each two-dimensional image corresponds to multiple viewing angles.

S102：根据多个视角下的二维图像，分别确定多个视角下的点云数据。S102: According to the two-dimensional images under multiple viewing angles, respectively determine point cloud data under multiple viewing angles.

在一些可能的实施例中，例如可以对多个二维图像分别进行估计，得到多个视角下的三维点云数据，其中，每个视角下的三维点云例如可以为稀疏点云。示例的，可以采用预设的SFM网络对每个视角下的二维图像进行估计，得到每个视角下的三维点云数据，以及多个视角下的相机位姿。In some possible embodiments, for example, multiple two-dimensional images may be estimated separately to obtain three-dimensional point cloud data under multiple viewing angles, wherein the three-dimensional point cloud under each viewing angle may be, for example, a sparse point cloud. As an example, the preset SFM network can be used to estimate the 2D image at each viewing angle, to obtain the 3D point cloud data at each viewing angle, and the camera poses at multiple viewing angles.

S103：基于预设体素空间，对多个视角下的点云数据中在预设体素空间中的点云进行采样，得到目标体素点。S103: Based on the preset voxel space, sampling point clouds in the preset voxel space in the point cloud data under multiple viewing angles to obtain target voxel points.

在本申请的一个实施例中，S103之前例如可以根据多个视角下的点云数据，确定目标物体的边界框的信息，其中，目标物体的边界框的信息即为包围目标物体的最小边界框的信息。In one embodiment of the present application, before S103, for example, according to the point cloud data under multiple viewing angles, the information of the bounding box of the target object can be determined, wherein the information of the bounding box of the target object is the minimum bounding box surrounding the target object Information.

确定目标物体的边界框的方式例如可以为：可以根据每个视角下的三维点云数据，对每个视角下的相机位姿进行校准，得到每个视角下的目标相机位姿，并根据更新后的目标相机位姿对目标物体进行边界框的估计，从而得到目标物体的边界框信息；也即本申请实施例中基于校准后的多个视角下的目标相机位姿进行目标物体的边框的估计，得到目标物体的边界框的信息，实现了目标物体的边界框的自动化估计，可保证估计得到的目标物体的边界框更紧致。The method of determining the bounding box of the target object can be, for example, as follows: the camera pose at each view angle can be calibrated according to the 3D point cloud data at each view angle, and the target camera pose at each view angle can be obtained, and updated according to The target camera pose estimates the bounding box of the target object, thereby obtaining the bounding box information of the target object; that is, the frame of the target object is estimated based on the calibrated target camera pose under multiple viewing angles in the embodiment of the present application. Estimation, the information of the bounding box of the target object is obtained, and the automatic estimation of the bounding box of the target object is realized, which can ensure that the estimated bounding box of the target object is more compact.

针对目标物体的边界框的信息，在一些可能的实施例中，也可以采用上述SFM对多个视角的二维图像的估计结果，进行自动化的物体边框估计，从而得到目标物体的边界框的信息。For the information of the bounding box of the target object, in some possible embodiments, the estimation results of the above-mentioned SFM on the two-dimensional images of multiple viewing angles can also be used to perform automatic object frame estimation, so as to obtain the information of the bounding box of the target object .

在本申请的实施例中，在确定目标物体的边界框BBox之后，还需要先将目标物体的边界框在三维空间中的坐标进行归一化，归一化到[-1，1]获取归一化后的点云数据，获取方式如下：In the embodiment of the present application, after determining the bounding box BBox of the target object, it is necessary to first normalize the coordinates of the bounding box of the target object in the three-dimensional space, and normalize to [-1, 1] to obtain the normalized The normalized point cloud data can be obtained as follows:

将上述BBox在三维空间中的坐标归一化到[-1,1]，并计算缩放矩阵为scale_mat∈R^4×4。其中，B∈R³；

i∈(x,y,z)。Normalize the coordinates of the above BBox in the three-dimensional space to [-1,1], and calculate the scaling matrix as scale_mat∈R ^4×4 . Among them, B ∈ R ³ ;

i∈(x,y,z).

R^4×4表示4x4矩阵，R³表示三维向量空间，c_x、c_y、c_z表示在x，y，z方向上的偏移量。B_x，B_y，B_z表示BBOX在x，y，z方向上的坐标值。R ^4×4 represents a 4x4 matrix, R ³ represents a three-dimensional vector space, and c _x , _cy , and c _z represent offsets in the x, y, and z directions. B _x , B _y , B _z represent the coordinate values of the BBOX in the x, y, and z directions.

由于本申请提供的方法中，目标物体的边界框是自动计算估计的，而并非人为估计的，因此，减小了三维重建过程中的应用复杂度。In the method provided by the present application, the bounding box of the target object is calculated and estimated automatically instead of artificially, so the application complexity in the three-dimensional reconstruction process is reduced.

在本申请的实施例中，首先需要对预设体素空间进行初始化，初始化方式例如可以为：根据目标物体的目标精度，设置预设体素空间内采样的体素数量。In the embodiment of the present application, firstly, the preset voxel space needs to be initialized, and the initialization method may be, for example, setting the number of voxels sampled in the preset voxel space according to the target accuracy of the target object.

其中，目标精度表示期望获得的物体的三维模型的精细度，不同的精细度对应的目标精度不同，精细度越高，目标精度越高。举例说明，例如中等精细度约对应在三维空间内按照每个维度下采样100体素的方式进行采样。例如可以基于预设体素空间，根据目标物体边界框的信息，对多个视角下的归一化后的点云数据中在预设体素空间中的点云进行采样，确定多个视角对应的目标体素点。Wherein, the target precision represents the fineness of the three-dimensional model of the object to be obtained, and different finenesses correspond to different target precisions, and the higher the fineness, the higher the target precision. For example, for example, medium fineness corresponds to sampling in a manner of downsampling 100 voxels in each dimension in a three-dimensional space. For example, based on the preset voxel space, according to the information of the bounding box of the target object, the point cloud in the preset voxel space in the normalized point cloud data under multiple viewing angles can be sampled to determine the correspondence between multiple viewing angles. target voxel point.

在三维空间内对预设体素空间进行初始化V∈[-1,1]³，并设置每个维度内采样的体素数量为ρ，其中ρ设置的大小是根据目标物体的目标精度确定的，目标精度越高时，ρ设置的越大，目标精度越低时，ρ设置的越小，则在三维空间中的每个维度内的坐标

定义每个体素属于物体内部的计数数组为O∈[-1,1]³。Initialize the preset voxel space V∈[-1,1] ³ in the three-dimensional space, and set the number of voxels sampled in each dimension to ρ, where the size of ρ is determined according to the target accuracy of the target object , the higher the target accuracy, the larger the ρ setting, the lower the target accuracy, the smaller the ρ setting, the coordinates in each dimension in the three-dimensional space

Define the count array that each voxel belongs to inside the object as O∈[-1,1] ³ .

针对第i个视角，假设当前计算得到的从世界坐标系到当前2D图像的变换矩阵为w2c_mat_i∈R^3×4。对于V∈[-1,1]³中的每一个点(i,j,k),投影到当前2D图像的坐标为

其中横纵坐标表示为/>

p＝(p₀,p₁,p₂)。则/>

For the i-th viewing angle, assume that the currently calculated transformation matrix from the world coordinate system to the current 2D image is w2c_mat _i ∈R ^3×4 . For each point (i,j,k) in V∈[-1,1] ³ , the coordinates projected to the current 2D image are

The horizontal and vertical coordinates are expressed as />

p = (p ₀ , p ₁ , p ₂ ). Then />

R^3×4表示3x4矩阵，p₀,p₁,p₂表示P的x，y，z方向上的分量。O_i,j,k表示计数数组O在三维空间中坐标为(i,j,k)的分量。R ^3×4 represents a 3x4 matrix, and p ₀ , p ₁ , and p ₂ represent the components of P in the x, y, and z directions. O _{i, j, k} represents the component of the count array O whose coordinates are (i, j, k) in the three-dimensional space.

S104：根据目标体素点确定目标物体的三维表面。S104: Determine the three-dimensional surface of the target object according to the target voxel points.

在本申请的一个实施例中，确定目标物体的三维表面的方式例如可以为：采用等值面提取算法基于目标体素点确定目标物体的三维表面。In an embodiment of the present application, the manner of determining the three-dimensional surface of the target object may be, for example, using an isosurface extraction algorithm to determine the three-dimensional surface of the target object based on the target voxel points.

在本申请的实施例中，针对所有视角下的二维图像重复上述确定目标体素点的过程，获取最终的计数数组O。对于O中的每一个位置(i,j,k)，若

则认为该体素点属于物体内部，否则输入物体外部。其中阈值γ∈(0,1)。根据等值面提取Marching cubes算法提取属于物体内部的体素点构成的三维表面(3D凸包)。In the embodiment of the present application, the above-mentioned process of determining the target voxel point is repeated for the two-dimensional images under all viewing angles, and the final count array O is obtained. For each position (i,j,k) in O, if

It is considered that the voxel point belongs to the inside of the object, otherwise it is input outside the object. where the threshold γ∈(0,1). According to the isosurface extraction Marching cubes algorithm, the three-dimensional surface (3D convex hull) composed of voxel points belonging to the interior of the object is extracted.

S105：根据各视角下的相机位姿，确定每条采样射线与三维表面的交点区间。S105: Determine an intersection interval between each sampling ray and the three-dimensional surface according to the camera poses at each viewing angle.

针对每个视角下的相机位姿，计算每条采样射线与上述计算得到的物体3D凸包的表面交点，前后两次相交的交点分别称为近交点与远交点。假设第n个相机原点在世界坐标系下的坐标为o_n,待采样的射线的单位向量为

(假设该射线为经过相机原点和2D图像中(i,j)的直线)，则定义中心点到相机原点的距离为/>

因此可以得到射线与第1步中计算得到的BBox的近交点/>

远交点为

此外如果射线与3D凸包存在交点，并记o_n与3D凸包的第一个交点为

o_n相对于/>

的对称点为/>

并记其与3D凸包的第一个交点为/>

则远交点/>

当射线与3D凸包无交点时，使用射线与BBox的交点作为/>

和/>

即如果射线d_i,j与3D凸包无交点，则使用下式替换：For the camera pose at each viewing angle, calculate the surface intersection point of each sampling ray and the 3D convex hull of the object calculated above. Assuming that the coordinates of the nth camera origin in the world coordinate system are o _n , the unit vector of the ray to be sampled is

(Assuming that the ray is a straight line passing through the origin of the camera and (i,j) in the 2D image), the distance from the center point to the origin of the camera is defined as />

Therefore, the close intersection point of the ray and the BBox calculated in step 1 can be obtained />

far node is

In addition, if there is an intersection point between the ray and the 3D convex hull, and the first intersection point between o _n and the 3D convex hull is

o _n relative to />

The symmetry point of

And record its first intersection with the 3D convex hull as />

then far node />

When the ray has no intersection with the 3D convex hull, use the intersection of the ray and the BBox as />

and />

That is, if the ray d _{i, j} has no intersection with the 3D convex hull, use the following formula to replace:

至此得到每条采样射线的有效近交点

和远交点/>

So far, the effective near intersection point of each sampled ray is obtained

and far node />

如果存在采样射线与目标物体相切的情况，也即该采样射线的第一交点和第二交点相重合。If the sampling ray is tangent to the target object, that is, the first intersection point and the second intersection point of the sampling ray coincide.

S106：采用颜色网络基于各交点区间，确定目标物体的表面颜色。S106: Using the color network to determine the surface color of the target object based on each intersection interval.

其中，颜色网络为预先训练好的网络模型，其使用多层感知机(MultilayerPerceptron，MLP)网络构建，并通过训练得到网络参数，该网络的输出即为空间中各点的预测颜色值。Among them, the color network is a pre-trained network model, which is constructed using a Multilayer Perceptron (MLP) network, and network parameters are obtained through training. The output of the network is the predicted color value of each point in the space.

也即，至此已经得到了目标三维表面的数据以及表面颜色，也即得到了目标物体的三维模型数据，后续可以基于得到的三维模型数据直接对目标物体进行三维世界内的渲染。That is to say, the data and surface color of the 3D surface of the target have been obtained so far, that is, the 3D model data of the target object has been obtained, and the target object can be directly rendered in the 3D world based on the obtained 3D model data.

S107：基于表面颜色，对目标物体进行渲染。S107: Render the target object based on the surface color.

在得到目标物体的三维模型数据之后，可以将三维模型数据进行存储，并根据后续实际的业务需求，通过模型渲染软件打开并加载三维模型数据后，实现对目标物体对应的三维模型的渲染与展示。After obtaining the 3D model data of the target object, the 3D model data can be stored, and according to the subsequent actual business needs, after opening and loading the 3D model data through the model rendering software, the rendering and display of the 3D model corresponding to the target object can be realized .

由于本申请实施例提供的三维重建方法中在进行三维点云中目标体素点的采集的过程中，是通过对多个视角下的多个二维图像进行边界框估计，得到的目标物体的边界框的信息进行的，而并非采用人为估计的边界框的信息进行的，因此，相对于人为估计的边界框，本申请自动估计的方式其可实现紧致的物体边界框的估计，从而可使得三维点云的采样更加精确，继而可使得三维重建得到的三维模型数据更准确，继而提高了三维模型的重建精度和模型细节的完整度。In the 3D reconstruction method provided by the embodiment of the present application, in the process of collecting the target voxel points in the 3D point cloud, the target object is obtained by estimating the bounding boxes of multiple 2D images under multiple viewing angles. Therefore, compared with the artificially estimated bounding box, the automatic estimation method of this application can realize the estimation of the compact object bounding box, so that The sampling of the 3D point cloud is made more accurate, and then the 3D model data obtained by 3D reconstruction can be made more accurate, thereby improving the reconstruction accuracy of the 3D model and the integrity of the model details.

采用本申请提供的多视角三维重建方法，获取目标物体在多个视角下的二维图像，随后基于多个二维图像分别确定多个视角下的点云数据和目标物体的边界框的信息，并基于预设体素空间，对在预设体素空间中的点云进行采样，确定目标体素点，基于目标体素点确定目标物体的三维表面，随后，在基于采样射线确定目标物体的表面颜色时，首先确定每条采样射线与三维表面的交点区间，随后仅基于颜色网络确定焦点区间内的每条采样射线的颜色，从而确定目标物体的表面颜色，这样的确定方式可以缩小采样射线的采样范围，从而提高确定目标物体表面颜色的效率，以及提高目标物体的渲染效率。Using the multi-view 3D reconstruction method provided by the present application, two-dimensional images of the target object under multiple viewing angles are obtained, and then the point cloud data under multiple viewing angles and the information of the bounding box of the target object are respectively determined based on the multiple two-dimensional images, And based on the preset voxel space, the point cloud in the preset voxel space is sampled, the target voxel point is determined, the three-dimensional surface of the target object is determined based on the target voxel point, and then the target object is determined based on the sampling ray For the surface color, first determine the intersection interval between each sampling ray and the three-dimensional surface, and then determine the color of each sampling ray in the focus area based on the color network, thereby determining the surface color of the target object. This method of determination can reduce the sampling ray The sampling range, thereby improving the efficiency of determining the surface color of the target object, and improving the rendering efficiency of the target object.

可选地，在上述实施例的基础上，本申请实施例还可提供一种多视角三维重建方法，如下结合附图对上述方法中确定目标物体的表面颜色的实现过程进行示例说明。图2为本申请另一实施例提供的一种多视角三维重建方法的流程示意图，如图2所示，S106可包括：Optionally, on the basis of the above embodiments, the embodiments of the present application may also provide a multi-view three-dimensional reconstruction method. The implementation process of determining the surface color of the target object in the above method is illustrated as follows with reference to the accompanying drawings. Fig. 2 is a schematic flowchart of a multi-view three-dimensional reconstruction method provided by another embodiment of the present application. As shown in Fig. 2, S106 may include:

S111：根据符号距离函数网络，确定目标体素点的符号距离函数值、法向量以及目标体素点间距。S111: According to the signed distance function network, determine the signed distance function value of the target voxel point, the normal vector, and the distance between the target voxel points.

在本申请的实施例中，针对第n个视角下的采样射线

在/>

和

之间进行3D采样，通过上述SDF网络确定各目标体素点的符号距离函数值、法向量以及目标体素点间距。In the embodiment of this application, for the sampling ray at the nth viewing angle

at />

and

3D sampling is performed between them, and the signed distance function value, normal vector and target voxel point spacing of each target voxel point are determined through the above SDF network.

S112：根据颜色网络，确定沿着每条采样射线的颜色值。S112: Determine the color value along each sampling ray according to the color network.

同样，在确定各采样射线的颜色值时，也是针对第n个视角下的采样射线

在

和/>

之间进行3D采样，通过颜色网络，确定沿着每条采样射线的颜色值。Similarly, when determining the color value of each sampling ray, it is also for the sampling ray at the nth viewing angle

exist

and />

3D sampling is carried out between, and the color value along each sampling ray is determined through the color network.

S113：获得每条采样射线的目标渲染颜色值。S113: Obtain the target rendering color value of each sampling ray.

在本申请的实施例中，确定每条采样射线的目标渲染颜色值的方式例如可以为通过Logistic累积分布函数将加权的SDF值转为体密度值。并根据每条射线上所有采样点的体密度值、颜色值和采样间距，进行体渲染，获得每条射线最终渲染的颜色值。In the embodiment of the present application, the manner of determining the target rendering color value of each sampling ray may be, for example, converting the weighted SDF value into a volume density value through a Logistic cumulative distribution function. And perform volume rendering according to the volume density value, color value and sampling interval of all sampling points on each ray, and obtain the final rendered color value of each ray.

可选地，在上述实施例的基础上，本申请实施例还可提供一种多视角三维重建方法，如下结合附图对上述方法中确定目标体素点的实现过程进行示例说明。图3为本申请另一实施例提供的一种多视角三维重建方法的流程示意图，如图3所示，S103可包括：Optionally, on the basis of the above-mentioned embodiments, the embodiments of the present application may also provide a multi-view three-dimensional reconstruction method, and the implementation process of determining the target voxel point in the above-mentioned method is illustrated as follows with reference to the accompanying drawings. Fig. 3 is a schematic flowchart of a multi-view three-dimensional reconstruction method provided by another embodiment of the present application. As shown in Fig. 3, S103 may include:

S121：确定多个视角下的二维图像中的每个像素点是否为显著性前景。S121: Determine whether each pixel in the two-dimensional image under multiple viewing angles is a salient foreground.

显著性前景为二维图像中具有明确轮廓的物体实例；首先，本申请可以基于显著性前景检测算法进行显著性检测，一般二维图像中被高频细节所包围的物体即为物体实例，也即通过显著性检测检测出多个视角下的二维图像中的一个或多个具有明确轮廓的物体实例，随后，分别判断每个像素点的像素点位置是否在具有明确轮廓的物体实例的内部，若在，也即该像素点落入具有明确轮廓的物体实例的区域范围之内，也即该像素点为显著性前景像素点；否则，说明该像素点在具有明确轮廓的物体实例的之外，也即该像素点为背景，不属于显著性前景。A salient foreground is an object instance with a clear outline in a two-dimensional image; first, this application can perform saliency detection based on a salient foreground detection algorithm. Generally, an object surrounded by high-frequency details in a two-dimensional image is an object instance. That is, one or more object instances with clear outlines in the two-dimensional images under multiple viewing angles are detected through saliency detection, and then, whether the pixel position of each pixel is inside the object instance with clear outlines is judged respectively , if it is, that is, the pixel falls within the range of the object instance with a clear outline, that is, the pixel is a salient foreground pixel; otherwise, it means that the pixel is between the object instance with a clear outline In addition, that is, the pixel is the background and does not belong to the salient foreground.

例如可以对于每个视角下的二维图像，对于图像中的每一个像素点，若属于显著性前景则标记为1，或不属于显著性前景(属于背景)则标记为0。假设存在N个视角，每个视角下2D显著性图像为I_n∈R^w×h，其中w，h分别为图像宽和高。若坐标[x,y]属于显著性前景则I_n(x,y)＝1，否则I_n(x,y)＝0。For example, for a two-dimensional image at each viewing angle, for each pixel in the image, if it belongs to the salient foreground, it is marked as 1, or if it does not belong to the salient foreground (belongs to the background), it is marked as 0. Assuming that there are N viewing angles, the 2D saliency image at each viewing angle is I _n ∈ ^{R w×h} , where w, h are the image width and height, respectively. If the coordinate [x, y] belongs to the salient foreground, I _n (x, y)=1, otherwise I _n (x, y)=0.

R^w×h表示w、x、h的矩阵。I_n(x,y)表示I_n中坐标为(x,y)的像素。R ^w×h represents a matrix of w, x, and h. I _n (x, y) represents the pixel whose coordinates are (x, y) in I _n .

S122：根据是否为显著性前景的确定结果，基于预设体素空间，在多个视角下的点云数据中，确定属于目标物体内部的目标体素点。S122: According to the determination result of whether it is a salient foreground, based on the preset voxel space, determine the target voxel points belonging to the inside of the target object in the point cloud data under multiple viewing angles.

在本申请的实施例中，确定属于目标物体内部的目标体素点的方式例如可以为：根据目标体素点和多个视角的视角数量，确定第一比值；若第一比值大于预设阈值，则确定目标体素点属于目标物体内部。In the embodiment of the present application, the method of determining the target voxel point belonging to the inside of the target object may be, for example: determining the first ratio according to the target voxel point and the number of viewing angles of multiple viewing angles; if the first ratio is greater than the preset threshold , it is determined that the target voxel point belongs to the interior of the target object.

图4为本申请一实施例提供的多视角三维重建方法的流程示意图，如图4所示，以完整的流程图对本申请提供的三维重建方法中符号距离函数网络和颜色网络的训练流程进行解释说明：Fig. 4 is a schematic flow chart of the multi-view 3D reconstruction method provided by an embodiment of the present application. As shown in Fig. 4, the training process of the symbolic distance function network and the color network in the 3D reconstruction method provided by the present application is explained with a complete flow chart illustrate:

S201：输入目标物体在各个视角下的二维图像、相机位姿与稀疏点云。S201: Input the 2D image, camera pose and sparse point cloud of the target object under various viewing angles.

S202：获取BBox并将稀疏点云进行归一化。S202: Obtain a BBox and normalize the sparse point cloud.

其中，根据SFM获取的稀疏点云结果获取目标物体的边界框BBox，并将物体在三维空间中的x、y、z坐标归一化到[-1,1]，并计算缩放矩阵。Among them, the bounding box BBox of the target object is obtained according to the sparse point cloud result obtained by SFM, and the x, y, z coordinates of the object in the three-dimensional space are normalized to [-1,1], and the scaling matrix is calculated.

S203：计算每个视角下的二维图像的显著前景。S203: Calculate the salient foreground of the two-dimensional image under each viewing angle.

计算每个视角下拍摄的二维图片中的显著性前景区域，对于二维图像中的每一个像素，若属于前景标记为1，若属于背景标记为0。Calculate the salient foreground area in the two-dimensional image taken at each viewing angle. For each pixel in the two-dimensional image, if it belongs to the foreground, it is marked as 1, and if it belongs to the background, it is marked as 0.

S204：初始化体素空间并判断是否属于目标物体的内部。S204: Initialize the voxel space and determine whether it belongs to the interior of the target object.

在三维空间中的单位空间内初始化体素空间，并设置体素间隔为ρ。将上述初始化后的体素空间中的每一个点根据相机参数投影到2D图像中，并判断每个体素点投影到2D图像中属于显著前景或者背景。Initialize the voxel space in the unit space in the three-dimensional space, and set the voxel interval to ρ. Each point in the above-mentioned initialized voxel space is projected into the 2D image according to the camera parameters, and it is determined that each voxel point projected into the 2D image belongs to the salient foreground or background.

判断每个体素点在各个视角下投影到2D图中属于前景的比例，若比例超过阈值，则确定该体素点属于目标物体的内部，否则属于目标物体的外部。Judging the proportion of each voxel point projected into the 2D map belonging to the foreground at each viewing angle, if the proportion exceeds the threshold, it is determined that the voxel point belongs to the inside of the target object, otherwise it belongs to the outside of the target object.

S205：根据目标体素点生成目标物体的三维表面。S205: Generate a three-dimensional surface of the target object according to the target voxel points.

确定属于目标物体的内部的点为目标体素点，并根据目标体素点的确定结果建立目标物体的三维表面，也即3D凸包。The points inside the target object are determined as the target voxel points, and the three-dimensional surface of the target object, that is, the 3D convex hull is established according to the determination result of the target voxel points.

S206：计算每条采样射线与目标物体的交点区间。S206: Calculate the intersection interval between each sampling ray and the target object.

针对每个视角下的相机位姿，计算每条采样射线与上述计算得到的物体3D凸包的表面交点，计算最近点与最远点，基于最近点和最远点，确定每条采样射线与目标物体的交点区间。For the camera pose at each viewing angle, calculate the surface intersection point between each sampling ray and the 3D convex hull of the object calculated above, calculate the closest point and the farthest point, and determine the relationship between each sampling ray and the farthest point based on the closest point and the farthest point The intersection interval of the target object.

S207：随机采样部分射线并计算三维模型数据。S207: Randomly sample some rays and calculate 3D model data.

接着，在3D凸包的最近点与最远点之间沿着每条观察射线进行3D采样，通过上述符号距离函数(Signed Distance Function，SDF)网络和Color网络计算采样点的SDF值、法向量、采样点间距以及沿着观察射线的颜色值；其中，上述SDF值、法向量、采样点间距以及沿着观察射线的颜色值统称为三维模型数据。Then, 3D sampling is performed along each observation ray between the nearest point and the farthest point of the 3D convex hull, and the SDF value and normal vector of the sampling point are calculated through the above-mentioned Signed Distance Function (SDF) network and Color network , sampling point spacing, and color values along the observation ray; wherein, the above-mentioned SDF value, normal vector, sampling point spacing, and color values along the observation ray are collectively referred to as 3D model data.

S208：体密度渲染。S208: volume density rendering.

接着，再通过Logistic累积分布函数将加权的SDF值转为体密度值。并根据每条射线上所有采样点的体密度值、颜色值和采样间距，进行体渲染，获得每条射线最终渲染的颜色值。Then, the weighted SDF value is converted into a volume density value through the Logistic cumulative distribution function. And perform volume rendering according to the volume density value, color value and sampling interval of all sampling points on each ray, and obtain the final rendered color value of each ray.

S209：计算损失函数。S209: Calculate a loss function.

S210：优化网络参数。S210: Optimizing network parameters.

其中，优化的网络参数分别包括优化符号距离函数网络参数，以及颜色网络的网络参数。Wherein, the optimized network parameters respectively include optimized signed distance function network parameters and network parameters of the color network.

S211：是否达到最大迭代次数。S211: Whether the maximum number of iterations is reached.

若否，则返回执行S207，继续随机采样部分射线并计算。If not, go back to S207 and continue to randomly sample and calculate some rays.

若是，则执行S212。If yes, execute S212.

S212：保存网络参数。S212: Save network parameters.

若达到最大迭代次数，则确定符号距离函数网络以及颜色网络迭代完成，此时获取迭代后的模型并投入使用。If the maximum number of iterations is reached, it is determined that the iteration of the signed distance function network and the color network is completed, and at this time the iterated model is obtained and put into use.

采用本申请提供的多视角三维重建方法，在解决对二维图像的三维重建和渲染的挤出上，提出了一种基于三维表面对二维图像进行三维重建的方法，可以优化目标物体周围有效范围内的三维采样射线的采样，仅基于颜色网络确定焦点区间内的每条采样射线的颜色，从而确定目标物体的表面颜色，这样的确定方式可以缩小采样射线的采样范围，从而提高确定目标物体表面颜色的效率，以及提高目标物体的渲染效率，提高三维重建的效率和精度。Using the multi-view 3D reconstruction method provided by this application, in solving the extrusion of 3D reconstruction and rendering of 2D images, a method of 3D reconstruction of 2D images based on 3D surfaces is proposed, which can optimize the effective The sampling of three-dimensional sampling rays within the range only determines the color of each sampling ray in the focal area based on the color network, thereby determining the surface color of the target object. This determination method can narrow the sampling range of the sampling ray, thereby improving the determination of the target object. Improve the efficiency of surface color, improve the rendering efficiency of target objects, and improve the efficiency and accuracy of 3D reconstruction.

下述结合附图对本申请所提供的多视角三维重建装置进行解释说明，该多视角三维重建装置可执行上述图1-图4任一多视角三维重建方法，其具体实现以及有益效果参照上述，如下不再赘述。The multi-view 3D reconstruction device provided by the present application will be explained below in conjunction with the accompanying drawings. The multi-view 3D reconstruction device can execute any of the above-mentioned multi-view 3D reconstruction methods in Figures 1-4. For its specific implementation and beneficial effects, refer to the above. No further details will be given below.

图5为本申请一实施例提供的多视角三维重建装置的结构示意图，如图5所示，该装置包括：获取模块201、确定模块202、采样模块203和渲染模块204，其中：FIG. 5 is a schematic structural diagram of a multi-view 3D reconstruction device provided by an embodiment of the present application. As shown in FIG. 5 , the device includes: an acquisition module 201, a determination module 202, a sampling module 203, and a rendering module 204, wherein:

获取模块201，用于获取目标物体在多个视角下的二维图像；An acquisition module 201, configured to acquire two-dimensional images of the target object under multiple viewing angles;

确定模块202，用于根据多个视角下的二维图像，分别确定多个视角下的点云数据；A determining module 202, configured to determine point cloud data under multiple viewing angles according to the two-dimensional images under multiple viewing angles;

采样模块203，用于基于预设体素空间，对多个视角下的点云数据中在预设体素空间中的点云进行采样，得到目标体素点；The sampling module 203 is configured to sample the point cloud in the preset voxel space in the point cloud data under multiple viewing angles based on the preset voxel space, to obtain the target voxel point;

确定模块202，具体用于根据目标体素点确定目标物体的三维表面；根据各视角下的相机位姿，确定每条采样射线与三维表面的交点区间；采用颜色网络基于各交点区间，确定目标物体的表面颜色；The determining module 202 is specifically used to determine the three-dimensional surface of the target object according to the target voxel points; determine the intersection interval between each sampling ray and the three-dimensional surface according to the camera pose under each viewing angle; use the color network to determine the target based on each intersection interval the surface color of the object;

渲染模块204，用于基于表面颜色，对目标物体进行渲染。The rendering module 204 is configured to render the target object based on the surface color.

可选地，确定模块202，具体用于根据符号距离函数网络，确定目标体素点的符号距离函数值、法向量以及目标体素点间距；根据颜色网络，确定沿着每条采样射线的颜色值；获得每条采样射线的目标渲染颜色值。Optionally, the determining module 202 is specifically configured to determine the signed distance function value, the normal vector, and the target voxel point spacing of the target voxel point according to the signed distance function network; determine the color along each sampling ray according to the color network Value; get the target render color value for each sampled ray.

可选地，确定模块202，具体用于采用等值面提取算法基于目标体素点确定目标物体的三维表面。Optionally, the determining module 202 is specifically configured to determine the three-dimensional surface of the target object based on the target voxel points using an isosurface extraction algorithm.

可选地，在上述实施例的基础上，本申请实施例还可提供一种多视角三维重建装置，如下结合附图对上述图5给出的装置的实现过程进行示例说明。图6为本申请另一实施例提供的多视角三维重建装置的结构示意图，如图6所示，该装置还包括：设置模块205，用于根据目标物体的目标精度，设置预设体素空间内采样的体素数量。Optionally, on the basis of the above-mentioned embodiments, the embodiments of the present application may also provide a multi-view three-dimensional reconstruction device. The implementation process of the above-mentioned device shown in FIG. 5 is illustrated as follows with reference to the accompanying drawings. Fig. 6 is a schematic structural diagram of a multi-view 3D reconstruction device provided by another embodiment of the present application. As shown in Fig. 6, the device further includes: a setting module 205, which is used to set a preset voxel space according to the target accuracy of the target object The number of voxels to sample within.

可选地，确定模块202，具体用于确定多个视角下的二维图像中的每个像素点是否为显著性前景；根据是否为显著性前景的确定结果，基于预设体素空间，在多个视角下的点云数据中，确定属于目标物体内部的目标体素点。Optionally, the determination module 202 is specifically configured to determine whether each pixel in the two-dimensional image under multiple viewing angles is a salient foreground; according to the determination result of whether it is a salient foreground, based on a preset voxel space, in In the point cloud data under multiple viewing angles, the target voxel points belonging to the interior of the target object are determined.

可选地，确定模块202，具体用于根据目标体素点和多个视角的视角数量，确定第一比值；若第一比值大于预设阈值，则确定目标体素点属于目标物体内部。Optionally, the determining module 202 is specifically configured to determine a first ratio according to the target voxel point and the number of viewing angles of multiple viewing angles; if the first ratio is greater than a preset threshold, determine that the target voxel point belongs to the inside of the target object.

可选地，确定模块202，具体用于根据所述多个视角下的点云数据，确定目标物体的边界框的信息；Optionally, the determination module 202 is specifically configured to determine the information of the bounding box of the target object according to the point cloud data under the multiple viewing angles;

获取模块201，具体用于对目标物体的边界框进行归一化，获取归一化后的边界框数据；The acquisition module 201 is specifically used to normalize the bounding box of the target object, and acquire normalized bounding box data;

采样模块203，具体用于基于预设体素空间，对多个视角下的归一化后的边界框数据中在预设体素空间中的点云进行采样，得到目标体素点。The sampling module 203 is specifically configured to, based on the preset voxel space, sample point clouds in the preset voxel space in the normalized bounding box data under multiple viewing angles to obtain target voxel points.

上述装置用于执行前述实施例提供的方法，其实现原理和技术效果类似，在此不再赘述。The above-mentioned device is used to execute the methods provided in the foregoing embodiments, and its implementation principles and technical effects are similar, and details are not repeated here.

以上这些模块可以是被配置成实施以上方法的一个或多个集成电路，例如：一个或多个特定集成电路(Application Specific Integrated Circuit，简称ASIC)，或，一个或多个微处理器，或，一个或者多个现场可编程门阵列(Field Programmable Gate Array，简称FPGA)等。再如，当以上某个模块通过处理元件调度程序代码的形式实现时，该处理元件可以是通用处理器，例如中央处理器(Central Processing Unit，简称CPU)或其它可以调用程序代码的处理器。再如，这些模块可以集成在一起，以片上系统(system-on-a-chip，简称SOC)的形式实现。The above modules may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (Application Specific Integrated Circuit, referred to as ASIC), or, one or more microprocessors, or, One or more Field Programmable Gate Arrays (Field Programmable Gate Array, FPGA for short), etc. For another example, when one of the above modules is implemented in the form of a processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).

图7为本申请一实施例提供的多视角三维重建设备的结构示意图，该多视角三维重建设备可以集成于终端设备或者终端设备的芯片。FIG. 7 is a schematic structural diagram of a multi-view 3D reconstruction device provided by an embodiment of the present application. The multi-view 3D reconstruction device may be integrated into a terminal device or a chip of the terminal device.

如图7所示，该多视角三维重建设备包括：处理器501、总线502和存储介质503。As shown in FIG. 7 , the multi-view three-dimensional reconstruction device includes: a processor 501 , a bus 502 and a storage medium 503 .

处理器501用于存储程序，处理器501调用存储介质503存储的程序，以执行上述图1-图4对应的方法实施例。具体实现方式和技术效果类似，这里不再赘述。The processor 501 is configured to store a program, and the processor 501 invokes the program stored in the storage medium 503 to execute the above-mentioned method embodiments corresponding to FIGS. 1-4 . The specific implementation manner and technical effect are similar, and will not be repeated here.

可选地，本申请还提供一种程序产品，例如存储介质，该存储介质上存储有计算机程序，包括程序，该程序在被处理器运行时执行上述方法对应的实施例。Optionally, the present application further provides a program product, such as a storage medium, on which a computer program is stored, including a program. When the program is run by a processor, the corresponding embodiment of the above method is executed.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(英文：processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取存储器(英文：Random Access Memory，简称：RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the functions described in various embodiments of the present application. part of the method. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviated: ROM), random access memory (English: Random Access Memory, abbreviated: RAM), magnetic disk or optical disc, etc. Various media that can store program code.

Claims

1. A method for multi-view three-dimensional reconstruction, characterized in that the method comprises:

Acquire two-dimensional images of the target object under multiple viewing angles;

According to the two-dimensional images under the multiple viewing angles, respectively determine the point cloud data under the multiple viewing angles;

Based on the preset voxel space, sampling the point cloud in the preset voxel space in the point cloud data under multiple viewing angles to obtain the target voxel point;

determining a three-dimensional surface of the target object according to the target voxel points;

Determining the intersection interval between each sampling ray and the three-dimensional surface according to the camera pose under each of the viewing angles;

Using a color network to determine the surface color of the target object based on each of the intersection intervals;

Render the target object based on the surface color.

2. The method according to claim 1, wherein said adopting a color network to determine the surface color of said target object based on each said intersection interval comprises:

According to the signed distance function network, determine the signed distance function value of the target voxel point, the normal vector and the target voxel point spacing;

determining a color value along each of the sampling rays according to the color network;

Gets the target render color value for each of said sampled rays.

3. The method according to claim 1, wherein said determining the three-dimensional surface of the target object according to the target voxel points comprises:

An isosurface extraction algorithm is used to determine the three-dimensional surface of the target object based on the target voxel points.

4. The method according to claim 1, wherein, based on the preset voxel space, the point cloud in the preset voxel space in the point cloud data under multiple viewing angles is sampled to obtain Before the target voxel point, the method also includes:

According to the target accuracy of the target object, the number of voxels sampled in the preset voxel space is set.

5. The method according to claim 1, wherein, based on the preset voxel space, the point cloud in the preset voxel space in the point cloud data under multiple viewing angles is sampled to obtain Target voxel points, including:

determining whether each pixel in the two-dimensional images under the plurality of viewing angles is a salient foreground;

According to the determination result of whether it is a salient foreground, based on the preset voxel space, the target voxel points belonging to the inside of the target object are determined in the point cloud data under multiple viewing angles.

6. The method according to claim 5, wherein the determining the target voxel points belonging to the inside of the target object comprises:

determining a first ratio according to the target voxel point and the number of viewing angles of multiple viewing angles;

If the first ratio is greater than a preset threshold, it is determined that the target voxel point belongs to the inside of the target object.

7. The method according to claim 1, wherein, based on the preset voxel space, the point cloud in the preset voxel space in the point cloud data under multiple viewing angles is sampled to obtain Before the target voxel point, the method also includes:

Determine the information of the bounding box of the target object according to the point cloud data under the plurality of viewing angles;

normalizing the bounding box of the target object, and obtaining normalized bounding box data;

Based on the preset voxel space, the point cloud in the preset voxel space in the point cloud data under multiple viewing angles is sampled to obtain the target voxel point, including:

Based on the preset voxel space, the point cloud in the preset voxel space in the normalized bounding box data under multiple viewing angles is sampled to obtain the target voxel point.

8. A multi-view three-dimensional reconstruction device, characterized in that the device comprises: an acquisition module, a determination module, a sampling module and a rendering module, wherein:

The acquiring module is configured to acquire two-dimensional images of the target object under multiple viewing angles;

The determining module is configured to respectively determine point cloud data under the multiple viewing angles according to the two-dimensional images under the multiple viewing angles;

The sampling module is configured to, based on a preset voxel space, sample point clouds in the preset voxel space in the point cloud data under multiple viewing angles to obtain target voxel points;

The determining module is specifically configured to determine the three-dimensional surface of the target object according to the target voxel points; determine the intersection interval between each sampling ray and the three-dimensional surface according to the camera poses under each of the viewing angles; The color network determines the surface color of the target object based on each of the intersection intervals;

The rendering module is configured to render the target object based on the surface color.

9. A multi-view three-dimensional reconstruction device, characterized in that the device comprises: a processor, a storage medium and a bus, the storage medium stores machine-readable instructions executable by the processor, and when the multi-view When the three-dimensional reconstruction device is running, the processor communicates with the storage medium through a bus, and the processor executes the machine-readable instructions to perform the method described in any one of claims 1-7.

10. A storage medium, wherein a computer program is stored on the storage medium, and the computer program executes the method according to any one of claims 1-7 when run by a processor.