[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116310120A - Multi-view three-dimensional reconstruction method, device, equipment and storage medium - Google Patents

Multi-view three-dimensional reconstruction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116310120A
CN116310120A CN202310271277.7A CN202310271277A CN116310120A CN 116310120 A CN116310120 A CN 116310120A CN 202310271277 A CN202310271277 A CN 202310271277A CN 116310120 A CN116310120 A CN 116310120A
Authority
CN
China
Prior art keywords
point
target
determining
target object
voxel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310271277.7A
Other languages
Chinese (zh)
Inventor
赵敏达
李林橙
张永强
刘柏
范长杰
胡志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202310271277.7A priority Critical patent/CN116310120A/en
Publication of CN116310120A publication Critical patent/CN116310120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Generation (AREA)

Abstract

The application provides a multi-view three-dimensional reconstruction method, device, equipment and storage medium, and relates to the technical field of three-dimensional reconstruction. The method comprises the following steps: acquiring two-dimensional images of a target object under a plurality of view angles; respectively determining point cloud data under a plurality of view angles according to the two-dimensional images under the plurality of view angles; sampling point clouds in a preset voxel space in point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points; determining a three-dimensional surface of the target object according to the target voxel point; determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle; determining the surface color of the target object based on each intersection point interval by adopting a color network; the target object is rendered based on the surface color. Compared with the prior art, a large amount of redundant and invalid calculation is avoided, and the efficiency and the accuracy of three-dimensional reconstruction are improved.

Description

Multi-view three-dimensional reconstruction method, device, equipment and storage medium
Technical Field
The present application relates to the field of three-dimensional reconstruction technologies, and in particular, to a multi-view three-dimensional reconstruction method, apparatus, device, and storage medium.
Background
Multi-view three-dimensional reconstruction is a technique for three-dimensional (3D) object shape reconstruction using pictures of the object to be reconstructed taken from different view angles. In recent years, with the development of 3D games, architectural designs, and virtual reality, the demand for three-dimensional reconstruction of real-world objects has increased. Current three-dimensional reconstruction based on deep learning is evolving faster.
The prior art generally uses neural surface reconstruction methods using a symbolic distance function (Signed Distance Function, SDF). The method obtains an initial bounding box (BBox) of the object according to a sparse point cloud calculated by a motion recovery structure (Structure From Motion, SFM) method. And designing an observed sampling ray based on the camera pose estimation result, and effectively sampling in the BBox. For each target voxel point, the SDF value is predicted by a multi-layer perceptron (Multilayer Perceptron, MLP), as well as the color at that target voxel point. And then converting the SDF value into a bulk density value through a Logistic probability density distribution function, and training the SDF representation by utilizing a neural volume rendering method in combination with the predicted color of each target voxel point.
However, in such a reconstruction mode, since the actual object to be reconstructed only occupies a part of the BBox, a great amount of redundant and invalid computation exists in the ray sampling based on the whole BBox, so that the optimization efficiency is reduced, and the final reconstruction effect is reduced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a multi-view three-dimensional reconstruction method, device, equipment and storage medium, so as to solve the problems that in the prior art, a large amount of redundant and invalid calculation exist, and the reconstruction effect is reduced due to the reduction of the optimization efficiency.
In order to achieve the above purpose, the technical solution adopted in the embodiment of the present application is as follows:
in a first aspect, an embodiment of the present application provides a multi-view three-dimensional reconstruction method, including:
acquiring two-dimensional images of a target object under a plurality of view angles;
respectively determining point cloud data under the multiple view angles according to the two-dimensional images under the multiple view angles;
sampling point clouds in a preset voxel space in point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points;
determining the three-dimensional surface of the target object according to the target voxel point;
determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle;
determining the surface color of the target object based on each intersection point interval by adopting a color network;
rendering the target object based on the surface color.
In a second aspect, another embodiment of the present application provides a multi-view three-dimensional reconstruction apparatus, the apparatus comprising: the device comprises an acquisition module, a determination module, a sampling module and a rendering module, wherein:
the acquisition module is used for acquiring two-dimensional images of the target object under a plurality of view angles;
the determining module is used for respectively determining point cloud data under the multiple view angles according to the two-dimensional images under the multiple view angles;
the sampling module is used for sampling point clouds in the preset voxel space in the point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points;
the determining module is specifically configured to determine a three-dimensional surface of the target object according to the target voxel point; determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle; determining the surface color of the target object based on each intersection point interval by adopting a color network;
the rendering module is used for rendering the target object based on the surface color.
In a third aspect, another embodiment of the present application provides a multi-view three-dimensional reconstruction apparatus, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via the bus when the multi-view three-dimensional reconstruction is running, the processor executing the machine-readable instructions to perform the steps of the method according to any of the first aspect above.
In a fourth aspect, another embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of the first aspects described above.
The beneficial effects of this application are: by adopting the multi-view three-dimensional reconstruction method provided by the application, the two-dimensional image of the target object under a plurality of view angles is obtained, then the point cloud data under the plurality of view angles are respectively determined based on the plurality of two-dimensional images, the point cloud in the preset voxel space is sampled based on the preset voxel space, the target voxel point is determined, the three-dimensional surface of the target object is determined based on the target voxel point, then when the surface color of the target object is determined based on the sampling rays, the intersection point interval of each sampling ray and the three-dimensional surface is firstly determined, and then the color of each sampling ray in the focus interval is determined based on the color network only, so that the surface color of the target object is determined.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a multi-view three-dimensional reconstruction method according to an embodiment of the present application;
fig. 2 is a flow chart of a multi-view three-dimensional reconstruction method according to another embodiment of the present application;
fig. 3 is a flow chart of a multi-view three-dimensional reconstruction method according to another embodiment of the present application;
fig. 4 is a flow chart of a multi-view three-dimensional reconstruction method according to another embodiment of the present application;
FIG. 5 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments.
The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
Additionally, a flowchart, as used in this application, illustrates operations implemented in accordance with some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.
To facilitate an understanding of the present application, the following description is provided for some of the terms devised in this application:
salient object detection (salient object detection, SOD): it is intended to distinguish the most visually distinct areas. The portions of the image that belong to the saliency foreground are marked using conventional computer vision techniques or data-driven based deep learning algorithms.
The implicit expression of the symbol distance function (Signed Distance Function, SDF) three-dimensional object surface determines the distance of a point to the object surface over a limited area in space and defines the symbol of the distance at the same time: the point is positive inside the object and negative outside, 0 when on the surface. The SDF may be implicitly characterized by means of a multi-layer perceptron (MLP) network.
Bounding box (BoundingBox, BBox), the smallest bounding box in 3D space that encloses the object.
Motion restoration structure (Structure From Motion, SFM) is a technique for estimating a three-dimensional structure from a series of two-dimensional images containing visual motion information, with which camera parameters for each view angle and a sparse point cloud can be obtained.
Marking Cubes: the surface extraction algorithm, also known as a surface rendering algorithm, is an algorithm for extracting an iso-surface from volumetric data. The core concept is that after the three-dimensional space and the vertex scalar value are set, determining which sides of the voxels intersect with the isosurface when comparing the vertex with the user-specified threshold value, creating a triangular patch, and connecting the patches of all cubes on the isosurface boundary, thereby obtaining a surface.
The multi-view three-dimensional reconstruction method provided by the embodiment of the application can be executed by a terminal or a server. A terminal is any device having computing hardware capable of supporting and executing a corresponding software product. The following explains a multi-view three-dimensional reconstruction method provided by the embodiment of the present application in conjunction with a plurality of specific application examples. Fig. 1 is a flow chart of a multi-view three-dimensional reconstruction method according to an embodiment of the present application, as shown in fig. 1, the method includes:
s101: two-dimensional images of a target object at a plurality of viewing angles are acquired.
In some possible embodiments of the present application, the plurality of viewing angles may be, for example, 50-80 viewing angles, and it should be understood that the above embodiments are only illustrative, and that the two-dimensional images at the specific plurality of viewing angles are only a plurality of viewing angle images for the viewing angle after 360 ° surrounding the target object, and may also be any integer number of less than 50 or any integer number of greater than 80, and the specific number of the specific plurality of viewing angles may be flexibly adjusted according to the needs of the user, and is not limited to the above embodiments.
In some possible embodiments, the manner of acquiring the two-dimensional images at multiple viewing angles may be, for example: and acquiring a plurality of two-dimensional images obtained by shooting a target object in a real environment under a plurality of view angles by a preset camera, wherein the two-dimensional images respectively correspond to the view angles.
S102: and respectively determining point cloud data under the multiple view angles according to the two-dimensional images under the multiple view angles.
In some possible embodiments, for example, the multiple two-dimensional images may be estimated separately to obtain three-dimensional point cloud data under multiple views, where the three-dimensional point cloud under each view may be, for example, a sparse point cloud. For example, a preset SFM network may be used to estimate the two-dimensional image under each view angle, so as to obtain three-dimensional point cloud data under each view angle and camera pose under multiple view angles.
S103: and sampling point clouds in the preset voxel space in the point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points.
In an embodiment of the present application, S103 may determine, for example, information of a bounding box of the target object according to point cloud data under multiple view angles, where the information of the bounding box of the target object is information of a minimum bounding box surrounding the target object.
The manner of determining the bounding box of the target object may be, for example: the camera pose under each view angle can be calibrated according to the three-dimensional point cloud data under each view angle to obtain the target camera pose under each view angle, and the boundary frame of the target object is estimated according to the updated target camera pose, so that the boundary frame information of the target object is obtained; in other words, in the embodiment of the application, the frame of the target object is estimated based on the calibrated target camera pose under a plurality of view angles, so that the information of the boundary frame of the target object is obtained, the automatic estimation of the boundary frame of the target object is realized, and the estimated boundary frame of the target object can be ensured to be tighter.
For the information of the bounding box of the target object, in some possible embodiments, the above-mentioned SFM estimation result of the two-dimensional images of multiple perspectives may also be used to perform automatic object frame estimation, so as to obtain the information of the bounding box of the target object.
In the embodiment of the present application, after determining the bounding box BBox of the target object, the coordinates of the bounding box of the target object in the three-dimensional space need to be normalized first, normalized to [ -1,1] to obtain normalized point cloud data, where the obtaining manner is as follows:
normalizing the coordinates of the BBox in the three-dimensional space to [ -1,1]And calculates the scaling matrix as scale_mat εR 4×4 . Wherein B is E R 3
Figure BDA0004135185430000081
Figure BDA0004135185430000082
i∈(x,y,z)。
R 4×4 Represents a 4x4 matrix, R 3 Representing a three-dimensional vector space, c x 、c y 、c z Is shown inOffset in the x, y, z directions. B (B) x ,B y ,B z Represents coordinate values of BBOX in x, y, z directions.
In the method provided by the application, the bounding box of the target object is automatically calculated and estimated instead of being estimated artificially, so that the application complexity in the three-dimensional reconstruction process is reduced.
In the embodiment of the present application, the preset voxel space needs to be initialized first, and the initialization manner may be, for example: and setting the number of voxels sampled in a preset voxel space according to the target precision of the target object.
The target precision represents the fineness of the three-dimensional model of the object expected to be obtained, the target precision corresponding to different fineness is different, and the higher the fineness is, the higher the target precision is. For example, medium finesse corresponds to sampling in three dimensions with 100 voxels per dimension. For example, based on a preset voxel space, according to information of a boundary box of a target object, sampling point clouds in the preset voxel space in normalized point cloud data under multiple view angles, and determining target voxel points corresponding to the multiple view angles.
Initializing a preset voxel space in a three-dimensional space by V epsilon [ -1,1] 3 And setting the number of voxels sampled in each dimension to be ρ, wherein the magnitude of ρ is determined according to the target accuracy of the target object, and the higher the target accuracy is, the larger the ρ setting is, and the lower the target accuracy is, the smaller the ρ setting is, the coordinates in each dimension in the three-dimensional space are
Figure BDA0004135185430000083
Defining the count array of each voxel belonging to the interior of the object as O E < -1,1] 3
For the i-th view, assume that the currently calculated transformation matrix from the world coordinate system to the current 2D image is w2c_mat i ∈R 3×4 . For V epsilon [ -1,1] 3 Is projected to the current 2D image in coordinates of each point (i, j, k)
Figure BDA0004135185430000091
Wherein the abscissa indicates +.>
Figure BDA0004135185430000092
p=(p 0 ,p 1 ,p 2 ). Then->
Figure BDA0004135185430000093
R 3×4 Representing a 3x4 matrix, p 0 ,p 1 ,p 2 Representing the components in the x, y, z directions of P. O (O) i,j,k Representing the component of the count array O in three dimensions with coordinates (i, j, k).
S104: and determining the three-dimensional surface of the target object according to the target voxel point.
In one embodiment of the present application, the manner of determining the three-dimensional surface of the target object may be, for example: and determining the three-dimensional surface of the target object based on the target voxel point by adopting an isosurface extraction algorithm.
In the embodiment of the present application, the above process of determining the target voxel point is repeated for the two-dimensional images under all view angles, and the final count array O is obtained. For each position (i, j, k) in O, if
Figure BDA0004135185430000094
The voxel point is considered to belong to the object interior, otherwise the object exterior is entered. Wherein the threshold gamma e (0, 1). And extracting a three-dimensional surface (3D convex hull) formed by voxel points belonging to the interior of the object according to an isosurface extraction Marching cube algorithm.
S105: and determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle.
And calculating the intersection point of each sampling ray and the surface of the 3D convex hull of the object obtained by calculation according to the camera pose under each view angle, wherein the intersection points of the front and rear intersection points are respectively called a near intersection point and a far intersection point. Let the coordinates of the nth camera origin in the world coordinate system be o n The unit vector of the ray to be sampled is
Figure BDA0004135185430000095
(assuming that the ray is a straight line passing through the camera origin and (i, j) in the 2D image), the distance from the center point to the camera origin is defined as +.>
Figure BDA0004135185430000101
Thus, the near intersection point +.f of the ray with BBox calculated in step 1 can be obtained>
Figure BDA0004135185430000102
The far intersection point is
Figure BDA0004135185430000103
In addition, if the ray and the 3D convex hull have intersection points, o is recorded n The first intersection point with the 3D convex hull is
Figure BDA0004135185430000104
o n Relative to->
Figure BDA0004135185430000105
Is +.>
Figure BDA0004135185430000106
And record the first intersection point of the three-dimensional convex hull as +.>
Figure BDA0004135185430000107
Far intersection->
Figure BDA0004135185430000108
When there is no intersection point between the ray and the 3D convex hull, the intersection point between the ray and BBox is used as +.>
Figure BDA0004135185430000109
And->
Figure BDA00041351854300001010
I.e. if ray d i,j And (3) the three-dimensional convex hull has no intersection point, and the following formula is used for substitution:
Figure BDA00041351854300001011
so far, the effective near-intersection point of each sampling ray is obtained
Figure BDA00041351854300001012
And far intersection->
Figure BDA00041351854300001013
If there is a case where the sampling ray is tangential to the target object, i.e. the first intersection point and the second intersection point of the sampling ray coincide.
S106: and determining the surface color of the target object based on each intersection point interval by adopting a color network.
The color network is a pre-trained network model, which is built by using a multi-layer perceptron (Multilayer Perceptron, MLP) network, and network parameters are obtained through training, and the output of the network is the predicted color value of each point in the space.
That is, the data of the three-dimensional surface and the surface color of the target have been obtained so far, that is, the three-dimensional model data of the target object is obtained, and then the rendering in the three-dimensional world can be directly performed on the target object based on the obtained three-dimensional model data.
S107: the target object is rendered based on the surface color.
After the three-dimensional model data of the target object are obtained, the three-dimensional model data can be stored, and according to the follow-up actual business requirements, the three-dimensional model data are opened and loaded through model rendering software, so that the rendering and the displaying of the three-dimensional model corresponding to the target object are realized.
In the three-dimensional reconstruction method provided by the embodiment of the invention, in the process of collecting the target voxel points in the three-dimensional point cloud, the boundary frame estimation is carried out on a plurality of two-dimensional images under a plurality of view angles, and the information of the boundary frame of the obtained target object is not carried out by adopting the information of the boundary frame estimated manually, so that the automatic estimation mode of the three-dimensional reconstruction method can realize the estimation of the compact object boundary frame, thereby enabling the sampling of the three-dimensional point cloud to be more accurate, enabling the three-dimensional model data obtained by three-dimensional reconstruction to be more accurate, and further improving the reconstruction precision of the three-dimensional model and the integrity of model details.
According to the multi-view three-dimensional reconstruction method, two-dimensional images of a target object under a plurality of view angles are obtained, then point cloud data under the plurality of view angles and information of a boundary frame of the target object are respectively determined based on the plurality of two-dimensional images, sampling is conducted on point clouds in the preset voxel space based on the preset voxel space, target voxel points are determined, a three-dimensional surface of the target object is determined based on the target voxel points, then when the surface color of the target object is determined based on sampling rays, an intersection point interval of each sampling ray and the three-dimensional surface is determined, and then the color of each sampling ray in a focus interval is determined only based on a color network, so that the surface color of the target object is determined.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a multi-view three-dimensional reconstruction method, and an implementation process of determining a surface color of a target object in the foregoing method is described below with reference to the accompanying drawings. Fig. 2 is a flow chart of a multi-view three-dimensional reconstruction method according to another embodiment of the present application, as shown in fig. 2, S106 may include:
s111: and determining the symbol distance function value, the normal vector and the target voxel point distance of the target voxel point according to the symbol distance function network.
In embodiments of the present application, for sampling rays at the nth view angle
Figure BDA0004135185430000121
At->
Figure BDA0004135185430000122
And
Figure BDA0004135185430000123
3D sampling is carried out between the two points, and the symbol distance function value, the normal vector and the target voxel point distance of each target voxel point are determined through the SDF network.
S112: from the color network, color values along each sampled ray are determined.
Similarly, in determining the color value of each sampling ray, the color value is also determined for the sampling ray at the nth view angle
Figure BDA0004135185430000124
At the position of
Figure BDA0004135185430000125
And->
Figure BDA0004135185430000126
And 3D sampling is performed therebetween, and color values along each sampling ray are determined through the color network.
S113: a target rendering color value for each sampled ray is obtained.
In an embodiment of the present application, the manner of determining the target rendering color value for each sampled ray may be, for example, converting the weighted SDF value into a bulk density value by a Logistic cumulative distribution function. And performing volume rendering according to the volume density values, the color values and the sampling intervals of all the sampling points on each ray to obtain the color value finally rendered by each ray.
Optionally, on the basis of the foregoing embodiment, the embodiment of the present application may further provide a multi-view three-dimensional reconstruction method, and an implementation process of determining the target voxel point in the foregoing method is described below with reference to the accompanying drawings. Fig. 3 is a flow chart of a multi-view three-dimensional reconstruction method according to another embodiment of the present application, as shown in fig. 3, S103 may include:
s121: it is determined whether each pixel point in the two-dimensional image at a plurality of viewing angles is a saliency foreground.
The saliency foreground is an object instance with a clear outline in a two-dimensional image; firstly, saliency detection can be performed based on a saliency foreground detection algorithm, an object surrounded by high-frequency details in a general two-dimensional image is an object instance, namely, one or more object instances with clear contours in the two-dimensional image under a plurality of view angles are detected through the saliency detection, then, whether the pixel point position of each pixel point is in the interior of the object instance with the clear contours is respectively judged, and if so, the pixel point falls in the area range of the object instance with the clear contours, namely, the pixel point is a saliency foreground pixel point; otherwise, it is stated that the pixel is outside the object instance with the clear outline, i.e. the pixel is background and does not belong to the salient foreground.
For example, for a two-dimensional image at each view angle, for each pixel in the image, it may be marked as 1 if it belongs to a saliency foreground, or as 0 if it does not belong to a saliency foreground (belongs to a background). Assuming that there are N views, the 2D saliency image at each view is I n ∈R w×h Where w, h are the image width and height, respectively. If the coordinates [ x, y]Belongs to the significance prospect I n (x, y) =1, otherwise I n (x,y)=0。
R w×h Representing a matrix of w, x, h. I n (x, y) represents I n The pixel with the middle coordinates of (x, y).
S122: and determining target voxel points belonging to the interior of the target object in the point cloud data under a plurality of view angles based on a preset voxel space according to the determination result of whether the target object is the significance prospect.
In the embodiment of the present application, the manner of determining the target voxel point belonging to the interior of the target object may be, for example: determining a first ratio according to the target voxel point and the number of view angles of the plurality of view angles; if the first ratio is greater than the preset threshold, determining that the target voxel point belongs to the interior of the target object.
Fig. 4 is a flow chart of a multi-view three-dimensional reconstruction method according to an embodiment of the present application, and as shown in fig. 4, a complete flow chart is used to explain a training flow of a symbol distance function network and a color network in the three-dimensional reconstruction method according to the present application:
s201: and inputting a two-dimensional image, a camera pose and sparse point clouds of the target object under each view angle.
S202: and obtaining BBox and normalizing the sparse point cloud.
And obtaining a boundary box BBox of the target object according to the sparse point cloud result obtained by the SFM, normalizing x, y and z coordinates of the object in a three-dimensional space to [ -1,1], and calculating a scaling matrix.
S203: the salient foreground of the two-dimensional image at each view angle is calculated.
And calculating a salient foreground region in the two-dimensional picture shot at each view angle, wherein for each pixel in the two-dimensional image, if the foreground is marked as 1, and if the background is marked as 0.
S204: the voxel space is initialized and it is determined whether it belongs to the interior of the target object.
The voxel space is initialized in a unit space in the three-dimensional space, and the voxel interval is set to ρ. And projecting each point in the initialized voxel space into the 2D image according to the camera parameters, and judging that each voxel point is projected into the 2D image and belongs to a remarkable foreground or background.
And judging the proportion of each voxel point belonging to the foreground in the 2D image under each view angle, if the proportion exceeds a threshold value, determining that the voxel point belongs to the interior of the target object, and otherwise, determining that the voxel point belongs to the exterior of the target object.
S205: and generating a three-dimensional surface of the target object according to the target voxel point.
And determining points belonging to the interior of the target object as target voxel points, and establishing a three-dimensional surface of the target object, namely a 3D convex hull, according to the determination result of the target voxel points.
S206: and calculating an intersection point interval of each sampling ray and the target object.
And calculating the surface intersection point of each sampling ray and the 3D convex hull of the object obtained by calculation according to the camera pose under each view angle, calculating the closest point and the farthest point, and determining the intersection point interval of each sampling ray and the target object based on the closest point and the farthest point.
S207: a portion of the rays are randomly sampled and three-dimensional model data is calculated.
Then, 3D sampling is carried out along each observation ray between the nearest point and the farthest point of the 3D convex hull, and the SDF value, the normal vector, the sampling point distance and the Color value along the observation ray of the sampling points are calculated through the symbol distance function (Signed Distance Function, SDF) network and the Color network; wherein, the SDF value, the normal vector, the sampling point pitch, and the color value along the observation ray are collectively referred to as three-dimensional model data.
S208: and (5) rendering the volume density.
And then converting the weighted SDF value into a bulk density value through a Logistic cumulative distribution function. And performing volume rendering according to the volume density values, the color values and the sampling intervals of all the sampling points on each ray to obtain the color value finally rendered by each ray.
S209: a loss function is calculated.
S210: network parameters are optimized.
Wherein the optimized network parameters include optimized symbol distance function network parameters and color network parameters, respectively.
S211: whether the maximum number of iterations is reached.
If not, return to S207, continue to sample part of the rays randomly and calculate.
If yes, then execution proceeds to S212.
S212: network parameters are saved.
If the maximum iteration times are reached, determining that the iteration of the symbol distance function network and the color network is completed, and acquiring and putting into use an iterated model.
By adopting the multi-view three-dimensional reconstruction method provided by the application, on solving the problem of extrusion of three-dimensional reconstruction and rendering of a two-dimensional image, the method for three-dimensional reconstruction of the two-dimensional image based on the three-dimensional surface is provided, the sampling of three-dimensional sampling rays in an effective range around a target object can be optimized, the color of each sampling ray in a focus interval is determined only based on a color network, so that the surface color of the target object is determined, the sampling range of the sampling rays can be reduced by adopting the determination mode, the efficiency of determining the surface color of the target object is improved, the rendering efficiency of the target object is improved, and the efficiency and the precision of three-dimensional reconstruction are improved.
The multi-view three-dimensional reconstruction device provided in the present application is explained below with reference to the accompanying drawings, and the multi-view three-dimensional reconstruction device may execute any one of the multi-view three-dimensional reconstruction methods of fig. 1 to 4, and the specific implementation and the beneficial effects thereof are referred to the above and are not repeated below.
Fig. 5 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to an embodiment of the present application, as shown in fig. 5, where the device includes: an acquisition module 201, a determination module 202, a sampling module 203, and a rendering module 204, wherein:
an acquisition module 201, configured to acquire two-dimensional images of a target object under multiple viewing angles;
a determining module 202, configured to determine point cloud data under a plurality of view angles according to two-dimensional images under the plurality of view angles;
the sampling module 203 is configured to sample, based on a preset voxel space, point clouds in the preset voxel space in the point cloud data under multiple view angles, so as to obtain a target voxel point;
a determining module 202, specifically configured to determine a three-dimensional surface of the target object according to the target voxel point; determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle; determining the surface color of the target object based on each intersection point interval by adopting a color network;
a rendering module 204, configured to render the target object based on the surface color.
Optionally, the determining module 202 is specifically configured to determine, according to the symbolic distance function network, a symbolic distance function value, a normal vector, and a target voxel point distance of the target voxel point; determining a color value along each sampled ray according to the color network; a target rendering color value for each sampled ray is obtained.
Optionally, the determining module 202 is specifically configured to determine a three-dimensional surface of the target object based on the target voxel point using an iso-surface extraction algorithm.
Optionally, on the basis of the foregoing embodiments, embodiments of the present application may further provide a multi-view three-dimensional reconstruction device, where an implementation procedure of the device illustrated in fig. 5 is described below with reference to the accompanying drawings. Fig. 6 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to another embodiment of the present application, as shown in fig. 6, where the device further includes: the setting module 205 is configured to set, according to the target precision of the target object, the number of voxels sampled in the preset voxel space.
Optionally, the determining module 202 is specifically configured to determine whether each pixel point in the two-dimensional image under multiple viewing angles is a saliency foreground; and determining target voxel points belonging to the interior of the target object in the point cloud data under a plurality of view angles based on a preset voxel space according to the determination result of whether the target object is the significance prospect.
Optionally, the determining module 202 is specifically configured to determine a first ratio according to the target voxel point and the number of views of the plurality of views; if the first ratio is greater than the preset threshold, determining that the target voxel point belongs to the interior of the target object.
Optionally, the determining module 202 is specifically configured to determine information of a bounding box of the target object according to the point cloud data under the multiple view angles;
the obtaining module 201 is specifically configured to normalize a bounding box of the target object, and obtain normalized bounding box data;
the sampling module 203 is specifically configured to sample, based on a preset voxel space, a point cloud in the preset voxel space in normalized bounding box data under multiple view angles, so as to obtain a target voxel point.
The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASICs), or one or more microprocessors, or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGAs), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic structural diagram of a multi-view three-dimensional reconstruction device according to an embodiment of the present application, where the multi-view three-dimensional reconstruction device may be integrated in a terminal device or a chip of the terminal device.
As shown in fig. 7, the multi-view three-dimensional reconstruction apparatus includes: a processor 501, a bus 502, and a storage medium 503.
The processor 501 is configured to store a program, and the processor 501 invokes the program stored in the storage medium 503 to perform the method embodiments corresponding to fig. 1-4. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application also provides a program product, such as a storage medium, on which a computer program is stored, including a program which, when being executed by a processor, performs the corresponding embodiments of the above-mentioned method.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Claims (10)

1. A multi-view three-dimensional reconstruction method, the method comprising:
acquiring two-dimensional images of a target object under a plurality of view angles;
respectively determining point cloud data under the multiple view angles according to the two-dimensional images under the multiple view angles;
sampling point clouds in a preset voxel space in point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points;
determining the three-dimensional surface of the target object according to the target voxel point;
determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle;
determining the surface color of the target object based on each intersection point interval by adopting a color network;
rendering the target object based on the surface color.
2. The method of claim 1, wherein said determining the surface color of the target object using a color network based on each of the intersection intervals comprises:
determining a symbol distance function value, a normal vector and a target voxel point distance of the target voxel point according to the symbol distance function network;
determining a color value along each of the sampled rays according to the color network;
and obtaining a target rendering color value of each sampling ray.
3. The method of claim 1, wherein said determining a three-dimensional surface of the target object from the target voxel point comprises:
and determining the three-dimensional surface of the target object based on the target voxel point by adopting an isosurface extraction algorithm.
4. The method of claim 1, wherein the sampling, based on a preset voxel space, of point clouds in the preset voxel space in the point cloud data under a plurality of view angles, before obtaining the target voxel point, the method further comprises:
and setting the number of voxels sampled in a preset voxel space according to the target precision of the target object.
5. The method of claim 1, wherein the sampling, based on a preset voxel space, a point cloud in the preset voxel space in the point cloud data under a plurality of view angles to obtain a target voxel point includes:
determining whether each pixel point in the two-dimensional image at the plurality of viewing angles is a saliency foreground;
and determining target voxel points belonging to the interior of the target object in the point cloud data under a plurality of view angles based on a preset voxel space according to a determination result of whether the target object is the significance prospect.
6. The method of claim 5, wherein the determining a target voxel point belonging to the interior of the target object comprises:
determining a first ratio according to the target voxel point and the view angle number of the plurality of view angles;
and if the first ratio is larger than a preset threshold value, determining that the target voxel point belongs to the interior of the target object.
7. The method of claim 1, wherein the sampling, based on a preset voxel space, a point cloud in the preset voxel space in the point cloud data under a plurality of view angles, before obtaining a target voxel point, the method further comprises:
determining the information of the boundary frame of the target object according to the point cloud data under the multiple view angles;
normalizing the boundary frame of the target object to obtain normalized boundary frame data;
the sampling, based on a preset voxel space, point clouds in the preset voxel space in point cloud data under multiple view angles to obtain a target voxel point includes:
and sampling point clouds in the preset voxel space in the normalized boundary box data under a plurality of view angles based on the preset voxel space to obtain the target voxel point.
8. A multi-view three-dimensional reconstruction apparatus, the apparatus comprising: the device comprises an acquisition module, a determination module, a sampling module and a rendering module, wherein:
the acquisition module is used for acquiring two-dimensional images of the target object under a plurality of view angles;
the determining module is used for respectively determining point cloud data under the multiple view angles according to the two-dimensional images under the multiple view angles;
the sampling module is used for sampling point clouds in the preset voxel space in the point cloud data under a plurality of view angles based on the preset voxel space to obtain target voxel points;
the determining module is specifically configured to determine a three-dimensional surface of the target object according to the target voxel point; determining an intersection point interval of each sampling ray and the three-dimensional surface according to the camera pose under each view angle; determining the surface color of the target object based on each intersection point interval by adopting a color network;
the rendering module is used for rendering the target object based on the surface color.
9. A multi-view three-dimensional reconstruction apparatus, the apparatus comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via the bus when the multi-view three-dimensional reconstruction is running, the processor executing the machine-readable instructions to perform the method of any of the preceding claims 1-7.
10. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the preceding claims 1-7.
CN202310271277.7A 2023-03-14 2023-03-14 Multi-view three-dimensional reconstruction method, device, equipment and storage medium Pending CN116310120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310271277.7A CN116310120A (en) 2023-03-14 2023-03-14 Multi-view three-dimensional reconstruction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310271277.7A CN116310120A (en) 2023-03-14 2023-03-14 Multi-view three-dimensional reconstruction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116310120A true CN116310120A (en) 2023-06-23

Family

ID=86825354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310271277.7A Pending CN116310120A (en) 2023-03-14 2023-03-14 Multi-view three-dimensional reconstruction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116310120A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521719A (en) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 Virtual person three-dimensional model determining method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120147008A1 (en) * 2010-12-13 2012-06-14 Huei-Yung Lin Non-uniformly sampled 3d information representation method
EP3503030A1 (en) * 2017-12-22 2019-06-26 The Provost, Fellows, Foundation Scholars, & the other members of Board, of the College of the Holy & Undiv. Trinity of Queen Elizabeth, Method and apparatus for generating a three-dimensional model
CN114782630A (en) * 2022-04-27 2022-07-22 美智纵横科技有限责任公司 Point cloud data generation method and device, readable storage medium and sweeping robot
CN115294275A (en) * 2022-08-05 2022-11-04 珠海普罗米修斯视觉技术有限公司 Method and device for reconstructing three-dimensional model and computer readable storage medium
CN115731348A (en) * 2022-11-18 2023-03-03 深圳博升光电科技有限公司 Reconstruction method of multi-view three-dimensional point cloud

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120147008A1 (en) * 2010-12-13 2012-06-14 Huei-Yung Lin Non-uniformly sampled 3d information representation method
EP3503030A1 (en) * 2017-12-22 2019-06-26 The Provost, Fellows, Foundation Scholars, & the other members of Board, of the College of the Holy & Undiv. Trinity of Queen Elizabeth, Method and apparatus for generating a three-dimensional model
CN114782630A (en) * 2022-04-27 2022-07-22 美智纵横科技有限责任公司 Point cloud data generation method and device, readable storage medium and sweeping robot
CN115294275A (en) * 2022-08-05 2022-11-04 珠海普罗米修斯视觉技术有限公司 Method and device for reconstructing three-dimensional model and computer readable storage medium
CN115731348A (en) * 2022-11-18 2023-03-03 深圳博升光电科技有限公司 Reconstruction method of multi-view three-dimensional point cloud

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521719A (en) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 Virtual person three-dimensional model determining method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Long et al. Adaptive surface normal constraint for depth estimation
Xu et al. Multi-scale geometric consistency guided and planar prior assisted multi-view stereo
Hamzah et al. Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation
Zhuang et al. Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation
Zach Fast and high quality fusion of depth maps
Turkulainen et al. DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
CN111414923B (en) Indoor scene three-dimensional reconstruction method and system based on single RGB image
Sergey et al. Fast ray casting of function-based surfaces
CN115330940B (en) Three-dimensional reconstruction method, device, equipment and medium
CN116563493A (en) Model training method based on three-dimensional reconstruction, three-dimensional reconstruction method and device
Lin et al. Multiview textured mesh recovery by differentiable rendering
CN116310120A (en) Multi-view three-dimensional reconstruction method, device, equipment and storage medium
CN116402976A (en) Training method and device for three-dimensional target detection model
CN113281779B (en) 3D object rapid detection method, device, equipment and medium
Chen et al. Ground 3D object reconstruction based on multi-view 3D occupancy network using satellite remote sensing image
Tylecek et al. Depth map fusion with camera position refinement
Lin et al. A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery
Li et al. Edge-aware neural implicit surface reconstruction
CN115375847B (en) Material recovery method, three-dimensional model generation method and model training method
CN116758219A (en) Region-aware multi-view stereo matching three-dimensional reconstruction method based on neural network
Dong et al. Learning stratified 3D reconstruction
Huang et al. Examplar-based shape from shading
EP4445337A1 (en) Three-dimensional reconstruction method and device, and storage medium
CN112785494B (en) Three-dimensional model construction method and device, electronic equipment and storage medium
CN113554102A (en) Aviation image DSM matching method for cost calculation dynamic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination