[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2023078335A1 - 用于三维重建的方法、系统和存储介质 - Google Patents

用于三维重建的方法、系统和存储介质 Download PDF

Info

Publication number
WO2023078335A1
WO2023078335A1 PCT/CN2022/129484 CN2022129484W WO2023078335A1 WO 2023078335 A1 WO2023078335 A1 WO 2023078335A1 CN 2022129484 W CN2022129484 W CN 2022129484W WO 2023078335 A1 WO2023078335 A1 WO 2023078335A1
Authority
WO
WIPO (PCT)
Prior art keywords
local
geometric
target object
global
envelope
Prior art date
Application number
PCT/CN2022/129484
Other languages
English (en)
French (fr)
Inventor
尚弘
李翔
施展
许宽宏
Original Assignee
索尼集团公司
尚弘
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 索尼集团公司, 尚弘 filed Critical 索尼集团公司
Priority to CN202280072092.7A priority Critical patent/CN118302798A/zh
Publication of WO2023078335A1 publication Critical patent/WO2023078335A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present disclosure relates generally to three-dimensional reconstruction techniques, and in particular to deep neural network-based three-dimensional reconstruction techniques.
  • High-precision 3D reconstruction can play an important role in some occasions where plane vision is difficult or even impossible to solve, such as industrial automation, medical assistance applications, virtual reality applications, and visual navigation.
  • 3D reconstruction technology needs to obtain image information or depth information of the target object under multiple perspectives.
  • accuracy of 3D reconstruction is directly related to the density of angles. The sparser the angle, the lower the accuracy of the 3D reconstruction, or even impossible to model.
  • the method for training the model includes generating an initial voxel envelope of the target object based on images obtained by shooting the target object from multiple perspectives; randomly sampling points within the initial voxel envelope to obtain a sample A collection of points; global feature extraction is performed on the image to obtain a global feature map; based on the geometric association, the global feature corresponding to the sampling point is determined from the global feature map; the geometric information about the sampling point is encoded to generate geometric encoding information; and training a model based on at least global feature and geometrically encoded information.
  • the method for three-dimensional reconstruction includes: generating an initial voxel envelope of the target object based on images obtained by shooting the target object from multiple perspectives; randomly sampling points within the initial voxel envelope to obtain A collection of sampling points; extract global features from the image to obtain a global feature map; determine the global features corresponding to the sampling points from the global feature map based on geometric association; encode the geometric information about the sampling points to generate geometric encoding information ; Input the global features and corresponding geometric coding information into the model for 3D reconstruction, and judge the geometric relationship between the sampling point and the surface of the target object.
  • a system for three-dimensional reconstruction includes: a training unit configured to execute the method for training a three-dimensional reconstruction model according to various embodiments of the present disclosure; and an inference unit configured to execute the method according to the present disclosure Methods for 3D reconstruction of various embodiments are disclosed.
  • Yet another aspect of the disclosure relates to a computer-readable storage medium storing one or more instructions.
  • the one or more instructions may, when executed by the processor, cause the processor to perform the steps of the methods according to the embodiments of the present disclosure.
  • Still another aspect of the present disclosure relates to various apparatuses, including components or units for performing the steps of the methods according to the embodiments of the present disclosure.
  • FIG. 1 is a schematic diagram showing an example of the configuration of a system for three-dimensional reconstruction according to an embodiment of the present disclosure.
  • FIG. 2 is a flowchart illustrating an example of steps of a method for training a three-dimensional reconstruction model according to an embodiment of the present disclosure.
  • Fig. 3 is a flow chart showing an example of sub-steps of some steps of the method for training a three-dimensional reconstruction model according to an embodiment of the present disclosure.
  • FIG. 4 is a flow chart illustrating an example of steps of performing focus training on a local area according to an embodiment of the present disclosure.
  • FIG. 5 is a flow chart illustrating an example of sub-steps of some steps of performing focus training on a local area according to an embodiment of the present disclosure.
  • FIG. 6 is a flowchart illustrating an example of steps for training a depth information extractor according to an embodiment of the present disclosure.
  • FIG. 7A shows a schematic diagram of an example of generating a visual shell according to an embodiment of the present disclosure.
  • FIG. 7B shows a schematic diagram of an example of applying constraint conditions according to an embodiment of the present disclosure.
  • FIG. 7C shows a schematic diagram of another example of application constraints according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart illustrating an example of steps of a method for three-dimensional reconstruction according to an embodiment of the present disclosure.
  • Fig. 9 is a flow chart illustrating an example of sub-steps of some steps of the method for three-dimensional reconstruction according to an embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of an example of three-dimensional reconstruction of a target object according to an embodiment of the present disclosure.
  • FIG. 11 is a flowchart illustrating an example of steps of a method for voxel transparency according to an embodiment of the present disclosure.
  • Fig. 12 shows a schematic diagram of an example of performing voxel transparency according to an embodiment of the present disclosure.
  • FIG. 1 An example of the configuration of a system for three-dimensional reconstruction according to an embodiment of the present disclosure is exemplarily described below with reference to FIG. 1 .
  • the system 100 for three-dimensional reconstruction may include a training unit 112 and an inference unit 114 .
  • the training unit 112 is used for training the 3D reconstruction model.
  • the training unit 112 may be configured to perform steps of a method for training a three-dimensional reconstruction model that will be described later.
  • the reasoning unit 114 is used to perform 3D reconstruction using the 3D reconstruction model.
  • the reasoning unit 114 may be configured to perform the steps of the method for three-dimensional reconstruction that will be described later.
  • the system 100 for three-dimensional reconstruction further includes a voxel transparentization unit 116 .
  • the voxel transparentization unit 116 may be configured to perform transparent processing on some voxels within the target voxel envelope obtained from the 3D reconstruction.
  • the voxel transparency unit 116 may set some voxels corresponding to objects with certain transparency, such as glass and hair, in the target voxel envelope obtained from 3D reconstruction to have corresponding transparency.
  • the information processing module 110 is used below to collectively refer to each unit of the system 100 such as the training unit 112 , the reasoning unit 114 and the voxel transparency unit 116 for processing information to realize 3D reconstruction.
  • each of the above units may be implemented as an independent physical entity, or may also be implemented by a single entity (for example, a processor (CPU or DSP, etc.), an integrated circuit, etc.). If the respective units in the information processing module 110 are implemented as independent physical entities, they may be deployed together or separated from each other. For example, in some embodiments, one of the training unit 112 and the reasoning unit 114 may be deployed at a local end, while the other may be deployed at a remote end.
  • the system 100 for three-dimensional reconstruction may further include a camera 120 .
  • the camera 120 may be configured to photograph the target object 130 from multiple perspectives. Camera 120 may be pre-calibrated. Thus, the mapping relationship from the camera coordinate system to the world coordinate system can be obtained.
  • the camera 120 may include at least one of an ordinary camera and a depth camera such as an RGB-D camera.
  • the number of cameras may be one or more.
  • information from the camera 120 may be transmitted to various components in the information processing module 110 .
  • the information processing module 110 may be deployed near the camera 120 .
  • at least a part of the information processing module 110 may be deployed separately from the camera 120 .
  • at least a part of the information processing module 110 may be deployed on a remote server.
  • system 100 illustrated in FIG. 1 includes a camera 120
  • the system 100 itself may not include the camera 120 , but may instead use an image of the target object captured by a camera external to the system.
  • the method for training a three-dimensional reconstruction model according to an embodiment of the present disclosure is exemplarily described below with reference to FIGS. 2-6 and 7A-7C.
  • the content described above in conjunction with FIG. 1 is also applicable to the corresponding features.
  • a method 200 for training a three-dimensional reconstruction model may mainly include the following steps:
  • step 202 an initial voxel envelope of the target object is generated based on the images obtained by shooting the target object from multiple perspectives;
  • step 204 randomly sample the points in the initial voxel envelope to obtain a set of sampling points
  • step 206 global feature extraction is performed on the image to obtain a global feature map
  • step 208 based on the geometric association, the global feature corresponding to the sampling point is determined from the global feature map;
  • step 210 encoding processing is performed on the geometric information about the sampling points to generate geometric encoding information; and.
  • a model is trained based on at least the global feature and geometric encoding information.
  • the method for training the three-dimensional reconstruction model may further include calibrating the camera for capturing the target object from multiple perspectives, so as to obtain a mapping relationship from the camera coordinate system to the world coordinate system.
  • calibration information about the camera is known in advance.
  • an initial voxel envelope of the target object may be generated based on images obtained by photographing the target object from multiple perspectives (step 202 ).
  • the number M of images may be an integer greater than or equal to 1.
  • generating the initial voxel envelope of the target object may be based on Visual-hull technology.
  • generating the initial voxel envelope of the target object may include generating a visible hull of the target object.
  • FIG. 7A shows a schematic diagram of an example of generating a visual shell according to an embodiment of the present disclosure.
  • contour lines of the target object observed at various angles of view can be obtained.
  • the contour lines at each viewpoint and the corresponding camera principal points together define a 3D cone within which the target object will be located.
  • a rough voxel envelope containing the target object can be obtained, called the visible hull.
  • generating the initial voxel envelope of the target object may further include: applying constraints to the visible hull. Specifically, the initial voxel envelope of the target object is determined or refined by applying one or more constraints on the basis of the visible hull.
  • FIGS 7B-7C show schematic diagrams of examples of application constraints according to embodiments of the present disclosure.
  • the constraints may include constraints based on depth information of the target object.
  • At least one camera may be a depth camera.
  • the depth camera can obtain the depth information of the captured target object.
  • a finer initial voxel envelope can be formed by modifying the corresponding 3D cones using the depth information.
  • constraints may include inherent topographic features of the target object.
  • the inherent topographic features may include but not limited to constraints of the human body.
  • the human body constraints include but are not limited to: one or more of the number of human torso and facial features, extreme relative positions, freedom degree constraints, size, length, and the like.
  • the visual shell can be optimized in combination with technologies such as image body segmentation and skeleton extraction, thereby forming a more refined initial voxel envelope.
  • determining the initial voxel envelope of the target object by applying constraints can overcome the problem that the initial voxel envelope is not fine enough or even prone to errors simply relying on limited image information, and improves the accuracy of the initial voxel envelope. Accuracy and finesse.
  • points within the initial voxel envelope may be randomly sampled to obtain a set of sampling points (step 204 ).
  • sampling points In the existing 3D reconstruction technology, it may be necessary to judge the sampling points distributed in the whole imaging area. But the imaging region may contain a large number of sampling points that are actually far away from the reconstructed object. These sampling points do not contribute to 3D reconstruction, so the effectiveness of sampling will be reduced, thereby affecting the accuracy of reconstruction, increasing unnecessary processing overhead, and the like.
  • the inventors of the present application realized that by selecting appropriate sampling points, the efficiency of sampling can be improved.
  • the range of random sampling is limited within the generated initial voxel envelope.
  • this limitation can effectively reduce the scope of sampling, thus improving the effectiveness of sampling to optimize 3D reconstruction, avoid unnecessary processing overhead, and the like.
  • the target object to be reconstructed must lie within this initial voxel envelope. Therefore, defining the range of random sampling within the generated initial voxel envelope can also advantageously improve the accuracy of 3D reconstruction.
  • the number N of sampling points can be selected according to needs.
  • N is a positive integer.
  • the points within the initial voxel envelope are uniformly randomly sampled.
  • the points within the initial voxel envelope are non-uniformly randomly sampled.
  • enhanced (i.e., denser) random sampling is performed on regions corresponding to specific parts.
  • randomly sampling points within the initial voxel envelope may also include determining a specific range in the image corresponding to a specific part of the target object based on image recognition.
  • the specific parts include, but are not limited to, one or more of hands or human faces.
  • the specific parts are hands.
  • the image recognition method may include but not limited to any one or combination of face detection, gesture detection, and the like.
  • enhanced random sampling may be performed on points within the specific area corresponding to the specific range during random sampling.
  • a specific region corresponding to a specific range in the image may be acquired through the principle of multi-eye vision.
  • uniform random sampling may be performed within the entire initial voxel envelope, and enhanced random sampling may be performed within specific regions.
  • the union of all obtained sampling points can be set as a set of sampling points.
  • uniform random sampling may be performed on areas inside the initial voxel envelope except specific areas, and enhanced random sampling may be performed on specific areas.
  • the union of all obtained sampling points can be set as a set of sampling points.
  • global feature extraction can be performed on the image to obtain a global feature map (step 206).
  • the image may be input into a global feature extractor for global feature extraction.
  • the global feature extractor may include but not limited to any one or combination of neural network, automatic codec, SIFT, HOG, etc.
  • a global feature map for each image can be obtained. Once the global feature extraction for all images is completed, the number of obtained global feature maps can be equal to the number of images.
  • the global feature map may consist of feature elements.
  • Each feature element can be expressed in the form of a multidimensional vector.
  • the feature elements in the global feature map can respectively correspond to pixel blocks on the image.
  • the "correspondence" between a feature element and a pixel block means that the feature element can represent the feature of the corresponding pixel block.
  • performing global feature extraction on the image further includes performing preprocessing on the image such as downsampling before inputting the image to the global feature extractor to reduce the resolution of the image.
  • an image with a resolution of 512*512 may be compressed into an image with a resolution of 64*64 before the image is input to the global feature extractor.
  • the global feature corresponding to the sampling point may be determined from the global feature map based on the geometric association (step 208 ).
  • the feature elements in the global feature map can respectively correspond to pixel blocks on the image.
  • the pixel block on which the sampling point is imaged can be determined through a geometric relationship, that is, the pixel block corresponding to the sampling point.
  • a correspondence relationship from sampling points to feature elements based on geometric association can be established.
  • the number M of images may be greater than 1
  • the number of global features corresponding to each sampling point may be greater than 1
  • the total number P of global features may be greater than the number N of sampling points.
  • the total number of global features can be expressed as Among them, H i is the number of pixel blocks corresponding to the sampling points on the i-th image. Limited by the viewing angle, not every sampling point may have a corresponding pixel block on every image, that is, H i ⁇ N.
  • the geometric information about the sampling points may be encoded to generate geometric encoding information (step 210 ).
  • the geometric information about the sampling point may include at least a part of the spatial coordinates of the sampling point and the internal and external orientation information of the camera imaging the sampling point.
  • the geometric information about the sampling point may only include the spatial coordinates of the sampling point.
  • the generated geometrically encoded information may only relate to the sample points themselves.
  • a piece of geometric coding information may be associated with at least one pixel block or at least one global feature corresponding to the same sampling point.
  • the geometric information about the sampling point may include not only the spatial coordinates of the sampling point but also the internal and external orientation elements of the camera.
  • a piece of geometrically encoded information may be associated with a single pixel block or a single global feature collectively defined by the aforementioned geometric information.
  • the generated geometry encoding information may be a multi-dimensional vector.
  • the geometric encoding information may include a multidimensional vector corresponding to the spatial coordinates of the sampling points and a multidimensional vector corresponding to the internal and external orientation information of the camera.
  • the inventors of the present application realized that, because the geometric coding information contains various information such as the above, the geometric coding information can represent geometric features more accurately than the intuitive geometric information. Therefore, using geometric coding information to represent geometric features is beneficial to improve the accuracy of 3D reconstruction.
  • the model may be trained based on at least global feature and geometric encoding information (step 212).
  • the trained model can be used for three-dimensional reconstruction of the target object.
  • global features and corresponding geometric encoding information can be input into the model to determine the geometric relationship between the sampling point and the surface of the target object (substep 302 ).
  • the global features corresponding to the sampling point and corresponding geometric encoding information can be input into the model.
  • the number of global features corresponding to each sampling point may be greater than one.
  • multiple global features and corresponding geometric encoding information may be input for one sampling point. Therefore, increasing the number of images can not only form a more detailed initial voxel envelope, but also increase the training data to further improve the accuracy of the model.
  • the geometric coding information is only related to the spatial coordinates of the sampling point itself, then when the number of images is greater than 1, the same geometric coding information may be related to multiple global coordinates corresponding to the same sampling point features are associated. Therefore, in some embodiments, the relevant geometric coding information of multiple global features input for one sampling point may be the same. Therefore, the geometrically encoded information associated with the information of more sampling points can provide more accurate training data, thereby improving the accuracy of the model.
  • the trained model can output a judgment result indicating the geometric relationship between the sampling point and the surface of the target object.
  • the judgment result may be numerical.
  • the judgment result may be a numerical value indicating the probability that the sampling point is located inside/outside the surface of the target object.
  • the determination result when the determination result is 1, it may indicate that the sampling point is located within the surface of the target object. Relatively, when the determination result is 0, it may indicate that the sampling point is located outside the surface of the target object. vice versa. In other cases, the judgment result can be between 0 and 1.
  • the trained model can be represented by an implicit function f that outputs the probability that the sampling point is located inside/outside the surface of the target object according to the above-mentioned input.
  • an implicit function f that outputs the probability that the sampling point is located inside/outside the surface of the target object according to the above-mentioned input.
  • a discrimination error for each sampling point may be calculated (substep 304 ).
  • the judgment error of each sampling point can be obtained.
  • an implicit function f * may be used to describe the real target object surface.
  • the real target object surface may be the 0.5 isosurface of the function value of f * .
  • the discriminant error L of the sampling point can be calculated by calculating the difference between the function value representing the implicit function f of the model and the function value of the implicit function f * representing the real target object surface.
  • F G (X), Z (X) respectively refer to the global feature corresponding to the sampling point X and the geometric coding information about the sampling X.
  • Equation 2 describes a specific example of using the absolute value of the difference to calculate the discrimination error of the sampling point.
  • the calculation method of the discrimination error is not limited to this.
  • Equation 2 describes the case where there is one global feature and corresponding geometric encoding information for each sampling point, a similar calculation method is also applicable to the case where each sampling point has multiple global features and corresponding geometric encoding information .
  • the global discriminant error of the model may be calculated (sub-step 306).
  • the global discriminant error L G of the model can be expressed as the mean square error between the function value of the hidden function f of the model and the function value of the hidden function f * of the real target object surface.
  • F G (X i ), Z(X i ) refer to the global feature corresponding to the sampling point Xi and the geometrically encoded information about the sampling point Xi , respectively.
  • Equation 3 describes the case where there is one global feature and corresponding geometric encoding information for each sampling point, a similar calculation method is also applicable to the case where each sampling point has multiple global features and corresponding geometric encoding information .
  • the parameters of the model may be updated (substep 310 ) based on whether the global discriminant error meets the accuracy requirement (substep 308 ).
  • the process may proceed to sub-step 310 to update the parameters of the model. Processing then returns to sub-step 302 .
  • substeps 302-310 can be repeated until the global discrimination error meets the accuracy requirement. That is, training the model can be done by iteratively optimizing the model.
  • any suitable method can be used for iterative optimization of the model, including but not limited to gradient descent method, stochastic gradient descent method and the like.
  • the inventors of the present application realized that it is possible to selectively focus on the points with larger errors for training. In this way, better and faster model fitting can be achieved by assigning different weights related to the size of the error to different sampling points.
  • training the model may also include: selecting a local area according to the discrimination error of the sampling points, and performing focused training on the local area (substep 314).
  • local regions with relatively larger discrimination errors can be selected for focused training.
  • the sampling points may be sorted according to the magnitude of the discrimination error. That is, the sorting order can reflect the size relationship of the discrimination error. If the sorting is carried out in descending order, the sampling points at the top of the sorting have a relatively larger discriminant error. On the contrary, if the sorting is carried out from small to large, the last few sampling points in the sorting will have a relatively larger discriminant error.
  • At least a partial area in the area where the subset of sampling points with a relatively larger discrimination error is located can be determined as a local area.
  • the number N' of sampling points included in the subset can be preset, and N' is a positive integer smaller than N.
  • the area where the subset of sampling points with relatively larger discrimination errors are located may be an area defined according to the distribution of these sampling points. In other embodiments, these areas may be pre-divided areas.
  • a local area with relatively more sampling points to be optimized may be selected for focused training.
  • the sampling point to be optimized refers to that the discrimination error of the sampling point has not yet met the predetermined requirement.
  • the method of selecting a local area according to the discrimination error of the sampling point is not limited to the example method described above.
  • local feature extraction is performed on a local sub-image corresponding to a local area in the image to obtain a local feature map (step 402 ).
  • local sub-images can be input into a local feature extractor for local feature extraction.
  • the local feature extractor may include but not limited to any one or combination of neural network, automatic codec, SIFT, HOG, etc.
  • a local feature map for each local sub-image can be obtained.
  • the number of obtained local feature maps may be equal to the number of local sub-images.
  • the feature elements constituting the local feature map can also be expressed in the form of multi-dimensional vectors.
  • the feature elements in the local feature map may respectively correspond to pixel blocks on the local sub-image.
  • the local sub-images input into the local feature extractor for local feature extraction can have higher resolution than the images input into the global feature extractor for global feature extraction.
  • local sub-images that have not undergone preprocessing such as downsampling may be directly input to the local feature extractor.
  • local features corresponding to sampling points in the local area may be determined from the local feature map based on geometric association (step 404 ).
  • the related description of determining the global feature corresponding to the sampling point from the global feature map is also basically applicable to determining the local feature corresponding to the sampling point in the local area from the local feature map, and here Repeat description.
  • the training of the model is focused using local features and corresponding geometrically encoded information (step 406).
  • the key training model may mainly include the following sub-steps 502 - 508 .
  • sub-step 502 local features and corresponding geometric coding information may be input into the model to determine the geometric relationship between the sampling points in the local area and the surface of the target object.
  • a local discriminant error of the model may be calculated.
  • the local discriminant error LL of the model can be expressed as the mean square error between the function value of the implicit function f of the model and the function value of the hidden function f * of the real target object surface.
  • FL (X i ) and Z(X i ) respectively refer to the local features corresponding to the sampling point Xi and the geometrically encoded information about the sampling point Xi .
  • the process may proceed to sub-step 508 to update the parameters of the model. Processing may then return to sub-step 502 .
  • substeps 502-508 can be repeated until the local discrimination error meets the precision requirement. That is, focused training of the model is done by iteratively optimizing the model for local regions.
  • any suitable method can be used for iterative optimization of the model, including but not limited to gradient descent method, stochastic gradient descent method and the like.
  • steps 502-508 may be similar to the processing of steps 302, 306-310, except that the input signal changes from global features and corresponding geometric encoding information to finer local features and corresponding geometric encoding information, Partially repeated descriptions are omitted here.
  • double-loop improve the speed and quality of model fitting by iteratively optimizing the model.
  • training the model based at least on the global features and geometrically encoded information may also include training a depth information extractor for extracting depth information from the global features (sub-step 312).
  • Depth information can intuitively represent the distance between the target object and the camera, which is very important for 3D reconstruction.
  • the inventors of the present application realized that it is possible to train a depth information extractor for extracting depth information from image features such as global features. In this way, the present application can not only use the image features such as texture itself for 3D reconstruction, but also use the depth information extracted from the image features to improve the ability to perceive the depth of the scene.
  • the actual depth map D may be obtained by photographing the target object using, for example, one or more depth cameras. In some embodiments, the actual depth map D may include actual depth information of each point of the photographed object.
  • step 602 the global feature is input into the depth information extractor f D to obtain a fitting depth map D′.
  • the fitted depth map D′ may include fitted depth information extracted by the input depth information extractor according to the input global features.
  • the fitting depth map D' may include fitting depth information of each sampling point.
  • step 604 the actual depth map D is compared with the fitted depth map D′ to obtain a depth error L D .
  • the depth error LD may refer to the absolute value or the square of the difference between the fitted depth information and the actual depth information for each sampling point.
  • the form of the depth error L D is not particularly limited, as long as the difference between the fitted depth map D′ and the actual depth map D can be expressed.
  • step 606 it is judged whether the depth error meets the precision requirement.
  • step 608 the parameters of the depth information extractor f D are updated.
  • steps 602-608 are repeated until the depth error meets the accuracy requirement (“Yes”).
  • the method for three-dimensional reconstruction may mainly include the following steps:
  • step 802 an initial voxel envelope of the target object is generated based on images obtained by photographing the target object from multiple perspectives;
  • step 804 randomly sample points within the initial voxel envelope to obtain a set of sample points
  • step 806 global feature extraction is performed on the image to obtain a global feature map
  • step 808 based on the geometric association, the global feature corresponding to the sampling point is determined from the global feature map
  • step 810 the geometric information about the sampling point is encoded to generate geometric encoding information; and.
  • step 812 the global features and corresponding geometric coding information are input into the model for 3D reconstruction, and the geometric relationship between the sampling point and the surface of the target object is judged.
  • the method for three-dimensional reconstruction may further include calibrating the camera for capturing the target object from multiple perspectives, so as to obtain a mapping relationship from the camera coordinate system to the world coordinate system.
  • calibration information about the camera is known in advance.
  • An initial voxel envelope of the target object as illustrated in FIG. 10 may be generated based on the image illustrated in FIG. 10 obtained by shooting the target object from multiple perspectives (step 802 in FIG. 8 ).
  • generating the initial voxel envelope of the target object may be based on visual hull techniques.
  • generating the initial voxel envelope of the target object may include generating a visible hull of the target object.
  • generating the initial voxel envelope of the target object may further include: applying constraints to the visible hull. Specifically, the initial voxel envelope of the target object is determined or refined by applying one or more constraints on the basis of the visible hull.
  • the constraints may include constraints based on depth information of the target object.
  • the constraints may include inherent topographic features of the target object.
  • the inherent topographic features may include but not limited to constraints of the human body.
  • the human body constraints include but are not limited to: one or more of the number of human torso and facial features, extreme relative positions, freedom degree constraints, size, length, and the like.
  • determining the initial voxel envelope of the target object by applying constraints can overcome the problem that the initial voxel envelope is not fine enough or even prone to errors simply relying on limited image information, and improves the accuracy of the initial voxel envelope. Accuracy and finesse.
  • points within the initial voxel envelope may be randomly sampled to obtain a set of sampling points as illustrated in FIG. 10 (step 804 in FIG. 8 ).
  • this limitation can effectively reduce the scope of sampling, thus improving the effectiveness of sampling to optimize 3D reconstruction, avoid unnecessary processing overhead, and the like.
  • defining the range of random sampling within the generated initial voxel envelope can also advantageously improve the accuracy of 3D reconstruction.
  • the points within the initial voxel envelope are uniformly randomly sampled.
  • the points within the initial voxel envelope are non-uniformly randomly sampled.
  • enhanced (i.e., denser) random sampling is performed on regions corresponding to specific parts.
  • randomly sampling points within the initial voxel envelope may also include determining a specific range in the image corresponding to a specific part of the target object based on image recognition.
  • the specific parts include, but are not limited to, one or more of hands or human faces.
  • the specific parts are hands.
  • the image recognition method may include but not limited to any one or combination of face detection, gesture detection, and the like.
  • enhanced random sampling may be performed on points within the specific area corresponding to the specific range during random sampling.
  • a specific region corresponding to a specific range in the image may be acquired through the principle of multi-eye vision.
  • Global feature extraction can be performed on the image to obtain a global feature map (step 806 in FIG. 8 ).
  • the image may be input into a global feature extractor for global feature extraction.
  • the global feature extractor may include but not limited to any one or combination of neural network, automatic codec, SIFT, HOG, etc.
  • a global feature map for each image can be obtained. Once the global feature extraction for all images is completed, the number of obtained global feature maps can be equal to the number of images.
  • the global feature map may consist of feature elements.
  • Each feature element can be expressed in the form of a multidimensional vector.
  • the feature elements in the global feature map can respectively correspond to pixel blocks on the image.
  • the "correspondence" between a feature element and a pixel block means that the feature element can represent the feature of the corresponding pixel block.
  • performing global feature extraction on the image further includes performing preprocessing on the image such as downsampling before inputting the image to the global feature extractor to reduce the resolution of the image.
  • an image with a resolution of 512*512 may be compressed into an image with a resolution of 64*64 before the image is input to the global feature extractor.
  • the global feature corresponding to the sampling point may be determined from the global feature map based on the geometric association (step 808 in FIG. 8 ).
  • the feature elements in the global feature map can respectively correspond to pixel blocks on the image.
  • the pixel block on which the sampling point is imaged can be determined through a geometric relationship, that is, the pixel block corresponding to the sampling point.
  • a correspondence relationship from sampling points to feature elements based on geometric association can be established.
  • Geometric information about the sampling points may be encoded to generate geometric encoding information (step 810 in FIG. 8 ).
  • the geometric information about the sampling point may include at least a part of the spatial coordinates of the sampling point and the internal and external orientation information of the camera imaging the sampling point.
  • the geometric information about the sampling point may only include the spatial coordinates of the sampling point.
  • the geometric information about the sampling point may include not only the spatial coordinates of the sampling point but also the internal and external orientation elements of the camera.
  • the inventors of the present application have recognized that geometrically encoded information can more accurately represent geometric features relative to intuitive geometric information. Therefore, using geometric coding information to represent geometric features is beneficial to improve the accuracy of 3D reconstruction.
  • steps 802 to 810 for 3D reconstruction may be similar to the processing of steps 202 to 210 for training a 3D reconstruction model in terms of the flow of steps.
  • steps 202-210 are also basically applicable to steps 802-810, so part of the descriptions about steps 802-810 are omitted herein.
  • step 802-step 810 and the processing of step 802-step 810 may be different in the specific implementation manner of each step.
  • the global features and corresponding geometric coding information can be input into the model for 3D reconstruction, and the geometric relationship between the sampling points and the surface of the target object can be judged (step 812 in FIG. 8 ).
  • the global features corresponding to the sampling point and corresponding geometric encoding information can be input into the model.
  • the model for 3D reconstruction can be obtained by training using the method for training a 3D reconstruction model according to an embodiment of the present disclosure.
  • the model can judge the geometric relationship between the sampling point and the surface of the target object, and output the judgment result.
  • the judgment result may be numerical.
  • the judgment result may be a numerical value indicating the probability that the sampling point is located inside/outside the surface of the target object.
  • the determination result when the determination result is 1, it may indicate that the sampling point is located within the surface of the target object. Relatively, when the determination result is 0, it may indicate that the sampling point is located outside the surface of the target object. vice versa. In other cases, the judgment result can be between 0 and 1.
  • the model can be represented by an implicit function f that outputs the probability that the sampling point is located inside/outside the surface of the target object according to the above-mentioned input.
  • an implicit function f that outputs the probability that the sampling point is located inside/outside the surface of the target object according to the above-mentioned input.
  • the inventors of the present application realized that the region whose geometric relationship with the surface of the target object cannot be clearly determined can be selectively enlarged and re-determined, thereby improving the accuracy of the three-dimensional reconstruction.
  • the method for 3D reconstruction may further include: selecting a local blurred area according to the confidence of the judgment result, and performing fine 3D reconstruction on the local blurred area (step 814 in FIG. 8 ).
  • local blurred regions with relatively lower confidence can be selected for fine 3D reconstruction.
  • Confidence can indicate the certainty of a judgment. For example, when the judgment result is a numerical value indicating the probability that the sampling point is located inside/outside the surface of the target object, if the judgment result is 1 or 0, it can be determined that the sampling point is located inside or outside the surface of the target object with a high degree of confidence . In contrast, when the judgment result is 0.5, it cannot be determined whether the sampling point is located inside or outside the surface of the target object, and the confidence level is low.
  • the sampling points may be sorted according to the degree of confidence. That is, the order of sorting can reflect the magnitude relationship of confidence. If the sorting is performed in descending order, the lowest sorted sampling points have relatively lower confidence. Conversely, if the sorting is performed from small to large, the highest-ranked sampling points have relatively lower confidence.
  • At least a partial area in the area where the subset of sampling points with lower confidence is located may be determined as a local blurred area.
  • the number of sampling points included in the subset can be preset.
  • the area where the subset of sampling points with relatively lower confidence is located may be an area defined according to the distribution of these sampling points. In other embodiments, these areas may be pre-divided areas.
  • a local area with relatively more blurred sampling points can be selected for key training.
  • the fuzzy sampling point means that the confidence level of the sampling point has not yet met the predetermined requirement.
  • local feature extraction may be performed on a local sub-image in the image corresponding to a local blurred area as illustrated in FIG. 10 to obtain a local feature map (step 902 in FIG. 9 ).
  • local sub-images may be input into a local feature extractor for local feature extraction.
  • the local feature extractor may include but not limited to any one or combination of neural network, automatic codec, SIFT, HOG, etc.
  • a local feature map for each local sub-image can be obtained.
  • the number of obtained local feature maps may be equal to the number of local sub-images.
  • the feature elements constituting the local feature map can also be expressed in the form of multi-dimensional vectors.
  • the feature elements in the local feature map may respectively correspond to pixel blocks on the local sub-image.
  • the local sub-images input into the local feature extractor for local feature extraction can have higher resolution than the images input into the global feature extractor for global feature extraction.
  • local sub-images that have not undergone preprocessing such as downsampling may be directly input to the local feature extractor.
  • the local features corresponding to the sampling points in the local blurred area may be determined from the local feature map based on the geometric association (step 904 in FIG. 9 ).
  • the related description of determining the global feature corresponding to the sampling point from the global feature map is also basically applicable to determining the local feature corresponding to the sampling point in the local fuzzy area from the local feature map, here Do not repeat the description.
  • local features and corresponding geometric encoding information can be input into the model for 3D reconstruction, and the geometric relationship between the sampling points in the local blurred area and the surface of the target object can be re-judged ( Figure 9 in step 906).
  • the local features corresponding to the sampling point and the corresponding geometric coding information can be input into the model.
  • the model can re-judge the geometric relationship between the sampling point and the surface of the target object, and output an updated judgment result to correct the relationship between the sampling point in the local blur area and the surface of the target object. geometric relationship.
  • the local sub-images used for local feature extraction can have a higher resolution than the image used for global feature extraction, so that local features can represent the corresponding samples more accurately and meticulously than global features. point characteristics. Therefore, the 3D reconstruction for local blurred areas is finer.
  • the method for three-dimensional reconstruction may further include performing three-dimensional reconstruction on the target object based on the geometric relationship between the sampling points and the surface of the target object.
  • the three-dimensional reconstructed target voxel envelope can be obtained.
  • the surface of the target object can be determined.
  • the method for 3D reconstruction may further include performing transparency processing on some voxels within the target voxel envelope obtained from 3D reconstruction.
  • part of the voxels in the target voxel envelope obtained by 3D reconstruction should be transparentized, so that objects with certain transparency such as glass (for example, cups, glasses), hair etc.
  • objects with certain transparency such as glass (for example, cups, glasses), hair etc.
  • Part of the voxels corresponding to the object show the transparency consistent with the actual situation, which will help to make the target voxel envelope obtained by 3D reconstruction appear more natural.
  • the method 1100 for transparentizing some voxels within the target voxel envelope obtained from 3D reconstruction mainly includes steps 1102 - 1106 described in detail below.
  • the transparency of transparent pixels in the image may be obtained (step 1102 in FIG. 11 ).
  • processing such as image matting can be applied to the captured image I O to obtain a processed image II with transparent pixels, and obtain The transparency of transparent pixels.
  • voxels corresponding to transparent pixels may be solved for (step 1104 in FIG. 11 ).
  • the envelope in the world coordinate system corresponding to the transparent pixel area in the image can be obtained, and the Envelope and the communicative voxel of the target voxel envelope V O obtained by 3D reconstruction, that is, the voxel corresponding to the transparent pixel.
  • the transparency of the voxels corresponding to the transparent pixels may be set based on the transparency of the transparent pixels (step 1106 in FIG. 11 ).
  • the transparency of a voxel corresponding to a transparent pixel can be set equal to the transparency of the corresponding transparent pixel, thereby obtaining a voxel-transparent target voxel envelope with transparent voxels V I .
  • performing voxel transparency processing can perform more accurate visual expression on objects with certain transparency, such as glass and hair.
  • the method for training a three-dimensional reconstruction model and the method for three-dimensional reconstruction according to the embodiments of the present disclosure can improve sampling efficiency and data accuracy, and re-judgment by zooming in on a local area with poor judgment effect. This results in more accurate 3D reconstruction at a lower cost.
  • the present disclosure can realize high-precision three-dimensional reconstruction of the target object only by using sparse cameras (imaging at sparse angles). The cost of three-dimensional modeling can be reduced and/or the accuracy of three-dimensional modeling can be improved.
  • Embodiments of the present disclosure also provide a computer-readable storage medium storing one or more instructions. When these instructions are executed by a processor, the processor may execute the method for training a three-dimensional reconstruction model or the method for three-dimensional reconstruction in the above embodiments. step.
  • the instructions in the computer-readable storage medium according to the embodiments of the present disclosure may be configured to perform operations corresponding to the above-mentioned system and method embodiments.
  • Embodiments of computer-readable storage media will be apparent to those skilled in the art when referring to the above-described system and method embodiments, and thus will not be described again.
  • Computer-readable storage media for carrying or including the above-described instructions also fall within the scope of the present disclosure.
  • Such computer-readable storage media may include, but are not limited to, floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, and the like.
  • Embodiments of the present disclosure also provide various devices including components or units for performing the steps of the method for training a 3D reconstruction model or the steps of the 3D reconstruction method in the above embodiments.
  • each of the above components or units may be implemented as an independent physical entity, or may also be implemented by a single entity (for example, a processor (CPU or DSP, etc.), an integrated circuit, etc.).
  • a plurality of functions included in one unit in the above embodiments may be realized by separate devices.
  • a plurality of functions implemented by a plurality of units in the above embodiments may be respectively implemented by separate devices.
  • one of the above functions may be realized by a plurality of units.
  • a method for training a three-dimensional reconstruction model comprising:
  • the model is trained based on at least global features and geometrically encoded information.
  • training the model comprises:
  • the parameters of the model are updated.
  • the local area is selected, and the local area is focused on training.
  • selecting a local area comprises:
  • Determining at least a partial area in the area where the subset of sampling points with a relatively larger discrimination error is located is a local area.
  • carrying out key training to local area comprises:
  • training the depth information extractor comprises:
  • the parameters of the depth information extractor are updated.
  • randomly sampling points within the initial voxel envelope comprises:
  • a computer-readable storage medium having stored thereon one or more instructions which, when executed by a processor, cause the processor to perform the steps of the method according to any one of items 1-9.
  • An apparatus for training a model for three-dimensional reconstruction comprising means for performing the steps of the method according to any one of items 1-9.
  • a method for three-dimensional reconstruction comprising:
  • the global features and corresponding geometric coding information are input into the model for 3D reconstruction, and the geometric relationship between the sampling point and the surface of the target object is judged.
  • the local blurred area is selected, and the local blurred area is finely reconstructed in 3D.
  • Transparency processing is performed on some voxels within the target voxel envelope obtained from 3D reconstruction.
  • the transparency of the voxel corresponding to the transparent pixel is set.
  • a computer readable storage medium having stored thereon one or more instructions which, when executed by a processor, cause the processor to perform the steps of the method according to any one of items 12-16.
  • An apparatus for three-dimensional reconstruction comprising means for performing the steps of the method according to any one of items 12-16.
  • a system for three-dimensional reconstruction comprising:
  • a training unit configured to perform the steps of the method according to any one of items 1-9;
  • An inference unit configured to perform the steps of the method according to any one of items 12-14.
  • the voxel transparency unit is configured to perform transparency processing on some voxels within the envelope of the target voxels obtained from the three-dimensional reconstruction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本公开内容涉及用于三维重建的方法、系统和存储介质。描述了关于三维重建的各种实施例。在一个实施例中,用于训练三维重建模型的方法包括:基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;对初始体素包络内的点进行随机采样,获得采样点的集合;对图像进行全局特征提取,获得全局特征图;基于几何关联,从全局特征图中确定与采样点对应的全局特征;对关于采样点的几何信息进行编码处理,生成几何编码信息;以及至少基于全局特征和几何编码信息来训练模型。

Description

用于三维重建的方法、系统和存储介质
相关申请的交叉引用
本申请是以申请号为202111296646.5,申请日为2021年11月04日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开一般地涉及三维重建技术,并且具体地涉及基于深度神经网络的三维重建技术。
背景技术
高精度的三维重建能够在诸如工业自动化、医疗辅助应用、虚拟现实应用、视觉导航等一些平面视觉难以解决甚至无法解决的场合起到重要的作用。
传统的高精度的三维重建技术需要获取目标对象在多视角下的图像信息或深度信息,通常情况下,三维重建的精度与角度的稠密程度直接相关。角度越稀疏,三维重建的精度越低,甚至无法建模。
发明内容
本公开的一个方面涉及用于训练三维重建模型的方法。根据本公开的实施例,用于训练模型的方法包括基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;对初始体素包络内的点进行随机采样,获得采样点的集合;对图像进行全局特征提取,获得全局特征图;基于几何关联,从全局特征图中确定与采样点对应的全局特征;对关于采样点的几何信息进行编码处理,生成几何编码信息;以及至少基于全局特征和几何编码信息来训练模型。
本公开的一个方面涉及用于三维重建的方法。根据本公开的实施例,用于三维重建 的方法包括:基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;对初始体素包络内的点进行随机采样,获得采样点的集合;对图像进行全局特征提取,获得全局特征图;基于几何关联,从全局特征图中确定与采样点对应的全局特征;对关于采样点的几何信息进行编码处理,生成几何编码信息;将全局特征和相应的几何编码信息输入用于三维重建的模型,判断采样点与目标对象表面的几何关系。
本公开的一个方面涉及用于三维重建的系统。根据本公开的实施例,用于三维重建的系统包括:训练单元,被配置为执行根据本公开的各种实施例的用于训练三维重建模型的方法;以及推理单元,被配置为执行根据本公开的各种实施例的用于三维重建的方法。
本公开的再一个方面涉及存储有一个或多个指令的计算机可读存储介质。在一些实施例中,该一个或多个指令可以在由处理器执行时,使处理器执行根据本公开实施例的各方法的步骤。
本公开的再一个方面涉及各种装置,包括用于执行根据本公开实施例的各方法的步骤的部件或单元。
提供上述概述是为了总结一些示例性的实施例,以提供对本文所描述的主题的各方面的基本理解。因此,上述特征仅仅是例子并且不应该被解释为以任何方式缩小本文所描述的主题的范围或精神。本文所描述的主题的其他特征、方面和优点将从以下结合附图描述的具体实施方式而变得明晰。
附图说明
当结合附图考虑实施例的以下具体描述时,可以获得对本公开内容更好的理解。在各附图中使用了相同或相似的附图标记来表示相同或者相似的部件。各附图连同下面的具体描述一起包含在本说明书中并形成说明书的一部分,用来例示说明本公开的实施例和解释本公开的原理和优点。其中:
图1是示出根据本公开实施例的用于三维重建的系统的配置的示例的示意图。
图2是示出根据本公开实施例的用于训练三维重建模型的方法的步骤的示例的流程图。
图3是示出根据本公开实施例的用于训练三维重建模型的方法的部分步骤的子步骤的示例的流程图。
图4是示出根据本公开实施例的对局部区域进行重点训练的步骤的示例的流程图。
图5是示出根据本公开实施例的对局部区域进行重点训练的部分步骤的子步骤的示例的流程图。
图6是示出根据本公开实施例的用于训练深度信息提取器的步骤的示例的流程图。
图7A示出了根据本公开实施例的生成可视外壳的示例的示意图。
图7B示出了根据本公开实施例的应用约束条件的示例的示意图。
图7C示出了根据本公开实施例的应用约束条件的又一示例的示意图。
图8是示出根据本公开实施例的用于三维重建的方法的步骤的示例的流程图。
图9是示出根据本公开实施例的用于三维重建的方法的部分步骤的子步骤的示例的流程图。
图10示出了根据本公开实施例的三维重建目标对象的示例的示意图。
图11是示出根据本公开实施例的用于进行体素透明化的方法的步骤的示例的流程图。
图12示出了根据本公开实施例的进行体素透明化的示例的示意图。
虽然在本公开内容中所描述的实施例可能易于有各种修改和另选形式,但是其具体实施例在附图中作为例子示出并且在本文中被详细描述。但是,应当理解,附图以及对其的详细描述不是要将实施例限定到所公开的特定形式,而是相反,目的是要涵盖属于权利要求的精神和范围内的所有修改、等同和另选方案。
具体实施方式
以下描述根据本公开的设备和方法等各方面的代表性应用。这些例子的描述仅是为了增加上下文并帮助理解所描述的实施例。因此,对本领域技术人员而言明晰的是,以下所描述的实施例可以在没有具体细节当中的一些或全部的情况下被实施。在其他情况下,众所周知的过程步骤没有详细描述,以避免不必要地模糊所描述的实施例。其他应 用也是可能的,本公开的方案并不限制于这些示例。
下面结合图1示例性地描述根据本公开实施例的用于三维重建的系统的配置的示例。
根据本公开的实施例,用于三维重建的系统100可以包括训练单元112和推理单元114。
训练单元112用于训练三维重建模型。特别地,训练单元112可以被配置为执行稍后将描述的用于训练三维重建模型的方法的步骤。
推理单元114用于利用三维重建模型进行三维重建。特别地,推理单元114可以被配置为执行稍后将描述的用于三维重建的方法的步骤。
在一些实施例中,用于三维重建的系统100还包括体素透明化单元116。
在一些实施例中,体素透明化单元116可以被配置为对三维重建得到的目标体素包络内的部分体素进行透明化处理。
具体地,体素透明化单元116可以将三维重建得到的目标体素包络中与诸如玻璃、头发等具有一定透明度的物体对应的部分体素设置为具有相应的透明度。
为便于描述,下面用信息处理模块110统一指代系统100的诸如训练单元112、推理单元114以及体素透明化单元116之类的用于对信息进行处理以实现三维重建的各个单元。
应注意,上述各个单元仅是根据其实现的具体功能划分的逻辑模块,不用于限制具体的实现方式,例如能够以软件、硬件或者软硬件结合的方式来实现。在实际实现的过程中,上述各个单元可被实现为独立的物理实体,或者也可由单个实体(例如,处理器(CPU或DSP等)、集成电路等)来实现。如果信息处理模块110中的各个单元被实现为独立的物理实体,它们可以部署成在一起或者彼此分开。例如,在一些实施例中,训练单元112与推理单元114中的一个可以部署在本地端,而另一个可以部署在远程端。
在一些实施例中,用于三维重建的系统100还可以包括相机120。相机120可以被配置为以多视角拍摄目标对象130。相机120可以被预先标定。由此,可以获取从相机坐标系到世界坐标系的映射关系。
在一些实施例中,相机120中可以包括普通的相机和诸如RGB-D相机之类的深度 相机中的至少一种。
在一些实施例中,相机的数量可以为一个或多个。
如图1所示,来自相机120的信息可以被传送给信息处理模块110中的各个部件。
在一些实施例中,信息处理模块110可以部署在相机120附近。可替代地,在一些实施例中,信息处理模块110中的至少一部分可以与相机120分开部署。例如,在一些实施例中,信息处理模块110中的至少一部分可以部署在远程的服务器处。本领域的技术人员应当理解,对于信息处理模块110与相机120的位置关系没有特别的限制,而是可以根据实际应用进行选择,只要信息处理模块110能够从相机120获取待处理的信息即可。
虽然图1中例示的系统100包括相机120,本领域的技术人员应当理解,系统100本身可以不包括相机120,而是可以代替地使用由系统外部的相机拍摄的目标对象的图像。
下面参考图2-图6以及图7A-7C来示例性地描述根据本公开实施例的用于训练三维重建模型的方法。上面结合图1所描述的内容也可以适用于对应的特征。
如图2所示,根据本公开的实施例,用于训练三维重建模型的方法200可以主要包括以下步骤:
在步骤202,基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
在步骤204,对初始体素包络内的点进行随机采样,获得采样点的集合;
在步骤206,对图像进行全局特征提取,获得全局特征图;
在步骤208,基于几何关联,从全局特征图中确定与采样点对应的全局特征;
在步骤210,对关于采样点的几何信息进行编码处理,生成几何编码信息;以及。
在步骤212,至少基于全局特征和几何编码信息来训练模型。
在一些实施例中,用于训练三维重建模型的方法还可以包括对多视角拍摄目标对象的相机进行标定,以获取从相机坐标系到世界坐标系的映射关系。
可替代地,在一些实施例中,关于相机的标定信息是预先已知的。
如图2所例示的,可以基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络(步骤202)。
在一些实施例中,图像的数量M可以为大于或等于1的整数。
在一些实施例中,生成目标对象的初始体素包络可以基于可视外壳(Visual-hull)技术。
具体地,生成目标对象的初始体素包络可以包括生成目标对象的可视外壳。
图7A示出了根据本公开实施例的生成可视外壳的示例的示意图。如图7A所示,当使用相机以多视角拍摄目标对象时,可以获得在各个视角观察到的目标对象的轮廓线。每个视角上的轮廓线和对应的相机主点可以共同确定一个三维的锥体,目标对象将位于该锥体内。通过对获得的所有的锥体取交集,可以获得包含目标对象在内的粗糙体素包络,称为可视外壳。本领域技术人员容易理解,拍摄的视角越多,所确定的初始体素包络就越精细。
在一些实施例中,生成目标对象的初始体素包络还可以包括:对可视外壳应用约束条件。具体地,在可视外壳的基础上,通过应用一个或多个约束条件,确定或准确化目标对象的初始体素包络。
图7B-图7C示出了根据本公开实施例的应用约束条件的示例的示意图。
如图7B所示,在一些实施例中,约束条件可以包括基于目标对象的深度信息的约束条件。
例如,在一些实施例中,至少一个相机可以为深度相机。深度相机可以获取所拍摄目标对象的深度信息。由此,通过使用深度信息修正相应的三维锥体,可以形成更精细的初始体素包络。
在一些实施例中,约束条件可以包括目标对象的固有形貌特征。
例如,当目标对象为人体时,固有形貌特征可以包括但不限于人体约束条件。
具体地,在一些实施例中,人体约束条件包括但不限于:人体躯干及五官的数量、极限相对位置、自由度约束、大小、长度等中的一个或多个。
利用人体约束条件,可以结合图像人体分割、骨架提取等技术对可视外壳进行优化,从而形成更精细的初始体素包络。
例如,由于可视外壳技术的条件限制,可能会在相机看不到的地方产生错误体素, 比如多余的腿、胳膊等。如图7C所示,利用人体约束条件,可以消除这种错误,提高初始体素包络的准确度。
有利地,通过应用约束条件来确定目标对象的初始体素包络能够克服单纯地依靠有限的图像信息来构建初始体素包络不够精细甚至容易出现错误的问题,提高了初始体素包络的准确度和精细度。
如图2所例示的,一旦生成目标对象的初始体素包络,可以对初始体素包络内的点进行随机采样,获得采样点的集合(步骤204)。
在现有的三维重建技术中,可能需要对分布在整个成像区域内的采样点进行判断。但成像区域可能包含大量实际上远离重建对象的采样点。这些采样点对三维重建没有贡献,因此会降低采样的有效性从而影响重建的准确度、增加不必要的处理开销等。本申请的发明人认识到,通过选择适当的采样点,可以提高采样的效率。
在本公开的各个实施例中,随机采样的范围被限定在所生成的初始体素包络内部。
有利地,该限制能够有效地缩小采样的范围,因此提高采样的有效性从而优化三维重建、避免不必要的处理开销等。
此外,如上所述,在使用诸如可视外壳之类的技术来构建目标对象的初始体素包络的情况下,待重建的目标对象必然位于该初始体素包络内。由此,将随机采样的范围定义在所生成的初始体素包络内部还能够有利地提高三维重建的准确度。
在一些实施例中,采样点的数量N可以根据需要进行选择。N为正整数。
在一些实施例中,对初始体素包络内的点进行均匀的随机采样。
可替代地,在另一些实施例中,对初始体素包络内的点进行非均匀的随机采样。例如,为了对人脸、手等特定部位进行更精细的三维建模,对与特定部位对应的区域进行增强(即,更密集)的随机采样。
由此,在一些实施例中,对初始体素包络内的点进行随机采样还可以包括基于图像识别来确定图像中与目标对象的特定部位对应的特定范围。
在一些实施例中,特定部位包括但不限于手或人脸等中的一个或多个。例如,在某个实施例中,特定部位为双手。
在一些实施例中,图像识别的方法可以包括但不限于人脸检测、手势检测等中的任意一个或组合。
在一些实施例中,一旦识别出与特定部位对应的特定范围,可以在进行随机采样的过程中对与特定范围对应的特定区域内的点进行增强的随机采样。
例如,在一些实施例中,可以通过多目视觉原理获取与图像中特定范围对应的特定区域。
在一些实施例中,可以在整个初始体素包络内部进行均匀的随机采样,并在特定区域内进行增强的随机采样。由此,所获得的所有采样点的并集可以被设置为采样点的集合。
可替代地,在一些实施例中,可以对初始体素包络内部除特定区域以外的区域进行均匀的随机采样,并对特定区域进行增强的随机采样。由此,所获得的所有采样点的并集可以被设置为采样点的集合。
如图2所例示的,可以对图像进行全局特征提取,获得全局特征图(步骤206)。
具体地,在一些实施例中,可以将图像输入到全局特征提取器中进行全局特征提取。
在一些实施例中,全局特征提取器可以包括但不限于神经网络、自动编解码、SIFT、HOG等中的任意一个或组合。
作为全局特征提取器的输出,可以获得针对各个图像的全局特征图。一旦完成对所有图像的全局特征提取,所获得的全局特征图的数量可以等于图像的数量。
在一些实施例中,全局特征图可以由特征元素组成。各个特征元素可以采取多维向量的形式表示。全局特征图中的特征元素可以分别与图像上的像素块对应。这里,特征元素与像素块的“对应”指该特征元素可以表示对应像素块的特征。本领域技术人员容易理解,图像的分辨率越高或像素块越小,提取得到的全局特征图就越能准确地表示图像,但相应的工作量就越大。
在一些实施例中,为了避免巨大的计算开销,对图像进行全局特征提取还包括在将图像输入到全局特征提取器之前对图像进行诸如降采样之类的预处理,以降低图像的分辨率。
例如,在一些实施例中,在将图像输入到全局特征提取器之前,可以将分辨率为512*512的图像压缩为分辨率为64*64的图像。
如图2所例示的,可以基于几何关联,从全局特征图中确定与采样点对应的全局特征(步骤208)。
如上所述,全局特征图中的特征元素可以分别与图像上的像素块对应。此外,可以通过几何关系确定在上面对采样点成像的像素块,即,与采样点对应的像素块。由此,可以建立基于几何关联的从采样点到特征元素的对应关系。
应注意,由于图像的数量M可以大于1,因此,在一些实施例中,与每个采样点对应的全局特征的数量可以大于1,使得全局特征的总数量P可以大于采样点的数量N。例如,全局特征的总数量可以被表示为
Figure PCTCN2022129484-appb-000001
其中,H i为第i张图像上与采样点对应的像素块的数量。受视角限制,不一定每个采样点都能在每张图像上具有对应的像素块,即,H i≤N。
如图2所例示的,可以对关于采样点的几何信息进行编码处理,生成几何编码信息(步骤210)。
在一些实施例中,关于采样点的几何信息可以包括采样点的空间坐标以及对采样点成像的相机的内外方位信息中的至少一部分。
例如,在一些实施例中,关于采样点的几何信息可以仅包括采样点的空间坐标。在这些实施例中,所生成的几何编码信息可以仅与采样点本身相关。在图像的数量大于1的情况下,一条几何编码信息可以与对应于同一采样点的至少一个像素块或至少一个全局特征相关联。
例如,在另一些实施例中,关于采样点的几何信息可以不仅包括采样点的空间坐标还包括相机的内外方位元素。在这些实施例中,一条几何编码信息可以与由上述几何信息共同限定的单个像素块或单个全局特征相关联。
在一些实施例中,所生成的几何编码信息可以为多维向量。例如,作为示例,几何编码信息可以包括分别与采样点的空间坐标对应的多维向量和与相机的内外方位信息对应的多维向量。
本申请的发明人认识到,由于几何编码信息包含了例如上述的多方面信息,因此几 何编码信息能够相对于直观的几何信息更准确地表示几何特征。由此,利用几何编码信息来表示几何特征有利于提高三维重建的准确度。
如图2所例示的,可以至少基于全局特征和几何编码信息来训练模型(步骤212)。
在一些实施例中,所训练的模型可以用于对目标对象进行三维重建。
下面结合图3详细描述根据本公开实施例的至少基于全局特征和几何编码信息来训练模型的子步骤的示例。
如图3所例示的,在一些实施例,可以将全局特征和相应的几何编码信息输入模型,判断采样点与目标对象表面的几何关系(子步骤302)。
具体地,可以针对每个采样点,将与采样点对应的全局特征和相应的几何编码信息输入模型。
一方面,如以上所分析的,在图像的数量大于1的情况下,与每个采样点对应的全局特征的数量可能大于1。由此,在一些实施例中,可以针对一个采样点输入多个全局特征以及相应的几何编码信息。因此,增加图像的数量不仅能够形成更加细致的初始体素包络,也能够增加训练数据,进一步提高模型的准确度。
另一方面,如以上所分析的,如果几何编码信息仅与采样点本身的空间坐标相关,那么在图像的数量大于1的情况下,同一几何编码信息可能与对应于同一采样点的多个全局特征相关联。由此,在一些实施例中,针对一个采样点输入的多个全局特征的相关几何编码信息可以是相同的。因此,与更多的采样点的信息相关的几何编码信息能够提供更准确的训练数据,从而提高模型的准确度。
本领域技术人员能够理解,可以使用能够根据上述输入来判断相应采样点与目标对象表面的几何关系的任意模型。基于针对任一采样点的全局特征以及相应的几何编码信息,所训练的模型可以输出指示该采样点与目标对象表面的几何关系的判断结果。
在一些实施例中,判断结果可以是数值的。
例如,在一些实施例中,判断结果可以是指示该采样点位于目标对象表面内/外的概率的数值。
举例来说,当判断结果为1时,可以指示该采样点位于目标对象表面内。相对地, 当判断结果为0时,可以指示该采样点位于目标对象表面外。反之亦然。在其它情况下,判断结果可以在0和1之间。
因此,在一些实施例中,所训练的模型可以使用根据上述输入而输出采样点位于目标对象表面内/外的概率的隐函数f来表示。下面以这种情况为例对步骤212的部分子步骤进行描述,但本领域技术人员能够理解,本公开不限于此。
如图3所例示的,在一些实施例中,可以计算各个采样点的判别误差(子步骤304)。
在一些实施例中,通过将模型输出的判断结果与真实的目标对象表面情况进行比较,可以获得各个采样点的判别误差。
例如,在一些实施例中,可以使用隐函数f *来描述真实的目标对象表面。
[式1]
Figure PCTCN2022129484-appb-000002
即,如果点X在目标对象表面内部,f *的函数值为1,如果在外部,f *的函数值为0。真实的目标对象表面可以为f *的函数值的0.5等值面。
由此,在一些实施例中,可以通过计算表示模型的隐函数f的函数值与表示真实的目标对象表面的隐函数f *的函数值的差值,计算采样点的判别误差L。
[式2]
L=|f(F G(X),Z(X))-f *(X)|
其中,F G(X)、Z(X)分别指与采样点X对应的全局特征和关于该采样X的几何编码信息。
虽然式2描述了使用差值的绝对值来计算采样点的判别误差的具体示例。但本领域技术人员应该理解,判别误差的计算方式不限于此。
此外,虽然式2描述了针对每个采样点具有一个全局特征和相应的几何编码信息的情况,但类似的计算方法也适用于每个采样点具有多个全局特征和相应的几何编码信息的情况。
如图3所例示的,在一些实施例中,可以计算模型的全局判别误差(子步骤306)。
例如,可以将模型的全局判别误差L G表示为表示模型的隐函数f的函数值与表示真实的目标对象表面的隐函数f *的函数值的均方误差。
[式3]
Figure PCTCN2022129484-appb-000003
其中,F G(X i)、Z(X i)分别指与采样点X i对应的全局特征和关于该采样点X i的几何编码信息。
虽然以上结合式3描述了使用均方误差来计算模型的全局判别误差的具体示例。但本领域技术人员应该理解,全局判别误差的计算方式不限于此。
此外,虽然式3描述了针对每个采样点具有一个全局特征和相应的几何编码信息的情况,但类似的计算方法也适用于每个采样点具有多个全局特征和相应的几何编码信息的情况。
如图3所例示的,在一些实施例中,可以基于全局判别误差是否满足精度要求(子步骤308),更新模型的参数(子步骤310)。
在一些实施例中,可以通过将全局判别误差与预设的阈值进行比较来确定全局判别误差是否满足精度要求。
如果全局判别误差满足精度要求(“是”),则可以结束处理。
否则,如果全局判别误差不满足精度要求(“否”),则处理可以前进到子步骤310,以更新模型的参数。随后,处理返回到子步骤302。
这样,可以重复进行子步骤302-310,直至全局判别误差满足精度要求。即,可以通过迭代地优化模型来完成对模型的训练。
本领域技术人员应该理解,可以使用任意合适的方法来进行模型的迭代优化,包括但不限于梯度下降法、随机梯度下降法等。
本申请的发明人认识到,可以有选择性的针对误差较大的点进行重点训练。从而通过对不同的采样点赋予与误差大小相关的不同权重来实现更好更快的模型拟合。
特别地,在一些实施例中,训练模型还可以包括:根据采样点的判别误差选择局部区域,对局部区域进行重点训练(子步骤314)。
在一些实施例中,可以选择具有相对更大的判别误差的局部区域来进行重点训练。
例如,在一些实施例中,可以将采样点按照判别误差的大小进行排序。即,排序的次序可以反映判别误差的大小关系。如果排序按照由大到小进行,则排序最靠前的若干采样点具有相对更大的判别误差。反之,如果排序按照由小到大进行,则排序最靠后的若干采样点具有相对更大的判别误差。
由此,可以将具有相对更大的判别误差的采样点的子集合所处的区域中的至少部分区域确定为局部区域。
在一些实施例中,子集合中包含的采样点的数量N’可以被预先设定,N’为小于N的正整数。
在一些实施例中,具有相对更大的判别误差的采样点的子集合所处的区域可以为根据这些采样点的分布而限定的区域。在另一些实施例中,这些区域可以是预先划分的区域。
可替代地,在一些实施例中,可以选择具有相对更多的待优化的采样点的局部区域来进行重点训练。其中,待优化的采样点是指针对该采样点的判别误差尚未满足预定的要求。
本领域的技术人员容易理解,根据采样点的判别误差来选择局部区域的方式不限于以上描述的示例方式。
下面参考图4来示例性地描述根据本公开实施例的对局部区域进行重点训练的方法400的步骤。
如图4所例示的,在一些实施例中,对图像中与局部区域对应的局部子图像进行局部特征提取,获得局部特征图(步骤402)。
在一些实施例中,与全局特征提取类似地,可以将局部子图像输入到局部特征提取 器中进行局部特征提取。
在一些实施例中,局部特征提取器可以包括但不限于神经网络、自动编解码、SIFT、HOG等中的任意一个或组合。
作为局部特征提取器的输出,可以获得针对各个局部子图像的局部特征图。一旦完成对所有局部子图像的局部特征提取,所获得的局部特征图的数量可以等于局部子图像的数量。
在一些实施例中,与全局特征提取类似地,组成局部特征图的特征元素也可以采取多维向量的形式表示。局部特征图中的特征元素可以分别与局部子图像上的像素块对应。
如以上所分析的,图像的分辨率越高或像素块越小,提取得到的特征图就越能准确地表示图像。因此,为了获得关于局部子图像的更多细节,输入到局部特征提取器中进行局部特征提取的局部子图像可以比输入到全局特征提取器中进行全局特征提取的图像具有更高的分辨率。例如,在一些实施例中,可以将没有经历诸如降采样之类的预处理的局部子图像直接输入到局部特征提取器。
如图4所例示的,在一些实施例中,可以基于几何关联,从局部特征图中确定与局部区域中的采样点对应的局部特征(步骤404)。
在本公开的各种实施例中,从全局特征图确定与采样点对应的全局特征的相关描述也基本适用于从局部特征图中确定与局部区域中的采样点对应的局部特征,在此不重复描述。
如图4所例示的,在一些实施例中,使用局部特征和相应的几何编码信息来重点训练模型(步骤406)。
下面结合图5示例性地描述根据本公开实施例的使用局部特征和相应的几何编码信息来重点训练模型406的子步骤的示例。
如图5所例示的,在一些实施例中,重点训练模型可以主要包括以下子步骤502-子步骤508。
在子步骤502,可以将局部特征和相应的几何编码信息和输入模型,判断局部区域中的采样点与目标对象表面的几何关系。
具体地,可以针对局部区域中的每个采样点,将与采样点对应的局部特征和相应的几何编码信息输入模型。
在子步骤504,可以计算模型的局部判别误差。
例如,在一些实施例中,可以将模型的局部判别误差L L表示为表示模型的隐函数f的函数值与表示真实的目标对象表面的隐函数f *的函数值的均方误差。
[式4]
Figure PCTCN2022129484-appb-000004
其中,F L(X i)、Z(X i)分别指与采样点X i对应的局部特征和关于该采样点X i的几何编码信息。
虽然以上结合式4描述了使用均方误差来计算模型的局部判别误差的具体示例。但本领域技术人员应该理解,局部判别误差的计算方式不限于此。同样,类似的计算方法也适用于每个采样点具有多个局部特征和相应的几何编码信息的情况。
在子步骤506,可以判别局部判别误差是否满足精度要求。
在一些实施例中,可以通过将局部判别误差与预设的阈值进行比较来确定局部判别误差是否满足精度要求。
如果局部判别误差满足精度要求(“是”),则可以结束处理。
否则,如果局部判别误差不满足精度要求(“否”),则处理可以前进到子步骤508,以更新模型的参数。随后,处理可以返回到子步骤502。
这样,可以重复进行子步骤502-508,直至局部判别误差满足精度要求。即,通过针对局部区域迭代地优化模型来完成对模型的重点训练。
本领域技术人员应该理解,可以使用任意合适的方法来进行模型的迭代优化,包括但不限于梯度下降法、随机梯度下降法等。
在各个实施例中,除了输入信号从全局特征和相应的几何编码信息变为更精细的局 部特征和相应的几何编码信息,步骤502-508的处理可以与步骤302、306-310的处理类似,在此省略部分重复的描述。
有利地,相比于仅根据全局判别误差来单循环迭代地优化模型,通过另外针对误差较高的区域使用更清晰的图像块(即,放大误差较高的区域)进行重点训练,能够双循环迭代地优化模型,从而提升模型拟合的速度和质量。
此外,在一些实施例中,至少基于全局特征和几何编码信息来训练模型还可以包括训练用于从全局特征中提取深度信息的深度信息提取器(子步骤312)。
深度信息能够直观地表示目标对象与相机的距离,对于三维重建十分重要。本申请的发明人认识到,可以训练用于从诸如全局特征之类的图像特征中提取深度信息的深度信息提取器。这样,本申请不仅可以使用诸如纹理之类的图像特征本身进行三维重建,还可以使用从图像特征中提取出的深度信息来提高对场景深度的感知能力。
下面结合图6简单介绍根据本公开实施例的用于训练深度信息提取器的方法600的步骤。
在一些实施例中,可以通过使用例如一个或多个深度相机拍摄目标对象来获得实际深度图D。在一些实施例中,该实际深度图D可以包括所拍摄对象的各个点的实际深度信息。
如图6所示,首先,在步骤602中,将全局特征输入深度信息提取器f D,得到拟合深度图D′。
在一些实施例中,拟合深度图D′可以包括输入深度信息提取器根据输入的全局特征而提取出的拟合深度信息。特别地,在一些实施例中,拟合深度图D′可以包括各个采样点的拟合深度信息。
如图6所示,在步骤604中,比较实际深度图D和拟合深度图D′,获得深度误差L D
在一些实施例中,深度误差L D可以指针对各个采样点的拟合深度信息与实际深度信息的差的绝对值或平方等。但本领域技术人员容易理解,对深度误差L D的形式没有特别地限制,只要能够表示出拟合深度图D′相对于实际深度图D的差异即可。
如图6所示,在步骤606中,判断深度误差是否满足精度要求。
在一些实施例中,可以通过将深度误差与预设阈值进行比较来判断深度误差是否满足精度要求。
如果深度误差满足精度要求(“是”),则训练深度信息提取器的处理结束。
否则,如果深度误差不满足精度要求(“否”),则处理前进到步骤608,更新深度信息提取器f D的参数。
随后,重复步骤602-608,直至深度误差满足精度要求(“是”)。
下面参考图8、图9例示的用于三维重建的方法的示例流程图以及图10中例示的三维重建目标对象的示例示意图,描述根据本公开实施例的用于三维重建的方法。
如图8所示,根据本公开的实施例,用于三维重建的方法可以主要包括以下步骤:
在步骤802,基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
在步骤804,对初始体素包络内的点进行随机采样,获得采样点的集合;
在步骤806,对图像进行全局特征提取,获得全局特征图;
在步骤808,基于几何关联,从全局特征图中确定与采样点对应的全局特征;
在步骤810,对关于采样点的几何信息进行编码处理,生成几何编码信息;以及。
在步骤812,将全局特征和相应的几何编码信息输入用于三维重建的模型,判断采样点与目标对象表面的几何关系。
在一些实施例中,用于三维重建的方法还可以包括对多视角拍摄目标对象的相机进行标定,以获取从相机坐标系到世界坐标系的映射关系。
可替代地,在一些实施例中,关于相机的标定信息是预先已知的。
为便于理解,下面结合图10中例示的示意图对图8、图9中例示的流程图中的部分步骤进行示意性解释。
可以基于以多视角拍摄目标对象获得的如图10所例示的图像,生成如图10所例示的目标对象的初始体素包络(图8中的步骤802)。
在一些实施例中,生成目标对象的初始体素包络可以基于可视外壳技术。
具体地,生成目标对象的初始体素包络可以包括生成目标对象的可视外壳。
在一些实施例中,生成目标对象的初始体素包络还可以包括:对可视外壳应用约束 条件。具体地,在可视外壳的基础上,通过应用一个或多个约束条件,确定或准确化目标对象的初始体素包络。
在一些实施例中,约束条件可以包括基于目标对象的深度信息的约束条件。在另一些实施例中,约束条件可以包括目标对象的固有形貌特征。例如,当目标对象为人体时,固有形貌特征可以包括但不限于人体约束条件。具体地,在一些实施例中,人体约束条件包括但不限于:人体躯干及五官的数量、极限相对位置、自由度约束、大小、长度等中的一个或多个。
有利地,通过应用约束条件来确定目标对象的初始体素包络能够克服单纯地依靠有限的图像信息来构建初始体素包络不够精细甚至容易出现错误的问题,提高了初始体素包络的准确度和精细度。
一旦生成目标对象的初始体素包络,可以对初始体素包络内的点进行随机采样,获得如图10所例示的采样点的集合(图8中的步骤804)。
有利地,该限制能够有效地缩小采样的范围,因此提高采样的有效性从而优化三维重建、避免不必要的处理开销等。此外,将随机采样的范围定义在所生成的初始体素包络内部还能够有利地提高三维重建的准确度。
在一些实施例中,对初始体素包络内的点进行均匀的随机采样。
可替代地,在另一些实施例中,对初始体素包络内的点进行非均匀的随机采样。
例如,为了对人脸、手等特定部位进行更精细的三维建模,对与特定部位对应的区域进行增强(即,更密集)的随机采样。
由此,在一些实施例中,对初始体素包络内的点进行随机采样还可以包括基于图像识别来确定图像中与目标对象的特定部位对应的特定范围。
在一些实施例中,特定部位包括但不限于手或人脸等中的一个或多个。例如,在某个实施例中,特定部位为双手。
在一些实施例中,图像识别的方法可以包括但不限于人脸检测、手势检测等中的任意一个或组合。
在一些实施例中,一旦识别出与特定部位对应的特定范围,可以在进行随机采样的 过程中对与特定范围对应的特定区域内的点进行增强的随机采样。
例如,在一些实施例中,可以通过多目视觉原理获取与图像中特定范围对应的特定区域。
可以对图像进行全局特征提取,获得全局特征图(图8中的步骤806)。
具体地,在一些实施例中,可以将图像输入到全局特征提取器中进行全局特征提取。
在一些实施例中,全局特征提取器可以包括但不限于神经网络、自动编解码、SIFT、HOG等中的任意一个或组合。
作为全局特征提取器的输出,可以获得针对各个图像的全局特征图。一旦完成对所有图像的全局特征提取,所获得的全局特征图的数量可以等于图像的数量。
在一些实施例中,全局特征图可以由特征元素组成。各个特征元素可以采取多维向量的形式表示。全局特征图中的特征元素可以分别与图像上的像素块对应。这里,特征元素与像素块的“对应”指该特征元素可以表示对应像素块的特征。本领域技术人员容易理解,图像的分辨率越高或像素块越小,提取得到的全局特征图就越能准确地表示图像,但相应的工作量就越大。
在一些实施例中,为了避免巨大的计算开销,对图像进行全局特征提取还包括在将图像输入到全局特征提取器之前对图像进行诸如降采样之类的预处理,以降低图像的分辨率。
例如,在一些实施例中,在将图像输入到全局特征提取器之前,可以将分辨率为512*512的图像压缩为分辨率为64*64的图像。
可以基于几何关联,从全局特征图中确定与采样点对应的全局特征(图8中的步骤808)。
如上所述,全局特征图中的特征元素可以分别与图像上的像素块对应。此外,可以通过几何关系确定在上面对采样点成像的像素块,即,与采样点对应的像素块。由此,可以建立基于几何关联的从采样点到特征元素的对应关系。
可以对关于采样点的几何信息进行编码处理,生成几何编码信息(图8中的步骤810)。
在一些实施例中,关于采样点的几何信息可以包括采样点的空间坐标以及对采样点成像的相机的内外方位信息中的至少一部分。
例如,在一些实施例中,关于采样点的几何信息可以仅包括采样点的空间坐标。
例如,在另一些实施例中,关于采样点的几何信息可以不仅包括采样点的空间坐标还包括相机的内外方位元素。
本申请的发明人认识到,几何编码信息能够相对于直观的几何信息更准确地表示几何特征。由此,利用几何编码信息来表示几何特征有利于提高三维重建的准确度。
在本公开的实施例中,用于三维重建的步骤802-步骤810的处理可以与用于训练三维重建模型的步骤202-步骤210的处理在步骤的流程上类似。在本公开的各种实施例中,关于步骤202-步骤210的相关描述也基本适用于步骤802-步骤810,因此本文中省略了关于步骤802-步骤810的部分描述。
但应注意,步骤802-步骤810的处理与步骤802-步骤810的处理在每个步骤的具体实施方式上可以有所不同。
如图10所例示的,可以将全局特征和相应的几何编码信息输入用于三维重建的模型,判断采样点与目标对象表面的几何关系(图8中的步骤812)。
具体地,可以针对每个采样点,将与采样点对应的全局特征和相应的几何编码信息输入模型。
在一些实施例中,用于三维重建的模型可以使用根据本公开实施例的用于训练三维重建模型的方法训练得到。
但本领域技术人员能够理解,可以使用能够根据上述输入来判断相应采样点与目标对象表面的几何关系的任意模型。
由此,如图10所例示的,基于针对任一采样点的全局特征以及相应的几何编码信息,模型可以对采样点与目标对象表面的几何关系进行判断,并输出判断结果。
在一些实施例中,判断结果可以是数值的。
例如,在一些实施例中,判断结果可以是指示该采样点位于目标对象表面内/外的概率的数值。
举例来说,当判断结果为1时,可以指示该采样点位于目标对象表面内。相对地,当判断结果为0时,可以指示该采样点位于目标对象表面外。反之亦然。在其它情况下,判断结果可以在0和1之间。
在一些实施例中,模型可以使用根据上述输入而输出采样点位于目标对象表面内/外的概率的隐函数f来表示。但本领域技术人员能够理解,本公开不限于此。
本申请的发明人认识到,可以有选择性地将尚无法明确判断出与目标对象表面的几何关系的区域放大并重新进行判断,从而提高三维重建的准确度。
因此,在一些实施例中,用于三维重建的方法还可以包括:根据判断结果的置信度选择局部模糊区域,对局部模糊区域进行精细三维重建(图8中的步骤814)。
在一些实施例中,可以选择具有相对更低的置信度的局部模糊区域来进行精细三维重建。
置信度可以指示判断的确定性。举例来说,当判断结果为指示该采样点位于目标对象表面内/外的概率的数值时,如果判断结果为1或0,则能够确定该采样点位于目标对象表面内或外,置信度高。相对地,当判断结果为0.5时,则无法确定该采样点位于目标对象表面内或外,置信度低。
例如,在一些实施例中,可以将采样点按照置信度的大小进行排序。即,排序的次序可以反映置信度的大小关系。如果排序按照由大到小进行,则排序最靠后的若干采样点具有相对更低的置信度。反之,如果排序按照由小到大进行,则排序最靠前的若干采样点具有相对更低的置信度。
由此,可以将具有更低的置信度的采样点的子集合所处的区域中的至少部分区域确定为局部模糊区域。
在一些实施例中,子集合中包含的采样点的数量可以被预先设定。
在一些实施例中,具有相对更低的置信度的采样点的子集合所处的区域可以为根据这些采样点的分布而限定的区域。在另一些实施例中,这些区域可以是预先划分的区域。
可替代地,在一些实施例中,可以选择具有相对更多的模糊的采样点的局部区域 来进行重点训练。其中,模糊的采样点是指该采样点的置信度尚未满足预定的要求。
本领域的技术人员容易理解,根据判断结果的置信度选择局部模糊区域的方式不限于以上描述的示例方式。
在一些实施例中,可以对图像中与局部模糊区域对应的如图10所例示的局部子图像进行局部特征提取,获得局部特征图(图9中的步骤902)。
在一些实施例中,与全局特征提取类似地,可以将局部子图像输入到局部特征提取器中进行局部特征提取。
在一些实施例中,局部特征提取器可以包括但不限于神经网络、自动编解码、SIFT、HOG等中的任意一个或组合。
作为局部特征提取器的输出,可以获得针对各个局部子图像的局部特征图。一旦完成对所有局部子图像的局部特征提取,所获得的局部特征图的数量可以等于局部子图像的数量。
在一些实施例中,与全局特征提取类似地,组成局部特征图的特征元素也可以采取多维向量的形式表示。局部特征图中的特征元素可以分别与局部子图像上的像素块对应。
如以上所分析的,图像的分辨率越高或像素块越小,提取得到的特征图就越能准确地表示图像。因此,为了获得关于局部子图像的更多细节,输入到局部特征提取器中进行局部特征提取的局部子图像可以比输入到全局特征提取器中进行全局特征提取的图像具有更高的分辨率。例如,在一些实施例中,可以将没有经历诸如降采样之类的预处理的局部子图像直接输入到局部特征提取器。
在一些实施例中,可以基于几何关联,从局部特征图中确定与局部模糊区域中的采样点对应的局部特征(图9中的步骤904)。
在本公开的各种实施例中,从全局特征图确定与采样点对应的全局特征的相关描述也基本适用于从局部特征图中确定与局部模糊区域中的采样点对应的局部特征,在此不重复描述。
在一些实施例中,如图10所例示的,可以将局部特征和相应的几何编码信息输入用于三维重建的模型,重新判断局部模糊区域中的采样点与目标对象表面的几何关系(图 9中的步骤906)。
具体地,可以针对局部模糊区域中的每个采样点,将与采样点对应的局部特征和相应的几何编码信息输入模型。
由此,针对局部模糊区域中的任一采样点,模型可以重新判断该采样点与目标对象表面的几何关系,并输出更新的判断结果,以修正局部模糊区域中的采样点与目标对象表面的几何关系。
如上所述,用于进行局部特征提取的局部子图像可以比用于进行全局特征提取的图像具有更高的分辨率,使得局部特征可以相比于全局特征更能准确细致地表示所对应的采样点的特性。因此,针对局部模糊区域进行的三维重建更精细。
因此,有利地,相比于仅根据全局特征来进行三维重建,通过另外对置信度低的区域赋予更清晰的图像块进行重新判断,即放大模糊的区域,能够使重建的三维体素对该区域有更好的拟合。
在一些实施例中,如图10所例示的,用于三维重建的方法还可以包括基于采样点与目标对象表面的几何关系,对目标对象进行三维重建。
在一些实施例中,通过判断所有采样点与目标对象表面的几何关系,可以获得三维重建的目标体素包络。
例如,在判断结果是指示该采样点位于目标对象表面内/外的概率的数值的情况下,通过提取0.5等值面,就可以确定目标对象表面。
此外,在一些实施例中,用于三维重建的方法还可以包括对三维重建得到的目标体素包络内的部分体素进行透明化处理。
本申请的发明人认识到,对三维重建得到的目标体素包络内的部分体素进行透明化处理,使得其中与诸如玻璃(例如,杯子、眼镜)、发丝之类的具有一定透明度的物体对应的部分体素表现出与实际情况一致的透明度,将有助于使得三维重建得到的目标体素包络显得更加自然。
下面参考图11中例示的流程图和图12中例示的示意图,描述根据本公开实施例的进行体素透明化的方法的示例。
如图11所例示的,对三维重建得到的目标体素包络内的部分体素进行透明化处理的方法1100主要包括下面详细描述的步骤1102-步骤1106。
在一些实施例中,可以获取图像中的透明像素的透明度(图11中的步骤1102)。
例如,在一些实施例中,如图12所例示的,可以对拍摄的图像I O应用诸如抠图(Image matting)之类的处理,以获得具有透明像素的经处理的图像I I,并获得透明像素的透明度。
在一些实施例中,可以求解与透明像素对应的体素(图11中的步骤1104)。
例如,在一些实施例中,如图12所例示的,可以根据从相机坐标系到世界坐标系的映射关系,获取与图像中的透明像素区域对应的世界坐标系下的包络,并求解该包络与三维重建得到的目标体素包络V O的交际体素,即,与透明像素对应的体素。
在一些实施例中,可以基于透明像素的透明度,设定与透明像素对应的体素的透明度(图11中的步骤1106)。
例如,在一些实施例中,可以将与透明像素对应的体素的透明度设定为等于所对应的透明像素的透明度,从而获得具有透明体素的经体素透明化处理的目标体素包络V I
有利地,进行体素透明化处理能够对诸如玻璃、发丝之类的具有一定透明度的物体进行更准确的视觉表达。
根据本公开实施例的用于训练三维重建模型的方法和用于三维重建的方法能够提高采样的效率和数据的准确性,并放大判断效果差的局部区域重新进行判断。从而以更低的成本实现更准确的三维重建。在此基础上,本公开能够仅利用稀疏的相机(通过稀疏角度下成像)就实现对目标对象的高精度三维重建。能够降低三维建模的成本和/或提高三维建模的准确度。
值得注意的是,在以上描述的方法中的各个步骤之间的边界仅仅是说明性的。在实际操作中,各个步骤之间可以任意组合,甚至合成单个步骤。此外,各个步骤的执行顺序不受描述顺序的限制,并且部分步骤可以省略。各个实施例的操作步骤也可以以任何适当的顺序相互组合,从而类似地实现比所描述的更多或更少的操作。
本公开实施例还提供了存储有一个或多个指令的计算机可读存储介质,这些指令可以在由处理器执行时,使处理器执行上述实施例中的训练三维重建模型方法或者三维重 建方法的步骤。
应当理解,根据本公开实施例的计算机可读存储介质中的指令可以被配置为执行与上述系统和方法实施例相应的操作。当参考上述系统和方法实施例时,计算机可读存储介质的实施例对于本领域技术人员而言是明晰的,因此不再重复描述。用于承载或包括上述指令的计算机可读存储介质也落在本公开的范围内。这样的计算机可读存储介质可以包括但不限于软盘、光盘、磁光盘、存储卡、存储棒等等。
本公开实施例还提供了包括用于执行上述实施例中的训练三维重建模型方法或者三维重建方法的步骤的部件或单元的各种装置。
应注意,上述各个部件或单元仅是根据其所实现的具体功能划分的逻辑模块,而不是用于限制具体的实现方式,例如可以以软件、硬件或者软硬件结合的方式来实现。在实际实现时,上述各个部件或单元可被实现为独立的物理实体,或者也可由单个实体(例如,处理器(CPU或DSP等)、集成电路等)来实现。例如,在以上实施例中包括在一个单元中的多个功能可以由分开的装置来实现。替选地,在以上实施例中由多个单元实现的多个功能可分别由分开的装置来实现。另外,以上功能之一可由多个单元来实现。
以上参照附图描述了本公开的示例性实施例,但是本公开当然不限于以上示例。本领域技术人员可在所附权利要求的范围内得到各种变更和修改,并且应理解这些变更和修改自然将落入本公开的技术范围内。
虽然已经详细说明了本公开及其优点,但是应当理解在不脱离由所附的权利要求所限定的本公开的精神和范围的情况下可以进行各种改变、替代和变换。而且,本公开实施例的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本公开的实施例还包括:
1.一种用于训练三维重建模型的方法,所述方法包括:
基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
对初始体素包络内的点进行随机采样,获得采样点的集合;
对图像进行全局特征提取,获得全局特征图;
基于几何关联,从全局特征图中确定与采样点对应的全局特征;
对关于采样点的几何信息进行编码处理,生成几何编码信息;以及
至少基于全局特征和几何编码信息来训练模型。
2.根据项目1所述的方法,其中,训练模型包括:
将全局特征和相应的几何编码信息输入模型,判断采样点与目标对象表面的几何关系;
计算采样点的判别误差;
计算模型的全局判别误差;以及。
基于全局判别误差是否满足精度要求,更新模型的参数。
3.根据项目2所述的方法,其中,训练模型还包括:
根据采样点的判别误差选择局部区域,对局部区域进行重点训练。
4.根据项目3所述的方法,其中,选择局部区域包括:
将采样点按照判别误差的大小进行排序;以及
将具有相对更大的判别误差的采样点的子集合所处的区域中的至少部分区域确定为局部区域。
5.根据项目3所述的方法,其中,对局部区域进行重点训练包括:
对图像中与局部区域对应的局部子图像进行局部特征提取,获得局部特征图;
基于几何关联,从局部特征图中确定与局部区域中的采样点对应的局部特征;
使用局部特征和相应的几何编码信息来重点训练模型。
6.根据项目1所述的方法,其中,训练模型还包括:
训练用于从全局特征中提取深度信息的深度信息提取器。
7.根据项目6所述的方法,其中,训练深度信息提取器包括:
将全局特征输入深度信息提取器,得到拟合深度图;
比较实际深度图和拟合深度图,获得深度误差;以及
基于深度误差是否满足精度要求,更新深度信息提取器的参数。
8.根据项目1所述的方法,其中,生成目标对象的初始体素包络包括:
生成目标对象的可视外壳;以及
对可视外壳应用约束条件,确定或准确化目标对象的初始体素包络。
9.根据项目1所述的方法,其中,对初始体素包络内的点进行随机采样包括:
基于图像识别,确定图像中与目标对象的特定部位对应的特定范围;以及
对与特定范围对应的特定区域内的点进行增强的随机采样。
10.一种计算机可读存储介质,其上存储有一个或多个指令,所述指令在由处理器执行时,使处理器执行根据项目1-9中任一项所述方法的步骤。
11.一种用于训练用于三维重建的模型的装置,包括用于执行根据项目1-9中任一项所述方法的步骤的部件。
12.一种用于三维重建的方法,包括:
基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
对初始体素包络内的点进行随机采样,获得采样点的集合;
对图像进行全局特征提取,获得全局特征图;
基于几何关联,从全局特征图中确定与采样点对应的全局特征;
对关于采样点的几何信息进行编码处理,生成几何编码信息;
将全局特征和相应的几何编码信息输入用于三维重建的模型,判断采样点与目标对象表面的几何关系。
13.根据项目12所述的方法,还包括:
根据判断结果的置信度选择局部模糊区域,对局部模糊区域进行精细三维重建。
14.根据项目13所述的方法,其中,对局部模糊区域进行精细三维重建包括:
对图像中与局部模糊区域对应的局部子图像进行局部特征提取,获得局部特征图;
基于几何关联,从局部特征图中确定与局部模糊区域中的采样点对应的局部特征;
将局部特征和相应的几何编码信息输入用于三维重建的模型,重新判断局部模糊区域中的采样点与目标对象表面的几何关系。
15.根据项目12所述的方法,所述方法还包括:
对三维重建得到的目标体素包络内的部分体素进行透明化处理。
16.根据项目15所述的方法,其中,对三维重建得到的目标体素包络内的部分体素进行透明化处理包括:
获取图像中的透明像素的透明度;
求解与透明像素对应的体素;
基于透明像素的透明度,设定与透明像素对应的体素的透明度。
17.一种计算机可读存储介质,其上存储有一个或多个指令,所述指令在由处理器执行时,使处理器执行根据项目12-16中任一项所述方法的步骤。
18.一种用于三维重建的装置,包括用于执行根据项目12-16中任一项所述方法的步骤的部件。
19.一种用于三维重建的系统,包括:
训练单元,被配置为执行根据项目1-9中任一项所述方法的步骤;以及
推理单元,被配置为执行根据项目12-14中任一项所述方法的步骤。
20.根据项目19所述的系统,还包括:
体素透明化单元,被配置为对三维重建得到的目标体素包络内的部分体素进行透明化处理。

Claims (20)

  1. 一种用于训练三维重建模型的方法,所述方法包括:
    基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
    对初始体素包络内的点进行随机采样,获得采样点的集合;
    对图像进行全局特征提取,获得全局特征图;
    基于几何关联,从全局特征图中确定与采样点对应的全局特征;
    对关于采样点的几何信息进行编码处理,生成几何编码信息;以及
    至少基于全局特征和几何编码信息来训练模型。
  2. 根据权利要求1所述的方法,其中,训练模型包括:
    将全局特征和相应的几何编码信息输入模型,判断采样点与目标对象表面的几何关系;
    计算采样点的判别误差;
    计算模型的全局判别误差;以及。
    基于全局判别误差是否满足精度要求,更新模型的参数。
  3. 根据权利要求2所述的方法,其中,训练模型还包括:
    根据采样点的判别误差选择局部区域,对局部区域进行重点训练。
  4. 根据权利要求3所述的方法,其中,选择局部区域包括:
    将采样点按照判别误差的大小进行排序;以及
    将具有相对更大的判别误差的采样点的子集合所处的区域中的至少部分区域确定为局部区域。
  5. 根据权利要求3所述的方法,其中,对局部区域进行重点训练包括:
    对图像中与局部区域对应的局部子图像进行局部特征提取,获得局部特征图;
    基于几何关联,从局部特征图中确定与局部区域中的采样点对应的局部特征;以及
    使用局部特征和相应的几何编码信息来重点训练模型。
  6. 根据权利要求1所述的方法,其中,训练模型还包括:
    训练用于从全局特征中提取深度信息的深度信息提取器。
  7. 根据权利要求6所述的方法,其中,训练深度信息提取器包括:
    将全局特征输入深度信息提取器,得到拟合深度图;
    比较实际深度图和拟合深度图,获得深度误差;以及
    基于深度误差是否满足精度要求,更新深度信息提取器的参数。
  8. 根据权利要求1所述的方法,其中,生成目标对象的初始体素包络包括:
    生成目标对象的可视外壳;以及
    对可视外壳应用约束条件,确定或准确化目标对象的初始体素包络。
  9. 根据权利要求1所述的方法,其中,对初始体素包络内的点进行随机采样包括:
    基于图像识别,确定图像中与目标对象的特定部位对应的特定范围;以及
    对与特定范围对应的特定区域内的点进行增强的随机采样。
  10. 一种计算机可读存储介质,其上存储有一个或多个指令,所述指令在由处理器执行时,使处理器执行根据权利要求1-9中任一项所述方法的步骤。
  11. 一种用于训练用于三维重建的模型的装置,包括用于执行根据权利要求1-9中任一项所述方法的步骤的部件。
  12. 一种用于三维重建的方法,包括:
    基于以多视角拍摄目标对象获得的图像,生成目标对象的初始体素包络;
    对初始体素包络内的点进行随机采样,获得采样点的集合;
    对图像进行全局特征提取,获得全局特征图;
    基于几何关联,从全局特征图中确定与采样点对应的全局特征;
    对关于采样点的几何信息进行编码处理,生成几何编码信息;
    将全局特征和相应的几何编码信息输入用于三维重建的模型,判断采样点与目标对象表面的几何关系。
  13. 根据权利要求12所述的方法,还包括:
    根据判断结果的置信度选择局部模糊区域,对局部模糊区域进行精细三维重建。
  14. 根据权利要求13所述的方法,其中,对局部模糊区域进行精细三维重建包括:
    对图像中与局部模糊区域对应的局部子图像进行局部特征提取,获得局部特征图;
    基于几何关联,从局部特征图中确定与局部模糊区域中的采样点对应的局部特征;
    将局部特征和相应的几何编码信息输入用于三维重建的模型,重新判断局部模糊区域中的采样点与目标对象表面的几何关系。
  15. 根据权利要求12所述的方法,所述方法还包括:
    对三维重建得到的目标体素包络内的部分体素进行透明化处理。
  16. 根据权利要求15所述的方法,其中,对三维重建得到的目标体素包络内的部分体素进行透明化处理包括:
    获取图像中的透明像素的透明度;
    求解与透明像素对应的体素;
    基于透明像素的透明度,设定与透明像素对应的体素的透明度。
  17. 一种计算机可读存储介质,其上存储有一个或多个指令,所述指令在由处理器执行时,使处理器执行根据权利要求12-16中任一项所述方法的步骤。
  18. 一种用于三维重建的装置,包括用于执行根据权利要求12-16中任一项所述 方法的步骤的部件。
  19. 一种用于三维重建的系统,包括:
    训练单元,被配置为执行根据权利要求1-9中任一项所述方法的步骤;以及
    推理单元,被配置为执行根据权利要求12-14中任一项所述方法的步骤。
  20. 根据权利要求19所述的系统,还包括:
    体素透明化单元,被配置为对三维重建得到的目标体素包络内的部分体素进行透明化处理。
PCT/CN2022/129484 2021-11-04 2022-11-03 用于三维重建的方法、系统和存储介质 WO2023078335A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280072092.7A CN118302798A (zh) 2021-11-04 2022-11-03 用于三维重建的方法、系统和存储介质

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111296646.5 2021-11-04
CN202111296646.5A CN116091686A (zh) 2021-11-04 2021-11-04 用于三维重建的方法、系统和存储介质

Publications (1)

Publication Number Publication Date
WO2023078335A1 true WO2023078335A1 (zh) 2023-05-11

Family

ID=86199651

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129484 WO2023078335A1 (zh) 2021-11-04 2022-11-03 用于三维重建的方法、系统和存储介质

Country Status (2)

Country Link
CN (2) CN116091686A (zh)
WO (1) WO2023078335A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911631A (zh) * 2024-03-19 2024-04-19 广东石油化工学院 一种基于异源图像匹配的三维重建方法
CN118297999A (zh) * 2024-06-04 2024-07-05 浙江大华技术股份有限公司 图像生成方法、电子设备以及存储介质
CN118521699A (zh) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 一种虚拟人三维头发丝发型的生成方法及系统
CN118570397A (zh) * 2024-07-31 2024-08-30 山东济矿鲁能煤电股份有限公司阳城煤矿 煤矿主井井底积煤与尾绳的3d图像生成与分析系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563875A (zh) * 2020-03-09 2020-08-21 北京灵医灵科技有限公司 基于动态边缘预测的核磁共振影像中肾脏分离方法及装置
US10970518B1 (en) * 2017-11-14 2021-04-06 Apple Inc. Voxel-based feature learning network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970518B1 (en) * 2017-11-14 2021-04-06 Apple Inc. Voxel-based feature learning network
CN111563875A (zh) * 2020-03-09 2020-08-21 北京灵医灵科技有限公司 基于动态边缘预测的核磁共振影像中肾脏分离方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 1 December 2020, UNIVERSITY OF CHINESE ACADEMY OF SCIENCES (SCHOOL OF ARTIFICIAL INTELLIGENCE, UNIVERSITY OF CHINESE ACADEMY OF SCIENCES), CN, article LIU CHENNING: "Research and Application of 3D Reconstruction Based on Multiview Images", pages: 1 - 78, XP009545532, DOI: 10.27824/d.cnki.gzkdx.2020.000064 *
WU, DAN ET AL.: "Three-dimension Reconstruction Method Based on Silhouette for Pot Rice", JOURNAL OF AGRICULTURAL SCIENCE AND TECHNOLOGY, vol. 22, no. 09, 15 September 2020 (2020-09-15), XP009546040, ISSN: 1008-0864 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911631A (zh) * 2024-03-19 2024-04-19 广东石油化工学院 一种基于异源图像匹配的三维重建方法
CN117911631B (zh) * 2024-03-19 2024-05-28 广东石油化工学院 一种基于异源图像匹配的三维重建方法
CN118297999A (zh) * 2024-06-04 2024-07-05 浙江大华技术股份有限公司 图像生成方法、电子设备以及存储介质
CN118521699A (zh) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 一种虚拟人三维头发丝发型的生成方法及系统
CN118570397A (zh) * 2024-07-31 2024-08-30 山东济矿鲁能煤电股份有限公司阳城煤矿 煤矿主井井底积煤与尾绳的3d图像生成与分析系统

Also Published As

Publication number Publication date
CN118302798A (zh) 2024-07-05
CN116091686A (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2023078335A1 (zh) 用于三维重建的方法、系统和存储介质
CN111795704B (zh) 一种视觉点云地图的构建方法、装置
CN105005755B (zh) 三维人脸识别方法和系统
US20190259202A1 (en) Method to reconstruct a surface from partially oriented 3-d points
Osher et al. Geometric level set methods in imaging, vision, and graphics
WO2017219391A1 (zh) 一种基于三维数据的人脸识别系统
US8064685B2 (en) 3D object recognition
KR101007276B1 (ko) 3차원 안면 인식
CN108875813B (zh) 一种基于几何图像的三维网格模型检索方法
JP6624794B2 (ja) 画像処理装置、画像処理方法及びプログラム
CN106910242A (zh) 基于深度相机进行室内完整场景三维重建的方法及系统
CN106909873A (zh) 人脸识别的方法和装置
CN110910437B (zh) 一种复杂室内场景的深度预测方法
WO2019127102A1 (zh) 信息处理方法、装置、云处理设备以及计算机程序产品
CN111739167B (zh) 3d人头重建方法、装置、设备和介质
CN104834894B (zh) 一种结合二进制编码和类-Hausdorff距离的手势识别方法
US11972573B2 (en) Method of automatically recognizing wound boundary based on artificial intelligence and method of generating three-dimensional wound model
US20210390772A1 (en) System and method to reconstruct a surface from partially oriented 3-d points
CN114708382A (zh) 基于增强现实的三维建模方法、装置、存储介质、设备
CN114299590A (zh) 人脸补全模型的训练方法、人脸补全方法及系统
CN111709269B (zh) 一种深度图像中基于二维关节信息的人手分割方法和装置
Jisen A study on target recognition algorithm based on 3D point cloud and feature fusion
Gong Application and Practice of Artificial Intelligence Technology in Interior Design
CN110135474A (zh) 一种基于深度学习的倾斜航空影像匹配方法和系统
CN116051980A (zh) 基于倾斜摄影的建筑识别方法、系统、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889357

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18704660

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 202280072092.7

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 22889357

Country of ref document: EP

Kind code of ref document: A1