WO2022237249A1

WO2022237249A1 - Three-dimensional reconstruction method, apparatus and system, medium, and computer device

Info

Publication number: WO2022237249A1
Application number: PCT/CN2022/075636
Authority: WO
Inventors: 曹智杰; 汪旻; 刘文韬; 钱晨; 马利庄
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-05-10
Filing date: 2022-02-09
Publication date: 2022-11-17
Also published as: JP2023547888A; CN113160418A; TW202244853A; CN113160418B; KR20230078777A

Abstract

The present disclosure provides a three-dimensional reconstruction method, apparatus and system, a medium, and a computer device. The method comprises: carrying out three-dimensional reconstruction on a target object in an image by means of a three-dimensional reconstruction network, and obtaining an initial value of a parameter of the target object, the initial value of the parameter being used for building a three-dimensional model of the target object; optimizing the initial value of the parameter on the basis of pre-acquired supervision information used for indicating features of the target object, so as to obtain an optimized value of the parameter; and performing skinned mesh processing on the basis of the optimized value of the parameter, and establishing a three-dimensional model of the target object.

Description

Three-dimensional reconstruction method, device and system, medium and computer equipment

相关申请的交叉引用Cross References to Related Applications

本公开要求于2021年05月10日提交的、申请号为202110506464X、发明名称为“三维重建方法、装置和系统、介质及计算机设备”的中国专利申请的优先权，该申请以引用的方式并入本文中。This disclosure claims the priority of the Chinese patent application with the application number 202110506464X and the title of the invention "three-dimensional reconstruction method, device and system, medium and computer equipment" submitted on May 10, 2021, which is incorporated by reference into this article.

technical field

本公开涉及计算机视觉技术领域，尤其涉及三维重建方法、装置和系统、介质及计算机设备。The present disclosure relates to the technical field of computer vision, and in particular to a three-dimensional reconstruction method, device and system, media and computer equipment.

Background technique

三维重建是计算机视觉中的重要技术之一，在增强现实，虚拟现实等领域有许多潜在的应用。通过对目标对象进行三维重建，能够重建出目标对象的体态和肢体旋转。然而，传统的三维重建方式无法兼顾重建结果的准确性和可靠性。3D reconstruction is one of the important technologies in computer vision, and has many potential applications in fields such as augmented reality and virtual reality. By performing three-dimensional reconstruction on the target object, the posture and limb rotation of the target object can be reconstructed. However, traditional 3D reconstruction methods cannot balance the accuracy and reliability of reconstruction results.

发明内容Contents of the invention

本公开提供一种三维重建方法、装置和系统、介质及计算机设备。The present disclosure provides a three-dimensional reconstruction method, device and system, medium and computer equipment.

根据本公开实施例的第一方面，提供一种三维重建方法，所述方法包括：通过三维重建网络对图像中的目标对象进行三维重建，得到所述目标对象的参数的初始值，其中，所述参数的初始值用于建立所述目标对象的三维模型；基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化，得到所述参数的优化值；基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。According to the first aspect of the embodiments of the present disclosure, there is provided a 3D reconstruction method, the method comprising: performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain an initial value of a parameter of the target object, wherein the The initial value of the parameter is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter; based on the obtained The optimized values of the above parameters are used for bone skinning processing, and the three-dimensional model of the target object is established.

在一些实施例中，所述监督信息包括第一监督信息，或者所述监督信息包括第一监督信息和第二监督信息；所述第一监督信息包括以下至少一者：所述目标对象的初始二维关键点，所述图像中所述目标对象上的多个像素点的语义信息；所述第二监督信息包括所述目标对象表面的初始三维点云。本公开实施例可以仅采用目标对象的初始二维关键点或者像素点的语义信息作为监督信息来对所述参数的初始值进行优化，优化效率较高，优化复杂度低；或者，也可以将目标对象表面的初始三维点云与前述的初始二维关键点或者像素点的语义信息共同作为监督信息，从而提高获取的参数的优化值的准确度。In some embodiments, the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface. In the embodiments of the present disclosure, only the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.

在一些实施例中，所述方法还包括：通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。将关键点提取网络提取出的初始二维关键点的信息作为监督信息，能够为三维模型生成较为自然合理的动作。In some embodiments, the method further includes: extracting information of initial two-dimensional key points of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.

在一些实施例中，所述图像包括所述目标对象的深度图像；所述方法还包括：从所述深度图像中提取所述目标对象上多个像素点的深度信息；基于所述深度信息将所述深度图像中所述目标对象上的多个像素点反向投影到三维空间，得到所述目标对象表面的初始三维点云。通过提取深度信息，并基于深度信息将二维图像上的像素点反向投影到三维空间，得到目标对象表面的初始三维点云，从而能够将该初始三维点云作为监督信息来优化参数的初始值，进一步提高了参数优化的准确性。In some embodiments, the image includes a depth image of the target object; the method further includes: extracting depth information of a plurality of pixels on the target object from the depth image; A plurality of pixel points on the target object in the depth image are back-projected to a three-dimensional space to obtain an initial three-dimensional point cloud of the surface of the target object. By extracting the depth information and back-projecting the pixels on the two-dimensional image to the three-dimensional space based on the depth information, the initial three-dimensional point cloud of the target object surface can be obtained, so that the initial three-dimensional point cloud can be used as the supervision information to optimize the initial parameters. value, further improving the accuracy of parameter optimization.

在一些实施例中，所述图像还包括所述目标对象的RGB图像；所述从所述深度图像中提取所述目标对象上多个像素点的深度信息，包括：对所述RGB图像进行图像分割，基于图像分割的结果确定所述RGB图像中目标对象所在的图像区域，基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域；获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过对RGB图像进行图像分割，能够准确地确定目标对象的位置，从而准确地提取出目标对象的深度信息。In some embodiments, the image further includes an RGB image of the target object; the extracting the depth information of a plurality of pixels on the target object from the depth image includes: performing image processing on the RGB image Segmentation, determining the image area where the target object is located in the RGB image based on the results of the image segmentation, determining the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image; acquiring the depth image Depth information of multiple pixels in the image area where the target object is located. By performing image segmentation on the RGB image, the position of the target object can be accurately determined, thereby accurately extracting the depth information of the target object.

在一些实施例中，所述方法还包括：从所述初始三维点云中过滤掉离群点，将过滤后的所述初始三维点云作为所述第二监督信息。通过过滤离群点，从而减轻离群点的干扰，进一步提高了参数优化过程的准确性。In some embodiments, the method further includes: filtering outliers from the initial three-dimensional point cloud, and using the filtered initial three-dimensional point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.

在一些实施例中，所述目标对象的图像通过图像采集装置采集得到，所述参数包括：所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数；所述基于预先获取的用于表示目标对象特征的监督信息对所述参数的初始值进行优化，包括：在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下，基于所述监督信息和所述位移参数的初始值，对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化，得到位移参数的优化值和全局旋转参数的优化值；基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化，得到关键点旋转参数的优化值和体态参数的优化值。由于在优化过程中，改变图像采集装置的位置与改变三维关键点位置均可以导致三维关键点的二维投影产生变化，这将会导致优化过程很不稳定。通过采用两阶段优化的方式，先固定关键点旋转参数的初始值和体态参数的初始值来对图像采集装置的位移参数的初始值和全局旋转参数的初始值进行优化，再固定位移参数的初始值和全局旋转参数的初始值，对关键点旋转参数的初始值和体态参数的初始值进行优化，提高了优化过程的稳定性。In some embodiments, the image of the target object is acquired by an image acquisition device, and the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object The body parameters and the displacement parameters of the image acquisition device; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object, including: the initial value of the body parameter and the key Under the condition that the initial value of the point rotation parameter remains unchanged, based on the supervisory information and the initial value of the displacement parameter, optimize the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter , to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter; based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are performed Optimization, to obtain the optimal value of the key point rotation parameter and the optimal value of the body parameter. During the optimization process, changing the position of the image acquisition device and changing the position of the 3D key points can lead to changes in the 2D projection of the 3D key points, which will lead to an unstable optimization process. By adopting a two-stage optimization method, firstly fix the initial value of the key point rotation parameter and the initial value of the posture parameter to optimize the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter, and then fix the initial value of the displacement parameter value and the initial value of the global rotation parameter, optimize the initial value of the key point rotation parameter and the initial value of the body shape parameter, and improve the stability of the optimization process.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点；所述基于所述监督信息和所述位移参数的初始值，对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化，包括：获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到，所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到；获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失；获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失；基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。预设部位可以是躯干等部位，由于不同的动作对躯干部位的关键点的影响较小，因此，通过采用躯干部位的关键点确定第一损失，能够减轻不同动作对关键点位置的影响，提高优化结果的准确性。由于二维关键点是二维平面上的监督信息，而图像采集装置的位移参数是三维平面上的参数，通过获取第二损失，能够减少优化结果落入二维平面上的局部最优点从而偏离真实点的情况。In some embodiments, the supervision information includes the initial two-dimensional key points of the target object; the current value of the displacement parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter And optimizing the initial value of the global rotation parameter includes: obtaining the target two-dimensional projection key points corresponding to the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the preset position of the target object; wherein, The 3D key points of the target object are obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the 2D projection key point is based on the current value of the displacement parameter and the global The initial value of the rotation parameter is obtained by projecting the three-dimensional key point of the target object; obtaining the first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtaining the initial value of the displacement parameter and a second loss between the current value of the displacement parameter; optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss and the second loss. The preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点；所述基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化，包括：获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到；获取第四损失，所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性；基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。本实施例基于位移参数的优化值和全局旋转参数的优化值对关键点旋转参数的初始值和体态参数的初始值进行优化，提高了优化过程的稳定性，同时，通过第四损失保证了优化后的参数对应的姿态的合理性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the optimized value based on the displacement parameter and the optimized value of the global rotation parameter, and the initial value of the key point rotation parameter Optimizing with the initial value of the posture parameter includes: obtaining the third loss between the optimized two-dimensional projection key point of the target object and the initial two-dimensional key point, and the optimized two-dimensional projection key point is based on the The optimized value of the displacement parameter and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter and the initial value of the key point rotation parameter and the initial value of the posture parameter is obtained; the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter; Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss. This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process. At the same time, the fourth loss ensures the optimization The latter parameters correspond to the rationality of the pose.

在一些实施例中，所述方法还包括：在基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后，对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例在前述优化的基础上，对优化后的各项参数进行联合优化，从而进一步提高了优化结果的准确性。In some embodiments, the method further includes: after optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter , performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云；所述基于所述监督信息和所述位移参数的初始值，对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化，包括：获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到，所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到；获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失；获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失；获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失；所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到；基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化，从而提高了优化结果的准确性。In some embodiments, the supervision information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; based on the supervision information and the initial value of the displacement parameter, the Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter includes: obtaining presets belonging to the target object in the two-dimensional projection key points corresponding to the three-dimensional key points of the target object The target two-dimensional projection key point of the part; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the two-dimensional projection key The point is obtained by projecting the 3D key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the first 2D key point between the target 2D projection key point and the initial 2D key point A loss; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the fifth loss between the first three-dimensional point cloud of the surface of the target object and the initial three-dimensional point cloud loss; the first three-dimensional point cloud is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter; based on the first loss, the second loss and the fifth loss to the The current value of the displacement parameter and the initial value of the global rotation parameter are optimized. In this embodiment, the three-dimensional point cloud is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

在一些实施例中，所述对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化，包括：获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到；获取第七损失，所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性；获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失；所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到；基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化，从而提高了优化结果的准确性。In some embodiments, the joint optimization of the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the posture parameter and the optimized value of the displacement parameter includes: obtaining the A sixth loss between an optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter on the The optimized three-dimensional key points of the target object are obtained by projection, and the optimized three-dimensional key points are obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; the seventh loss is obtained, and the first Seven losses are used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the body shape parameter; the second three-dimensional point cloud of the surface of the target object is obtained and the initial The eighth loss between three-dimensional point clouds; the second three-dimensional point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the body shape parameter; based on the sixth loss, the first The seventh loss and the eighth loss jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter. In this embodiment, the three-dimensional point cloud is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

根据本公开实施例的第二方面，提供一种三维重建装置，所述装置包括：第一三维重建模块，用于通过三维重建网络对图像中的目标对象进行三维重建，得到所述目标对象的参数的初始值，其中，所述参数的初始值用于建立所述目标对象的三维模型；优化模块，用于基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化，得到参数的优化值；第二三维重建模块，用于基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。According to the second aspect of the embodiments of the present disclosure, there is provided a 3D reconstruction device, the device comprising: a first 3D reconstruction module, configured to perform 3D reconstruction on a target object in an image through a 3D reconstruction network, to obtain the target object The initial value of the parameter, wherein the initial value of the parameter is used to establish the three-dimensional model of the target object; the optimization module is used to adjust the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object Perform optimization to obtain the optimized value of the parameter; the second three-dimensional reconstruction module is used to perform bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object.

在一些实施例中，所述装置还包括：二维关键点提取模块，用于通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。将关键点提取网络提取出的初始二维关键点的信息作为监督信息，能够为三维模型生成较为自然合理的动作。In some embodiments, the device further includes: a two-dimensional key point extraction module, configured to extract initial two-dimensional key point information of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.

在一些实施例中，所述图像包括所述目标对象的深度图像；所述装置还包括：深度信息提取模块，用于从所述深度图像中提取所述目标对象上多个像素点的深度信息；反向投影模块，用于基于所述深度信息将所述深度图像中所述目标对象上的多个像素点反向投影到三维空间，得到所述目标对象表面的初始三维点云。通过提取深度信息，并基于深度信息将二维图像上的像素点反向投影到三维空间，得到目标对象表面的初始三维点云，从而能够将该初始三维点云作为监督信息来优化参数的初始值，进一步提高了参数优化的准确性。In some embodiments, the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract depth information of multiple pixels on the target object from the depth image a back-projection module, configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object. By extracting the depth information and back-projecting the pixels on the two-dimensional image to the three-dimensional space based on the depth information, the initial three-dimensional point cloud of the target object surface can be obtained, so that the initial three-dimensional point cloud can be used as the supervision information to optimize the initial parameters. value, further improving the accuracy of parameter optimization.

在一些实施例中，所述图像还包括所述目标对象的RGB图像；所述深度信息提取模块包括：图像分割单元，用于对所述RGB图像进行图像分割，图像区域确定单元，用于基于图像分割的结果确定所述RGB图像中目标对象所在的图像区域，基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域；深度信息获取单元，用于获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过对RGB图像进行图像分割，能够准确地确定目标对象的位置，从而准确地提取出目标对象的深度信息。In some embodiments, the image further includes an RGB image of the target object; the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, and an image area determination unit for based on The result of image segmentation determines the image area where the target object is located in the RGB image, and determines the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image; the depth information acquisition unit is used to acquire Depth information of multiple pixels in the image area where the target object is located in the depth image. By performing image segmentation on the RGB image, the position of the target object can be accurately determined, thereby accurately extracting the depth information of the target object.

在一些实施例中，所述装置还包括：过滤模块，用于从所述初始三维点云中过滤掉离群点，将过滤后的所述初始三维点云作为所述第二监督信息。通过过滤离群点，从而减轻离群点的干扰，进一步提高了参数优化过程的准确性。In some embodiments, the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.

在一些实施例中，所述目标对象的图像通过图像采集装置采集得到，所述参数包括：所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数；所述优化模块包括：第一优化单元，用于在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下，基于所述监督信息和所述位移参数的初始值，对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化，得到位移参数的优化值和全局旋转参数的优化值；第二优化单元，用于基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化，得到关键点旋转参数的优化值和体态参数的优化值。由于在优化过程中，改变图像采集装置的位置与改变三维关键点位置均可以导致三维关键点的二维投影产生变化，这将会导致优化过程很不稳定。通过采用两阶段优化的方式，先固定关键点旋转参数的初始值和体态参数的初始值来对图像采集装置的位移参数的初始值和全局旋转参数的初始值进行优化，再固定位移参数的初始值和全局旋转参数的初始值，对关键点旋转参数的初始值和体态参数的初始值进行优化，提高了优化过程的稳定性。In some embodiments, the image of the target object is acquired by an image acquisition device, and the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object The posture parameters of the body parameters and the displacement parameters of the image acquisition device; the optimization module includes: a first optimization unit, for when the initial values of the body posture parameters and the initial values of the key point rotation parameters remain unchanged, based on The supervision information and the initial value of the displacement parameter are optimized by optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter ; The second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter to obtain the key point rotation parameter The optimal value of and the optimal value of body parameters. During the optimization process, changing the position of the image acquisition device and changing the position of the 3D key points can lead to changes in the 2D projection of the 3D key points, which will lead to an unstable optimization process. By adopting a two-stage optimization method, firstly fix the initial value of the key point rotation parameter and the initial value of the posture parameter to optimize the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter, and then fix the initial value of the displacement parameter value and the initial value of the global rotation parameter, optimize the initial value of the key point rotation parameter and the initial value of the body shape parameter, and improve the stability of the optimization process.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点；所述第一优化单元用于：获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到，所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到；获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失；获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失；基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。预设部位可以是躯干等部位，由于不同的动作对躯干部位的关键点的影响较小，因此，通过采用躯干部位的关键点确定第一损失，能够减轻不同动作对关键点位置的影响，提高优化结果的准确性。由于二维关键点是二维平面上的监督信息，而图像采集装置的位移参数是三维平面上的参数，通过获取第二损失，能够减少优化结果落入二维平面上的局部最优点从而偏离真实点的情况。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter for optimization. The preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点；所述第二优化单元用于：获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到；获取第四损失，所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性；基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。本实施例基于位移参数的优化值和全局旋转参数的优化值对关键点旋转参数的初始值和体态参数的初始值进行优化，提高了优化过程的稳定性，同时，通过第四损失保证了优化后的参数对应的姿态的合理性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; the fourth loss is obtained, and the fourth loss is used to characterize the optimal value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized . This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process. At the same time, the fourth loss ensures the optimization The latter parameters correspond to the rationality of the pose.

在一些实施例中，所述装置还包括：联合优化模块，用于在基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后，对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例在前述优化的基础上，对优化后的各项参数进行联合优化，从而进一步提高了优化结果的准确性。In some embodiments, the device further includes: a joint optimization module, configured to perform an initial value of the key point rotation parameter and the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.

在一些实施例中，所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云；所述第一优化单元用于：获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到，所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到；获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失；获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失；获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失；所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到；基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化，从而提高了优化结果的准确性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point cloud is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter ; optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss, the second loss and the fifth loss. In this embodiment, the three-dimensional point cloud is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

在一些实施例中，所述联合优化模块包括：第一获取单元，用于获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到；第二获取单元，用于获取第七损失，所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性；第三获取单元，用于获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失；所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到；联合优化单元，用于基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化，从而提高了优化结果的准确性。In some embodiments, the joint optimization module includes: a first acquisition unit, configured to acquire the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the optimization The two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter, The optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to represent the optimized value of the global rotation parameter and the optimized value of the key point rotation parameter value and the rationality of the posture corresponding to the optimized value of the body posture parameter; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the first 3D point cloud The 2D and 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; the joint optimization unit is used for pairing based on the sixth loss, the seventh loss and the eighth loss The optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, the three-dimensional point cloud is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

根据本公开实施例的第三方面，提供一种三维重建系统，所述系统包括：图像采集装置，用于采集目标对象的图像；以及与所述图像采集装置通信连接的处理单元，用于通过三维重建网络对所述图像中的所述目标对象进行三维重建，得到所述目标对象的参数的初始值，所述参数的初始值用于建立所述目标对象的三维模型；基于预先获取的用于表示目标对象特征的监督信息对所述参数的初始值进行优化，得到所述参数的优化值；基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。According to a third aspect of an embodiment of the present disclosure, there is provided a three-dimensional reconstruction system, the system comprising: an image acquisition device, configured to acquire an image of a target object; and a processing unit communicatively connected to the image acquisition device, configured to The three-dimensional reconstruction network performs three-dimensional reconstruction on the target object in the image to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the three-dimensional model of the target object; Optimizing the initial value of the parameter based on the supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; performing bone skinning processing based on the optimized value of the parameter to establish a three-dimensional model of the target object.

根据本公开实施例的第四方面，提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现任一实施例所述的方法。According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any embodiment is implemented.

根据本公开实施例的第五方面，提供一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现任一实施例所述的方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, any The method described in the examples.

根据本公开实施例的第六方面，提供一种计算机程序产品，该计算机程序产品存储于存储介质中并包括可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现任一实施例所述的方法。According to a sixth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product is stored in a storage medium and includes a computer program that can run on a processor, and when the processor executes the computer program, any A method described in one embodiment.

本公开实施例通过将三维重建网络对目标对象的图像进行三维重建，从而得到参数的初始值，再基于监督信息对所述参数的初始值进行优化，基于参数优化得到的参数的优化值来建立目标对象的三维模型。参数优化的方法优点在于能够给出较为精确的，符合图像二维观察特征的三维重建结果，但往往会给不自然的，不合理的动作结果，可靠性较低。而通过三维重建网络进行网络回归则能够给出较为自然合理的动作结果，因此，将三维重建网络的输出结果作为参数的初始值来进行优化，能够在保证三维重建结果可靠性的基础上，兼顾三维重建的准确性。In the embodiment of the present disclosure, the initial value of the parameter is obtained by three-dimensionally reconstructing the image of the target object through the three-dimensional reconstruction network, and then the initial value of the parameter is optimized based on the supervisory information, and the optimized value of the parameter obtained based on the parameter optimization is established. 3D model of the target object. The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. Network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, optimizing the output of the 3D reconstruction network as the initial value of the parameters can ensure the reliability of the 3D reconstruction results. Accuracy of 3D reconstruction.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，而非限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

图1A和图1B是一些实施例的三维模型的示意图。1A and 1B are schematic illustrations of three-dimensional models of some embodiments.

图2是本公开实施例的三维重建方法的流程图。Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present disclosure.

图3是本公开实施例的整体流程图。FIG. 3 is an overall flowchart of an embodiment of the present disclosure.

图4A和图4B分别是本公开实施例的应用场景的示意图。FIG. 4A and FIG. 4B are schematic diagrams of application scenarios of embodiments of the present disclosure, respectively.

图5是本公开实施例的三维重建装置的框图。FIG. 5 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure.

图6是本公开实施例的三维重建系统的示意图。FIG. 6 is a schematic diagram of a three-dimensional reconstruction system according to an embodiment of the present disclosure.

图7是本公开实施例的计算机设备的结构示意图。FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

在本公开使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外，本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

应当理解，尽管在本公开可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本公开范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

为了使本技术领域的人员更好的理解本公开实施例中的技术方案，并使本公开实施例的上述目的、特征和优点能够更加明显易懂，下面结合附图对本公开实施例中的技术方案作进一步详细的说明。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.

对目标对象进行三维重建需要重建出目标对象的体态和肢体旋转，通常使用参数化模型来表达目标对象的体态和肢体旋转，而不仅仅是三维关键点。例如，对不同的人进行三维重建，分别重建出了体态较瘦的人的三维模型(如图1A所示)和体态较胖的人的三维模型(如图1B所示)，由于图1A所示的人和图1B所示的人处于相同的姿态下，关键点信息相同，仅通过关键点信息则无法表示出二者体态上的差异。The 3D reconstruction of the target object needs to reconstruct the body posture and limb rotation of the target object. Usually, a parametric model is used to express the body posture and limb rotation of the target object, not just the 3D key points. For example, by performing 3D reconstruction on different people, a 3D model of a thinner person (as shown in Figure 1A) and a 3D model of a fatter person (as shown in Figure 1B ) are respectively reconstructed. The person shown in Figure 1B is in the same posture as the person shown in Figure 1B, and the key point information is the same, and the difference in posture between the two cannot be represented only through the key point information.

在相关技术中，一般通过参数优化和网络回归两种方式进行三维重建。参数优化的方法通常选择一套标准参数，依据目标对象的图像的二维视觉特征，采用梯度下降法来对目标对象的三维模型的参数的初始值进行迭代优化，其中图像的二维视觉特征可以选择二维关键点等。参数优化的方法优点在于能够给出较为准确的、符合图像二维视觉特征的参数估计结果，但往往会给出不自然、不合理的动作结果，并且参数优化的最终性能非常依赖参数的初始值，导致基于参数优化的三维重建方式可靠性较低。In related technologies, 3D reconstruction is generally carried out by means of parameter optimization and network regression. The parameter optimization method usually selects a set of standard parameters, and uses the gradient descent method to iteratively optimize the initial value of the parameters of the 3D model of the target object according to the 2D visual features of the image of the target object, where the 2D visual features of the image can be Select 2D keypoints, etc. The advantage of the parameter optimization method is that it can give more accurate parameter estimation results that conform to the two-dimensional visual characteristics of the image, but it often gives unnatural and unreasonable action results, and the final performance of parameter optimization is very dependent on the initial value of the parameters , leading to low reliability of the 3D reconstruction method based on parameter optimization.

网络回归的方法通常训练一个端到端的神经网络来学习从图像到三维模型参数的映射。网络回归的方法优点在于能够给出较为自然合理的动作结果，但由于缺乏大量的训练数据，三维重建结果可能与图像中的二维视觉特征不符，因此，基于网络回归的三维重建方式准确度较低。相关技术中的三维重建方式无法兼顾三维重建结果的准确性和可靠性。Methods for network regression typically train an end-to-end neural network to learn the mapping from images to 3D model parameters. The advantage of the network regression method is that it can give more natural and reasonable action results. However, due to the lack of a large amount of training data, the 3D reconstruction results may not match the 2D visual features in the image. Therefore, the accuracy of the 3D reconstruction method based on network regression is relatively low. Low. The 3D reconstruction method in the related art cannot take into account the accuracy and reliability of the 3D reconstruction results.

基于此，本公开实施例提供一种三维重建方法，如图2所示，所述方法包括：Based on this, an embodiment of the present disclosure provides a three-dimensional reconstruction method, as shown in FIG. 2 , the method includes:

步骤201：通过三维重建网络对图像中的目标对象进行三维重建，得到所述目标对象的参数的初始值，其中，所述参数的初始值用于建立所述目标对象的三维模型；Step 201: Perform 3D reconstruction on the target object in the image through a 3D reconstruction network to obtain initial values of parameters of the target object, wherein the initial values of the parameters are used to establish a 3D model of the target object;

步骤202：基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化，得到参数的优化值；Step 202: Optimizing the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter;

步骤203：基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。Step 203: Perform bone skinning processing based on the optimized values of the parameters, and establish a 3D model of the target object.

在步骤201中，目标对象可以是三维对象，例如物理空间中的人、动物、机器人等，或者是所述三维对象上的一个或多个区域，例如，人脸或者肢体等。为了便于描述，下文以目标对象是人，对目标对象进行的三维重建为人体重建为例进行说明。所述目标对象的图像可以是单张图像，也可以包括从多个不同视角对目标对象进行拍摄得到的多张图像。基于单张图像的三维人体重建称为单目三维人体重建，基于不同视角的多张图像的三维人体重建称为多目三维人体重建。每张图像都可以是灰度图、RGB图像或者RGBD图像。所述图像可以是目标对象周围的图像采集装置(例如，相机或者摄像头)实时采集的图像，也可以是预先采集并储存的图像。In step 201, the target object may be a three-dimensional object, such as a person, an animal, a robot, etc. in a physical space, or one or more regions on the three-dimensional object, such as a human face or a limb. For the convenience of description, the target object is a human being, and the three-dimensional reconstruction performed on the target object is a human body reconstruction as an example for description. The image of the target object may be a single image, or may include multiple images obtained by shooting the target object from multiple different angles of view. 3D human body reconstruction based on a single image is called monocular 3D human body reconstruction, and 3D human body reconstruction based on multiple images from different perspectives is called multi-eye 3D human body reconstruction. Each image can be a grayscale image, RGB image or RGBD image. The image may be an image collected in real time by an image acquisition device (for example, a camera or a camera) around the target object, or an image collected and stored in advance.

可以通过三维重建网络对目标对象的图像进行三维重建，其中，三维重建网络可以是一个预先训练的神经网络。三维重建网络可以基于图像进行三维重建，并估计出自然合理的参数的初始值，这里的参数的初始值可以通过一个向量来表示，所述向量的维度例如可以是85维，所述向量中包含人体的运动肢体旋转信息(即姿态参数的初始值，包括人体的全局旋转参数的初始值和23个关键点的关键点旋转参数的初始值)、体态参数的初始值以及摄像机的参数的初始值这三部分信息。人体可以由关键点和连接这些关键点的肢体骨骼表示，人体关键点可包括头顶、鼻子、脖子、左右眼、左右耳、胸部、左右肩膀、左右手肘、左右手腕、左右髋部、左右臀、左右膝盖、左右脚踝等关键点中的一个或多个，姿态参数的初始值用于确定人体的关键点在三维空间中的位置。体态参数的初始值用于确定人体的高矮胖瘦等身材信息。所述摄像机的参数的初始值用于确定人体在摄像机坐标系下在三维空间中的绝对位置，摄像机的参数包括摄像机与人体之间的位移参数以及摄像机的姿态参数，其中，摄像机的姿态参数的初始值可以用人体的全局旋转参数的初始值来代替。可以使用多人线性蒙皮(Skinned Multi-Person Linear，SMPL)模型的参数形式(称为SMPL参数)来表示所述人体参数。在获取SMPL参数的值之后，可以基于SMPL参数的值进行骨骼蒙皮处理，即使用一个映射函数M(θ,β)将体态参数的初始值和姿态参数的初始值映射为人体表面的三维模型，该三维模型包括6890个顶点，顶点之间通过固定的连接关系构成三角面片。可以使用一个预训练的回归器W，从人体表面模型的顶点进一步回归出人体的三维关键点

即： The image of the target object can be reconstructed in 3D through a 3D reconstruction network, wherein the 3D reconstruction network can be a pre-trained neural network. The 3D reconstruction network can perform 3D reconstruction based on images, and estimate the initial values of natural and reasonable parameters. The initial values of the parameters here can be represented by a vector. The dimension of the vector can be 85 dimensions, for example, and the vector contains The rotation information of the moving limbs of the human body (that is, the initial value of the posture parameters, including the initial values of the global rotation parameters of the human body and the initial values of the key point rotation parameters of 23 key points), the initial values of the posture parameters and the initial values of the camera parameters These three parts of information. The human body can be represented by key points and limb bones connecting these key points. The key points of the human body can include the top of the head, nose, neck, left and right eyes, left and right ears, chest, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right buttocks, One or more of key points such as left and right knees, left and right ankles, etc., the initial value of the pose parameter is used to determine the position of the key points of the human body in three-dimensional space. The initial value of the body shape parameter is used to determine body shape information such as height, shortness, fatness, and thinness of the human body. The initial value of the parameter of the camera is used to determine the absolute position of the human body in the three-dimensional space under the camera coordinate system, and the parameter of the camera includes a displacement parameter between the camera and the human body and a posture parameter of the camera, wherein the posture parameter of the camera is The initial value can be replaced by the initial value of the global rotation parameter of the human body. The parameters of the human body can be expressed using a parametric form of a Skinned Multi-Person Linear (SMPL) model (referred to as SMPL parameters). After obtaining the value of the SMPL parameter, the bone skinning process can be performed based on the value of the SMPL parameter, that is, a mapping function M(θ,β) is used to map the initial value of the body parameter and the initial value of the attitude parameter to the three-dimensional model of the human body surface , the 3D model includes 6890 vertices, and the vertices form a triangular patch through a fixed connection relationship. A pre-trained regressor W can be used to further regress the 3D key points of the human body from the vertices of the human surface model

which is:

在步骤202中，监督信息可以是图像的二维视觉特征(也被称为二维观察特征)，例如，图像中目标对象的二维关键点和所述目标对象上的多个像素点的语义信息中的至少一者。一个像素点的语义信息用于表征所述像素点处于所述目标对象上的哪个区域，所述区域例如可以是头部、手臂、躯干、腿等所在区域。在采用二维关键点信息作为监督信息的情况下，可以使用二维关键点提取网络对图像中的人体关键点位置进行估计，此处可以选用任意的二维姿态估计方法，例如OpenPose。除了采用二维视觉特征作为监督信息之外，还可以将二维视觉特征和目标对象表面的初始三维点云共同作为监督信息，从而进一步提高三维重建的准确性。In step 202, the supervisory information can be two-dimensional visual features of the image (also called two-dimensional observation features), for example, the two-dimensional key points of the target object in the image and the semantics of multiple pixel points on the target object at least one of the information. The semantic information of a pixel is used to represent which area the pixel is located on the target object, and the area may be, for example, the area where the head, arm, torso, leg, etc. are located. In the case of using two-dimensional key point information as supervision information, the two-dimensional key point extraction network can be used to estimate the position of human key points in the image. Here, any two-dimensional pose estimation method can be used, such as OpenPose. In addition to using 2D visual features as supervision information, 2D visual features and the initial 3D point cloud of the target object surface can also be used as supervision information to further improve the accuracy of 3D reconstruction.

在所述图像包括深度图像(例如，所述图像为RGBD图像)的情况下，可以从所述深度图像中提取所述目标对象上多个像素点的深度信息，基于所述深度信息将所述深度图像中所述目标对象上的多个像素点投影到三维空间，得到所述目标对象表面的初始三维点云。When the image includes a depth image (for example, the image is an RGBD image), the depth information of multiple pixels on the target object can be extracted from the depth image, and the Multiple pixel points on the target object in the depth image are projected into a three-dimensional space to obtain an initial three-dimensional point cloud on the surface of the target object.

所述多个像素点可以是图像中目标对象上的部分或全部像素点。例如，可以包括目标对象上需要进行三维重建的各个区域的像素点，且每个区域中像素点的数量应大于或等于进行三维重建所需的数量。The plurality of pixels may be part or all of the pixels on the target object in the image. For example, it may include pixel points of various areas on the target object that need to be three-dimensionally reconstructed, and the number of pixel points in each area should be greater than or equal to the number required for three-dimensional reconstruction.

由于图像中一般既包括目标对象，又包括背景区域。因此，可以对所述图像中包括的RGB图像进行图像分割，获取所述RGB图像中目标对象所在的图像区域，基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域；获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过进行图像分割，可以从图像中提取出需要进行三维重建的目标对象所在的图像区域，避免图像中的背景区域对三维重建的影响。在一些实施例中，所述深度图像中的像素点与所述RGB图像中的像素点一一对应。例如，所述图像也可以为RGBD图像。Because the image generally includes both the target object and the background area. Therefore, image segmentation can be performed on the RGB image included in the image, the image area where the target object is located in the RGB image is obtained, and the target object in the depth image is determined based on the image area where the target object is located in the RGB image. the image area; acquiring depth information of multiple pixels in the image area where the target object is located in the depth image. By performing image segmentation, the image area where the target object that needs to be three-dimensionally reconstructed is located can be extracted from the image, and the influence of the background area in the image on the three-dimensional reconstruction can be avoided. In some embodiments, the pixels in the depth image correspond one-to-one to the pixels in the RGB image. For example, the image may also be an RGBD image.

进一步地，还可以从三维点云(即，初始三维点云)中过滤掉离群点，监督信息可包括过滤后的三维点云。所述过滤可以采用点云过滤器实现。通过过滤掉离群点，能够得到更加精细的目标对象表面的三维点云，从而进一步提高三维重建的准确性。对三维点云中的每一个目标三维点，获取与该目标三维点距离最近的n个三维点到该目标三维点的平均距离，假设各个目标三维点对应的平均距离服从一个统计分布(例如，高斯分布)，可以计算该统计分布的均值和方差，并基于所述均值和方差设定一个阈值s，那么平均距离在阈值s范围之外的三维点，可以被视为离群点并从三维点云中过滤掉。Further, outliers can also be filtered out from the 3D point cloud (ie, the initial 3D point cloud), and the supervision information can include the filtered 3D point cloud. The filtering can be implemented using a point cloud filter. By filtering out outliers, a finer 3D point cloud of the surface of the target object can be obtained, thereby further improving the accuracy of 3D reconstruction. For each target 3D point in the 3D point cloud, obtain the average distance from the n 3D points nearest to the target 3D point to the target 3D point, assuming that the average distance corresponding to each target 3D point obeys a statistical distribution (for example, Gaussian distribution), the mean and variance of the statistical distribution can be calculated, and a threshold s can be set based on the mean and variance, then the three-dimensional points whose average distance is outside the range of the threshold s can be regarded as outliers and analyzed from the three-dimensional filtered from the point cloud.

在实际应用中，如果所述图像为RGB图像，可以将二维观察特征作为监督信息对所述参数的初始值进行迭代优化。如果所述图像为RGBD图像，可以将二维观察特征和目标对象表面的三维点云共同作为监督信息对所述参数的初始值进行迭代优化。优化方式例如可以采用梯度下降法，本公开对此不做限制。In practical applications, if the image is an RGB image, the initial values of the parameters can be iteratively optimized using the two-dimensional observation features as supervisory information. If the image is an RGBD image, the two-dimensional observation features and the three-dimensional point cloud of the surface of the target object can be used as supervisory information to iteratively optimize the initial value of the parameter. The optimization method may, for example, use a gradient descent method, which is not limited in the present disclosure.

在步骤203中，可以基于所述参数的优化值进行骨骼蒙皮处理，得到所述目标对象的三维模型。In step 203, bone skinning processing may be performed based on the optimized values of the parameters to obtain a three-dimensional model of the target object.

如图3所示，是本公开实施例的整体流程图。在输入为RGB图像的情况下，可以通过三维重建网络对RGB图像进行三维重建，得到图像中人的人体参数值，并采用关键点提取网络对图像中的人进行关键点提取，得到人体二维关键点。然后，将人体参数值作为参数的初始值，将人体二维关键点作为监督信息，通过参数优化模块对人体参数初始值进行优化，得到人体参数的优化值，并基于人体参数的优化值进行骨骼蒙皮处理，得到人体重建模型。As shown in FIG. 3 , it is an overall flowchart of the embodiment of the present disclosure. In the case that the input is an RGB image, the RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key points of the person in the image to obtain the two-dimensional human body key point. Then, the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body are used as the supervision information, and the initial value of the human body parameter is optimized through the parameter optimization module to obtain the optimized value of the human body parameter, and based on the optimized value of the human body parameter. Skinning processing to obtain the human body reconstruction model.

在输入为RGBD图像的情况下，可以将图像分解为RGB图像和TOF(Time of Flight，飞行时间)深度图，TOF深度图中包括RGB图像中各个像素点的深度信息。可以通过三维重建网络对RGB图像进行三维重建，得到图像中人的人体参数值，并采用关键点提取网络对图像中的人进行关键点提取，得到人体二维关键点。还可以采用点云重建模块来基于TOF深度图中的深度信息重建出人体表面点云。然后，将人体参数值作为参数的初始值，将人体二维关键点和人体表面点云共同作为监督信息，通过参数优化模块对人体参数初始值进行优化，得到人体参数的优化值，并基于人体参数的优化值进行骨骼蒙皮处理，得到人体重建模型。In the case that the input is an RGBD image, the image can be decomposed into an RGB image and a TOF (Time of Flight, time of flight) depth map. The TOF depth map includes the depth information of each pixel in the RGB image. The RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key point of the person in the image to obtain the two-dimensional key point of the human body. The point cloud reconstruction module can also be used to reconstruct the surface point cloud of the human body based on the depth information in the TOF depth map. Then, the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body and the point cloud of the human body surface are jointly used as supervision information. The optimal value of the parameters is processed by bone skinning to obtain the human body reconstruction model.

进一步地，在得到人体重建模型之后，还可以基于RGB图像或者RGBD图像中的颜色信息，对人体重建模型进行色彩处理，以使人体重建模型与图像中的人物的颜色信息相匹配。Further, after the human body reconstruction model is obtained, color processing may be performed on the human body reconstruction model based on the color information in the RGB image or the RGBD image, so that the human body reconstruction model matches the color information of the person in the image.

本公开实施例中，通过三维重建网络对图像中的目标对象进行三维重建，从而得到参数的初始值，再基于监督信息对所述参数的初始值进行优化，基于参数的优化值来建立目标对象的三维模型。参数优化的方法优点在于能够给出较为精确的，符合图像二维观察特征的三维重建结果，但往往会给不自然的、不合理的动作结果，可靠性较低。而通过三维重建网络进行网络回归则能够给出较为自然合理的动作结果，因此，将三维重建网络的输出结果作为参数的初始值来进行参数优化，能够在保证三维重建结果可靠性的基础上，兼顾三维重建的准确性。In the embodiment of the present disclosure, the target object in the image is reconstructed three-dimensionally through the three-dimensional reconstruction network, so as to obtain the initial value of the parameter, and then optimize the initial value of the parameter based on the supervision information, and establish the target object based on the optimized value of the parameter 3D model of . The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. The network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, using the output of the 3D reconstruction network as the initial value of the parameters for parameter optimization can ensure the reliability of the 3D reconstruction results. Taking into account the accuracy of 3D reconstruction.

在一些实施例中，在参数优化阶段，可以采用多阶段优化方法。所述多阶段优化方法可包括摄像机优化阶段与姿态优化阶段。在摄像机优化阶段，优化目标为全局旋转参数的值R以及所述图像采集装置与所述目标对象之间的位移参数的当前值t。其中，t和R都是三维向量，R使用轴角形式表达。在姿态优化阶段，优化目标为关键点旋转参数的值与体态参数的值。In some embodiments, in the parameter optimization stage, a multi-stage optimization method may be used. The multi-stage optimization method may include a camera optimization stage and a pose optimization stage. In the camera optimization stage, the optimization targets are the value R of the global rotation parameter and the current value t of the displacement parameter between the image acquisition device and the target object. Among them, t and R are three-dimensional vectors, and R is expressed in the form of axis and angle. In the pose optimization stage, the optimization targets are the values of key point rotation parameters and body posture parameters.

由于在优化过程中，改变摄像机位置与改变人体三维关键点位置均可以导致三维关键点的二维投影产生变化，这将会导致优化过程很不稳定。因此在摄像机优化阶段中，固定人体姿态，在姿态优化阶段，固定摄像机位置，从而提高优化过程的稳定性。即，在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下，基于所述监督信息和所述位移参数的初始值，对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化，得到位移参数的优化值和全局旋转参数的优化值；然后保持位移参数的优化值和全局旋转参数的优化值不变，基于所述位移参数的优化值和全局旋转参数的优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化，得到关键点旋转参数的优化值和体态参数的优化值。Because in the optimization process, changing the position of the camera and changing the position of the 3D key points of the human body can cause changes in the 2D projection of the 3D key points, which will make the optimization process very unstable. Therefore, in the camera optimization stage, the human body pose is fixed, and in the pose optimization stage, the camera position is fixed, thereby improving the stability of the optimization process. That is, when the initial value of the body posture parameter and the initial value of the key point rotation parameter remain unchanged, based on the supervision information and the initial value of the displacement parameter, the current displacement parameter of the image acquisition device value and the initial value of the global rotation parameter are optimized to obtain the optimal value of the displacement parameter and the optimal value of the global rotation parameter; then keep the optimal value of the displacement parameter and the optimal value of the global rotation parameter unchanged, and based on the The optimized value and the optimized value of the global rotation parameter are optimized by optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter to obtain the optimized value of the key point rotation parameter and the optimized value of the body shape parameter.

进一步地，可以获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到；所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到。获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失。获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失。基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。Further, among the 2D projection key points corresponding to the 3D key points of the target object, target 2D projection key points belonging to preset parts of the target object can be acquired; wherein, the 3D key points of the target object are based on the The initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained; the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter for the target object The 3D key points of are obtained by projection. A first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.

其中，所述预设部位可以是躯干部位，所述目标二维投影关键点可以包括左右肩膀点，左右髋部点，脊柱中心点等关键点。由于不同的动作对躯干部位的关键点的影响较小，因此，通过采用躯干部位的关键点建立第一损失，能够减轻不同动作对关键点位置的影响，提高优化结果的准确性。第一损失也可以称为躯干关键点投影损失，第二损失也可以称为相机位移正则化损失，第一损失可通过下述公式(1)得到，第二损失可通过下述公式(2)得到：Wherein, the preset part may be a trunk part, and the key points of the target two-dimensional projection may include left and right shoulder points, left and right hip points, spine center points and other key points. Since different actions have less influence on the key points of the torso, by using the key points of the torso to establish the first loss, the influence of different actions on the position of the key points can be reduced and the accuracy of the optimization result can be improved. The first loss can also be called torso key point projection loss, and the second loss can also be called camera displacement regularization loss. The first loss can be obtained by the following formula (1), and the second loss can be obtained by the following formula (2) get:

L _cam＝||t-t _net|| ₂ (2)； L _cam = ||tt _net || ₂ (2);

其中，L _torso和L _cam分别表示第一损失和第二损失，x _torso和

分别表示目标二维投影关键点和初始二维关键点，t和t _net分别表示所述图像采集装置与所述目标对象之间的位移参数的当前值以及所述位移参数的初始值。可以基于第一损失和第二损失确定第一目标损失L ₁，例如，所述第一目标损失可以确定为所述第一损失与所述第二损失之和，可通过下述公式(3)确定： Among them, L _torso and L _cam denote the first loss and the second loss respectively, x _torso and

represent target two-dimensional projection key points and initial two-dimensional key points respectively, and t and t _net represent the current value of the displacement parameter between the image acquisition device and the target object and the initial value of the displacement parameter respectively. The first target loss L ₁ can be determined based on the first loss and the second loss. For example, the first target loss can be determined as the sum of the first loss and the second loss, which can be determined by the following formula (3) Sure:

L ₁＝L _torso+L _cam (3)。 L ₁ =L _torso +L _cam (3).

可以获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失，其中，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到。获取第四损失，所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性。基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。A third loss between an optimized 2D projection keypoint of the target object and the initial 2D keypoint may be obtained, wherein the optimized 2D projection keypoint is based on an optimized value of the displacement parameter and a global rotation parameter The optimized value of is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter. The fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body posture parameter. Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.

第三损失也可以称为二维关键点投影损失，第四损失也可以称为姿态合理性损失，第三损失可通过下述公式(4)确定：The third loss can also be called the two-dimensional key point projection loss, the fourth loss can also be called the attitude rationality loss, and the third loss can be determined by the following formula (4):

其中，L _2d为第三损失，x和

分别表示所述优化二维投影关键点以及所述初始二维关键点。可以基于第三损失和第四损失确定第二目标损失，例如，所述第二目标损失可以确定为所述第三损失与所述第四损失之和，可通过下述公式(5)确定： where L _2d is the third loss, x and

represent the optimized two-dimensional projection key points and the initial two-dimensional key points respectively. The second target loss may be determined based on the third loss and the fourth loss. For example, the second target loss may be determined as the sum of the third loss and the fourth loss, which may be determined by the following formula (5):

L ₂＝L _2d+L _prior (5)； L ₂ =L _2d +L _prior (5);

其中，L ₂为第二目标损失，L _prior为第四损失，可以采用高斯混合模型(Gaussian Mixture Model，GMM)来获取，用于判断全局旋转参数的优化值、关键点旋转参数的初始和体态参数的初始值对应的姿态是否合理，对不合理的姿态输出较大的损失。 Among them, L ₂ is the second target loss, and L _prior is the fourth loss, which can be obtained by using a Gaussian Mixture Model (GMM), which is used to judge the optimal value of the global rotation parameter, the initial and body posture of the key point rotation parameter Whether the attitude corresponding to the initial value of the parameter is reasonable, and output a large loss for the unreasonable attitude.

在基于所述位移参数的优化值和全局旋转参数优化值，对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后，还可以对所述全局旋转参数的优化值，所述关键点旋转参数的优化值，体态参数的优化值以及所述位移参数的优化值进行联合优化，即采用三阶段优化方式。对于监督信息中包括目标对象表面的三维点云的信息的情况，可以采用所述三阶段优化方式，包括摄像机优化阶段、姿态优化阶段和点云优化阶段。After optimizing the initial value of the key point rotation parameter and the initial value of the posture parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the optimized value of the global rotation parameter may also be optimized, The optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, that is, a three-stage optimization method is adopted. For the case where the supervision information includes the information of the 3D point cloud on the surface of the target object, the three-stage optimization method can be adopted, including the camera optimization stage, the attitude optimization stage and the point cloud optimization stage.

在摄像机优化阶段，可以获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点；其中，所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到，所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到。获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失。获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失。获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失；其中，所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到。基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。所述第五损失也可以称为最近点迭代(Iterative Closest Point，ICP)点云配准损失，可通过如下公式(6)确定：In the camera optimization stage, target two-dimensional projection key points belonging to preset parts of the target object among the two-dimensional projection key points corresponding to the three-dimensional key points of the target object can be obtained; wherein, the three-dimensional key points of the target object Based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The 3D key points of the target object are obtained by projection. A first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. Acquiring a fifth loss between the first 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the first 3D point cloud is based on the initial value of the global rotation parameter, the key point rotation parameter Initial values and initial values of body parameters are obtained. The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss. The fifth loss can also be called the nearest point iteration (Iterative Closest Point, ICP) point cloud registration loss, which can be determined by the following formula (6):

式中，L _icp为所述第五损失，将所述初始三维点云看作点云P，将所述第一三维点云看作点云Q，K ₁＝{(p,q)}为点云P中的每个点到点云Q中距离最近的点构成的点对集合，K ₂＝{(p,q)}为点云Q中的每个点到点云P中距离最近的点构成的点对集合。第一损失和第二损失分别通过如下公式(7)和公式(8)表示： In the formula, L _icp is the fifth loss, the initial 3D point cloud is regarded as point cloud P, and the first 3D point cloud is regarded as point cloud Q, K ₁ ={(p,q)} is Each point in point cloud P is a set of point pairs formed by the closest point in point cloud Q, K ₂ ={(p,q)} is the point pair set from each point in point cloud Q to the closest point in point cloud P A set of point pairs composed of points. The first loss and the second loss are represented by the following formula (7) and formula (8) respectively:

L _cam＝||t-t _net|| ₂ (8)； L _cam = ||tt _net || ₂ (8);

其中，L _torso和L _cam分别表示第一损失和第二损失，x _torso和

分别表示目标二维投影关键点和初始二维关键点，t和t _net分别表示所述位移参数的当前值以及所述位移参数的初始值。可以基于第一损失、第二损失和第五损失之和确定第一目标损失L ₁，再基于第一目标损失对所述位移参数的当前值和全局旋转参数的初始值进行优化，即，如以下公式(9)： Among them, L _torso and L _cam denote the first loss and the second loss respectively, x _torso and

respectively represent the target two-dimensional projection key point and the initial two-dimensional key point, t and t _net represent the current value of the displacement parameter and the initial value of the displacement parameter respectively. The first target loss L ₁ can be determined based on the sum of the first loss, the second loss and the fifth loss, and then optimize the current value of the displacement parameter and the initial value of the global rotation parameter based on the first target loss, that is, as The following formula (9):

L ₁＝L _torso+L _cam+L _icp (9)。 L ₁ =L _torso +L _cam +L _icp (9).

三阶段优化过程中的姿态优化阶段与二阶段优化过程中的姿态优化阶段的优化方式相同，此处不再赘述。The attitude optimization stage in the three-stage optimization process is the same as the attitude optimization stage in the two-stage optimization process, and will not be repeated here.

在点云优化阶段，可以获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失，其中，所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到，所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到。获取第七损失，所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性。获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失；其中，所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到。基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、体态参数的优化值以及所述位移参数的优化值进行联合优化，可通过以下公式(10)和公式(11)进行优化：In the point cloud optimization stage, the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint can be obtained, wherein the optimized 2D projection keypoint is based on the displacement parameter The optimized value and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the body shape parameter. The optimized value is obtained. A seventh loss is obtained, and the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter. Obtain an eighth loss between the second 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the second 3D point cloud is based on the optimized value of the global rotation parameter and the key point rotation parameter The optimal value and the optimal value of body parameters are obtained. Jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the posture parameter and the optimized value of the displacement parameter based on the sixth loss, the seventh loss and the eighth loss , can be optimized by the following formula (10) and formula (11):

式中，

为第六损失，

为优化二维投影关键点，

为初始二维关键点。第七损失可以采用高斯混合模型来获取，用于判断全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态是否合理，对不合理的姿态输出较大的损失。

为第八损失，P为所述初始三维点云看作点云，

为所述第二三维点云，

为点云P中的每个点到点云

中距离最近的点构成的点对集合，

为点云

中的每个点到点云P中距离最近的点构成的点对集合。进一步地，可以将第六损失、第七损失和第八损失之和确定为第三目标损失L ₃，并基于第三目标损失对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、体态参数的优化值以及所述位移参数的优化值进行联合优化，可通过以下公式(12)进行联合优化： In the formula,

for the sixth loss,

To optimize the 2D projection keypoints,

is the initial two-dimensional keypoint. The seventh loss can be obtained by using a Gaussian mixture model, which is used to judge whether the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter is reasonable, and outputs a large loss for an unreasonable posture .

is the eighth loss, P is the initial 3D point cloud as a point cloud,

is the second 3D point cloud,

For each point in the point cloud P to the point cloud

A set of point pairs consisting of the closest points in the middle,

for the point cloud

A set of point pairs from each point in point cloud P to the nearest point in point cloud P. Further, the sum of the sixth loss, the seventh loss and the eighth loss can be determined as the third target loss L ₃ , and based on the third target loss, optimize the value of the global rotation parameter, the key point rotation parameter The optimal value of the optimized value, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, and can be jointly optimized by the following formula (12):

L ₃＝L _2d+L _prior+L _icp (12)。 L ₃ =L _2d +L _prior +L _icp (12).

在目标对象的图像为RGB图像的情况下，可以基于前述包括摄像机优化阶段与姿态优化阶段的二阶段优化方法进行参数优化；在目标对象的图像为RGBD图像的情况下，可以基于前述包括摄像机优化阶段、姿态优化阶段与点云优化阶段的三阶段优化方法进行参数优化。In the case where the image of the target object is an RGB image, parameter optimization can be performed based on the aforementioned two-stage optimization method including the camera optimization stage and the attitude optimization stage; The parameters are optimized by the three-stage optimization method of the stage, attitude optimization stage and point cloud optimization stage.

本方案的使用场景广泛，可以在虚拟试衣间、虚拟主播、视频动作迁移等场景中给出自然合理且准确的人体重建模型。This solution can be used in a wide range of scenarios, and can provide natural, reasonable and accurate human body reconstruction models in scenarios such as virtual fitting rooms, virtual anchors, and video action migration.

如图4A所示，是本公开实施例的虚拟试衣间应用场景的示意图。可以通过摄像头403采集用户401的图像，并将采集的图像发送给处理器(图中未示出)进行三维人体重建，以便获取用户401对应的人体重建模型404，并将人体重建模型404展示在显示界面402上供用户401观看。同时，用户401可以选择所需的服饰405，包括但不限于衣服4051和帽子4052等，可以基于人体重建模型404在显示界面402上显示服饰405，从而使用户401观看服饰405的穿戴效果。As shown in FIG. 4A , it is a schematic diagram of an application scene of a virtual fitting room according to an embodiment of the present disclosure. The image of the user 401 can be collected by the camera 403, and the collected image is sent to a processor (not shown in the figure) for three-dimensional human body reconstruction, so as to obtain the human body reconstruction model 404 corresponding to the user 401, and the human body reconstruction model 404 is displayed on The display interface 402 is for the user 401 to watch. At the same time, the user 401 can select the required clothing 405, including but not limited to clothing 4051 and hat 4052, etc., and the clothing 405 can be displayed on the display interface 402 based on the human body reconstruction model 404, so that the user 401 can watch the wearing effect of the clothing 405.

如图4B所示，是本公开实施例的虚拟直播间应用场景的示意图。在进行直播的过程中，可以通过主播客户端407采集主播用户406的图像，将主播用户406的图像发送至服务器408进行三维重建，得到主播用户的人体重建模型，即虚拟主播。服务器408可以将主播用户的人体重建模型返回至主播客户端407进行展示，如图中的模型4071所示。此外，主播客户端407还可以采集主播用户的语音信息，并将语音信息发送至服务器408，以使服务器408对人体重建模型以及语音信息进行融合。服务器408可以将融合后的人体重建模型和语音信息发送至观看直播节目的观众客户端409进行显示和播放，其中，显示的人体重建模型如图中的模型4091所示。通过上述方式，可以在观众客户端409上显示虚拟主播进行直播的画面。As shown in FIG. 4B , it is a schematic diagram of an application scenario of a virtual live broadcast room according to an embodiment of the present disclosure. During the live broadcast, the image of the anchor user 406 can be collected through the anchor client 407, and the image of the anchor user 406 can be sent to the server 408 for three-dimensional reconstruction to obtain the human body reconstruction model of the anchor user, that is, the virtual anchor. The server 408 can return the human body reconstruction model of the host user to the host client 407 for display, as shown in the model 4071 in the figure. In addition, the host client 407 can also collect the voice information of the host user, and send the voice information to the server 408, so that the server 408 can fuse the reconstruction model of the human body and the voice information. The server 408 can send the fused human body reconstruction model and voice information to the viewer client 409 watching the live program for display and playback, wherein the displayed human body reconstruction model is shown as model 4091 in the figure. Through the above method, the live broadcast screen of the virtual anchor can be displayed on the viewer client 409 .

本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

如图5所示，本公开还提供一种三维重建装置，所述装置包括：As shown in FIG. 5 , the present disclosure also provides a three-dimensional reconstruction device, which includes:

第一三维重建模块501，用于通过三维重建网络对图像中的目标对象进行三维重建，得到所述目标对象的参数的初始值，所述参数的初始值用于建立所述目标对象的三维模型；The first three-dimensional reconstruction module 501 is configured to perform three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, and the initial value of the parameter is used to establish a three-dimensional model of the target object ;

优化模块502，用于基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化，得到所述参数的优化值；An optimization module 502, configured to optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object, to obtain the optimized value of the parameter;

第二三维重建模块503，用于基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。The second three-dimensional reconstruction module 503 is configured to perform bone skinning processing based on the optimized values of the parameters, and establish a three-dimensional model of the target object.

在一些实施例中，本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法，其具体实现可以参照上文方法实施例的描述，为了简洁，这里不再赘述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.

如图6所示，本公开还提供一种三维重建系统，所述系统包括：As shown in FIG. 6, the present disclosure also provides a three-dimensional reconstruction system, which includes:

图像采集装置601，用于采集目标对象的图像；以及An image acquisition device 601, configured to acquire an image of a target object; and

与所述图像采集装置601通信连接的处理单元602，用于通过三维重建网络对所述图像中的目标对象进行三维重建，得到所述目标对象的参数的初始值，所述参数的初始值用于建立所述目标对象的三维模型；基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化，得到所述参数的优化值；基于所述参数的优化值进行骨骼蒙皮处理，建立所述目标对象的三维模型。The processing unit 602 communicated with the image acquisition device 601 is configured to perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used To establish a three-dimensional model of the target object; optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object to obtain the optimized value of the parameter; Skeletal skinning processing to establish a 3D model of the target object.

本公开实施例中的图像采集装置601可以是相机或者摄像头等具有图像采集功能的设备，图像采集装置601采集的图像可以实时传输给处理单元602，或者经过存储，并在需要时从存储空间传输到处理单元602。处理单元602可以是单个服务器或者是由多个服务器构成的服务器集群。处理单元602所执行的方法详见前述三维重建方法的实施例，此处不再赘述。The image acquisition device 601 in the embodiment of the present disclosure may be a device with an image acquisition function such as a camera or a camera, and the images collected by the image acquisition device 601 may be transmitted to the processing unit 602 in real time, or stored, and transmitted from the storage space when needed to processing unit 602. The processing unit 602 may be a single server or a server cluster composed of multiple servers. For the method executed by the processing unit 602, refer to the above-mentioned embodiment of the three-dimensional reconstruction method for details, and details are not repeated here.

本说明书实施例还提供一种计算机设备，其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其中，处理器执行所述程序时实现前述任一实施例所述的方法。The embodiment of this specification also provides a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.

图7示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图，该设备可以包括：处理器701、存储器702、输入/输出接口703、通信接口704和总线705。其中处理器701、存储器702、输入/输出接口703和通信接口704通过总线705实现彼此之间在设备内部的通信连接。FIG. 7 shows a schematic diagram of a more specific hardware structure of a computing device provided by the embodiment of this specification. The device may include: a processor 701 , a memory 702 , an input/output interface 703 , a communication interface 704 and a bus 705 . The processor 701 , the memory 702 , the input/output interface 703 and the communication interface 704 are connected to each other within the device through the bus 705 .

处理器701可以采用通用的CPU(Central Processing Unit，中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本说明书实施例所提供的技术方案。处理器701还可以包括显卡，所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。The processor 701 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification. The processor 701 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.

存储器702可以采用ROM(Read Only Memory，只读存储器)、RAM(Random Access Memory，随机存取存储器)、静态存储设备，动态存储设备等形式实现。存储器702可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器702中，并由处理器701来调用执行。The memory 702 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 702 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 702 and invoked by the processor 701 for execution.

输入/输出接口703用于连接输入/输出模块，以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出)，也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等，输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 703 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

通信接口704用于连接通信模块(图中未示出)，以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 704 is used to connect with a communication module (not shown in the figure), so as to realize communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).

总线705包括一通路，在设备的各个组件(例如处理器701、存储器702、输入/输出接口703和通信接口704)之间传输信息。Bus 705 includes a path for transferring information between the various components of the device (eg, processor 701, memory 702, input/output interface 703, and communication interface 704).

需要说明的是，尽管上述设备仅示出了处理器701、存储器702、输入/输出接口703、通信接口704以及总线705，但是在具体实施过程中，该设备还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述设备中也可以仅包含实现本说明书实施例方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 701, the memory 702, the input/output interface 703, the communication interface 704, and the bus 705, in the specific implementation process, the device may also include other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.

本公开实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现前述任一实施例所述的方法。An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this specification.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机，计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置实施例而言，由于其基本相似于方法实施例，所以描述得比较简单，相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Claims

A three-dimensional reconstruction method, said method comprising:

Performing three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, wherein the initial value of the parameter is used to establish a three-dimensional model of the target object;

Optimizing the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter;

Skeleton skinning processing is performed based on the optimized values of the parameters, and a three-dimensional model of the target object is established.

The method according to claim 1, wherein the supervision information comprises first supervision information, or the supervision information comprises first supervision information and second supervision information;

The first supervisory information includes at least one of the following: initial two-dimensional key points of the target object, semantic information of multiple pixels on the target object in the image;

The second supervisory information includes an initial three-dimensional point cloud of the surface of the target object.

The method according to claim 2, further comprising:

The initial two-dimensional key point information of the target object is extracted from the image through a key point extraction network.

The method according to claim 2 or 3, wherein the image comprises a depth image of the target object; the method further comprises:

extracting depth information of the plurality of pixel points on the target object from the depth image;

Back-projecting the plurality of pixel points on the target object in the depth image to a three-dimensional space based on the depth information to obtain the initial three-dimensional point cloud of the surface of the target object.

The method according to claim 4, wherein the image further comprises an RGB image of the target object; extracting the depth information of the plurality of pixels on the target object from the depth image comprises:

Carry out image segmentation to described RGB image;

Determining the image area where the target object is located in the RGB image based on the image segmentation result;

determining the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image;

Depth information of the plurality of pixels in the image area where the target object is located in the depth image is acquired.

The method according to any one of claims 2 to 5, wherein the method further comprises:

Filter out outliers from the initial three-dimensional point cloud, and use the filtered initial three-dimensional point cloud as the second supervisory information.

The method according to any one of claims 1 to 6, wherein the image of the target object is acquired by an image acquisition device, and the parameters include: the global rotation parameter of the target object, each Key point rotation parameters of key points, body posture parameters of the target object, and displacement parameters of the image acquisition device;

Optimizing initial values of the parameters based on pre-acquired supervisory information representing features of the target object includes:

Under the condition that the initial value of the posture parameter and the initial value of the key point rotation parameter remain unchanged, based on the supervision information and the initial value of the displacement parameter, the displacement parameter of the image acquisition device is adjusted. Optimizing the current value of the current value and the initial value of the global rotation parameter to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter;

Based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter, and obtain the optimized value of the key point rotation parameter and the optimal value of the key point rotation parameter. Optimum values for the body parameters.

The method according to claim 7, wherein the supervisory information includes initial two-dimensional key points of the target object;

Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter based on the supervisory information and the initial value of the displacement parameter, including:

Acquiring the target two-dimensional projection key points corresponding to the two-dimensional projection key points corresponding to the three-dimensional key points of the target object, which belong to the preset part of the target object; wherein, the three-dimensional key points of the target object are based on the global rotation parameter The initial value of the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained, and the key point of the two-dimensional projection is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The three-dimensional key points of the object are obtained by projection;

obtaining a first loss between the target 2D projected keypoint and the initial 2D keypoint;

obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter;

The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.

The method according to claim 7 or 8, wherein the supervisory information includes the initial two-dimensional key points of the target object; based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the The initial value of the key point rotation parameter and the initial value of the body shape parameter are optimized, including:

Obtaining a third loss between the optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, wherein the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the global rotation The optimized value of the parameter is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter worth it;

Obtaining a fourth loss, the fourth loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter;

Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.

The method according to any one of claims 7 to 9, wherein the initial value of the key point rotation parameter and the body posture are adjusted based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter. After the initial values of the parameters are optimized, the method further includes:

Joint optimization is performed on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter, and the optimized value of the displacement parameter.

The method according to claim 10, wherein the supervision information includes initial two-dimensional key points of the target object and an initial three-dimensional point cloud of the surface of the target object; based on the supervision information and the displacement parameter The initial value of the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter are optimized, including:

Acquiring the target two-dimensional projection key points corresponding to the two-dimensional projection key points corresponding to the three-dimensional key points of the target object, which belong to the preset part of the target object; wherein, the three-dimensional key points of the target object are based on the global rotation parameter The initial value of the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained, and the key point of the two-dimensional projection is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The 3D key points of the target object are obtained by projection;

Obtain a fifth loss between the first 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the first 3D point cloud is based on the initial value of the global rotation parameter, the key point rotation The initial value of the parameter and the initial value of the posture parameter are obtained;

The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss.

The method according to claim 10 or 11, wherein the optimization value of the global rotation parameter, the optimization value of the key point rotation parameter, the optimization value of the body shape parameter and the optimization value of the displacement parameter Perform joint optimization, including:

Obtaining a sixth loss between the optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, wherein the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the global rotation The optimized value of the parameter is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized body shape parameter worth it;

Obtaining a seventh loss, the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter;

Obtain an eighth loss between the second 3D point cloud of the surface of the target object and the initial 3D point cloud; the second 3D point cloud is based on the optimized value of the global rotation parameter, the key point rotation parameter Optimal value and the optimal value of described posture parameter are obtained;

Based on the sixth loss, the seventh loss and the eighth loss, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the displacement parameter The optimized value is jointly optimized.

A three-dimensional reconstruction device, said device comprising:

The first three-dimensional reconstruction module is configured to perform three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, wherein the initial value of the parameter is used to establish a three-dimensional image of the target object Model;

An optimization module, configured to optimize the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object, to obtain the optimized value of the parameter;

The second three-dimensional reconstruction module is configured to perform bone skinning processing based on the optimized values of the parameters, and establish a three-dimensional model of the target object.

A three-dimensional reconstruction system, the system comprising:

an image capture device for capturing an image of the target object; and

A processing unit communicatively connected to the image acquisition device, configured to perform three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, wherein the initial value of the parameter The value is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; based on the parameter The optimal value is used for bone skinning processing to establish a 3D model of the target object.

A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of claims 1 to 12 is implemented.

A computer device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor implementing the method according to any one of claims 1 to 12 when executing the program.