CN118470222B

CN118470222B - Medical ultrasonic image three-dimensional reconstruction method and system based on SDF diffusion

Info

Publication number: CN118470222B
Application number: CN202410917497.7A
Authority: CN
Inventors: 蔡青; 童亨; 仇世纪; 刘治; 董军宇
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2024-07-10
Filing date: 2024-07-10
Publication date: 2024-09-06
Anticipated expiration: 2044-07-10
Also published as: CN118470222A

Abstract

The present invention discloses a method and system for three-dimensional reconstruction of medical ultrasound images based on SDF diffusion, and belongs to the field of computer vision technology. First, a visual model pre-trained on medical data is used to extract features of an input ultrasound image. Then, the SDF diffusion process is started, and the reverse diffusion process starts with completely random noise, and gradually denoises to obtain a clean SDF field, wherein the SDF features and ultrasound image features are fused using a state space model and cross attention to achieve the consistency of the SDF field representation of the three-dimensional surface with the ultrasound image. After actual verification, the medical ultrasound image three-dimensional reconstruction method based on SDF diffusion provided by the present invention has the characteristics of high efficiency and high precision.

Description

A method and system for three-dimensional reconstruction of medical ultrasound images based on SDF diffusion

技术领域Technical Field

本发明涉及一种基于SDF扩散的医学超声影像三维重建方法和系统，属于计算机视觉技术领域。The invention relates to a method and system for three-dimensional reconstruction of medical ultrasonic images based on SDF diffusion, and belongs to the technical field of computer vision.

背景技术Background Art

超声图像三维重建是指通过获取和处理一系列二维超声图像数据，最终生成三维的解剖结构图像或体积数据，具有重要的临床意义和应用价值。物体的三维形状可以有多种的表达形式，如体素、点云、网格等，基于体素表示的方法可以使用3D卷积神经网络(CNN)，它可以重建具有任意拓扑结构的对象。然而，巨大的内存需求和计算时间限制了大多数方法对高分辨率的重建结果，所以无法实现超高的精度重建。点云表示相对简单且高度灵活，但是因为点云不是规则结构，所以它不能很好地适应传统的3D CNN网络。类似的，网络可以高精度的表示三维形状，但它也是离散的，无法用传统神经网络很好的表示。Ultrasound image 3D reconstruction refers to the process of acquiring and processing a series of 2D ultrasound image data to ultimately generate 3D anatomical structure images or volume data, which has important clinical significance and application value. The 3D shape of an object can be expressed in a variety of forms, such as voxels, point clouds, grids, etc. Voxel-based representation methods can use 3D convolutional neural networks (CNNs), which can reconstruct objects with arbitrary topological structures. However, the huge memory requirements and computing time limit most methods to high-resolution reconstruction results, so ultra-high precision reconstruction cannot be achieved. Point cloud representation is relatively simple and highly flexible, but because point clouds are not regular structures, they cannot adapt well to traditional 3D CNN networks. Similarly, the network can represent 3D shapes with high precision, but it is also discrete and cannot be well represented by traditional neural networks.

扩散去噪概率模型是一类基于迭代反转马尔可夫噪声过程的生成模型。在视觉中，早期的工作将问题表述为学习变分下界，或者将其框定为优化基于分数的生成模型或连续随机过程的离散化。近来的许多工作展示出扩散模型在内容生成任务上的巨大潜力，因此将扩散用于三维重建任务也将充分发挥模型的优势。Diffusion denoising probabilistic model is a class of generative models based on iterative inversion of Markov noise process. In vision, early works formulated the problem as learning variational lower bounds, or framed it as optimizing score-based generative models or discretization of continuous random processes. Many recent works have shown the great potential of diffusion models in content generation tasks, so using diffusion for 3D reconstruction tasks will also give full play to the advantages of the model.

在超声图像三维重建领域中，传统的方法有基于位置传感器的方法，它利用连接在探头上的位置传感器(如磁位传感器、光学位置传感器等)实时跟踪探头在三维空间中的位置和姿态，并将二维超声图像在获取时的空间位置信息存储下来，然后根据这些位置信息，将多个二维图像重建为三维数据。但是这种方法需要外置位置传感器系统，大大增加了系统的复杂度和成本，也存在线缆干扰、探头遮挡等问题影响精度。基于自由手扫描的方法，利用操作者自由移动超声探头在人体表面扫描时的运动轨迹和超声波束的扫描方向等信息，估计每个二维图像的空间位置关系,从而重建三维数据。但其需要高精度的探头运动轨迹和波束方向估计，不仅计算复杂而且实时性较差。基于图像配准的方法，从一系列相邻的二维超声图像中提取灰度、梯度、特征等信息,通过相似度度量进行配准,估计图像之间的几何变换关系,最终获得三维数据。但其对图像质量、视野内灰度分布均匀性等要求较高，精度易受影响，计算复杂。In the field of three-dimensional reconstruction of ultrasound images, traditional methods include position sensor-based methods, which use position sensors (such as magnetic position sensors, optical position sensors, etc.) connected to the probe to track the position and posture of the probe in three-dimensional space in real time, and store the spatial position information of the two-dimensional ultrasound image when it is acquired. Then, based on this position information, multiple two-dimensional images are reconstructed into three-dimensional data. However, this method requires an external position sensor system, which greatly increases the complexity and cost of the system. There are also problems such as cable interference and probe occlusion that affect the accuracy. The method based on free hand scanning uses the motion trajectory of the operator's free movement of the ultrasound probe when scanning the human body surface and the scanning direction of the ultrasound beam to estimate the spatial position relationship of each two-dimensional image, thereby reconstructing three-dimensional data. However, it requires high-precision probe motion trajectory and beam direction estimation, which is not only complex in calculation but also has poor real-time performance. The method based on image registration extracts grayscale, gradient, feature and other information from a series of adjacent two-dimensional ultrasound images, performs registration through similarity measurement, estimates the geometric transformation relationship between images, and finally obtains three-dimensional data. However, it has high requirements on image quality and uniformity of grayscale distribution in the field of view, is easily affected by accuracy, and is complex in calculation.

通过对现有的超声三维重建方法进行分析总结，已有的方法存在下面几个不足：（1）需要额外的设备或复杂的算法确定探头运动轨迹和方向，增加了实施难度和成本。（2）重建过程需要连续的超声视频或一系列连续的超声图像，所需数据量大，计算复杂。（3）重建过程耗时长，且精度一般。By analyzing and summarizing the existing ultrasound 3D reconstruction methods, the existing methods have the following shortcomings: (1) Additional equipment or complex algorithms are required to determine the probe motion trajectory and direction, which increases the difficulty and cost of implementation. (2) The reconstruction process requires continuous ultrasound video or a series of continuous ultrasound images, which requires a large amount of data and complex calculations. (3) The reconstruction process is time-consuming and has average accuracy.

发明内容Summary of the invention

本发明的目的是提出一种基于SDF（Signed Distance Field符号距离场）扩散的医学超声影像三维重建方法，以提高二维超声图像三维重建的速度和准确性。The purpose of the present invention is to propose a medical ultrasound image three-dimensional reconstruction method based on SDF (Signed Distance Field) diffusion to improve the speed and accuracy of three-dimensional reconstruction of two-dimensional ultrasound images.

SDF是一种物体三维形状的隐式表示方法，SDF的本质是存储每个点到形状表面的最近距离，即将模型划出一个表面，在模型表面外侧的点数值大于0，在模型表面内侧的点数值小于0，SDF可以方便的使用3D卷积网络，也能通过行进立方体的方法转化为高精度的网格表示。SDF is an implicit representation method of the three-dimensional shape of an object. The essence of SDF is to store the shortest distance from each point to the surface of the shape, that is, to draw a surface out of the model. The point values on the outside of the model surface are greater than 0, and the point values on the inside of the model surface are less than 0. SDF can be conveniently used in 3D convolutional networks, and can also be converted into a high-precision grid representation through the marching cube method.

为实现上述发明目的，本发明采取的具体技术方案如下：In order to achieve the above-mentioned invention object, the specific technical scheme adopted by the present invention is as follows:

一种基于SDF扩散的医学超声影像三维重建方法，包括以下步骤：A method for three-dimensional reconstruction of medical ultrasound images based on SDF diffusion comprises the following steps:

S1：获取医学超声图像数据，并提取该超声图像特征，用于指导后续的扩散过程；S1: Acquire medical ultrasound image data and extract features of the ultrasound image to guide the subsequent diffusion process;

S2：随机初始化SDF场，将SDF场按照正态分布进行随机初始化即； S2: Randomly initialize the SDF field and set the SDF field Initialize randomly according to normal distribution. ;

S3：利用扩散模型进行SDF扩散过程，扩散模型包含两个关键过程：前向扩散过程和反向扩散过程；先使用状态空间模型对体素和超声图像特征进行融合，得到融合超声图像特征后的体素特征，再在体素分辨率较小的阶段使用交叉注意力对体素特征和超声图像特征进行融合；S3: The diffusion model is used to perform the SDF diffusion process. The diffusion model includes two key processes: forward diffusion process and backward diffusion process. The state space model is first used to fuse the voxel and ultrasound image features to obtain the voxel features after the fusion of ultrasound image features. Then, the cross attention is used to fuse the voxel features and ultrasound image features at a stage with a small voxel resolution.

S4：使用行进立方体提取网格，最终重建完成由超声图像指导的三角形网格模型。S4: The mesh is extracted using marching cubes and the triangular mesh model guided by the ultrasound image is finally reconstructed.

进一步的，所述S1中，首先使用大量医学超声图像数据训练视觉模型MedSAM （Medical Segment Anything Model）再利用训练好的MedSAM的图像编码器对输入单个医学超声图像进行特征提取，并将输出映射至特征向量，在后续扩散过程中用于指导SDF向图片特征靠近，以实现超声图像的SDF三维重建。 Furthermore, in S1, a large amount of medical ultrasound image data is first used to train the visual model MedSAM (Medical Segment Anything Model), and then the trained MedSAM image encoder is used to extract features from the input single medical ultrasound image, and the output is mapped to a feature vector , which is used to guide the SDF to approach the image features in the subsequent diffusion process to achieve SDF 3D reconstruction of ultrasound images.

进一步的，所述S2中，SDF场是一种连续的三维场景表示方式，它将一个三维物体或场景看作是一个由距离值组成的三维标量场。具体来，对于场景中的任意一点，SDF的值定义为该点到物体表面的有符号距离，即：如果该点在物体内部，SDF值为该点到最近表面的负距离值;如果该点在物体外部，SDF值为该点到最近表面的正距离值；如果该点正好在物体表面上，SDF值为0。Furthermore, in S2, the SDF field is a continuous three-dimensional scene representation method, which regards a three-dimensional object or scene as a three-dimensional scalar field composed of distance values. Specifically, for any point in the scene, the SDF value is defined as the signed distance from the point to the surface of the object, that is, if the point is inside the object, the SDF value is the negative distance value from the point to the nearest surface; if the point is outside the object, the SDF value is the positive distance value from the point to the nearest surface; if the point is exactly on the surface of the object, the SDF value is 0.

进一步的，所述S3具体包括：Furthermore, the S3 specifically includes:

S3-1：前向扩散过程(Forward Diffusion Process)是在训练时使用的过程，它将数据逐步添加高斯噪声，直到最终被完全破坏，变成具有高斯分布的纯噪声。这个过程可以用马尔可夫链表示，每个时间步t都会添加一定量的高斯噪声，直到最终时刻T，原始数据变成纯噪声；给定一个数据样本，正向过程它逐渐将数据样本转换为纯高斯噪声： S3-1: The forward diffusion process is a process used during training. It gradually adds Gaussian noise to the data until it is completely destroyed and becomes pure noise with Gaussian distribution. This process can be represented by a Markov chain. A certain amount of Gaussian noise is added at each time step t until the final time T, when the original data becomes pure noise. Given a data sample , forward process It gradually transforms the data samples into pure Gaussian noise:

（1）； (1);

其中表示时间步t时的样本，表示前一个时间步的样本，为高斯转移概率： in represents the sample at time step t, represents the sample of the previous time step, is the Gaussian transition probability:

（2）； (2);

其中表示正态分布，表示噪声调度超参数，是一个可学习的系数或设置为一个常数；正向过程允许以封闭形式在任意时间步t采样： in represents a normal distribution, represents the noise scheduling hyperparameter, which is a learnable coefficient or set to a constant; the forward process allows sampling at any time step t in closed form :

（3）； (3);

其中，；因此，直接通过下式采样： in , ;therefore, Directly sample through the following formula:

（4）； (4);

其中的是一个从正态分布随机采样的噪声。Among them is a noise randomly sampled from a normal distribution.

S3-2：反向扩散过程(Reverse Diffusion Process)发生在生成时，旨在从纯噪声样本重构原始数据，即联合分布，这个反向过程被定义为具有学习高斯跃迁的马尔可夫链： S3-2: The reverse diffusion process occurs at generation time and aims to reconstruct the original data from pure noise samples, i.e., the joint distribution , the reverse process is defined as a Markov chain with learned Gaussian transitions:

（5）； (5);

（6）； (6);

其中是标准正态分布，表示由参数化的去噪函数，是一个时间步长相关的方差，设置为；从中采样，然后通过公式（6）绘制为： in is the standard normal distribution, Indicated by Parameterized denoising function, is a time step dependent variance, set to ;from Then, we draw the for:

（7）； (7);

（8）； (8);

其中是一个由参数化的神经网络，它从噪声输入预测噪声。通过重复这个过程，最终可以生成； in is a A parameterized neural network that starts with a noisy input By repeating this process, we can eventually generate ;

因此，目标函数比较了预测噪声与应用噪声之间的差异： Therefore, the objective function compares the prediction noise With applied noise The difference between:

（9）； (9);

此外，通过神经网络预测而不是噪声来修改： In addition, the neural network prediction Rather than noise To modify :

（10）； (10);

其中，，，是由参数化的神经网络，它预测无噪声数据；在这种情况下，目标函数变为: in, , , Is A parameterized neural network that predicts noise-free data ; In this case, the objective function becomes:

（11）； (11);

具体来说，给定一个干净的数据样本，按照正向过程中的公式（4）采样具有相同形状的： Specifically, given a clean data sample , according to formula (4) in the forward process, we sample the same shape :

（12）； (12);

然后，训练一个网络来预测无噪声数据，其 MSE 目标函数如下所示： Then, train a network To predict noise-free data, the MSE objective function is as follows:

（13）； (13);

其中的即S1中提取的超声图片特征； Among them That is, the ultrasound image features extracted in S1;

在推理过程中，使用训练后的神经网络经过的过程按照公式（14）逐步预测较小的噪声样本来生成新的SDF体素： During inference, the trained neural network is used go through The process of gradually predicting smaller noise samples according to formula (14) To generate new SDF voxels:

（14）； (14);

S3-3：扩散过程使用的是一个类U形网络，每一层的体素特征分辨率减半（例如：64→32→16），通道加倍；为了使生成的SDF场表示的物体与超声图像保持一致，需要在扩散过程中对SDF特征和超声图像特征进行融合，让超声图像特征指导SDF扩散过程；在模型的顶部，由于特征分辨率较大，体素特征保持在一个较大的数量级，无法直接使用交叉注意力对体素和超声图像特征进行融合，所以使用状态空间模型对其进行处理；S3-3: The diffusion process uses a U-shaped network, where the voxel feature resolution of each layer is halved (for example: 64→32→16) and the channels are doubled; in order to make the object represented by the generated SDF field consistent with the ultrasound image, it is necessary to fuse the SDF features and the ultrasound image features during the diffusion process, so that the ultrasound image features can guide the SDF diffusion process; at the top of the model, due to the large feature resolution, the voxel features are kept at a large order of magnitude, and the cross-attention cannot be directly used to fuse the voxel and ultrasound image features, so the state space model is used to process them;

状态空间模型(State Space Model)是一种常用于时间序列分析和控制理论的数学模型框架。它将动态系统的演化过程描述为一个状态方程和观测方程的组合。将一个一维的序列通过一个隐藏状态映射至，这个系统使用A作为演化参数，B、C作为投影参数： The state space model is a mathematical model framework commonly used in time series analysis and control theory. It describes the evolution of a dynamic system as a combination of a state equation and an observation equation. Through a hidden state Map to , this system uses A as the evolution parameter, and B and C as the projection parameters:

（15）； (15);

（16）； (16);

使用的Mamba是连续系统的离散版本，其中包括一个时间尺度参数将连续参数A、 B转换为离散参数、；常用的变换方法是零阶保持 (ZOH)，定义如下： The Mamba used is a discrete version of the continuous system, which includes a time scale parameter Convert continuous parameters A and B into discrete parameters , ; The commonly used transformation method is zero-order hold (ZOH), which is defined as follows:

（17）； (17);

（18）； (18);

在、离散之后，公式（15）（16）可以表示为： exist , After discretization, formulas (15) and (16) can be expressed as:

（19）； (19);

（20）； (20);

最后，模型通过全局卷积计算输出：Finally, the model calculates the output through global convolution:

（21）； (twenty one);

（22）； (twenty two);

其中 M 是输入序列x的长度，是结构化卷积核。 where M is the length of the input sequence x, is a structured convolution kernel.

为了将其运用至SDF扩散过程中，将体素特征展开为一维特征，并将S3-1中的时间步长t经过正余弦变换后线性映射至与体素特征相同大小的一维特征并与体素特征相加。为了融合超声图像特征与体素特征，将超声图像特征经过线性映射至与体素特征相同的通道数后与体素特征连接，得到SSM的输入x。 In order to apply it to the SDF diffusion process, the voxel features are expanded into one-dimensional features, and the time step t in S3-1 is linearly mapped to a one-dimensional feature of the same size as the voxel feature after sine and cosine transformation and added to the voxel feature. After linear mapping to the same number of channels as the voxel feature, it is connected with the voxel feature to obtain the input x of the SSM.

经过上述Mamba的计算后得到输出y，将体素特征部分取出后将一维序列重塑回体素特征的原形状，得到融合超声图像特征后的体素特征。After the above-mentioned calculation by Mamba, the output y is obtained. After the voxel feature part is taken out, the one-dimensional sequence is reshaped back to the original shape of the voxel feature to obtain the voxel feature after fusion of the ultrasound image feature.

S3-4：体素特征经过模型的顶部后，进行下采样使体素特征的分辨率下降，通道数增加，整体数据量减少，在体素分辨率较小的阶段使用交叉注意力对体素特征和超声图像特征进行融合。S3-4: After the voxel feature passes through the top of the model, it is downsampled to reduce the resolution of the voxel feature, increase the number of channels, and reduce the overall data volume. At the stage with smaller voxel resolution, cross attention is used to fuse the voxel feature and ultrasound image feature.

注意力函数描述为将查询和一组键值对映射到输出，其中查询、键、值和输出都是向量，输出为值的加权和。在实践中，同时计算一组查询的注意力函数，打包成一个矩阵Q，键和值也打包到矩阵K和V中；计算输出矩为：The attention function is described as mapping a query and a set of key-value pairs to an output, where the query, key, value, and output are all vectors, and the output is the weighted sum of the values. In practice, the attention function of a set of queries is calculated simultaneously, packaged into a matrix Q, and the keys and values are also packaged into matrices K and V; the output matrix is calculated as:

（23）； (twenty three);

多头注意力允许模型共同关注不同位置的不同表示子空间的信息；其公式如下：Multi-head attention allows the model to jointly focus on information in different representation subspaces at different positions; its formula is as follows:

（24）； (twenty four);

（25）； (25);

其中concat表示拼接操作，均是可学习的权重矩阵。 Where concat represents the concatenation operation. are all learnable weight matrices.

对于网格中的体素，将其中心投影到图像上，得到投影坐标；选择靠近的相邻图像块与交互，因为它们的特征极有可能影响由控制的局部几何形状；邻域图像补丁集用表示，选择如下：如果与中心之间的距离小于距离阈值，则补丁属于。 For voxels in the grid , project its center onto the image and get the projection coordinates ; Select Close The adjacent image blocks are interactions, because their characteristics are likely to affect Controlled local geometry; neighborhood image patch sets are used Indicates that the selection is as follows: If and The distance between centers is less than the distance threshold , then the patch belong .

使用多头自注意力对处的体素特征和属于中的补丁的超声图像补丁特征集之间的特征交互进行建模，如下所示： Use multi-head self-attention Voxel features at and belongs to Ultrasound image patch feature set of patches in The feature interactions between are modeled as follows:

（26）； (26);

（27）； (27);

这里，是标准的多头注意力操作，、、是可学习的矩阵，M 是视图投影引起的注意力计算的掩码，对体素和超声图像块使用绝对位置编码。 here, It is a standard multi-head attention operation. , , is a learnable matrix, M is the mask for attention computation induced by view projection, using absolute position encoding for voxels and ultrasound image patches.

经过交叉注意力后进一步将超声图像特征与体素特征进行了融合，此时输出的体素特征就是高精度的SDF场。After cross attention, the ultrasound image features are further fused with the voxel features. At this time, the output voxel features are high-precision SDF fields.

进一步的，S4中的行进立方体算法是一种用于从三维标量场中提取等值面(等值曲面)的经典算法，广泛应用于体数据可视化、几何重建等领域。算法步骤：设定一个等值面阈值T，遍历SDF场中所有立方体单元C：Furthermore, the marching cube algorithm in S4 is a classic algorithm for extracting isosurfaces (isosurfaces) from three-dimensional scalar fields, and is widely used in volume data visualization, geometric reconstruction and other fields. Algorithm steps: Set an isosurface threshold T and traverse all cube cells C in the SDF field:

(1) 根据C的8个顶点标量值与T的大小关系计算顶点状态码index；(1) Calculate the vertex state code index based on the relationship between the 8 vertex scalar values of C and T;

(2) 根据index在预定义的情况表中查找对应的交点配置pattern；(2) Search the corresponding intersection configuration pattern in the predefined situation table according to the index;

(3) 利用线性插值在立方体的12条边上计算与等值面的交点；(3) Use linear interpolation to calculate the intersection points with the isosurfaces on the 12 edges of the cube;

(4) 根据pattern连接相应交点形成一个或多个三角形；(4) Connect the corresponding intersection points according to the pattern to form one or more triangles;

(5) 将三角形加入网格表面数据。(5) Add triangles to the mesh surface data.

一种基于SDF扩散的医学超声影像三维重建系统，由超声图像特征提取模块、SDF扩散模块、网格生成模块组成；A medical ultrasound image 3D reconstruction system based on SDF diffusion, which consists of an ultrasound image feature extraction module, an SDF diffusion module, and a grid generation module;

所述超声图像特征提取模块：该模块用于提取输入超声图片的特征，使用经过大量医学图像数据预训练的视觉模型SAM-Med2D的图像编码器对输入超声图像进行特征提取，为后续扩散过程提供丰富准确的特征指导。The ultrasound image feature extraction module: This module is used to extract the features of the input ultrasound picture, and uses the image encoder of the visual model SAM-Med2D that has been pre-trained with a large amount of medical image data to extract features from the input ultrasound image, providing rich and accurate feature guidance for the subsequent diffusion process.

所述SDF扩散模块：该模块由一个嵌入状态空间模型和交叉注意力的U形神经网络组成，用于执行SDF扩散过程，从随机噪声开始逐步去噪以得到干净的SDF场，扩散过程中由状态空间模型和交叉注意力对体素特征和超声图片特征进行融合，实现重建网格与超声图片特征保持一致的目的。The SDF diffusion module: This module consists of a U-shaped neural network embedded with a state space model and cross attention, which is used to perform the SDF diffusion process, starting from random noise and gradually denoising to obtain a clean SDF field. During the diffusion process, the state space model and cross attention are used to fuse the voxel features and the ultrasound image features to achieve the purpose of keeping the reconstructed grid consistent with the ultrasound image features.

所述网格生成模块：该模块执行行进立方体算法，从重建完成的SDF场中提取高精度的三维网格。The mesh generation module: This module executes the marching cubes algorithm to extract a high-precision three-dimensional mesh from the reconstructed SDF field.

本发明的优点和有益效果：Advantages and beneficial effects of the present invention:

本发明将状态空间模型和交叉注意力引入至SDF扩散过程中，将超声图像特征与SDF特征充分融合，实现了一个端到端的快速且准确的超声图像三维重建。在一定程度上解决了超声图像三维重建需要额外的设备或复杂的算法确定探头运动轨迹和方向、耗时长等问题。The present invention introduces the state space model and cross attention into the SDF diffusion process, fully integrates the ultrasound image features with the SDF features, and realizes an end-to-end fast and accurate ultrasound image 3D reconstruction. To a certain extent, it solves the problems that ultrasound image 3D reconstruction requires additional equipment or complex algorithms to determine the probe motion trajectory and direction, and takes a long time.

经过实际验证，本发明提供的超声图像三维重建方法具有高效、高精确度的特点。Practical verification has shown that the ultrasonic image three-dimensional reconstruction method provided by the present invention has the characteristics of high efficiency and high precision.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的整体流程图。FIG. 1 is an overall flow chart of the present invention.

图2是本发明的框架图。FIG. 2 is a framework diagram of the present invention.

图3是本发明的结果图。FIG. 3 is a diagram showing the results of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合附图1-3及与具体实施例对本发明做进一步说明。The present invention will be further described below in conjunction with Figures 1-3 and specific embodiments.

实施例1：Embodiment 1:

一种基于SDF扩散的医学超声影像三维重建方法，整体流程如图1所示，包括如下步骤：A method for 3D reconstruction of medical ultrasound images based on SDF diffusion, the overall process is shown in FIG1 , and includes the following steps:

S1：使用经过大量医学图像数据预训练的视觉模型SAM-Med2D的图像编码器对输入单视图进行特征提取，输出特征向量，在后续扩散过程中在状态空间模型和交叉注意力中与SDF特征进行融合，实现指导三维重建的目的，以实现超声图像三维重建。 S1: Use the image encoder of SAM-Med2D, a visual model pre-trained with a large amount of medical image data, to extract features from the input single view and output a feature vector , in the subsequent diffusion process, it is fused with the SDF features in the state space model and cross attention to achieve the purpose of guiding 3D reconstruction, so as to realize 3D reconstruction of ultrasound images.

S2：随机初始化体素SDF场，其中的SDF全称是Signed Distance Function(带符号距离函数)。SDF场是一种连续的三维场景表示方式，它将一个三维物体或场景看作是一个由距离值组成的三维标量场。具体来说，对于场景中的任意一点，SDF的值定义为该点到物体表面的有符号距离,即:如果该点在物体内部，SDF值为该点到最近表面的负距离值;如果该点在物体外部，SDF值为该点到最近表面的正距离值；如果该点正好在物体表面上,SDF值为0。初始化时，将SDF场按照正态分布进行随机初始化即。 S2: Randomly initialize the voxel SDF field, where SDF stands for Signed Distance Function. The SDF field is a continuous three-dimensional scene representation that treats a three-dimensional object or scene as a three-dimensional scalar field composed of distance values. Specifically, for any point in the scene, the value of SDF is defined as the signed distance from the point to the surface of the object, that is: if the point is inside the object, the SDF value is the negative distance from the point to the nearest surface; if the point is outside the object, the SDF value is the positive distance from the point to the nearest surface; if the point is exactly on the surface of the object, the SDF value is 0. During initialization, the SDF field is initialized. Initialize randomly according to normal distribution. .

S3：对SDF场进行扩散过程。扩散模型(Diffusion Models)是一种新兴的深度生成模型,它利用可逆的扩散过程(Diffusion Process)训练神经网络，从噪声分布中生成所需的数据。生成式扩散模型包含两个关键过程:前向扩散过程和反向扩散过程。具体包括：S3: Diffusion process is performed on the SDF field. Diffusion Models is an emerging deep generative model that uses a reversible diffusion process to train a neural network to generate the required data from the noise distribution. The generative diffusion model contains two key processes: forward diffusion process and reverse diffusion process. Specifically, it includes:

（1）； (1);

（2）； (2);

（3）； (3);

（4）； (4);

其中的是一个从正态分布随机采样的噪声。 Among them is a noise randomly sampled from a normal distribution.

（5）； (5);

（6）； (6);

（7）； (7);

（8）； (8);

（9）； (9);

（10）； (10);

（11）； (11);

（12）； (12);

（13）； (13);

（14）； (14);

（15）； (15);

（16）； (16);

（17）； (17);

（18）； (18);

（19）； (19);

（20）； (20);

（21）； (twenty one);

（22 (twenty two

（23）； (twenty three);

（24）； (twenty four);

（25）； (25);

对于网格中的体素，将其中心投影到图像上，得到投影坐标；选择靠近的相邻图像块与交互，因为它们的特征极有可能影响由控制的局部几何形状；邻域图像补丁集用表示，选择如下：如果与中心之间的距离小于距离阈值，则补丁属于。For voxels in the grid , project its center onto the image and get the projection coordinates ; Select Close The adjacent image blocks are interactions, because their characteristics are likely to affect Controlled local geometry; neighborhood image patch sets are used Indicates that the selection is as follows: If and The distance between centers is less than the distance threshold , then the patch belong .

（26）； (26);

（27）； (27);

S4：执行行进立方体算法从SDF场中提取网格。算法步骤:设定一个等值面阈值T，遍历SDF场中所有立方体单元C：S4: Execute the marching cube algorithm to extract the grid from the SDF field. Algorithm steps: Set an isosurface threshold T and traverse all cube cells C in the SDF field:

(5) 将三角形加入网格表面数据；(5) Add triangles to the mesh surface data;

得到最终重建完成的高精度由超声图像指导的三角形网格模型。The final reconstructed high-precision triangular mesh model guided by the ultrasound image is obtained.

实施例2：该实施例以实施例1为基本方法，进行模块设计。Embodiment 2: This embodiment uses embodiment 1 as the basic method to carry out module design.

一种基于SDF扩散的医学超声影像三维重建系统，由超声图像特征提取模块、SDF扩散模块、网格生成模块组成，如图2所示，以下对各部分进行详细说明：A medical ultrasound image 3D reconstruction system based on SDF diffusion is composed of an ultrasound image feature extraction module, an SDF diffusion module, and a grid generation module, as shown in FIG2 . The following describes each part in detail:

超声图像特征提取模块：该模块用于提取输入超声图片的特征，使用经过大量医学图像数据预训练的视觉模型SAM-Med2D的图像编码器对输入超声图像进行特征提取，为后续扩散过程提供丰富准确的特征指导。Ultrasound image feature extraction module: This module is used to extract the features of the input ultrasound image. It uses the image encoder of the visual model SAM-Med2D, which has been pre-trained with a large amount of medical image data, to extract features from the input ultrasound image, providing rich and accurate feature guidance for the subsequent diffusion process.

SDF扩散模块：该模块由一个嵌入状态空间模型和交叉注意力的U形神经网络组成，用于执行SDF扩散过程，从随机噪声开始逐步去噪以得到干净的SDF场，扩散过程中由状态空间模型和交叉注意力对体素特征和超声图片特征进行融合，实现重建网格与超声图片特征保持一致的目的。SDF diffusion module: This module consists of a U-shaped neural network embedded with a state-space model and cross-attention, which is used to perform the SDF diffusion process, starting from random noise and gradually denoising to obtain a clean SDF field. During the diffusion process, the state-space model and cross-attention are used to fuse the voxel features and ultrasound image features to achieve the purpose of keeping the reconstructed grid consistent with the ultrasound image features.

网格生成模块：该模块执行行进立方体算法，从重建完成的SDF场中提取高精度的三维网格。Mesh Generation Module: This module executes the marching cubes algorithm to extract a high-precision 3D mesh from the reconstructed SDF field.

实施例3：该实施例以上述方法和系统为基础进行实例验证：Embodiment 3: This embodiment is verified by example based on the above method and system:

为了验证本发明超声图像三维重建的准确度与效率，在USOVA3D数据集上进行了实验。In order to verify the accuracy and efficiency of the three-dimensional reconstruction of ultrasound images of the present invention, experiments were conducted on the USOVA3D dataset.

该实施例以上述方法和系统为基础进行实例验证，为了验证本发明提出的超声图像SDF三维重建的准确性，在USOVA3D数据集上进行测试。得到如图3所示的结果，本发明方法（Unet-mamba）能较好的根据输入超声图像生成三维形状，且输出用时较短。本发明构建的模型在超声图像的SDF三维重建的精度较高且速度较快。This embodiment is verified by examples based on the above method and system. In order to verify the accuracy of the SDF three-dimensional reconstruction of ultrasound images proposed by the present invention, the test is conducted on the USOVA3D data set. The result shown in Figure 3 is obtained. The method of the present invention (Unet-mamba) can generate three-dimensional shapes based on the input ultrasound image well, and the output time is short. The model constructed by the present invention has high accuracy and fast speed in the SDF three-dimensional reconstruction of ultrasound images.

以上计划方案，仅为本发明中的实施方法，但本发明的保护范围不限于此，所有熟悉该技术的人员在本发明所披露的技术范围以内，可理解想到的替换或者变换，都应该包含在本发明的保护范围之内，所以，本发明的保护范围应以权利要求书的保护范围为准。The above plans are only implementation methods of the present invention, but the protection scope of the present invention is not limited thereto. All replacements or changes that can be understood and thought of by persons familiar with the technology within the technical scope disclosed in the present invention should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention shall be based on the protection scope of the claims.

Claims

1. The medical ultrasonic image three-dimensional reconstruction method based on SDF diffusion is characterized by comprising the following steps of:

s1: acquiring medical ultrasonic image data, extracting characteristics of the ultrasonic image, and guiding a subsequent diffusion process;

s2: randomly initializing SDF field, and randomly initializing SDF field V according to normal distribution

S3: the SDF diffusion process is performed using a diffusion model that contains two key processes: a forward diffusion process and a reverse diffusion process; fusing the voxel and the ultrasonic image features by using a state space model to obtain voxel features fused with the ultrasonic image features, and fusing the voxel features and the ultrasonic image features by using cross attention at a stage with smaller voxel resolution; the method specifically comprises the following steps:

S3-1: the forward diffusion process is represented by a Markov chain, a certain amount of Gaussian noise is added to each time step T until the final moment T, and the original data becomes pure noise; given one data sample X ₀～q(X₀), the forward process q (X ₀:t) gradually converts the data sample into pure gaussian noise:

Where x _t denotes the sample at time step t, x _t-1 denotes the sample at the previous time step, q (x _t∣x_t-1) is the gaussian transition probability:

Wherein the method comprises the steps of Representing normal distribution, beta _t representing noise scheduling super-parameters, which are a learned coefficient or set as a constant; the forward process samples x _t in a closed form at any time step t:

Wherein a _t＝1-β_t is defined as the total number of the components, Thus, x _t is sampled directly by:

where ε is a noise randomly sampled from a normal distribution;

S3-2: the back diffusion process is defined as a markov chain with learned gaussian transitions:

Wherein the method comprises the steps of Is a standard normal distribution, mu _θ(x_t, t) represents a denoising function parameterized by theta,Is a time-step dependent variance set toSampling from p (x _T), then plotting x _t-1～p_θ(x_t-1∣x_t by equation (6) as:

Where ε _θ(x_t, t) is a neural network parameterized by θ that predicts noise from noise input x _T; by repeating this process, X ₀ is finally generated;

Thus, the objective function compares the difference between the predicted noise e _θ(x_t, t) and the applied noise e:

Furthermore, μ _θ(x_t, t) is modified by neural network prediction x ₀ instead of noise ε:

μ_θ(x_t,t)＝γ_tf_θ(x_t,t)+δ_tx_t (10)

Wherein, F _θ(x_t, t) is a neural network parameterized by θ, which predicts noise-free data x ₀; in this case, the objective function becomes:

specifically, a clean data sample is given Sampling the samples having the same shape according to equation (4) in the forward direction

Then, train a networkTo predict noiseless data whose MSE objective function is as follows:

F _p is the ultrasonic image characteristic extracted in S1;

in the reasoning process, a trained neural network is used The process of t=t …,1 predicts progressively smaller noise samples according to equation (14)To generate new SDF voxels:

s3-3: the diffusion process uses a U-shaped network, the voxel feature resolution of each layer is halved, and the channel is doubled; in order to keep the object represented by the generated SDF field consistent with the ultrasonic image, the SDF features and the ultrasonic image features are fused in the diffusion process, so that the ultrasonic image features guide the SDF diffusion process;

the state space model will map a one-dimensional sequence x (t) to y (t) through a hidden state h (t), this system using a as evolution parameter, B, C as projection parameter:

h′(t)＝Ah(t)+Bx(t) (15)

y(t)＝Ch(t) (16)

mamba including a time scale parameter delta converts the continuous parameter A, B into a discrete parameter A common transformation method is zero-order preservation, defined as follows:

At the position of After discretization, equations (15) (16) are expressed as:

y_t＝Ch_t (20)

finally, the model is output through global convolution calculation:

where M is the length of the input sequence x, Is a structured convolution kernel;

In order to fuse the ultrasonic image characteristics and the voxel characteristics, the ultrasonic image characteristics F _p are linearly mapped to the same channel number as the voxel characteristics and then are connected with the voxel characteristics, so that an input x of the SSM is obtained;

Obtaining output y after Mamba calculation, taking out the voxel characteristic part, and remolding the one-dimensional sequence back to the original shape of the voxel characteristic to obtain the voxel characteristic after the ultrasonic image characteristic is fused;

s3-4: after the voxel feature passes through the top of the model, downsampling is carried out to reduce the resolution of the voxel feature, the number of channels is increased, the overall data volume is reduced, and the voxel feature and the ultrasonic image feature are fused by using cross attention at the stage of smaller voxel resolution; simultaneously calculating a group of attention functions of inquiry, packing the attention functions into a matrix Q, and packing keys and values into matrices K and V; the calculated output moment is:

the formula is used as follows:

MultiHead(Q,K,V)＝Concat(head₁,...,head_h)W^O (24)

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V) (25)

Wherein concat represents a stitching operation, and W ^O、W_i ^Q、W_i ^K、W_i ^V are all learnable weight matrices;

For the voxels v in the grid, projecting the center of the voxel v onto an image to obtain a projection coordinate P; selecting adjacent image blocks close to P to interact with v, wherein a neighborhood image patch set is denoted by N _V, and the selection is as follows: if the distance between P and the center of P _j is less than the distance threshold d _δ, then patch P _j belongs to N _V;

Feature interactions between voxel feature f _V at v and ultrasound image patch feature set f _I belonging to the patch in N _V are modeled using multi-headed self-attention as follows:

Q＝f_VW^Q,K＝f_NW^K,V＝f_NW^V (26)

Here MultiHead (·) is a standard multi-headed attention operation, W ^Q、W^K、W^V is a learnable matrix, M is a mask of attention calculations caused by view projection, absolute position encoding is used for voxels and ultrasound image tiles;

s3-5: further fusing the ultrasonic image features and the voxel features after the cross attention, wherein the output voxel features are high-precision SDF fields;

S4: the meshes are extracted by using the marching cubes, and finally, the triangular mesh model guided by the ultrasonic image is reconstructed.

2. The SDF diffusion-based medical ultrasound image three-dimensional reconstruction method of claim 1, wherein in S1, first training a visual model MedSAM using medical ultrasound image data, then feature extracting an input single medical ultrasound image using a trained MedSAM image encoder, and mapping the output to a feature vectorAnd the method is used for guiding the SDF to approach the picture features in the subsequent diffusion process so as to realize the SDF three-dimensional reconstruction of the ultrasonic image.

3. The SDF diffusion-based medical ultrasound image three-dimensional reconstruction method according to claim 1, wherein the marching cubes algorithm step in S4: setting an isosurface threshold value T, and traversing all cube units C in the SDF field:

(1) Calculating vertex state code index according to the size relation between the 8 vertex scalar values of C and T;

(2) Searching a corresponding intersection point configuration pattern in a predefined condition table according to the index;

(3) Calculating the intersection points with the isosurface on 12 sides of the cube by utilizing linear interpolation;

(4) Connecting corresponding intersection points according to pattern to form one or more triangles;

(5) Triangles are added to the mesh surface data.

4. The system based on the medical ultrasonic image three-dimensional reconstruction method based on SDF diffusion according to claim 1, which is characterized by comprising an ultrasonic image feature extraction module, an SDF diffusion module and a grid generation module;

The ultrasonic image feature extraction module is used for: the module is used for extracting the characteristics of an input ultrasonic image, and the characteristics of the input ultrasonic image are extracted by using a visual model SAM-Med2D image encoder which is pre-trained by a large amount of medical image data;

The SDF diffusion module: the module consists of a U-shaped neural network embedded with a state space model and cross attention, and is used for executing an SDF diffusion process, denoising gradually from random noise to obtain a clean SDF field, and fusing voxel characteristics and ultrasonic picture characteristics by the state space model and the cross attention in the diffusion process to achieve the aim of keeping consistency of a reconstruction grid and the ultrasonic picture characteristics;

The grid generation module: the module executes a marching cubes algorithm to extract a high precision three-dimensional grid from the reconstructed SDF field.