CN113810736B - An AI-driven real-time point cloud video transmission method and system - Google Patents
An AI-driven real-time point cloud video transmission method and system Download PDFInfo
- Publication number
- CN113810736B CN113810736B CN202110985757.0A CN202110985757A CN113810736B CN 113810736 B CN113810736 B CN 113810736B CN 202110985757 A CN202110985757 A CN 202110985757A CN 113810736 B CN113810736 B CN 113810736B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- video data
- features
- layer
- cloud video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000000605 extraction Methods 0.000 claims abstract description 58
- 238000011084 recovery Methods 0.000 claims abstract description 21
- 230000000007 visual effect Effects 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000013473 artificial intelligence Methods 0.000 claims abstract 19
- 238000005070 sampling Methods 0.000 claims description 33
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 238000003062 neural network model Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 23
- 239000013598 vector Substances 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 3
- 238000005265 energy consumption Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 13
- 230000003044 adaptive effect Effects 0.000 description 12
- 239000012634 fragment Substances 0.000 description 7
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及点云视频流传输领域,具体来说,涉及一种AI驱动的实时点云视频传输方法及系统。The present invention relates to the field of point cloud video stream transmission, in particular to an AI-driven real-time point cloud video transmission method and system.
背景技术Background technique
点云是描述体积媒体和全息视频的一种典型且受欢迎的数据格式,它可以由带深度传感器的RGB-D相机捕获。它允许用户体验六个自由度的运动场景,并可以改变其中的位置和方向,像传统的虚拟现实(Vitual Reality,简称VR)视频中只有三个自由度。体积媒体提供了多个角度的3D场景,被广泛应用于各个领域,包括教育、医疗、和娱乐等。Point clouds are a typical and popular data format for describing volumetric media and holographic videos, which can be captured by RGB-D cameras with depth sensors. It allows users to experience a motion scene with six degrees of freedom, and can change the position and direction in it, as there are only three degrees of freedom in traditional virtual reality (Vitual Reality, VR) videos. Volumetric media provides 3D scenes from multiple angles and is widely used in various fields, including education, medical treatment, and entertainment.
目前,即使包括5G在内的现有网络环境中传输密集的点云流非常具有挑战性,原因包括:(1)传输大量的点云视频需要极高的带宽,即使经过传统的压缩,也可能达到吉比特每秒(Gbps)的水平,超过了当前5G网络的能力;(2)3D体积媒体的计算开销很大,因为只能使用软件编解码器,低效率的编解码器也会减慢传输速度;(3)传统的技术例如自适应比特率视频流(Adaptive Bitrate Streaming,简称ABR)的速率自适应和缓冲控制不适用于3D体积媒体,需要探索提供体积媒体的先进技术。At present, it is very challenging to transmit dense point cloud streams even in the existing network environment including 5G, for the following reasons: (1) Transmitting a large amount of point cloud video requires extremely high bandwidth, even after traditional compression, it may Reaching the gigabit per second (Gbps) level, exceeding the capabilities of current 5G networks; (2) 3D volumetric media is computationally expensive because only software codecs can be used, and inefficient codecs will also slow down Transmission speed; (3) Traditional technologies such as rate adaptation and buffer control of Adaptive Bitrate Streaming (ABR) are not suitable for 3D volumetric media, and advanced technologies for volumetric media need to be explored.
目前大多数现有的点云视频传输技术可以通过传统压缩(包括提供有损和无损压缩)来实现,减少需要传输的数据规模,但仍然存在诸多不足之处,如:一方面,点云视频的无损压缩技术仍然不足以实现高效传输和较好的用户体验。另一方面,在网络条件有限的情况下,有损压缩很难保证恢复的点云真实性与实际的原始视频相匹配。其他的点云视频传输技术,例如扩展当前VR视频流的技术,在数据块层次上进行传输,这些方法的移动能耗高,接收设备上的处理延迟往往也是不可接受的。另外,每个传输块都容易受到网络波动和重组过程中各种数据包丢失的影响。At present, most of the existing point cloud video transmission technologies can be realized through traditional compression (including providing lossy and lossless compression), reducing the size of the data to be transmitted, but there are still many deficiencies, such as: On the one hand, point cloud video Advanced lossless compression technology is still not enough to achieve efficient transmission and better user experience. On the other hand, under limited network conditions, it is difficult for lossy compression to guarantee the authenticity of the recovered point cloud to match the actual original video. Other point cloud video transmission technologies, such as those that extend current VR video streams, are transmitted at the data block level. These methods have high mobile energy consumption, and the processing delay on the receiving device is often unacceptable. Additionally, each transport block is susceptible to various packet losses during network fluctuations and reassembly.
综上所述,传统技术的传输能力远远不能满足实时点云视频流的带宽要求。因此,有必要探索一种先进的传输方案,以保证在现有网络下的提供良好的服务。To sum up, the transmission capacity of traditional technologies is far from meeting the bandwidth requirements of real-time point cloud video streaming. Therefore, it is necessary to explore an advanced transmission scheme to ensure good service under the existing network.
发明内容Contents of the invention
针对相关技术中的问题,本发明提出一种AI驱动的实时点云视频传输方法及系统,可以显著降低点云视频流的传输量和能耗。该系统避免了传统传输方案中繁琐的多重处理,设计并训练了从原始数据采集到最终渲染和播放的端到端深度学习网络。In view of the problems in related technologies, the present invention proposes an AI-driven real-time point cloud video transmission method and system, which can significantly reduce the transmission volume and energy consumption of point cloud video streams. The system avoids the cumbersome multi-processing in traditional transmission schemes, and designs and trains an end-to-end deep learning network from raw data acquisition to final rendering and playback.
本发明的技术方案是这样实现的:Technical scheme of the present invention is realized like this:
根据本发明的一个方面,提供了一种AI驱动的实时点云视频传输方法。According to one aspect of the present invention, an AI-driven real-time point cloud video transmission method is provided.
该AI驱动的实时点云视频传输方法,包括:The AI-driven real-time point cloud video transmission method includes:
利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据;Use AI generation equipment to obtain video data information, and process the video data information to obtain point cloud video data;
对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征,并传输该关键点云特征;Perform layered feature extraction on the point cloud video data, determine the key point cloud features in the point cloud video frame, and transmit the key point cloud features;
接收的关键点云特征,并对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。The received key point cloud features are expanded and reconstructed to obtain point cloud information similar to the original input point cloud to form the original point cloud video in terms of visual effects.
其中,利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据包括:利用多个不同角度的深度摄像机扫描需要传输的3D模型,并获取每个深度摄像机的点云流;并将从多视角摄像机汇集的多个点云拼接出一个完整的点云,得到点云视频数据;Among them, using AI generation equipment to obtain video data information, and processing the video data information to obtain point cloud video data includes: using multiple depth cameras from different angles to scan the 3D model to be transmitted, and obtaining the point of each depth camera Cloud flow; and splicing multiple point clouds collected from multi-view cameras into a complete point cloud to obtain point cloud video data;
此外,该AI驱动的实时点云视频传输方法还包括:In addition, this AI-driven real-time point cloud video transmission method also includes:
将预先搭建好的深度神经网络,在线下使用大量包含各种规模的3D模型组成的训练集进行训练,训练得到多个候选神经网络模型;The pre-built deep neural network will be trained offline using a large number of training sets consisting of 3D models of various sizes, and multiple candidate neural network models will be obtained through training;
训练好的每个神经网络模型拆分成分层特征提取模块和基于生成对抗网络的点云恢复重建模块;并分别部署到靠近输入端的高性能边缘服务器和用户终端设备上;Each trained neural network model is split into a layered feature extraction module and a point cloud restoration and reconstruction module based on a generative confrontation network; and are respectively deployed to high-performance edge servers and user terminal devices near the input end;
在将分层特征提取模块和基于生成对抗网络的点云恢复重建模块部署到高性能边缘服务器和用户终端设备的同时,部署自适应匹配器,促使自适应匹配器根据实时监测到的网络带宽变化自适应地匹配满足当前网络的实时点云视频帧特征提取与重建恢复的神经网络模型。While deploying the layered feature extraction module and the point cloud restoration and reconstruction module based on the generative confrontation network to high-performance edge servers and user terminal devices, an adaptive matcher is deployed to prompt the adaptive matcher to monitor changes in network bandwidth in real time Adaptively match the neural network model that satisfies the current network's real-time point cloud video frame feature extraction and reconstruction recovery.
进一步的,所述分层特征提取模块包含三个串联的集合抽象层次进行分层特征学习,来捕捉原始点云的局部结构;所述集合抽象层由三个基本层组成,包括采样层、 分组层和迷你PointNet层;Further, the hierarchical feature extraction module includes three series of set abstraction levels for hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer is composed of three basic layers, including sampling layer, grouping layer and mini-PointNet layer;
其中,采样层,从上一层的输出中使用最远点采样技术选择一个子集,子集中的每个点表示一个局部区域的中心;分组层,在局部区域中心的周围找到n个最近邻点,组合构造成局部区域集合;迷你PointNet层,采用3个二维卷积层和1个最大池化层将局部区域集合变换为特征向量;最后一个集合抽象层的迷你PointNet层输出的特征向量即为所要传输的数据。Among them, the sampling layer uses the farthest point sampling technique to select a subset from the output of the previous layer, and each point in the subset represents the center of a local area; the grouping layer finds n nearest neighbors around the center of the local area Points are combined to construct a local area set; the mini PointNet layer uses 3 two-dimensional convolutional layers and 1 maximum pooling layer to transform the local area set into a feature vector; the feature vector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.
进一步的,所述点云恢复重建模块包括点云特征扩展部分以及最终点集的生成部分,其中,点云特征扩展部分,用于在收到传输过来的关键点云特征,通过一个多层感知器来统一特征的维度;并通过一个向上-向下-向上的扩展单元,产生更多样化的点云数目和特征维度;最终点集生成部分,包含两个多层感知器层,并通过该两个多层感知器层将扩展后的点云特征重构为三维坐标形式。Further, the point cloud restoration and reconstruction module includes a point cloud feature extension part and a final point set generation part, wherein the point cloud feature extension part is used to receive and transmit key point cloud features through a multi-layer perception The dimensionality of the feature is unified by the device; and through an up-down-up extension unit, a more diverse number of point clouds and feature dimensions are generated; the final point set generation part contains two layers of multi-layer perceptrons, and passed The two multilayer perceptron layers reconstruct the expanded point cloud features into a 3D coordinate form.
另外,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征包括:通过部署到靠近输入端的高性能边缘服务器的分层特征提取模块,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征。In addition, the layered feature extraction of the point cloud video data and the determination of the key point cloud features in the point cloud video frame include: through the layered feature extraction module deployed to a high-performance edge server close to the input end, the point cloud video data is analyzed. Layer feature extraction to identify key point cloud features in point cloud video frames.
此外,对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频包括:通过部署到靠近输入端的用户终端设备上的基于生成对抗网络的点云恢复重建模块,对关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。In addition, expanding and reconstructing the received key point cloud features, obtaining point cloud information similar to the original input point cloud, and forming the original point cloud video in terms of visual effects include: The point cloud recovery and reconstruction module of the generated confrontation network expands and reconstructs the key point cloud features, obtains point cloud information similar to the original input point cloud, and forms the original point cloud video in terms of visual effects.
根据本发明的另一个方面,提供了一种AI驱动的实时点云视频传输系统。According to another aspect of the present invention, an AI-driven real-time point cloud video transmission system is provided.
该AI驱动的实时点云视频传输系统包括:This AI-driven real-time point cloud video transmission system includes:
点云视频数据获取模块,用于利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据;The point cloud video data acquisition module is used to obtain video data information using AI generation equipment, and process the video data information to obtain point cloud video data;
关键点云特征提取模块,用于对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征,并传输该关键点云特征;The key point cloud feature extraction module is used to extract the layered features of the point cloud video data, determine the key point cloud features in the point cloud video frame, and transmit the key point cloud features;
点云恢复重建模块,用于接收的关键点云特征,并对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。The point cloud recovery and reconstruction module is used to receive the key point cloud features, expand and reconstruct the received key point cloud features, obtain point cloud information similar to the original input point cloud, and form the original point cloud video in visual effect .
其中,所述点云视频数据获取模块在利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据时,利用多个不同角度的深度摄像机扫描需要传输的3D模型,并获取每个深度摄像机的点云流;并将从多视角摄像机汇集的多个点云拼接出一个完整的点云,得到点云视频数据Wherein, the point cloud video data acquisition module uses AI generating equipment to acquire video data information, and processes the video data information to obtain point cloud video data, and uses a plurality of depth cameras at different angles to scan the 3D model that needs to be transmitted , and obtain the point cloud stream of each depth camera; and splicing multiple point clouds collected from multi-view cameras into a complete point cloud to obtain point cloud video data
此外,该AI驱动的实时点云视频传输系统还包括:In addition, the AI-driven real-time point cloud video transmission system also includes:
神经网络训练模块,用于将预先搭建好的深度神经网络,在线下使用大量包含各种规模的3D模型组成的训练集进行训练,训练得到多个候选神经网络模型;The neural network training module is used to train the pre-built deep neural network offline using a large number of training sets consisting of 3D models of various sizes, and obtain multiple candidate neural network models through training;
神经网络部署模块,用于训练好的每个神经网络模型拆分成分层特征提取模块和基于生成对抗网络的点云恢复重建模块;并分别部署到靠近输入端的高性能边缘服务器和用户终端设备上;The neural network deployment module is used to split each trained neural network model into a layered feature extraction module and a point cloud restoration and reconstruction module based on the generation confrontation network; and deploy them to high-performance edge servers and user terminal devices close to the input terminal respectively ;
神经网络匹配模块,用于在将分层特征提取模块和基于生成对抗网络的点云恢复重建模块部署到高性能边缘服务器和用户终端设备的同时,部署自适应匹配器,促使自适应匹配器根据实时监测到的网络带宽变化自适应地匹配满足当前网络的实时点云视频帧特征提取与重建恢复的神经网络模型。The neural network matching module is used to deploy the adaptive matcher while deploying the layered feature extraction module and the point cloud restoration and reconstruction module based on the generative confrontation network to high-performance edge servers and user terminal devices, so that the adaptive matcher can be deployed according to The real-time monitored network bandwidth changes are adaptively matched to the neural network model that satisfies the current network's real-time point cloud video frame feature extraction and reconstruction recovery.
进一步的,所述分层特征提取模块包含三个串联的集合抽象层次进行分层特征学习,来捕捉原始点云的局部结构;所述集合抽象层由三个基本层组成,包括采样层、 分组层和迷你PointNet层;Further, the hierarchical feature extraction module includes three series of set abstraction levels for hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer is composed of three basic layers, including sampling layer, grouping layer and mini-PointNet layer;
其中,采样层,从上一层的输出中使用最远点采样技术选择一个子集,子集中的每个点表示一个局部区域的中心;分组层,在局部区域中心的周围找到n个最近邻点,组合构造成局部区域集合;迷你PointNet层,采用3个二维卷积层和1个最大池化层将局部区域集合变换为特征向量;最后一个集合抽象层的迷你PointNet层输出的特征向量即为所要传输的数据。Among them, the sampling layer uses the farthest point sampling technique to select a subset from the output of the previous layer, and each point in the subset represents the center of a local area; the grouping layer finds n nearest neighbors around the center of the local area Points are combined to construct a local area set; the mini PointNet layer uses 3 two-dimensional convolutional layers and 1 maximum pooling layer to transform the local area set into a feature vector; the feature vector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.
进一步的,所述点云恢复重建模块包括点云特征扩展部分以及最终点集的生成部分,其中,点云特征扩展部分,用于在收到传输过来的关键点云特征,通过一个多层感知器来统一特征的维度;并通过一个向上-向下-向上的扩展单元,产生更多样化的点云数目和特征维度;最终点集生成部分,包含两个多层感知器层,用于通过该两个多层感知器层将扩展后的点云特征重构为三维坐标形式。Further, the point cloud restoration and reconstruction module includes a point cloud feature extension part and a final point set generation part, wherein the point cloud feature extension part is used to receive and transmit key point cloud features through a multi-layer perception The dimension of the feature is unified by the device; and through an up-down-up expansion unit, a more diverse number of point clouds and feature dimensions are generated; the final point set generation part contains two multi-layer perceptron layers for The expanded point cloud features are reconstructed into a three-dimensional coordinate form through the two multi-layer perceptron layers.
此外,所述关键点云特征提取模块在对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征时,通过部署到靠近输入端的高性能边缘服务器的分层特征提取模块,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征。In addition, when the key point cloud feature extraction module performs layered feature extraction on point cloud video data and determines key point cloud features in point cloud video frames, the layered feature extraction module is deployed to a high-performance edge server close to the input end. The module extracts layered features from the point cloud video data, and determines the key point cloud features in the point cloud video frame.
另外,所述点云恢复重建模块在对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频时,通过部署到靠近输入端的用户终端设备上的基于生成对抗网络的点云恢复重建模块,对关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。In addition, when the point cloud restoration and reconstruction module expands and reconstructs the received key point cloud features, obtains point cloud information similar to the original input point cloud, and forms the original point cloud video on the visual effect, it is deployed close to The point cloud recovery and reconstruction module based on the generative confrontation network on the user terminal device at the input end expands and reconstructs key point cloud features, obtains point cloud information similar to the original input point cloud, and forms the original point cloud video in visual effect .
有益效果:Beneficial effect:
本发明通过将原始需要传输的点云视频流进行特征提取,只传输部分关键点云特征,最后在接收端进行恢复重建,达到视觉上传输的是原始点云视频的效果,可以显著降低点云视频流的传输量和能耗。避免了传统传输方案中繁琐的多重处理,大大减少了数据传输量,使其更适合于现有的网络环境。本发明还考虑了网络环境的动态性和不稳定性,将其纳入到端到端网络设计和训练中,提供了自适应传输控制算法来平衡传输时延和重建准确率。The present invention extracts the features of the original point cloud video stream that needs to be transmitted, only transmits part of the key point cloud features, and finally restores and reconstructs at the receiving end, so as to achieve the effect of visually transmitting the original point cloud video, which can significantly reduce the point cloud Transmission volume and energy consumption of video streams. It avoids the cumbersome multi-processing in the traditional transmission scheme, greatly reduces the amount of data transmission, and makes it more suitable for the existing network environment. The present invention also considers the dynamics and instability of the network environment, incorporates it into end-to-end network design and training, and provides an adaptive transmission control algorithm to balance transmission delay and reconstruction accuracy.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1是根据本发明实施例的一种AI驱动的实时点云视频传输的流程示意图;Fig. 1 is a schematic flow chart of real-time point cloud video transmission driven by a kind of AI according to an embodiment of the present invention;
图2是根据本发明实施例的一种AI驱动的实时点云视频传输系统的结构示意框图;Fig. 2 is a structural schematic block diagram of an AI-driven real-time point cloud video transmission system according to an embodiment of the present invention;
图3是根据本发明实施例的一种AI驱动的实时点云视频传输方法的原理示意图;3 is a schematic diagram of the principle of an AI-driven real-time point cloud video transmission method according to an embodiment of the present invention;
图4是根据本发明实施例的深度神经网络模型结构设计示意图。FIG. 4 is a schematic diagram of a structural design of a deep neural network model according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention belong to the protection scope of the present invention.
根据本发明的实施例,提供了一种AI驱动的实时点云视频传输方法。According to an embodiment of the present invention, an AI-driven real-time point cloud video transmission method is provided.
如图1所示,根据本发明实施例的AI驱动的实时点云视频传输方法包括:As shown in Figure 1, the AI-driven real-time point cloud video transmission method according to the embodiment of the present invention includes:
步骤S101,利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据;Step S101, using the AI generating device to obtain video data information, and processing the video data information to obtain point cloud video data;
步骤S103,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征,并传输该关键点云特征;Step S103, performing layered feature extraction on the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features;
步骤S105,接收的关键点云特征,并对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。Step S105, the received key point cloud features, and expand and reconstruct the received key point cloud features, to obtain point cloud information similar to the original input point cloud, and form the original point cloud video in visual effect.
其中,利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据包括:利用多个不同角度的深度摄像机扫描需要传输的3D模型,并获取每个深度摄像机的点云流;并将从多视角摄像机汇集的多个点云拼接出一个完整的点云,得到点云视频数据;Among them, using AI generation equipment to obtain video data information, and processing the video data information to obtain point cloud video data includes: using multiple depth cameras from different angles to scan the 3D model to be transmitted, and obtaining the point of each depth camera Cloud flow; and splicing multiple point clouds collected from multi-view cameras into a complete point cloud to obtain point cloud video data;
此外,该AI驱动的实时点云视频传输方法还包括:In addition, this AI-driven real-time point cloud video transmission method also includes:
将预先搭建好的深度神经网络,在线下使用大量包含各种规模的3D模型组成的训练集进行训练,训练得到多个候选神经网络模型;The pre-built deep neural network will be trained offline using a large number of training sets consisting of 3D models of various sizes, and multiple candidate neural network models will be obtained through training;
训练好的每个神经网络模型拆分成分层特征提取模块和基于生成对抗网络的点云恢复重建模块;并分别部署到靠近输入端的高性能边缘服务器和用户终端设备上;Each trained neural network model is split into a layered feature extraction module and a point cloud restoration and reconstruction module based on a generative confrontation network; and are respectively deployed to high-performance edge servers and user terminal devices near the input end;
在将分层特征提取模块和基于生成对抗网络的点云恢复重建模块部署到高性能边缘服务器和用户终端设备的同时,部署自适应匹配器,促使自适应匹配器根据实时监测到的网络带宽变化自适应地匹配满足当前网络的实时点云视频帧特征提取与重建恢复的神经网络模型。While deploying the layered feature extraction module and the point cloud restoration and reconstruction module based on the generative confrontation network to high-performance edge servers and user terminal devices, an adaptive matcher is deployed to prompt the adaptive matcher to monitor changes in network bandwidth in real time Adaptively match the neural network model that satisfies the current network's real-time point cloud video frame feature extraction and reconstruction recovery.
进一步的,所述分层特征提取模块包含三个串联的集合抽象层次进行分层特征学习,来捕捉原始点云的局部结构;所述集合抽象层由三个基本层组成,包括采样层、 分组层和迷你PointNet层;Further, the hierarchical feature extraction module includes three series of set abstraction levels for hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer is composed of three basic layers, including sampling layer, grouping layer and mini-PointNet layer;
其中,采样层,从上一层的输出中使用最远点采样技术选择一个子集,子集中的每个点表示一个局部区域的中心;分组层,在局部区域中心的周围找到n个最近邻点,组合构造成局部区域集合;迷你PointNet层,采用3个二维卷积层和1个最大池化层将局部区域集合变换为特征向量;最后一个集合抽象层的迷你PointNet层输出的特征向量即为所要传输的数据。Among them, the sampling layer uses the farthest point sampling technique to select a subset from the output of the previous layer, and each point in the subset represents the center of a local area; the grouping layer finds n nearest neighbors around the center of the local area Points are combined to construct a local area set; the mini PointNet layer uses 3 two-dimensional convolutional layers and 1 maximum pooling layer to transform the local area set into a feature vector; the feature vector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.
进一步的,所述点云恢复重建模块包括点云特征扩展部分以及最终点集的生成部分,其中,点云特征扩展部分,用于在收到传输过来的关键点云特征,通过一个多层感知器来统一特征的维度;并通过一个向上-向下-向上的扩展单元,产生更多样化的点云数目和特征维度;最终点集生成部分,包含两个多层感知器层,并通过该两个多层感知器层将扩展后的点云特征重构为三维坐标形式。Further, the point cloud restoration and reconstruction module includes a point cloud feature extension part and a final point set generation part, wherein the point cloud feature extension part is used to receive and transmit key point cloud features through a multi-layer perception The dimensionality of the feature is unified by the device; and through an up-down-up extension unit, a more diverse number of point clouds and feature dimensions are generated; the final point set generation part contains two layers of multi-layer perceptrons, and passed The two multilayer perceptron layers reconstruct the expanded point cloud features into a 3D coordinate form.
另外,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征包括:通过部署到靠近输入端的高性能边缘服务器的分层特征提取模块,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征。In addition, the layered feature extraction of the point cloud video data and the determination of the key point cloud features in the point cloud video frame include: through the layered feature extraction module deployed to a high-performance edge server close to the input end, the point cloud video data is analyzed. Layer feature extraction to identify key point cloud features in point cloud video frames.
此外,对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频包括:通过部署到靠近输入端的用户终端设备上的基于生成对抗网络的点云恢复重建模块,对关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。In addition, expanding and reconstructing the received key point cloud features, obtaining point cloud information similar to the original input point cloud, and forming the original point cloud video in terms of visual effects include: The point cloud recovery and reconstruction module of the generated confrontation network expands and reconstructs the key point cloud features, obtains point cloud information similar to the original input point cloud, and forms the original point cloud video in terms of visual effects.
据本发明的实施例,提供了一种AI驱动的实时点云视频传输系统。According to an embodiment of the present invention, an AI-driven real-time point cloud video transmission system is provided.
如图2所示,根据本发明实施例的AI驱动的实时点云视频传输系统包括:As shown in Figure 2, the AI-driven real-time point cloud video transmission system according to the embodiment of the present invention includes:
点云视频数据获取模块201,用于利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据;The point cloud video
关键点云特征提取模块203,用于对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征,并传输该关键点云特征;The key point cloud
点云恢复重建模块205,用于接收的关键点云特征,并对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。The point cloud restoration and
其中,所述点云视频数据获取模块201在利用AI生成设备获取视频数据信息,并对该视频数据信息进行处理,得到点云视频数据时,利用多个不同角度的深度摄像机扫描需要传输的3D模型,并获取每个深度摄像机的点云流;并将从多视角摄像机汇集的多个点云拼接出一个完整的点云,得到点云视频数据Wherein, when the point cloud video
此外,该AI驱动的实时点云视频传输系统还包括:神经网络训练模块(图中未示出),用于将预先搭建好的深度神经网络,在线下使用大量包含各种规模的3D模型组成的训练集进行训练,训练得到多个候选神经网络模型;神经网络部署模块(图中未示出),用于训练好的每个神经网络模型拆分成分层特征提取模块和基于生成对抗网络的点云恢复重建模块;并分别部署到靠近输入端的高性能边缘服务器和用户终端设备上;神经网络匹配模块(图中未示出),用于在将分层特征提取模块和基于生成对抗网络的点云恢复重建模块部署到高性能边缘服务器和用户终端设备的同时,部署自适应匹配器,促使自适应匹配器根据实时监测到的网络带宽变化自适应地匹配满足当前网络的实时点云视频帧特征提取与重建恢复的神经网络模型。In addition, the AI-driven real-time point cloud video transmission system also includes: a neural network training module (not shown in the figure), which is used to combine a pre-built deep neural network with a large number of 3D models of various sizes used offline The training set is trained, and multiple candidate neural network models are obtained through training; the neural network deployment module (not shown in the figure) is used to split each trained neural network model into a hierarchical feature extraction module and a GAN-based The point cloud restoration and reconstruction module; and respectively deployed to high-performance edge servers and user terminal devices close to the input; neural network matching module (not shown in the figure), used to combine the layered feature extraction module and the generation confrontation network-based When the point cloud recovery and reconstruction module is deployed to the high-performance edge server and user terminal equipment, an adaptive matcher is deployed to prompt the adaptive matcher to adaptively match the real-time point cloud video frame that satisfies the current network according to the real-time monitored network bandwidth changes Feature extraction and reconstruction of restored neural network models.
所述分层特征提取模块包含三个串联的集合抽象层次进行分层特征学习,来捕捉原始点云的局部结构;所述集合抽象层由三个基本层组成,包括采样层、 分组层和迷你PointNet层;The hierarchical feature extraction module contains three series of set abstraction levels for hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer is composed of three basic layers, including sampling layer, grouping layer and mini PointNet layer;
其中,采样层,从上一层的输出中使用最远点采样技术选择一个子集,子集中的每个点表示一个局部区域的中心;分组层,在局部区域中心的周围找到n个最近邻点,组合构造成局部区域集合;迷你PointNet层,采用3个二维卷积层和1个最大池化层将局部区域集合变换为特征向量;最后一个集合抽象层的迷你PointNet层输出的特征向量即为所要传输的数据。Among them, the sampling layer uses the farthest point sampling technique to select a subset from the output of the previous layer, and each point in the subset represents the center of a local area; the grouping layer finds n nearest neighbors around the center of the local area Points are combined to construct a local area set; the mini PointNet layer uses 3 two-dimensional convolutional layers and 1 maximum pooling layer to transform the local area set into a feature vector; the feature vector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.
所述点云恢复重建模块包括点云特征扩展部分以及最终点集的生成部分,其中,点云特征扩展部分,用于在收到传输过来的关键点云特征,通过一个多层感知器来统一特征的维度;并通过一个向上-向下-向上的扩展单元,产生更多样化的点云数目和特征维度;最终点集生成部分,包含两个多层感知器层,用于通过该两个多层感知器层将扩展后的点云特征重构为三维坐标形式。The point cloud restoration and reconstruction module includes a point cloud feature extension part and a final point set generation part, wherein the point cloud feature extension part is used to unify the key point cloud features received and transmitted through a multi-layer perceptron The dimension of the feature; and through an up-down-up expansion unit, a more diverse number of point clouds and feature dimensions are generated; the final point set generation part contains two multi-layer perceptron layers, which are used to pass the two A multi-layer perceptron layer reconstructs the expanded point cloud features into a three-dimensional coordinate form.
此外,所述关键点云特征提取模块203在对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征时,通过部署到靠近输入端的高性能边缘服务器的分层特征提取模块,对点云视频数据进行分层特征提取,确定点云视频帧中的关键点云特征。In addition, when the key point cloud
另外,所述点云恢复重建模块205在对接收的关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频时,通过部署到靠近输入端的用户终端设备上的基于生成对抗网络的点云恢复重建模块,对关键点云特征进行扩展与重构,获得与原始输入点云相似的点云信息,形成视觉效果上的原始点云视频。In addition, when the point cloud recovery and
而为了方便更清楚的理解本发明的上述技术方案,以下从工作原理的角度对本发明的上述技术方案进行详细说明。In order to facilitate a clearer understanding of the above-mentioned technical solution of the present invention, the above-mentioned technical solution of the present invention will be described in detail below from the perspective of working principles.
图3是本发明的原理示意图,从图3中可以看出,本发明的方法如下:Fig. 3 is a principle schematic diagram of the present invention, as can be seen from Fig. 3, method of the present invention is as follows:
(1)多视角摄像机。系统使用多个放置在不同角度的深度摄像机来捕获原始点云,并使用USB线进行预处理,将每个摄像机的点云流同步到高性能边缘服务器进行拼接。(1) Multi-view camera. The system uses multiple depth cameras placed at different angles to capture the original point cloud, and uses a USB cable for preprocessing, and synchronizes the point cloud stream of each camera to a high-performance edge server for stitching.
(2)点云特征提取和点云信息恢复重建。关键点云特征提取即本发明提供的分层特征提取模块,对拼接后的点云进行关键特征提取;基于生成对抗网络的点云信息恢复重建即本发明提供的点云重建恢复模块,将接收的点云特征恢复重建。(2) Point cloud feature extraction and point cloud information restoration and reconstruction. The key point cloud feature extraction is the layered feature extraction module provided by the present invention, which extracts the key features of the spliced point cloud; the point cloud information restoration and reconstruction based on the generative confrontation network is the point cloud reconstruction recovery module provided by the present invention, which will receive The point cloud feature recovery reconstruction.
(3)自适应匹配器。用于感知所连接终端的网络状况,选择最优的传输推理模型,以保持通信的平稳运行,提高用户体验。(3) Adaptive matcher. It is used to perceive the network status of the connected terminal and select the optimal transmission reasoning model to maintain the smooth operation of communication and improve user experience.
(4)基站。该系统可以在当前网络环境下提供实时点云传输,利用现有基站将关键点云特征通过无线传输到各种终端。(4) Base station. The system can provide real-time point cloud transmission under the current network environment, and use existing base stations to wirelessly transmit key point cloud features to various terminals.
(5)用户终端。该系统可应用于多种终端,应用场景广泛。例如,智能手机上实现实时全息通信。更身临其境的体验例如使用AR眼镜或AR耳机渲染点云,用户可以身临其境地在点云视频中进行交互。(5) User terminal. The system can be applied to various terminals and has a wide range of application scenarios. For example, real-time holographic communication is realized on smartphones. A more immersive experience such as using AR glasses or AR headsets to render point clouds, users can interact in the point cloud video immersively.
图4是深度神经网络模型结构设计示意图,从图4中可以看出,深度神经网络模型结构包括分层特征提取模块和点云恢复重建模块,具体包括:Figure 4 is a schematic diagram of the structural design of the deep neural network model. It can be seen from Figure 4 that the deep neural network model structure includes a layered feature extraction module and a point cloud restoration and reconstruction module, specifically including:
(1)分层特征提取模块。用于对输入点云进行学习,提取部分关键点云的特征进行传输。具体地,该模块基于多个集合抽象层次进行分层特征学习,捕捉原始点云的局部结构。其中,每个集合抽象层由三个基本层组成,包括采样层,分组层和迷你PointNet层。(1) Hierarchical feature extraction module. It is used to learn the input point cloud and extract the features of some key point clouds for transmission. Specifically, this module performs hierarchical feature learning based on multiple sets of abstraction levels to capture the local structure of raw point clouds. Among them, each collection abstraction layer consists of three basic layers, including sampling layer, grouping layer and mini-PointNet layer.
采样层用于从上一层的输出中选择一个子集,表示局部区域的中心;分组层用于在中心周围找到n个最近邻点来构造局部区域集合,迷你PointNet用于采用3个二维卷积层和1个最大池化层将局部区域集合变换为特征向量。The sampling layer is used to select a subset from the output of the previous layer to represent the center of the local area; the grouping layer is used to find n nearest neighbors around the center to construct a local area set, and the mini PointNet is used to use 3 two-dimensional Convolutional layers and 1 max pooling layer transform the set of local regions into feature vectors.
(2)点云恢复重建模块。用于对接收端接收的关键点云特征进行恢复重建。具体地,该模块使用对抗生成网络中的部分生成思想,相比于整个生成对抗网络具有更少的参数和计算量,便于在资源受限的终端上部署。包括一个点云特征扩展部分和最终点集生成部分。(2) Point cloud restoration and reconstruction module. It is used to restore and reconstruct the key point cloud features received by the receiving end. Specifically, this module uses part of the generation idea in the confrontational network, which has fewer parameters and calculations than the entire generative confrontational network, and is easy to deploy on resource-constrained terminals. It includes a point cloud feature extension part and a final point set generation part.
点云特征扩展部分,首先收到传输过来的点云特征矩阵,通过一个多层感知器层来统一特征的维度。然后通过一个向上-向下-向上的扩展单元,产生更多样化的点云数目和特征维度。The point cloud feature extension part first receives the transmitted point cloud feature matrix, and unifies the dimension of the feature through a multi-layer perceptron layer. Then through an up-down-up extension unit, a more diverse number of point clouds and feature dimensions are generated.
最终点集生成部分,通过两个多层感知器层将扩展后的特征重构为三维坐标形式。In the final point set generation part, the expanded features are reconstructed into a three-dimensional coordinate form through two multi-layer perceptron layers.
在实际运用时,通过深度神经网络中的分层特征提取模块对原始数据进行点云特征提取时,具体实施方案可如下所示:In actual application, when the point cloud feature extraction is performed on the original data through the layered feature extraction module in the deep neural network, the specific implementation scheme can be as follows:
(1)训练阶段(1) Training stage
a、在训练集中的每个点云模型的表面随机选择200个点作为种子坐标,以每个种子坐标为中心,使用最远点采样技术,寻找其周围的256个点,使得这256个点构成的区域大约占据模型表面的5%,种子坐标和这256个点定义为一个碎片,并将碎片内的点集坐标归一化到一个单位球中。在本实施例中,输入到神经网络中的每个样本即为一个碎片,其包含256个点,每个点具有三维坐标,输入可表示为(256,3)。a. On the surface of each point cloud model in the training set, 200 points are randomly selected as the seed coordinates, centered on each seed coordinate, and the farthest point sampling technology is used to find 256 points around it, so that these 256 points The constituted area occupies approximately 5% of the model surface, the seed coordinates and these 256 points are defined as a fragment, and the point set coordinates within the fragment are normalized into a unit sphere. In this embodiment, each sample input into the neural network is a fragment, which includes 256 points, each point has three-dimensional coordinates, and the input can be expressed as (256, 3).
b、将(256,3)的输入样本输入到采样层,具体步骤为,使用最远点采样技术,选择128个点,得到稀疏点集,表示为(128,3)。其中选择使用最远点采样技术的原因是其相比于随机采样,可以更好地覆盖整个点集,具体选择中心点的数量,由人为指定,在本实施例中指定为128。b. Input the input samples of (256, 3) to the sampling layer. The specific steps are: use the farthest point sampling technology to select 128 points to obtain a sparse point set, expressed as (128, 3). The reason for choosing to use the farthest point sampling technique is that it can better cover the entire point set compared with random sampling. The number of specifically selected center points is specified manually, and is specified as 128 in this embodiment.
c、将稀疏点集(128,3)的信息输入到分组层,具体步骤为,以128个点为中心,使用球查询方法生成128个局部区域,每个区域内包含32个点,球半径为0.2,得到分组特征,表示为(128,32,3)。其中每个区域中点的数量和球的半径,由人为指定,在本实施例中指定为32和0.2。此步骤也可通过K近邻方法实现,两者对于结果的影响不大。c. Input the information of the sparse point set (128, 3) into the grouping layer. The specific steps are, centering on 128 points, use the ball query method to generate 128 local areas, each area contains 32 points, and the radius of the ball is is 0.2, and the grouping feature is obtained, expressed as (128, 32, 3). The number of points in each region and the radius of the ball are specified manually, and are specified as 32 and 0.2 in this embodiment. This step can also be implemented by the K-nearest neighbor method, both of which have little influence on the result.
d、将分组特征(128,32,3)的信息输入到迷你PointNet层,具体步骤为,分组特征(128,32,3)先后经过三个二维卷积层和一个最大池化层,输出分层特征信息(128,64)。其中,在本实施例中,三个二维卷积层的输出通道数目为64,64,64,卷积核大小为1×1,步长为1×1,无填充。d. Input the information of the grouping features (128, 32, 3) into the mini PointNet layer. The specific steps are, the grouping features (128, 32, 3) have passed through three two-dimensional convolutional layers and a maximum pooling layer successively, and output Hierarchical feature information (128, 64). Wherein, in this embodiment, the number of output channels of the three two-dimensional convolutional layers is 64, 64, and 64, the size of the convolution kernel is 1×1, the step size is 1×1, and there is no padding.
e、将分层特征信息(128,64)再输入到采样层,具体步骤为,使用最远点采样技术,选择64个点,输出稀疏点集特征(64,64)。其中具体选择中心点的数量,由人为指定,在本实施例中指定为64。e. Input the layered feature information (128, 64) into the sampling layer, the specific steps are: use the farthest point sampling technology, select 64 points, and output the sparse point set feature (64, 64). The number of specifically selected center points is specified manually, and is specified as 64 in this embodiment.
f、将稀疏点集特征(64,64)的信息再输入到分组层,具体步骤为,以64个点为中心,使用球查询方法生成64个局部区域,每个区域内包含64个点,球半径为0.3,得到分组特征,表示为(64,64,64)。其中每个区域中点的数量和球的半径,由人为指定,在本实施例中指定为64和0.3。此步骤也可通过K近邻方法实现。f. Input the information of the sparse point set features (64, 64) into the grouping layer. The specific steps are: centering on 64 points, use the ball query method to generate 64 local areas, each area contains 64 points, The radius of the ball is 0.3, and the grouping feature is obtained, expressed as (64, 64, 64). The number of points in each region and the radius of the ball are specified manually, and are specified as 64 and 0.3 in this embodiment. This step can also be implemented by the K-nearest neighbor method.
g、将分组特征(64,64,64)的信息输入到迷你PointNet层,具体步骤为,分组特征(64,64,64)先后经过三个二维卷积层和一个最大池化层,输出分层特征信息(64,32)。其中,在本实施例中,三个二维卷积层的输出通道数目为64,64,32,卷积核大小为1×1,步长为1×1,无填充。g. Input the information of the grouping features (64, 64, 64) into the mini PointNet layer. The specific steps are: the grouping features (64, 64, 64) successively pass through three two-dimensional convolutional layers and a maximum pooling layer, and output Hierarchical feature information (64, 32). Wherein, in this embodiment, the number of output channels of the three two-dimensional convolutional layers is 64, 64, and 32, the size of the convolution kernel is 1×1, the step size is 1×1, and there is no padding.
h、将分层特征信息(64,32)再输入到采样层,具体步骤为,使用最远点采样技术,选择N个点,得到稀疏点集特征(N,32)。其中具体选择中心点的数量N是一个变量。h. Input the layered feature information (64, 32) into the sampling layer again. The specific steps are: use the farthest point sampling technology to select N points to obtain the sparse point set feature (N, 32). The number N of specifically selected center points is a variable.
i、将稀疏点集特征(N,32)的信息输入到分组层,具体步骤为,以N个点为中心,使用球查询方法生成N个局部区域,每个区域内包含64个点,半径为0.4,得到分组特征,表示为(N,64,32)。其中每个区域中点的数量和球的半径,由人为指定,在本实施例中指定为64和0.3。N是一个变量。此步骤也可通过K近邻方法实现。i. Input the information of sparse point set features (N, 32) to the grouping layer. The specific steps are, centering on N points, use the ball query method to generate N local areas, each area contains 64 points, and the radius is 0.4, the grouping feature is obtained, expressed as (N, 64, 32). The number of points in each region and the radius of the ball are specified manually, and are specified as 64 and 0.3 in this embodiment. N is a variable. This step can also be implemented by the K-nearest neighbor method.
j、将分组特征(N,64,32)的信息输入到迷你PointNet层,具体步骤为,分组特征(N,64,32)先后经过三个二维卷积层和一个最大池化层,得到分层点云特征信息(N,M)。其中,在本实施例中,三个二维卷积层的输出通道数目为32,32,M,卷积核大小为1×1,步长为1×1,无填充。j. Input the information of the grouping features (N, 64, 32) into the mini PointNet layer. The specific steps are as follows: the grouping features (N, 64, 32) successively pass through three two-dimensional convolutional layers and a maximum pooling layer to obtain Hierarchical point cloud feature information (N, M). Wherein, in this embodiment, the number of output channels of the three two-dimensional convolutional layers is 32, 32, M, the size of the convolution kernel is 1×1, the step size is 1×1, and there is no padding.
对于分层特征提取模块,采样层,分组层和迷你PointNet层为一个集合抽象层次,步骤b-d为集合抽象层次1,e-g为集合抽象层次2,h-j为集合抽象层次3。其中集合抽象层次的数量由人为指定,本实施例中指定为3,最后一个抽象层次的输出为点云特征(N,M)。将所有的训练样本输入到深度神经网络中,经过步骤b-j,前向传播和反向传播,计算损失函数,更新网络的权重,训练出神经网络模型。将N和M设置成不同的组合,即可训练出若干个候选模型。For the hierarchical feature extraction module, the sampling layer, grouping layer and mini-PointNet layer are a set abstraction level, steps b-d are set abstraction level 1, e-g are set abstraction level 2, and h-j are set abstraction level 3. The number of collection abstraction levels is manually specified, and it is specified as 3 in this embodiment, and the output of the last abstraction level is point cloud features (N, M). Input all the training samples into the deep neural network, after steps b-j, forward propagation and back propagation, calculate the loss function, update the weight of the network, and train the neural network model. By setting N and M to different combinations, several candidate models can be trained.
(2)推理阶段(2) Reasoning stage
a、在要传输的视频帧中的目标物体中的表面随机选择若干个点作为种子坐标,以每个种子坐标为中心,使用最远点采样技术,寻找其周围的256个点,形成一个碎片,并将碎片内的点集坐标归一化到一个单位球中。其中种子数量由人为指定,在本发明中指定为目标点云数量除以256。a. Randomly select several points on the surface of the target object in the video frame to be transmitted as the seed coordinates, and use the farthest point sampling technology to find 256 points around it with each seed coordinate as the center to form a fragment , and normalize the point set coordinates within the fragment into a unit sphere. Wherein the number of seeds is specified manually, and in the present invention, it is specified as the number of target point clouds divided by 256.
b、根据网络波动情况,选择最优推理模型。b. According to network fluctuations, select the optimal reasoning model.
c、以碎片为单位输入最优推理模型,进行前向推理,得到最终需要传输的点云特征。c. Input the optimal inference model in units of fragments, perform forward inference, and obtain the final point cloud features that need to be transmitted.
在实际运用时,通过点云恢复重建模块对点云特征进行点云信息重建时,具体实施方案可如下所示:In actual use, when the point cloud information is reconstructed for point cloud features through the point cloud restoration and reconstruction module, the specific implementation plan can be as follows:
(1)训练阶段(1) Training stage
a、点云特征(N,M),经过一个二维卷积层,得到统一维度的点云特征(N,128)。该层的作用使得,即使通过不同的推理模型压缩的关键点云特征,经过此层也能输出相同维度的特征数目。其中,二维卷积层的输出通道数目为128,卷积核大小为1×1,步长为1×1,无填充。a. Point cloud features (N, M), after a two-dimensional convolutional layer, a uniform dimension point cloud feature (N, 128) is obtained. The function of this layer makes it possible to output the number of features of the same dimension even through key point cloud features compressed by different inference models. Among them, the number of output channels of the two-dimensional convolutional layer is 128, the convolution kernel size is 1×1, the step size is 1×1, and there is no padding.
b、统一维度的点云特征,表示为,(N,128),使用一个上采样操作增大点云特征数量,得到上采样后的点云特征,表示为(256,128)。其中,上采样操作的具体步骤为:b. Point cloud features of uniform dimension, expressed as , (N, 128), use an upsampling operation to increase the number of point cloud features, and get the upsampled point cloud features , expressed as (256, 128). Among them, the specific steps of the upsampling operation are:
将特征(N,128)复制r次后得到(rN,128),r为采样倍率,等于256/N。After copying the feature (N, 128) r times to get (rN, 128), r is the sampling magnification, which is equal to 256/N.
采用2D网格机制,给每个特征生成一个唯一的二维向量,并将该向量附加到同一个特征中的对应的每个点的特征向量上,得到特征(rN,128+2)。Using the 2D grid mechanism, a unique two-dimensional vector is generated for each feature, and this vector is appended to the feature vector of each corresponding point in the same feature to obtain the feature (rN, 128+2).
使用自注意单元和两个二维卷积层生成上采样后的点云特征。其中两个二维卷积层的输出通道为256,128,卷积核大小为1×1,步长为1×1,无填充。Generate upsampled point cloud features using a self-attention unit and two 2D convolutional layers . The output channels of the two two-dimensional convolutional layers are 256, 128, the convolution kernel size is 1×1, the step size is 1×1, and there is no padding.
c、对点云特征(256,128),使用下采样操作得到与规模相同的点云特征,表示为(N,128)。其中,下采样操作的具体步骤为:c. For point cloud features (256, 128), using the downsampling operation to get the same as point cloud features at the same scale , expressed as (N, 128). Among them, the specific steps of the downsampling operation are:
将点云特征通过简单地移动行列,改造成(N,r×128)的特征,point cloud features By simply moving the ranks and columns, transformed into (N, r×128) features,
使用两个二维卷积层生成点云特征。其中两个二维卷积层的输出通道为256,128,卷积核大小为1×1,步长为1×1,无填充。Generate point cloud features using two 2D convolutional layers . The output channels of the two two-dimensional convolutional layers are 256, 128, the convolution kernel size is 1×1, the step size is 1×1, and there is no padding.
d、将点云特征与点云特征相减,得到点云特征,维度为(N,128)。d. Point cloud features and point cloud features Subtract to get point cloud features , with dimension (N, 128).
e、对点云特征,(N,128)使用相同的上采样操作得到上采样点云特征,维度为(256,128)。e. For point cloud features , (N, 128) use the same upsampling operation to get upsampled point cloud features , the dimension is (256, 128).
f、将点云特征与点云特征相加,得到点云特征,维度为(256,128)。f. Point cloud features and point cloud features Add to get point cloud features , the dimension is (256, 128).
g、将点云特征,维度为(256,128)经过两个二维卷积层,进行坐标重构,得到重建点云,维度为(256,3)。其中两个二维卷积层的输出通道为64,3,卷积核大小为1×1,步长为1×1,无填充。g. Point cloud features , the dimension is (256, 128) After two two-dimensional convolutional layers, the coordinates are reconstructed, and the reconstructed point cloud is obtained, and the dimension is (256, 3). The output channels of the two two-dimensional convolutional layers are 64, 3, the convolution kernel size is 1×1, the stride is 1×1, and there is no padding.
对于点云恢复重建模块,步骤c-f为向上-向下-向上扩展单元,步骤g为最终点集生成部分。将所有的传输得到的点云特征样本输入到深度神经网络中,经过步骤a-g,前向传播和反向传播,计算损失函数,更新网络的权重,训练出神经网络模型。For the point cloud restoration and reconstruction module, steps c-f are the up-down-up expansion unit, and step g is the final point set generation part. Input all the point cloud feature samples obtained from the transmission into the deep neural network, after steps a-g, forward propagation and back propagation, calculate the loss function, update the weight of the network, and train the neural network model.
(2)推理阶段(2) Reasoning stage
a、根据网络波动情况,选择最优推理模型。a. According to network fluctuations, select the optimal inference model.
b、将所有的传输得到的点云特征样本输入到深度神经网络中,经过步骤A-G,进行前向推理,得到恢复重建出的碎片点云信息。b. Input all the transmitted point cloud feature samples into the deep neural network, and perform forward reasoning through steps A-G to obtain the restored and reconstructed fragmented point cloud information.
c、将所有的碎片重建信息加在一起,使用最远点采样技术,选择出与原始输入点云数量相同的点,重建出最终的点云。c. Add all fragment reconstruction information together, use the farthest point sampling technology, select the same number of points as the original input point cloud, and reconstruct the final point cloud.
借助于本发明的上述技术方案,本发明通过将原始需要传输的点云视频流进行特征提取,只传输部分关键点云特征,最后在接收端进行恢复重建,达到视觉上传输的是原始点云视频的效果。With the help of the above-mentioned technical solution of the present invention, the present invention extracts features from the original point cloud video stream that needs to be transmitted, only transmits some key point cloud features, and finally restores and reconstructs at the receiving end, so that what is visually transmitted is the original point cloud video effects.
在运行时,本发明是完全使用AI作为驱动的,即使用深度神经网络来自动实现特征提取和重建目标,直接将原始点云坐标作为输入,不需要其他操作,包括信息预处理,人为地获取点云的几何分布情况,属性重要度信息,选择编解码方式等步骤,只需要对该深度神经网络进行训练后部署到实际应用便可,并且本发明中的深度神经网络分开便是特征提取和重建两个过程。At runtime, the present invention is completely driven by AI, that is, using a deep neural network to automatically achieve feature extraction and reconstruction goals, directly using the original point cloud coordinates as input, and does not require other operations, including information preprocessing, artificially obtained The geometric distribution of the point cloud, the attribute importance information, the selection of encoding and decoding methods, etc., only need to train the deep neural network and then deploy it to the actual application, and the separation of the deep neural network in the present invention is feature extraction and Rebuild both processes.
本发明的系统为端到端训练的神经网络,包括特征提取和重建,即只要有数据输入便可进行训练,更加智能化,无需关注内部具体操作,还可以根据网络状况,选择合适的推理模型,给用户更好的体验。The system of the present invention is a neural network for end-to-end training, including feature extraction and reconstruction, that is, training can be performed as long as there is data input, it is more intelligent, and it does not need to pay attention to internal specific operations, and can also select a suitable reasoning model according to network conditions , to give users a better experience.
同时,本发明可以在精度损失可接受的情况下达到30.72倍的高倍压缩比,在现有的5G以下环境下,实现实时传输,大大减少了传输数据量。At the same time, the present invention can achieve a high compression ratio of 30.72 times under the condition of acceptable loss of precision, and realize real-time transmission under the existing environment below 5G, greatly reducing the amount of transmitted data.
综上所述,本发明可以显著降低点云视频流的传输量和能耗。避免了传统传输方案中繁琐的多重处理,大大减少了数据传输量,使其更适合于现有的网络环境,本发明还考虑了网络环境的动态性和不稳定性,将其纳入到端到端网络设计和训练中,提供了自适应传输控制算法来平衡传输时延和重建准确率In summary, the present invention can significantly reduce the transmission volume and energy consumption of point cloud video streams. It avoids the cumbersome multiple processing in the traditional transmission scheme, greatly reduces the amount of data transmission, and makes it more suitable for the existing network environment. The present invention also considers the dynamics and instability of the network environment and incorporates it into the end-to-end In the design and training of terminal network, an adaptive transmission control algorithm is provided to balance transmission delay and reconstruction accuracy
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the scope of the present invention. within the scope of protection.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110985757.0A CN113810736B (en) | 2021-08-26 | 2021-08-26 | An AI-driven real-time point cloud video transmission method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110985757.0A CN113810736B (en) | 2021-08-26 | 2021-08-26 | An AI-driven real-time point cloud video transmission method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113810736A CN113810736A (en) | 2021-12-17 |
CN113810736B true CN113810736B (en) | 2022-11-01 |
Family
ID=78894093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110985757.0A Active CN113810736B (en) | 2021-08-26 | 2021-08-26 | An AI-driven real-time point cloud video transmission method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113810736B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110691243A (en) * | 2019-10-10 | 2020-01-14 | 叠境数字科技(上海)有限公司 | Point cloud geometric compression method based on deep convolutional network |
CN113256640A (en) * | 2021-05-31 | 2021-08-13 | 北京理工大学 | Method and device for partitioning network point cloud and generating virtual environment based on PointNet |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
LU100465B1 (en) * | 2017-10-05 | 2019-04-09 | Applications Mobiles Overview Inc | System and method for object recognition |
CN110012279B (en) * | 2018-01-05 | 2020-11-17 | 上海交通大学 | 3D point cloud data-based view-division compression and transmission method and system |
CN108320330A (en) * | 2018-01-23 | 2018-07-24 | 河北中科恒运软件科技股份有限公司 | Real-time three-dimensional model reconstruction method and system based on deep video stream |
WO2020189983A1 (en) * | 2019-03-18 | 2020-09-24 | Samsung Electronics Co., Ltd. | Method and apparatus for accessing and transferring point cloud content in 360-degree video environment |
WO2020190114A1 (en) * | 2019-03-21 | 2020-09-24 | 엘지전자 주식회사 | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
CN111901601B (en) * | 2019-05-06 | 2023-03-31 | 上海交通大学 | Code rate allocation method for unequal error protection in dynamic point cloud data transmission |
US11210815B2 (en) * | 2019-08-09 | 2021-12-28 | Intel Corporation | Point cloud playback mechanism |
US11729243B2 (en) * | 2019-09-20 | 2023-08-15 | Intel Corporation | Dash-based streaming of point cloud content based on recommended viewports |
CN111783838A (en) * | 2020-06-05 | 2020-10-16 | 东南大学 | A point cloud feature space representation method for laser SLAM |
CN112672168B (en) * | 2020-12-14 | 2022-10-18 | 深圳大学 | Point cloud compression method and device based on graph convolution |
CN113141526B (en) * | 2021-04-27 | 2022-06-07 | 合肥工业大学 | Point cloud video adaptive transmission method with joint resource allocation driven by QoE |
-
2021
- 2021-08-26 CN CN202110985757.0A patent/CN113810736B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110691243A (en) * | 2019-10-10 | 2020-01-14 | 叠境数字科技(上海)有限公司 | Point cloud geometric compression method based on deep convolutional network |
CN113256640A (en) * | 2021-05-31 | 2021-08-13 | 北京理工大学 | Method and device for partitioning network point cloud and generating virtual environment based on PointNet |
Also Published As
Publication number | Publication date |
---|---|
CN113810736A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109377530B (en) | A Binocular Depth Estimation Method Based on Deep Neural Network | |
CN110599395B (en) | Target image generation method, device, server and storage medium | |
US20240212252A1 (en) | Method and apparatus for training video generation model, storage medium, and computer device | |
Akyildiz et al. | Holographic-type communication: A new challenge for the next decade | |
Xia et al. | WiserVR: Semantic communication enabled wireless virtual reality delivery | |
Zhu et al. | A semantic-aware transmission with adaptive control scheme for volumetric video service | |
CN113315972A (en) | Video semantic communication method and system based on hierarchical knowledge expression | |
WO2022205755A1 (en) | Texture generation method and apparatus, device, and storage medium | |
CN111462274A (en) | A method and system for human image synthesis based on SMPL model | |
CN115359173A (en) | Virtual multi-viewpoint video generation method, device, electronic device and storage medium | |
Shao et al. | Point cloud in the air | |
Bing et al. | Collaborative image compression and classification with multi-task learning for visual Internet of Things | |
CN108765549A (en) | A kind of product three-dimensional display method and device based on artificial intelligence | |
CN110782503B (en) | Face image synthesis method and device based on two-branch depth correlation network | |
CN116189281A (en) | End-to-end human behavior classification method and system based on spatio-temporal adaptive fusion | |
Ruan et al. | Point cloud compression with implicit neural representations: A unified framework | |
CN115100707A (en) | Model training method, video information generation method, device and storage medium | |
CN113810736B (en) | An AI-driven real-time point cloud video transmission method and system | |
CN115131196A (en) | Image processing method, system, storage medium and terminal equipment | |
CN116958451B (en) | Model processing, image generating method, image generating device, computer device and storage medium | |
CN117173333A (en) | Meta universe-based enhanced multi-mode virtual reality system and method | |
CN113822114A (en) | Image processing method, related equipment and computer readable storage medium | |
CN118230391A (en) | A 3D face enhanced recognition system based on point cloud and RGB image | |
EP4164221A1 (en) | Processing image data | |
CN116235429B (en) | Method, apparatus and computer readable storage medium for media streaming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |