CN114554220B

CN114554220B - A Fixed Scene Video Overlimit Compression and Decoding Method Based on Abstract Features

Info

Publication number: CN114554220B
Application number: CN202210038155.9A
Authority: CN
Inventors: 黄宏博; 陈伟骏; 孙牧野; 李萌
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2023-07-28
Anticipated expiration: 2042-01-13
Also published as: CN114554220A

Abstract

The invention discloses a fixed scene video overrun compression and decoding method based on abstract features, which comprises the following steps: 1) Extracting a background image of an original video by adopting a background modeling method and performing compression coding; 2) Extracting abstract features from a foreground target by adopting a foreground target extraction module comprising algorithms such as instance segmentation, key point detection and the like; 3) Taking a snapshot of the foreground object and compressing the snapshot; 4) Compressing video background compression data, foreground object abstract features and snapshot compression data; 5) Decompressing the video compression data by pre-decoding; 6) Inputting the abstract features of the foreground target and the snapshot of the foreground target into a trained generator for generating an countermeasure network; 7) Fusing the generated foreground target decoding image of each frame with the background image; 8) Reconstructing the fused video frames to obtain a decoded video. The invention has extremely high compression ratio aiming at the video of the fixed scene, remarkably improves the storage efficiency and prolongs the storage time of the monitoring video.

Description

A Fixed Scene Video Overlimit Compression and Decoding Method Based on Abstract Features

技术领域technical field

本发明涉及计算机视觉的深度学习技术领域，具体为一种基于抽象特征的固定场景视频超限压缩与解码方法。The invention relates to the technical field of deep learning of computer vision, in particular to an abstract feature-based overlimit compression and decoding method for fixed-scene video.

背景技术Background technique

常见的对于视频数据的压缩编码主要是基于纹理、边缘、图像块的移动等底层特征来去除冗余信息，并未充分考虑视频内容所包含的高层抽象特征。深度学习在计算机视觉领域的蓬勃发展为图像和视频的高层抽象理解带来了技术可行性。深度卷积神经网络在大数据和高性能并行计算的支撑下，对图像和视频等高层特征提取带来了革命性的变革。不同于传统基于手工设计的图像特征提取方式，卷积神经网络可以自动在大数据中提取表达能力更强的高层特征。这些高层特征在图像理解和视频结构化中起到了至关重要的作用。借助于深度卷积神经网络模型的高层特征提取能力，在普遍可获取的视频大数据基础上，提取视频中表达性更强的高层抽象特征信息，去除视频中大量存在的抽象冗余，将可以大幅度提升视频压缩性能，减少存储空间和传输带宽，为视频更好的持久化存储和传输带来新的思路。The common compression coding of video data is mainly based on low-level features such as texture, edge, and image block movement to remove redundant information, and does not fully consider the high-level abstract features contained in video content. The vigorous development of deep learning in the field of computer vision has brought technical feasibility to high-level abstract understanding of images and videos. With the support of big data and high-performance parallel computing, deep convolutional neural networks have brought revolutionary changes to the extraction of high-level features such as images and videos. Unlike traditional image feature extraction methods based on manual design, convolutional neural networks can automatically extract high-level features with stronger expressive capabilities in big data. These high-level features play a crucial role in image understanding and video structuring. With the help of the high-level feature extraction capability of the deep convolutional neural network model, on the basis of widely available video big data, extracting more expressive high-level abstract feature information in the video and removing a large number of abstract redundancy in the video will be able to Greatly improve video compression performance, reduce storage space and transmission bandwidth, and bring new ideas for better persistent video storage and transmission.

因此，如何提供一种通过从视频中提取高层抽象特征信息，以提高压缩比的的视频压缩方法是本领域技术人员亟需解决的问题。Therefore, how to provide a video compression method that improves the compression ratio by extracting high-level abstract feature information from the video is an urgent problem to be solved by those skilled in the art.

发明内容Contents of the invention

(一)解决的技术问题(1) Solved technical problems

针对现有技术的不足，本发明提供了一种基于抽象特征的固定场景视频超限压缩与解码方法，通过从视频中提取高层抽象特征信息进行存储，大幅减少了存储空间，以解决上述的技术问题。Aiming at the deficiencies of the prior art, the present invention provides a fixed-scene video overcompression and decoding method based on abstract features. By extracting high-level abstract feature information from the video for storage, the storage space is greatly reduced to solve the above-mentioned technical problems. question.

(二)技术方案(2) Technical solutions

为实现上述的目的，本发明提供如下技术方案：一种基于抽象特征的固定场景视频超限压缩与解码方法，其中包含了编码器与解码器。本方法包括以下步骤：In order to achieve the above object, the present invention provides the following technical solution: a fixed-scene video overcompression and decoding method based on abstract features, which includes an encoder and a decoder. This method comprises the following steps:

1.视频压缩。1. Video compression.

将原视频拆解为图像帧，送入编码器进行处理。编码器包含两个模块：背景建模与前景目标提取。Disassemble the original video into image frames and send them to the encoder for processing. The encoder consists of two modules: background modeling and foreground object extraction.

背景建模模块使用基于混合高斯模型的背景建模算法，对每一帧视频进行前景减除，得到背景图像。所有视频帧处理完后，将多帧背景图像求并集得到单张背景图像，然后进行离散余弦变换，量化与熵编码，得到视频背景压缩数据。The background modeling module uses a background modeling algorithm based on a mixed Gaussian model to perform foreground subtraction on each frame of video to obtain a background image. After all the video frames are processed, the multi-frame background images are combined to obtain a single background image, and then discrete cosine transform, quantization and entropy coding are performed to obtain video background compressed data.

前景目标提取模块由基于卷积神经网络的实例分割模型与关键点检测模型组成，对图像帧进行物体实例分割与关键点检测，获得前景目标抽象特征。所述前景目标抽象特征包含前景目标的形状特征和关键点特征。The foreground target extraction module is composed of an instance segmentation model and a key point detection model based on a convolutional neural network, and performs object instance segmentation and key point detection on image frames to obtain abstract features of foreground targets. The abstract feature of the foreground object includes the shape feature and key point feature of the foreground object.

处理完全部视频帧后，使用基于目标检测框IOU阈值的方法进行帧间目标匹配，得到帧间前景目标的对应关系，然后对每一个前景目标提取快照。提取快照的算法步骤为：对每一个前景目标的多帧形状特征，只保留实例分割模型输出的置信度最高的一帧形状特征，利用该形状特征从原视频帧中抠出该前景目标的图像，便得到了该前景目标的快照，将快照进行离散余弦变换，量化与熵编码，得到前景目标快照压缩数据。提取快照的目的是保存该前景目标的细节特征，例如颜色纹理等。After processing all the video frames, use the method based on the IOU threshold of the target detection frame to perform inter-frame target matching to obtain the corresponding relationship between foreground targets between frames, and then extract a snapshot for each foreground target. The algorithm steps for extracting snapshots are as follows: For each multi-frame shape feature of the foreground object, only retain the shape feature of the frame with the highest confidence output by the instance segmentation model, and use the shape feature to extract the image of the foreground object from the original video frame , the snapshot of the foreground object is obtained, and the snapshot is subjected to discrete cosine transform, quantization and entropy coding to obtain the compressed data of the foreground object snapshot. The purpose of extracting snapshots is to preserve the detailed features of the foreground object, such as color texture and so on.

最后将前景目标抽象特征、快照压缩数据与背景压缩数据进行压缩打包，得到视频压缩数据。视频压缩完成。Finally, the abstract feature of the foreground object, the snapshot compressed data and the background compressed data are compressed and packaged to obtain the video compressed data. Video compression is complete.

在编码器中，背景建模模块将原视频的背景编码为单张压缩图像，从而实现背景冗余信息的去除；前景目标提取模块通过提取前景目标抽象特征与快照，对原视频中每个前景目标只保存多帧抽象特征与单帧快照压缩数据，从而实现前景冗余信息的去除。相比传统视频压缩编码，本发明的编码方式大大减少了需要保存的数据容量，从而实现了超限压缩。In the encoder, the background modeling module encodes the background of the original video into a single compressed image, thereby realizing the removal of background redundant information; the foreground target extraction module extracts the abstract features and snapshots of the foreground target, The goal is to save only multi-frame abstract features and single-frame snapshot compressed data, so as to achieve the removal of foreground redundant information. Compared with traditional video compression coding, the coding method of the present invention greatly reduces the data capacity to be saved, thereby realizing over-limit compression.

2.视频预解码2. Video pre-decoding

当用户需要观看视频时，首先进行视频预解码。将编码器最后压缩打包的视频压缩数据解压，恢复出前景目标抽象特征、前景目标快照与视频背景图像。When a user needs to watch a video, first perform video pre-decoding. Decompress the video compression data compressed and packaged by the encoder at the end, and restore the abstract features of the foreground object, the snapshot of the foreground object and the video background image.

3.视频解码。3. Video decoding.

本发明的解码器由基于生成对抗网络架构的卷积神经网络模型组成，其中包含了生成器与判别器。生成器的输入为前景目标快照与前景目标抽象特征，输出为前景目标解码图像；判别器负责在生成器训练时辅助生成器提高生成图像的质量，输入为生成器生成的前景目标解码图像与真实视频帧中的前景目标图像，输出为介于0到1的数值，代表判别器判断输入图像可能是生成图像(0)或者真实图像(1)。The decoder of the present invention is composed of a convolutional neural network model based on a generated confrontation network architecture, which includes a generator and a discriminator. The input of the generator is the snapshot of the foreground target and the abstract features of the foreground target, and the output is the decoded image of the foreground target; the discriminator is responsible for assisting the generator to improve the quality of the generated image during the training of the generator, and the input is the decoded image of the foreground target generated by the generator and the real The foreground target image in the video frame is output as a value between 0 and 1, which means that the discriminator judges that the input image may be a generated image (0) or a real image (1).

(1)解码器训练过程(1) Decoder training process

训练过程的目标函数为：L＝L_GAN+L_L1+L_VGG The objective function of the training process is: L=L _GAN +L _L1 +L _VGG

其中：in:

为生成对抗损失，I_S与I_t分别为前景目标快照与需要生成的真实前景目标图像，R_S与R_t为根据I_S图像与I_t图像的关键点生成的响应图，以便输入到生成器中。为生成器生成的前景目标解码图像，z 为随机噪声。To generate the adversarial loss, _IS and _It are the foreground target snapshot and the real foreground target image to be generated respectively, _RS and R _t are the response maps generated according to the key points of the _IS image and _It image, so as to be input to the generation device. Decoded image for the foreground object generated by the generator, z is random noise.

其中：in:

为L1损失，计算生成器生成图像与真实图像的最小绝对误差。For the L1 loss, the minimum absolute error between the generator-generated image and the real image is computed.

其中：in:

为感知损失，通过将生成器生成的前景目标解码图像与真实前景目标图像输入到公开的VGG预训练网络模型，计算两者在深层特征图的最小平方差。For the perceptual loss, by inputting the decoded image of the foreground object generated by the generator and the real foreground object image into the public VGG pre-trained network model, the least square difference between the two in the deep feature map is calculated.

训练结束后，解码器只需保留生成器。After training, the decoder only needs to keep the generator.

(2)解码器解码过程(2) Decoder decoding process

读取每个前景目标的多帧抽象特征与快照，送入解码器中的生成器。生成器模型从多帧抽象特征获取目标的姿态、骨架等信息，从快照中获取目标的颜色、纹理等信息，将以上信息融合处理，生成前景目标解码图像。The multi-frame abstract features and snapshots of each foreground object are read and fed to the generator in the decoder. The generator model obtains information such as the pose and skeleton of the target from the abstract features of multiple frames, and obtains information such as the color and texture of the target from the snapshot, and fuses the above information to generate a decoded image of the foreground target.

读取视频背景图像，将所有生成的前景目标解码图像与背景图像融合，得到重建视频帧。将所有重建视频帧合并重构，得到解码视频。The video background image is read, and all generated foreground target decoded images are fused with the background image to obtain the reconstructed video frame. Merge and reconstruct all the reconstructed video frames to obtain the decoded video.

(三)有益效果(3) Beneficial effects

与现有技术相比，本发明提供了一种基于抽象特征的固定场景视频超限压缩与解码方法，具备以下有益效果：该基于抽象特征的固定场景视频超限压缩与解码方法，针对固定场景视频具有很高的压缩比，极大节约了存储空间资源。实验证明，针对不同长度、出现目标数量不同的固定场景视频，本方法存储的压缩数据容量仅为使用H264编码视频的1/40至1/3，实现了超越传统视频压缩编码的高压缩比。本发明可以应用于各类智能监控系统，显著延长监控视频的存储时长，并且在压缩过程中提取的目标抽象特征，可以用于异常行为检测、交通流量监测等。Compared with the prior art, the present invention provides a fixed-scene video overcompression and decoding method based on abstract features, which has the following beneficial effects: the abstract feature-based fixed-scene video over-limit compression and decoding method is aimed at fixed scene Video has a high compression ratio, which greatly saves storage space resources. Experiments have proved that for fixed-scene videos with different lengths and different numbers of targets, the compressed data capacity stored by this method is only 1/40 to 1/3 of the video encoded by H264, achieving a high compression ratio beyond traditional video compression coding. The invention can be applied to various intelligent monitoring systems, significantly prolonging the storage time of monitoring videos, and the abstract features of objects extracted during the compression process can be used for abnormal behavior detection, traffic flow monitoring and the like.

附图说明Description of drawings

图1为本发明提出的基于抽象特征的固定场景视频超限压缩与解码方法框架图。FIG. 1 is a frame diagram of an abstract feature-based fixed-scene video overcompression and decoding method proposed by the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明提出的固定场景视频超限压缩与解码方法的整体结构如图1所示，主要由两部分组成，编码器与解码器。压缩时，将原视频输入视频编码器后得到视频压缩数据；解码时首先将视频压缩数据进行预解码，然后输入解码器后生成解码视频。The overall structure of the fixed scene video overcompression and decoding method proposed by the present invention is shown in FIG. 1 , which mainly consists of two parts, an encoder and a decoder. When compressing, the original video is input to the video encoder to obtain video compressed data; when decoding, the video compressed data is firstly pre-decoded, and then input to the decoder to generate decoded video.

1.视频压缩步骤1. Video compression steps

步骤1)卷积神经网络初始化，将实例分割模型，关键点检测模型加载到GPU中。Step 1) The convolutional neural network is initialized, and the instance segmentation model and the key point detection model are loaded into the GPU.

步骤2)混合高斯背景模型初始化。Step 2) Mixed Gaussian background model initialization.

步骤3)读取原视频的第i帧。Step 3) read the i-th frame of the original video.

步骤4)将视频帧输入到混合高斯背景模型中，进行匹配与模型权值更新，得到高斯背景建模结果bg_i。Step 4) Input the video frame into the mixed Gaussian background model, perform matching and model weight update, and obtain the Gaussian background modeling result bg _i .

步骤5)使用实例分割模型对当前视频帧进行实例分割，得到m个前景目标的实例分割结果S_i＝{box_j，mask_j|j＝1,2,…,m}；其中box_j为视频帧中第j个前景目标的矩形检测框(x_min,y_min,x_max,y_max)，mask_j为视频帧中第j个前景目标的掩膜，掩膜是一个长宽与视频帧相等的二值图像，对应目标出现的区域为1，其他区域为0。后续步骤中，前景目标检测框等同于本步骤中的box_j，前景目标的形状特征等同于本步骤中的m ask_j，前景目标时空信息等同于当前帧序号i加上本步骤中的box_j(i，x _min,y_min,x_max,y_max)。Step 5) Use the instance segmentation model to perform instance segmentation on the current video frame, and obtain the instance segmentation results of m foreground objects S _i ={box _j , mask _j |j=1,2,...,m}; where box _j is the video The rectangular detection frame (x_min, y_min, x_max, y_max) of the jth foreground object in the frame, mask _j is the mask of the jth foreground object in the video frame, and the mask is a binary image whose length and width are equal to the video frame , the area corresponding to the target appearance is 1, and the other areas are 0. In the subsequent steps, the detection frame of the foreground object is equivalent to box _j in this step, the shape feature of the foreground object is equivalent to mask _j in this step, and the spatiotemporal information of the foreground object is equal to the current frame number i plus box _j in this step (i, x_min, y_min, x_max, y_max).

步骤6)使用关键点检测模型对每个检出的前景目标进行关键点检测，得到前景目标的关键点坐标(x0,y0,x1,y1...)。Step 6) Use the key point detection model to perform key point detection on each detected foreground object, and obtain the key point coordinates (x0, y0, x1, y1...) of the foreground object.

步骤7)重复执行步骤3至步骤6，直至处理完所有视频帧。Step 7) Repeat steps 3 to 6 until all video frames are processed.

步骤8)读取步骤5)中实例分割模型检出的前景目标时空信息(i，x_min,y_min,x_max,y_max)。使用基于目标检测框IOU阈值的方法进行帧间前景目标匹配，得到多个匹配列表，每个列表中包含每个前景目标的多帧时空信息，按照时间顺序排列。例如视频中出现了p个目标，这p个目标分别出现了q₁,q₂…q_p帧，则得到p个长度分别为为q₁,q₂…q_p的匹配列表，列表中每一项为该目标的在不同帧中的时空信息。Step 8) Read the spatio-temporal information (i, x_min, y_min, x_max, y_max) of the foreground object detected by the instance segmentation model in step 5). Using the method based on the IOU threshold of the target detection frame for inter-frame foreground target matching, multiple matching lists are obtained, and each list contains multi-frame spatio-temporal information of each foreground target, arranged in time order. For example, p targets appear in the video, and these p targets appear in q ₁ , q ₂ ... q _p frames respectively, then p matching lists with lengths q ₁ , q ₂ ... q _p respectively are obtained, and each Item is the spatio-temporal information of the target in different frames.

步骤9)对于每一个前景目标取快照，步骤为：根据步骤8)中每一个匹配列表的前景目标时空信息，读取每一个前景目标的多帧形状特征，然后只保留实例分割模型输出的置信度最高的一帧形状特征，使用该形状特征从原视频帧中抠出该前景目标的图像，便得到了该前景目标的快照。将快照进行离散余弦变换，量化与熵编码，得到前景目标快照压缩数据，使用该快照的时空信息作为文件名(i_s,x_min_s,y_min_s,x_max_s,y_max_s.jpg) 进行保存。Step 9) Take a snapshot for each foreground object, the steps are: according to the foreground object spatio-temporal information of each matching list in step 8), read the multi-frame shape features of each foreground object, and then only keep the confidence output of the instance segmentation model The shape feature of a frame with the highest degree is used, and the image of the foreground object is extracted from the original video frame by using the shape feature, and a snapshot of the foreground object is obtained. Perform discrete cosine transform, quantization, and entropy coding on the snapshot to obtain the compressed data of the foreground target snapshot, and use the spatiotemporal information of the snapshot as the file name (i _s , x_min _s , y_min _s , x_max _s , y_max _s .jpg) to save.

步骤10)将每个前景目标的快照文件名(i_s,x_min_s,y_min_s,x_max_s,y_ max_s.jpg)与多帧时空信息(i,x_min,y_min,x_max,y_max)、多帧关键点坐标(x0,y0,x1,y1...)合并，写入csv文件保存，为前景目标抽象特征文件。至此，对于每个前景目标，只保留了单帧快照+多帧目标抽象特征。Step 10) Combine the snapshot file name (i _s , x_min _s , y_min _s , x_max _s , y_ max _s .jpg) of each foreground target with the multi-frame spatio-temporal information (i, x_min, y_min, x_max, y_max), multi-frame The key point coordinates (x0, y0, x1, y1...) are merged and written into a csv file for saving, which is an abstract feature file of the foreground target. So far, for each foreground object, only single-frame snapshot + multi-frame object abstract features are preserved.

步骤11)将从步骤4)得到的每一帧的背景图像序列{bg_i|i＝ 1,2,…,n}求并集bg＝{bg₁ U bg₂ U bg₃ ··· U bg_n}，得到完整的视频背景图像，然后进行离散余弦变换，量化与熵编码，得到视频背景压缩数据。Step 11) Find the union bg={bg ₁ _U bg ₂ U bg ₃ U bg _n } to obtain a complete video background image, and then perform discrete cosine transform, quantization and entropy coding to obtain video background compressed data.

步骤12)将步骤9)得到的前景目标快照压缩数据，步骤10)得到的前景目标抽象特征文件以及步骤11)的得到的视频背景压缩数据作为整体进行压缩打包，得到视频压缩数据。Step 12) compress and package the foreground object snapshot compressed data obtained in step 9), the foreground object abstract feature file obtained in step 10) and the video background compressed data obtained in step 11) as a whole to obtain video compressed data.

2.视频解码步骤2. Video decoding steps

步骤1)预解码，将视频压缩数据解压，恢复出前景目标抽象特征，前景目标快照与视频背景图像。Step 1) pre-decoding, decompressing the compressed video data, recovering the abstract features of the foreground object, the snapshot of the foreground object and the video background image.

步骤2)卷积神经网络初始化，将训练好的生成器网络模型加载到GPU 中。Step 2) Initialize the convolutional neural network, and load the trained generator network model into the GPU.

步骤3)读取前景目标抽象特征文件，获取前景目标快照文件名和与其对应的多帧抽象特征。Step 3) Read the abstract feature file of the foreground object, and obtain the snapshot file name of the foreground object and its corresponding multi-frame abstract feature.

步骤4)将前景目标快照、前景目标快照的抽象特征和待生成前景目标解码图像的抽象特征输入到生成器模型中，生成前景目标解码图像。Step 4) Input the snapshot of the foreground object, the abstract features of the snapshot of the foreground object, and the abstract features of the decoded image of the foreground object to be generated into the generator model to generate the decoded image of the foreground object.

步骤5)重复执行步骤3至步骤4，直到前景目标抽象特征文件读取完毕，所有前景目标全部解码完成。Step 5) Repeat step 3 to step 4 until the abstract feature file of the foreground object is read and all the foreground objects are decoded.

步骤6)读取视频背景图像。Step 6) read the video background image.

步骤7)将每一帧的前景目标解码图像与视频背景图像融合，生成重建视频帧，直到视频所有帧均完成重建，将所有重建视频帧合并，得到解码视频。Step 7) Fusion the decoded image of the foreground target with the video background image of each frame to generate a reconstructed video frame until all frames of the video are reconstructed, and merge all the reconstructed video frames to obtain the decoded video.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device.

尽管已经示出和描述了本发明的实施例，对于本领域普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and variants, the scope of the invention is defined by the appended claims and their equivalents.

Claims

1. a fixed scene video overcompression and decoding method based on abstract features, is characterized in that comprising the following steps:

Use the encoder to compress the fixed scene video data;

Using a decoder to decode the compressed video data to obtain a decoded video;

In the encoder, a background modeling method is used to extract the background image of the original video, and then the extracted background image is compressed and encoded to obtain video background compressed data;

In the encoder, the foreground object extraction module including the object instance segmentation and key point detection algorithm is used to extract the features of the foreground object in the video frame to obtain the abstract feature of the foreground object, and the abstract feature of the foreground object includes the shape feature of the foreground object and key features;

The encoder utilizes the foreground target snapshot extraction algorithm to extract a snapshot of the foreground target obtained by the foreground target extraction module, compress and encode the snapshot, and obtain the compressed data of the foreground target snapshot;

The encoder extracts abstract features and snapshots of foreground objects, and only saves abstract features and snapshot compressed data for each foreground object in the video;

For the video background, only the video background compressed data is saved, and the abstract features of the foreground target, the snapshot compressed data and the background compressed data are compressed and packaged to obtain the video compressed data;

When decoding, the compressed video data is decompressed, and the abstract features of the foreground object, the snapshot of the foreground object and the video background image are recovered;

In the decoder, a deep learning model based on a Generative Adversarial Network is used for video decoding;

In the decoder, the abstract feature of the foreground target and the snapshot of the foreground target are input into the generator of the generative confrontation network, and the decoded image of the foreground target is reconstructed;

The decoder fuses the decoded image of the foreground target with the background image of each frame in the video to obtain a reconstructed video frame, and merges and reconstructs all the reconstructed video frames to obtain a decoded video.